FeatureExtractor
Extract features from datetime columns.
Create new features extracting datetime elements (day, month,
year, etc...) from the provided columns. Columns of dtype
datetime64
are used as is. Categorical columns that can be
successfully converted to a datetime format (less than 30% NaT
values after conversion) are also used.
This class can be accessed from atom through the feature_extraction method. Read more in the user guide.
Warning
Decision trees based algorithms build their split rules according to one feature at a time. This means that they will fail to correctly process cyclic features since the sin/cos features should be considered one single coordinate system.
See Also
Example
>>> import pandas as pd
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> # Add a datetime column
>>> X["date"] = pd.date_range(start="1/1/2018", periods=len(X))
>>> atom = ATOMClassifier(X, y)
>>> atom.feature_extraction(features=["day"], fmt="%d/%m/%Y", verbose=2)
Fitting FeatureExtractor...
Extracting datetime features...
--> Extracting features from column date.
--> Creating feature date_day.
>>> # Note the date_day column
>>> print(atom.dataset)
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry ... worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension date_day target
0 13.49 22.30 86.91 561.0 0.08752 0.07698 0.04751 0.03384 0.1809 ... 698.8 0.1162 0.17110 0.22820 0.12820 0.2871 0.06917 19 1
1 19.27 26.47 127.90 1162.0 0.09401 0.17190 0.16570 0.07593 0.1853 ... 1813.0 0.1509 0.65900 0.60910 0.17850 0.3672 0.11230 3 0
2 13.00 25.13 82.61 520.2 0.08369 0.05073 0.01206 0.01762 0.1667 ... 628.5 0.1218 0.10930 0.04462 0.05921 0.2306 0.06291 4 1
3 23.29 26.67 158.90 1685.0 0.11410 0.20840 0.35230 0.16200 0.2200 ... 1986.0 0.1536 0.41670 0.78920 0.27330 0.3198 0.08762 22 0
4 12.77 22.47 81.72 506.3 0.09055 0.05761 0.04711 0.02704 0.1585 ... 653.6 0.1419 0.15230 0.21770 0.09331 0.2829 0.08067 16 0
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 12.77 29.43 81.35 507.9 0.08276 0.04234 0.01997 0.01499 0.1539 ... 594.7 0.1234 0.10640 0.08653 0.06498 0.2407 0.06484 7 1
565 11.15 13.08 70.87 381.9 0.09754 0.05113 0.01982 0.01786 0.1830 ... 440.8 0.1341 0.08971 0.07116 0.05506 0.2859 0.06772 3 1
566 19.17 24.80 132.40 1123.0 0.09740 0.24580 0.20650 0.11180 0.2397 ... 1332.0 0.1037 0.39030 0.36390 0.17670 0.3176 0.10230 13 0
567 11.43 15.39 73.06 399.8 0.09639 0.06889 0.03503 0.02875 0.1734 ... 462.0 0.1190 0.16480 0.13990 0.08476 0.2676 0.06765 18 1
568 13.48 20.82 88.40 559.2 0.10160 0.12550 0.10630 0.05439 0.1720 ... 740.4 0.1610 0.42250 0.50300 0.22580 0.2807 0.10710 9 0
[569 rows x 32 columns]
>>> import pandas as pd
>>> from atom.feature_engineering import FeatureExtractor
>>> from sklearn.datasets import load_breast_cancer
>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)
>>> # Add a datetime column
>>> X["date"] = pd.date_range(start="1/1/2018", periods=len(X))
>>> fe = FeatureExtractor(features=["day"], fmt="%Y-%m-%d", verbose=2)
>>> X = fe.transform(X)
Extracting datetime features...
--> Extracting features from column date.
--> Creating feature date_day.
>>> # Note the date_day column
>>> print(X)
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry ... worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension date_day
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.30010 0.14710 0.2419 ... 184.60 2019.0 0.16220 0.66560 0.7119 0.2654 0.4601 0.11890 1
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.08690 0.07017 0.1812 ... 158.80 1956.0 0.12380 0.18660 0.2416 0.1860 0.2750 0.08902 2
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.19740 0.12790 0.2069 ... 152.50 1709.0 0.14440 0.42450 0.4504 0.2430 0.3613 0.08758 3
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.24140 0.10520 0.2597 ... 98.87 567.7 0.20980 0.86630 0.6869 0.2575 0.6638 0.17300 4
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.19800 0.10430 0.1809 ... 152.20 1575.0 0.13740 0.20500 0.4000 0.1625 0.2364 0.07678 5
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
564 21.56 22.39 142.00 1479.0 0.11100 0.11590 0.24390 0.13890 0.1726 ... 166.10 2027.0 0.14100 0.21130 0.4107 0.2216 0.2060 0.07115 19
565 20.13 28.25 131.20 1261.0 0.09780 0.10340 0.14400 0.09791 0.1752 ... 155.00 1731.0 0.11660 0.19220 0.3215 0.1628 0.2572 0.06637 20
566 16.60 28.08 108.30 858.1 0.08455 0.10230 0.09251 0.05302 0.1590 ... 126.70 1124.0 0.11390 0.30940 0.3403 0.1418 0.2218 0.07820 21
567 20.60 29.33 140.10 1265.0 0.11780 0.27700 0.35140 0.15200 0.2397 ... 184.60 1821.0 0.16500 0.86810 0.9387 0.2650 0.4087 0.12400 22
568 7.76 24.54 47.92 181.0 0.05263 0.04362 0.00000 0.00000 0.1587 ... 59.16 268.6 0.08996 0.06444 0.0000 0.0000 0.2871 0.07039 23
[569 rows x 31 columns]
Methods
fit | Do nothing. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
inverse_transform | Do nothing. |
set_output | Set output container. |
set_params | Set the parameters of this estimator. |
transform | Extract the new features. |
Do nothing.
Implemented for continuity of the API.
Fit to data, then transform it.
Get parameters for this estimator.
Parameters |
deep : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
|
Returns |
params : dict
Parameter names mapped to their values.
|
Do nothing.
Returns the input unchanged. Implemented for continuity of the API.
Set output container.
See sklearn's user guide on how to use the
set_output
API. See here a description
of the choices.
Set the parameters of this estimator.
Parameters |
**params : dict
Estimator parameters.
|
Returns |
self : estimator instance
Estimator instance.
|
Extract the new features.