FeatureExtractor
Create new features extracting datetime elements (day, month,
year, etc...) from the provided columns. Columns of dtype
datetime64 are used as is. Categorical columns that can be
successfully converted to a datetime format (less than 30% NaT
values after conversion) are also used.
This class can be accessed from atom through the feature_extraction method. Read more in the user guide.
Warning
Decision trees based algorithms build their split rules according to one feature at a time. This means that they will fail to correctly process cyclic features since the sin/cos features should be considered one single coordinate system.
See Also
Example
>>> import pandas as pd
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> # Add a datetime column
>>> X["date"] = pd.date_range(start="1/1/2018", periods=len(X))
>>> atom = ATOMClassifier(X, y)
>>> atom.feature_extraction(features=["day"], fmt="%d/%m/%Y", verbose=2)
Fitting FeatureExtractor...
Extracting datetime features...
 --> Extracting features from column date.
   --> Creating feature date_day.
>>> # Note the date_day column
>>> print(atom.dataset)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension  date_day  target
0         17.140         16.40          116.00      912.7          0.11860           0.22760         0.22290             0.140100         0.3040  ...      1461.0           0.15450             0.3949           0.3853               0.25500          0.4066                  0.10590        26       0
1         12.770         22.47           81.72      506.3          0.09055           0.05761         0.04711             0.027040         0.1585  ...       653.6           0.14190             0.1523           0.2177               0.09331          0.2829                  0.08067        16       0
2         19.170         24.80          132.40     1123.0          0.09740           0.24580         0.20650             0.111800         0.2397  ...      1332.0           0.10370             0.3903           0.3639               0.17670          0.3176                  0.10230        13       0
3         17.990         10.38          122.80     1001.0          0.11840           0.27760         0.30010             0.147100         0.2419  ...      2019.0           0.16220             0.6656           0.7119               0.26540          0.4601                  0.11890         1       0
4         14.690         13.98           98.22      656.1          0.10310           0.18360         0.14500             0.063000         0.2086  ...       809.2           0.13120             0.3635           0.3219               0.11080          0.2827                  0.09208        26       1
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...         ...               ...                ...              ...                   ...             ...                      ...       ...     ...
564        9.397         21.68           59.75      268.8          0.07969           0.06053         0.03735             0.005128         0.1274  ...       301.0           0.10860             0.1887           0.1868               0.02564          0.2376                  0.09206        17       1
565       20.640         17.35          134.80     1335.0          0.09446           0.10760         0.15270             0.089410         0.1571  ...      1946.0           0.15620             0.3055           0.4159               0.21120          0.2689                  0.07055         9       0
566       18.820         21.97          123.70     1110.0          0.10180           0.13890         0.15940             0.087440         0.1943  ...      1603.0           0.13900             0.3463           0.3912               0.17080          0.3007                  0.08314        10       0
567       13.940         13.17           90.31      594.2          0.12480           0.09755         0.10100             0.066150         0.1976  ...       653.3           0.13940             0.1364           0.1559               0.10150          0.2160                  0.07253        13       1
568       13.740         17.91           88.12      585.0          0.07944           0.06376         0.02881             0.013290         0.1473  ...       725.9           0.09711             0.1824           0.1564               0.06019          0.2350                  0.07014        30       1
[569 rows x 32 columns]
>>> import pandas as pd
>>> from atom.feature_engineering import FeatureExtractor
>>> from sklearn.datasets import load_breast_cancer
>>> X, _ = load_breast_cancer(return_X_y=True, as_frame=True)
>>> # Add a datetime column
>>> X["date"] = pd.date_range(start="1/1/2018", periods=len(X))
>>> fe = FeatureExtractor(features=["day"], fmt="%Y-%m-%d", verbose=2)
>>> X = fe.transform(X)
Extracting datetime features...
 --> Extracting features from column date.
   --> Creating feature date_day.
>>> # Note the date_day column
>>> print(X)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst perimeter  worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension  date_day
0          17.99         10.38          122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419  ...           184.60      2019.0           0.16220            0.66560           0.7119                0.2654          0.4601                  0.11890         1
1          20.57         17.77          132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812  ...           158.80      1956.0           0.12380            0.18660           0.2416                0.1860          0.2750                  0.08902         2
2          19.69         21.25          130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069  ...           152.50      1709.0           0.14440            0.42450           0.4504                0.2430          0.3613                  0.08758         3
3          11.42         20.38           77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597  ...            98.87       567.7           0.20980            0.86630           0.6869                0.2575          0.6638                  0.17300         4
4          20.29         14.34          135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809  ...           152.20      1575.0           0.13740            0.20500           0.4000                0.1625          0.2364                  0.07678         5
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...              ...         ...               ...                ...              ...                   ...             ...                      ...       ...
564        21.56         22.39          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726  ...           166.10      2027.0           0.14100            0.21130           0.4107                0.2216          0.2060                  0.07115        19
565        20.13         28.25          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752  ...           155.00      1731.0           0.11660            0.19220           0.3215                0.1628          0.2572                  0.06637        20
566        16.60         28.08          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590  ...           126.70      1124.0           0.11390            0.30940           0.3403                0.1418          0.2218                  0.07820        21
567        20.60         29.33          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397  ...           184.60      1821.0           0.16500            0.86810           0.9387                0.2650          0.4087                  0.12400        22
568         7.76         24.54           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587  ...            59.16       268.6           0.08996            0.06444           0.0000                0.0000          0.2871                  0.07039        23
[569 rows x 31 columns]
Methods
| fit | Do nothing. | 
| fit_transform | Fit to data, then transform it. | 
| get_params | Get parameters for this estimator. | 
| inverse_transform | Do nothing. | 
| set_output | Set output container. | 
| set_params | Set the parameters of this estimator. | 
| transform | Extract the new features. | 
Implemented for continuity of the API.
| Parameters | deep : bool, default=True 
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
  | 
| Returns | params : dict 
Parameter names mapped to their values.
  | 
Returns the input unchanged. Implemented for continuity of the API.
See sklearn's user guide on how to use the
set_output API. See here a description
of the choices.
| Parameters | **params : dict 
Estimator parameters.
  | 
| Returns | self : estimator instance 
Estimator instance.
  |