Encoder
The encoding type depends on the number of classes in the column:
- If n_classes=2 or ordinal feature, use Ordinal-encoding.
 - If 2 < n_classes <= 
max_onehot, use OneHot-encoding. - If n_classes > 
max_onehot, usestrategy-encoding. 
Missing values are propagated to the output column. Unknown classes encountered during transforming are imputed according to the selected strategy. Infrequent classes can be replaced with a value in order to prevent too high cardinality.
This class can be accessed from atom through the encode method. Read more in the user guide.
Warning
Three category-encoders estimators are unavailable:
- OneHotEncoder: Use the max_onehot parameter.
 - HashingEncoder: Incompatibility of APIs.
 - LeaveOneOutEncoder: Incompatibility of APIs.
 
See Also
Example
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> from numpy.random import randint
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> X["cat_feature_1"] = [f"x{i}" for i in randint(0, 2, len(X))]
>>> X["cat_feature_2"] = [f"x{i}" for i in randint(0, 3, len(X))]
>>> X["cat_feature_3"] = [f"x{i}" for i in randint(0, 20, len(X))]
>>> atom = ATOMClassifier(X, y, random_state=1)
>>> print(atom.X)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension  cat_feature_1  cat_feature_2  cat_feature_3
0          13.48         20.82           88.40      559.2          0.10160           0.12550         0.10630              0.05439         0.1720  ...            0.1610            0.42250           0.5030               0.22580          0.2807                  0.10710             x0             x1            x10
1          18.31         20.58          120.80     1052.0          0.10680           0.12480         0.15690              0.09451         0.1860  ...            0.1492            0.25360           0.3759               0.15100          0.3074                  0.07863             x1             x1            x15
2          17.93         24.48          115.20      998.9          0.08855           0.07027         0.05699              0.04744         0.1538  ...            0.1315            0.18060           0.2080               0.11360          0.2504                  0.07948             x1             x1            x17
3          15.13         29.81           96.71      719.5          0.08320           0.04605         0.04686              0.02739         0.1852  ...            0.1148            0.09866           0.1547               0.06575          0.3233                  0.06165             x0             x1            x10
4           8.95         15.76           58.74      245.2          0.09462           0.12430         0.09263              0.02308         0.1305  ...            0.1179            0.18790           0.1544               0.03846          0.1652                  0.07722             x1             x0            x19
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...               ...                ...              ...                   ...             ...                      ...            ...            ...            ...
564        14.34         13.47           92.51      641.2          0.09906           0.07624         0.05724              0.04603         0.2075  ...            0.1297            0.15250           0.1632               0.10870          0.3062                  0.06072             x0             x2            x18
565        13.17         21.81           85.42      531.5          0.09714           0.10470         0.08259              0.05252         0.1746  ...            0.1503            0.39040           0.3728               0.16070          0.3693                  0.09618             x0             x1             x9
566        17.30         17.08          113.00      928.2          0.10080           0.10410         0.12660              0.08353         0.1813  ...            0.1416            0.24050           0.3378               0.18570          0.3138                  0.08113             x0             x2             x6
567        17.68         20.74          117.40      963.7          0.11150           0.16650         0.18550              0.10540         0.1971  ...            0.1418            0.34980           0.3583               0.15150          0.2463                  0.07738             x1             x2             x5
568        14.80         17.66           95.88      674.8          0.09179           0.08890         0.04069              0.02260         0.1893  ...            0.1226            0.18810           0.2060               0.08308          0.3600                  0.07285             x0             x0             x9
[569 rows x 33 columns]
>>> atom.encode(strategy="target", max_onehot=10, verbose=2)
Fitting Encoder...
Encoding categorical columns...
 --> Ordinal-encoding feature cat_feature_1. Contains 2 classes.
 --> OneHot-encoding feature cat_feature_2. Contains 3 classes.
 --> Target-encoding feature cat_feature_3. Contains 20 classes.
>>> # Note the one-hot encoded column with name [feature]_[class]
>>> print(atom.X)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst concavity  worst concave points  worst symmetry  worst fractal dimension  cat_feature_1  cat_feature_2_x1  cat_feature_2_x0  cat_feature_2_x2  cat_feature_3
0          13.48         20.82           88.40      559.2          0.10160           0.12550         0.10630              0.05439         0.1720  ...           0.5030               0.22580          0.2807                  0.10710            0.0               1.0               0.0               0.0       0.541418
1          18.31         20.58          120.80     1052.0          0.10680           0.12480         0.15690              0.09451         0.1860  ...           0.3759               0.15100          0.3074                  0.07863            1.0               1.0               0.0               0.0       0.650825
2          17.93         24.48          115.20      998.9          0.08855           0.07027         0.05699              0.04744         0.1538  ...           0.2080               0.11360          0.2504                  0.07948            1.0               1.0               0.0               0.0       0.613359
3          15.13         29.81           96.71      719.5          0.08320           0.04605         0.04686              0.02739         0.1852  ...           0.1547               0.06575          0.3233                  0.06165            0.0               1.0               0.0               0.0       0.541418
4           8.95         15.76           58.74      245.2          0.09462           0.12430         0.09263              0.02308         0.1305  ...           0.1544               0.03846          0.1652                  0.07722            1.0               0.0               1.0               0.0       0.619953
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...              ...                   ...             ...                      ...            ...               ...               ...               ...            ...
564        14.34         13.47           92.51      641.2          0.09906           0.07624         0.05724              0.04603         0.2075  ...           0.1632               0.10870          0.3062                  0.06072            0.0               0.0               0.0               1.0       0.651395
565        13.17         21.81           85.42      531.5          0.09714           0.10470         0.08259              0.05252         0.1746  ...           0.3728               0.16070          0.3693                  0.09618            0.0               1.0               0.0               0.0       0.607243
566        17.30         17.08          113.00      928.2          0.10080           0.10410         0.12660              0.08353         0.1813  ...           0.3378               0.18570          0.3138                  0.08113            0.0               0.0               0.0               1.0       0.669235
567        17.68         20.74          117.40      963.7          0.11150           0.16650         0.18550              0.10540         0.1971  ...           0.3583               0.15150          0.2463                  0.07738            1.0               0.0               0.0               1.0       0.638596
568        14.80         17.66           95.88      674.8          0.09179           0.08890         0.04069              0.02260         0.1893  ...           0.2060               0.08308          0.3600                  0.07285            0.0               0.0               1.0               0.0       0.607243
[569 rows x 35 columns]
>>> from atom.data_cleaning import Encoder
>>> from sklearn.datasets import load_breast_cancer
>>> from numpy.random import randint
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> X["cat_feature_1"] = [f"x{i}" for i in randint(0, 2, len(X))]
>>> X["cat_feature_2"] = [f"x{i}" for i in randint(0, 3, len(X))]
>>> X["cat_feature_3"] = [f"x{i}" for i in randint(0, 20, len(X))]
>>> print(X)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension  cat_feature_1  cat_feature_2  cat_feature_3
0          17.99         10.38          122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419  ...           0.16220            0.66560           0.7119                0.2654          0.4601                  0.11890             x1             x2             x5
1          20.57         17.77          132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812  ...           0.12380            0.18660           0.2416                0.1860          0.2750                  0.08902             x1             x2            x13
2          19.69         21.25          130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069  ...           0.14440            0.42450           0.4504                0.2430          0.3613                  0.08758             x0             x0            x15
3          11.42         20.38           77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597  ...           0.20980            0.86630           0.6869                0.2575          0.6638                  0.17300             x0             x2            x10
4          20.29         14.34          135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809  ...           0.13740            0.20500           0.4000                0.1625          0.2364                  0.07678             x1             x1            x17
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...               ...                ...              ...                   ...             ...                      ...            ...            ...            ...
564        21.56         22.39          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726  ...           0.14100            0.21130           0.4107                0.2216          0.2060                  0.07115             x1             x1            x12
565        20.13         28.25          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752  ...           0.11660            0.19220           0.3215                0.1628          0.2572                  0.06637             x0             x2            x14
566        16.60         28.08          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590  ...           0.11390            0.30940           0.3403                0.1418          0.2218                  0.07820             x0             x1             x3
567        20.60         29.33          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397  ...           0.16500            0.86810           0.9387                0.2650          0.4087                  0.12400             x1             x0             x2
568         7.76         24.54           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587  ...           0.08996            0.06444           0.0000                0.0000          0.2871                  0.07039             x1             x1            x11
[569 rows x 33 columns]
>>> encoder = Encoder(strategy="target", max_onehot=10, verbose=2)
>>> X = encoder.fit_transform(X, y)
Fitting Encoder...
Encoding categorical columns...
 --> Ordinal-encoding feature cat_feature_1. Contains 2 classes.
 --> OneHot-encoding feature cat_feature_2. Contains 3 classes.
 --> Target-encoding feature cat_feature_3. Contains 20 classes.
>>> # Note the one-hot encoded column with name [feature]_[class]
>>> print(X)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst concavity  worst concave points  worst symmetry  worst fractal dimension  cat_feature_1  cat_feature_2_x2  cat_feature_2_x0  cat_feature_2_x1  cat_feature_3
0          17.99         10.38          122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419  ...           0.7119                0.2654          0.4601                  0.11890            1.0               1.0               0.0               0.0       0.645086
1          20.57         17.77          132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812  ...           0.2416                0.1860          0.2750                  0.08902            1.0               1.0               0.0               0.0       0.604148
2          19.69         21.25          130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069  ...           0.4504                0.2430          0.3613                  0.08758            0.0               0.0               1.0               0.0       0.675079
3          11.42         20.38           77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597  ...           0.6869                0.2575          0.6638                  0.17300            0.0               1.0               0.0               0.0       0.706297
4          20.29         14.34          135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809  ...           0.4000                0.1625          0.2364                  0.07678            1.0               0.0               0.0               1.0       0.716566
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...              ...                   ...             ...                      ...            ...               ...               ...               ...            ...
564        21.56         22.39          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726  ...           0.4107                0.2216          0.2060                  0.07115            1.0               0.0               0.0               1.0       0.598024
565        20.13         28.25          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752  ...           0.3215                0.1628          0.2572                  0.06637            0.0               1.0               0.0               0.0       0.683185
566        16.60         28.08          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590  ...           0.3403                0.1418          0.2218                  0.07820            0.0               0.0               0.0               1.0       0.472908
567        20.60         29.33          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397  ...           0.9387                0.2650          0.4087                  0.12400            1.0               0.0               1.0               0.0       0.585452
568         7.76         24.54           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587  ...           0.0000                0.0000          0.2871                  0.07039            1.0               0.0               0.0               1.0       0.516759
[569 rows x 35 columns]
Methods
| fit | Fit to data. | 
| fit_transform | Fit to data, then transform it. | 
| get_feature_names_out | Get output feature names for transformation. | 
| get_params | Get parameters for this estimator. | 
| inverse_transform | Do nothing. | 
| set_output | Set output container. | 
| set_params | Set the parameters of this estimator. | 
| transform | Encode the data. | 
Note that leaving y=None can lead to errors if the strategy
encoder requires target values. For multioutput tasks, only
the first target column is used to fit the encoder.
| Parameters | X: dataframe-like 
Feature set with shape=(n_samples, n_features).
 y: sequence or dataframe-like
Target column(s) corresponding to  X.
 | 
| Returns | Self 
Estimator instance.
  | 
| Parameters | input_features: sequence or None, default=None 
Only used to validate feature names with the names seen in
 fit.
 | 
| Returns | np.ndarray 
Transformed feature names.
  | 
| Parameters | deep : bool, default=True 
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
  | 
| Returns | params : dict 
Parameter names mapped to their values.
  | 
Returns the input unchanged. Implemented for continuity of the API.
See sklearn's user guide on how to use the
set_output API. See here a description
of the choices.
| Parameters | **params : dict 
Estimator parameters.
  | 
| Returns | self : estimator instance 
Estimator instance.
  |