Balancer
class atom.data_cleaning.Balancer(strategy="ADASYN", n_jobs=1, verbose=0, logger=None, random_state=None, **kwargs)[source]
Balance the number of samples per class in the target column.
When oversampling, the newly created samples have an increasing integer index for numerical indices, and an index of the form [estimator]_N for non-numerical indices, where N stands for the N-th sample in the data set. Use only for classification tasks.
This class can be accessed from atom through the balance method. Read more in the user guide.
Warning
The clustercentroids estimator is unavailable because of incompatibilities of the APIs.
See Also
Perform encoding of categorical features.
Handle missing values in the data.
Prune outliers from the data.
Example
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> atom = ATOMClassifier(X, y)
>>> print(atom.train)
mean radius mean texture ... worst fractal dimension target
0 18.030 16.85 ... 0.08225 0
1 10.950 21.35 ... 0.09606 0
2 14.250 22.15 ... 0.11320 0
3 17.570 15.05 ... 0.07919 0
4 10.600 18.95 ... 0.07587 1
.. ... ... ... ... ...
451 8.888 14.64 ... 0.10840 1
452 21.090 26.57 ... 0.12840 0
453 16.160 21.54 ... 0.07619 0
454 11.260 19.83 ... 0.07613 1
455 12.000 15.65 ... 0.07924 1
[456 rows x 31 columns]
>>> atom.balance(strategy="smote", verbose=2)
Oversampling with SMOTE...
--> Adding 116 samples to class 0.
>>> # Note that the number of rows has increased
>>> print(atom.train)
mean radius mean texture ... worst fractal dimension target
0 11.420000 20.380000 ... 0.173000 0
1 9.876000 17.270000 ... 0.073800 1
2 13.470000 14.060000 ... 0.093260 1
3 16.300000 15.700000 ... 0.072300 1
4 12.250000 17.940000 ... 0.081320 1
.. ... ... ... ... ...
567 12.975558 20.580996 ... 0.118509 0
568 11.786135 17.120749 ... 0.091266 0
569 16.194544 19.737215 ... 0.106434 0
570 16.780524 21.261883 ... 0.086889 0
571 20.705316 22.635645 ... 0.085362 0
[572 rows x 31 columns]
>>> from atom.data_cleaning import Balancer
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> print(X)
mean radius mean texture ... worst symmetry worst fractal dimension
0 17.99 10.38 ... 0.4601 0.11890
1 20.57 17.77 ... 0.2750 0.08902
2 19.69 21.25 ... 0.3613 0.08758
3 11.42 20.38 ... 0.6638 0.17300
4 20.29 14.34 ... 0.2364 0.07678
.. ... ... ... ... ...
564 21.56 22.39 ... 0.2060 0.07115
565 20.13 28.25 ... 0.2572 0.06637
566 16.60 28.08 ... 0.2218 0.07820
567 20.60 29.33 ... 0.4087 0.12400
568 7.76 24.54 ... 0.2871 0.07039
[569 rows x 30 columns]
>>> balancer = Balancer(strategy="smote", verbose=2)
>>> X, y = balancer.transform(X, y)
Oversampling with SMOTE...
--> Adding 145 samples to class 0.
>>> # Note that the number of rows has increased
>>> print(X)
mean radius mean texture ... worst symmetry worst fractal dimension
0 17.990000 10.380000 ... 0.460100 0.118900
1 20.570000 17.770000 ... 0.275000 0.089020
2 19.690000 21.250000 ... 0.361300 0.087580
3 11.420000 20.380000 ... 0.663800 0.173000
4 20.290000 14.340000 ... 0.236400 0.076780
.. ... ... ... ... ...
709 14.824550 17.497674 ... 0.345200 0.100678
710 20.170649 23.997572 ... 0.538881 0.099281
711 21.006050 22.305044 ... 0.277181 0.076740
712 20.791828 25.103989 ... 0.388202 0.122836
713 17.081185 23.560768 ... 0.342508 0.082558
[714 rows x 30 columns]
Methods
fit | Does nothing. |
fit_transform | Fit to data, then transform it. |
get_params | Get parameters for this estimator. |
inverse_transform | Does nothing. |
log | Print message and save to log file. |
save | Save the instance to a pickle file. |
set_params | Set the parameters of this estimator. |
transform | Balance the data. |
method fit(X=None, y=None, **fit_params)[source]
Does nothing.
Implemented for continuity of the API.
method fit_transform(X=None, y=None, **fit_params)[source]
Fit to data, then transform it.
method get_params(deep=True)[source]
Get parameters for this estimator.
Parameters | deep : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
|
Returns | params : dict
Parameter names mapped to their values.
|
method inverse_transform(X=None, y=None)[source]
Does nothing.
method log(msg, level=0, severity="info")[source]
Print message and save to log file.
method save(filename="auto", save_data=True)[source]
Save the instance to a pickle file.
Parameters | filename: str, default="auto"
Name of the file. Use "auto" for automatic naming.
save_data: bool, default=True
Whether to save the dataset with the instance. This
parameter is ignored if the method is not called from
atom. If False, remember to add the data to ATOMLoader
when loading the file.
|
method set_params(**params)[source]
Set the parameters of this estimator.
Parameters | **params : dict
Estimator parameters.
|
Returns | self : estimator instance
Estimator instance.
|
method transform(X, y=-1)[source]
Balance the data.