Balancer
class atom.data_cleaning.Balancer(strategy="ADASYN", n_jobs=1, verbose=0, random_state=None, **kwargs)[source]
Balance the number of samples per class in the target column.
When oversampling, the newly created samples have an increasing integer index for numerical indices, and an index of the form [estimator]_N for non-numerical indices, where N stands for the N-th sample in the data set. Use only for classification tasks.
This class can be accessed from atom through the balance method. Read more in the user guide.
Warning
- The clustercentroids estimator is unavailable because of incompatibilities of the APIs.
 - The Balancer class does not support multioutput tasks.
 
See Also
Perform encoding of categorical features.
Handle missing values in the data.
Prune outliers from the data.
Example
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> atom = ATOMClassifier(X, y, random_state=1)
>>> print(atom.train)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst perimeter  worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension  target
0          13.48         20.82           88.40      559.2          0.10160           0.12550         0.10630             0.054390         0.1720  ...           107.30       740.4            0.1610            0.42250          0.50300               0.22580          0.2807                  0.10710       0
1          18.31         20.58          120.80     1052.0          0.10680           0.12480         0.15690             0.094510         0.1860  ...           142.20      1493.0            0.1492            0.25360          0.37590               0.15100          0.3074                  0.07863       0
2          17.93         24.48          115.20      998.9          0.08855           0.07027         0.05699             0.047440         0.1538  ...           135.10      1320.0            0.1315            0.18060          0.20800               0.11360          0.2504                  0.07948       0
3          15.13         29.81           96.71      719.5          0.08320           0.04605         0.04686             0.027390         0.1852  ...           110.10       931.4            0.1148            0.09866          0.15470               0.06575          0.3233                  0.06165       0
4           8.95         15.76           58.74      245.2          0.09462           0.12430         0.09263             0.023080         0.1305  ...            63.34       270.0            0.1179            0.18790          0.15440               0.03846          0.1652                  0.07722       1
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...              ...         ...               ...                ...              ...                   ...             ...                      ...     ...
451        19.73         19.82          130.70     1206.0          0.10620           0.18490         0.24170             0.097400         0.1733  ...           159.80      1933.0            0.1710            0.59550          0.84890               0.25070          0.2749                  0.12970       0
452        12.72         13.78           81.78      492.1          0.09667           0.08393         0.01288             0.019240         0.1638  ...            88.54       553.7            0.1298            0.14720          0.05233               0.06343          0.2369                  0.06922       1
453        11.51         23.93           74.52      403.5          0.09261           0.10210         0.11120             0.041050         0.1388  ...            82.28       474.2            0.1298            0.25170          0.36300               0.09653          0.2112                  0.08732       1
454        10.75         14.97           68.26      355.3          0.07793           0.05139         0.02251             0.007875         0.1399  ...            77.79       441.2            0.1076            0.12230          0.09755               0.03413          0.2300                  0.06769       1
455        25.22         24.91          171.50     1878.0          0.10630           0.26650         0.33390             0.184500         0.1829  ...           211.70      2562.0            0.1573            0.60760          0.64760               0.28670          0.2355                  0.10510       0
[456 rows x 31 columns]
>>> atom.balance(strategy="smote", verbose=2)
Oversampling with SMOTE...
 --> Adding 116 samples to class 0.
>>> # Note that the number of rows has increased
>>> print(atom.train)
     mean radius  mean texture  mean perimeter    mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst perimeter   worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension  target
0      13.480000     20.820000       88.400000   559.200000         0.101600          0.125500        0.106300             0.054390       0.172000  ...       107.300000   740.400000          0.161000           0.422500         0.503000              0.225800        0.280700                 0.107100       0
1      18.310000     20.580000      120.800000  1052.000000         0.106800          0.124800        0.156900             0.094510       0.186000  ...       142.200000  1493.000000          0.149200           0.253600         0.375900              0.151000        0.307400                 0.078630       0
2      17.930000     24.480000      115.200000   998.900000         0.088550          0.070270        0.056990             0.047440       0.153800  ...       135.100000  1320.000000          0.131500           0.180600         0.208000              0.113600        0.250400                 0.079480       0
3      15.130000     29.810000       96.710000   719.500000         0.083200          0.046050        0.046860             0.027390       0.185200  ...       110.100000   931.400000          0.114800           0.098660         0.154700              0.065750        0.323300                 0.061650       0
4       8.950000     15.760000       58.740000   245.200000         0.094620          0.124300        0.092630             0.023080       0.130500  ...        63.340000   270.000000          0.117900           0.187900         0.154400              0.038460        0.165200                 0.077220       1
..           ...           ...             ...          ...              ...               ...             ...                  ...            ...  ...              ...          ...               ...                ...              ...                   ...             ...                      ...     ...
567    15.182945     22.486774       98.949465   711.386079         0.092513          0.102732        0.113923             0.069481       0.179224  ...       107.689157   826.276172          0.126730           0.199259         0.295172              0.142325        0.265352                 0.068318       0
568    19.990378     20.622944      130.491182  1253.735467         0.091583          0.117753        0.117236             0.082771       0.202428  ...       167.456689  1995.896044          0.132457           0.289652         0.332006              0.182989        0.299088                 0.084150       0
569    18.158121     18.928220      119.907435  1027.331092         0.113149          0.147089        0.171862             0.103942       0.209306  ...       135.286302  1319.270051          0.127029           0.233493         0.260138              0.133851        0.302406                 0.079535       0
570    23.733233     26.433751      158.185672  1724.145541         0.098008          0.193789        0.231158             0.139527       0.188817  ...       207.483796  2844.559632          0.150495           0.463361         0.599077              0.266433        0.290828                 0.091542       0
571    17.669575     16.375717      115.468589   968.552411         0.093636          0.109983        0.101005             0.075283       0.174505  ...       133.767576  1227.195245          0.118221           0.264624         0.249798              0.135098        0.268044                 0.076533       0
[572 rows x 31 columns]
>>> from atom.data_cleaning import Balancer
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> print(X)
     mean radius  mean texture  mean perimeter  mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst texture  worst perimeter  worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension
0          17.99         10.38          122.80     1001.0          0.11840           0.27760         0.30010              0.14710         0.2419  ...          17.33           184.60      2019.0           0.16220            0.66560           0.7119                0.2654          0.4601                  0.11890
1          20.57         17.77          132.90     1326.0          0.08474           0.07864         0.08690              0.07017         0.1812  ...          23.41           158.80      1956.0           0.12380            0.18660           0.2416                0.1860          0.2750                  0.08902
2          19.69         21.25          130.00     1203.0          0.10960           0.15990         0.19740              0.12790         0.2069  ...          25.53           152.50      1709.0           0.14440            0.42450           0.4504                0.2430          0.3613                  0.08758
3          11.42         20.38           77.58      386.1          0.14250           0.28390         0.24140              0.10520         0.2597  ...          26.50            98.87       567.7           0.20980            0.86630           0.6869                0.2575          0.6638                  0.17300
4          20.29         14.34          135.10     1297.0          0.10030           0.13280         0.19800              0.10430         0.1809  ...          16.67           152.20      1575.0           0.13740            0.20500           0.4000                0.1625          0.2364                  0.07678
..           ...           ...             ...        ...              ...               ...             ...                  ...            ...  ...            ...              ...         ...               ...                ...              ...                   ...             ...                      ...
564        21.56         22.39          142.00     1479.0          0.11100           0.11590         0.24390              0.13890         0.1726  ...          26.40           166.10      2027.0           0.14100            0.21130           0.4107                0.2216          0.2060                  0.07115
565        20.13         28.25          131.20     1261.0          0.09780           0.10340         0.14400              0.09791         0.1752  ...          38.25           155.00      1731.0           0.11660            0.19220           0.3215                0.1628          0.2572                  0.06637
566        16.60         28.08          108.30      858.1          0.08455           0.10230         0.09251              0.05302         0.1590  ...          34.12           126.70      1124.0           0.11390            0.30940           0.3403                0.1418          0.2218                  0.07820
567        20.60         29.33          140.10     1265.0          0.11780           0.27700         0.35140              0.15200         0.2397  ...          39.42           184.60      1821.0           0.16500            0.86810           0.9387                0.2650          0.4087                  0.12400
568         7.76         24.54           47.92      181.0          0.05263           0.04362         0.00000              0.00000         0.1587  ...          30.37            59.16       268.6           0.08996            0.06444           0.0000                0.0000          0.2871                  0.07039
[569 rows x 30 columns]
>>> balancer = Balancer(strategy="smote", verbose=2)
>>> X, y = balancer.fit_transform(X, y)
Oversampling with SMOTE...
 --> Adding 145 samples to class 0.
>>> # Note that the number of rows has increased
>>> print(X)
     mean radius  mean texture  mean perimeter    mean area  mean smoothness  mean compactness  mean concavity  mean concave points  mean symmetry  ...  worst texture  worst perimeter   worst area  worst smoothness  worst compactness  worst concavity  worst concave points  worst symmetry  worst fractal dimension
0      17.990000     10.380000      122.800000  1001.000000         0.118400          0.277600        0.300100             0.147100       0.241900  ...      17.330000       184.600000  2019.000000          0.162200           0.665600         0.711900              0.265400        0.460100                 0.118900
1      20.570000     17.770000      132.900000  1326.000000         0.084740          0.078640        0.086900             0.070170       0.181200  ...      23.410000       158.800000  1956.000000          0.123800           0.186600         0.241600              0.186000        0.275000                 0.089020
2      19.690000     21.250000      130.000000  1203.000000         0.109600          0.159900        0.197400             0.127900       0.206900  ...      25.530000       152.500000  1709.000000          0.144400           0.424500         0.450400              0.243000        0.361300                 0.087580
3      11.420000     20.380000       77.580000   386.100000         0.142500          0.283900        0.241400             0.105200       0.259700  ...      26.500000        98.870000   567.700000          0.209800           0.866300         0.686900              0.257500        0.663800                 0.173000
4      20.290000     14.340000      135.100000  1297.000000         0.100300          0.132800        0.198000             0.104300       0.180900  ...      16.670000       152.200000  1575.000000          0.137400           0.205000         0.400000              0.162500        0.236400                 0.076780
..           ...           ...             ...          ...              ...               ...             ...                  ...            ...  ...            ...              ...          ...               ...                ...              ...                   ...             ...                      ...
709    18.182301     24.944043      121.442258  1048.093046         0.105821          0.177739        0.225725             0.116642       0.193406  ...      33.361608       158.493074  1666.140527          0.144662           0.409220         0.565327              0.179898        0.294844                 0.095960
710    11.851902     18.713059       78.007613   441.464565         0.110844          0.151534        0.121607             0.051901       0.229830  ...      28.119008       119.347928   888.852911          0.163612           0.575805         0.692879              0.154555        0.475255                 0.139890
711    15.422292     24.668732      103.240756   745.189688         0.110060          0.171101        0.179765             0.089365       0.196931  ...      35.127849       144.459281  1269.764324          0.169552           0.580809         0.659512              0.202655        0.383515                 0.105496
712    15.550268     20.580991      103.085807   752.576384         0.115896          0.160149        0.194129             0.091891       0.196371  ...      29.636524       139.647722  1340.919647          0.171716           0.412331         0.592770              0.210292        0.320508                 0.106742
713    12.526830     23.731139       84.203000   480.830934         0.116830          0.228817        0.216860             0.082392       0.204680  ...      38.778068        99.031745   710.424650          0.183927           0.963409         1.018880              0.216792        0.419279                 0.192472
[714 rows x 30 columns]
Methods
| fit | Fit to data. | 
| fit_transform | Fit to data, then transform it. | 
| get_feature_names_out | Get output feature names for transformation. | 
| get_params | Get parameters for this estimator. | 
| inverse_transform | Do nothing. | 
| set_output | Set output container. | 
| set_params | Set the parameters of this estimator. | 
| transform | Balance the data. | 
method fit(X, y)[source]
Fit to data.
| Parameters | X: dataframe-like 
Feature set with shape=(n_samples, n_features).
 y: sequence
Target column corresponding to  X.
 | 
| Returns | Self 
Estimator instance.
  | 
method fit_transform(X=None, y=None, **fit_params)[source]
Fit to data, then transform it.
method get_feature_names_out(input_features=None)[source]
Get output feature names for transformation.
method get_params(deep=True)[source]
Get parameters for this estimator.
| Parameters | deep : bool, default=True 
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
  | 
| Returns | params : dict 
Parameter names mapped to their values.
  | 
method inverse_transform(X=None, y=None, **fit_params)[source]
Do nothing.
Returns the input unchanged. Implemented for continuity of the API.
method set_output(transform=None)[source]
Set output container.
See sklearn's user guide on how to use the
set_output API. See here a description
of the choices.
method set_params(**params)[source]
Set the parameters of this estimator.
| Parameters | **params : dict 
Estimator parameters.
  | 
| Returns | self : estimator instance 
Estimator instance.
  | 
method transform(X, y)[source]
Balance the data.
| Parameters | X: dataframe-like 
Feature set with shape=(n_samples, n_features).
 y: sequence
Target column corresponding to  X.
 | 
| Returns | dataframe 
Balanced dataframe.
 series
Transformed target column.
  |