Example: Regression¶

This example shows how to use ATOM to apply pca on the data and run a regression pipeline.

Download the abalone dataset from https://archive.ics.uci.edu/ml/datasets/Abalone. The goal of this dataset is to predict the rings (age) of abalone shells from physical measurements.

Load the data¶

In [1]:

                
                    Copied!
                    
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor

UserWarning: The pandas version installed (1.5.3) does not match the supported pandas version in Modin (1.5.2). This may cause undesired side effects!

In [2]:

                
                    Copied!
                    
# Load the data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()
# Load the data
X = pd.read_csv("./datasets/abalone.csv")

# Let's have a look
X.head()

Out[2]:

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.150	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.070	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.210	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.155	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.055	7

In [3]:

                
                    Copied!
                    
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, random_state=42)
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, random_state=42)

<< ================== ATOM ================== >>
Algorithm task: regression.

Dataset stats ==================== >>
Shape: (4177, 9)
Train set size: 3342
Test set size: 835
-------------------------------------
Memory: 509.72 kB
Scaled: False
Categorical features: 1 (12.5%)
Outlier values: 195 (0.6%)

In [4]:

                
                    Copied!
                    
# Encode the categorical features
atom.encode()
# Encode the categorical features
atom.encode()

Fitting Encoder...
Encoding categorical columns...
 --> OneHot-encoding feature Sex. Contains 3 classes.

In [5]:

                
                    Copied!
                    
# Plot the dataset's correlation matrix
atom.plot_correlation()
# Plot the dataset's correlation matrix
atom.plot_correlation()

In [6]:

                
                    Copied!
                    
# Apply pca for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
# Apply pca for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)

Fitting FeatureSelector...
Performing feature selection ...
 --> Applying Principal Component Analysis...
   --> Scaling features...
   --> Keeping 6 components.
   --> Explained variance ratio: 0.97

In [7]:

                
                    Copied!
                    
# Note that the fetaures are automatically renamed to pca0, pca1, etc...
atom.columns
# Note that the fetaures are automatically renamed to pca0, pca1, etc...
atom.columns

Out[7]:

Index(['pca0', 'pca1', 'pca2', 'pca3', 'pca4', 'pca5', 'Rings'], dtype='object')

In [8]:

                
                    Copied!
                    
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()

In [9]:

                
                    Copied!
                    
atom.plot_components()
atom.plot_components()

Run the pipeline¶

In [10]:

                
                    Copied!
                    
                        
                        
                    
                    

            
atom.run(
    models=["Tree", "Bag", "ET"],
    metric="mse",
    n_trials=5,
    n_bootstrap=5,
)
atom.run(
    models=["Tree", "Bag", "ET"],
    metric="mse",
    n_trials=5,
    n_bootstrap=5,
)

Training ========================= >>
Models: Tree, Bag, ET
Metric: neg_mean_squared_error


Running hyperparameter tuning for DecisionTree...
| trial |   criterion | splitter | max_depth | min_samples_split | min_samples_leaf | max_features | ccp_alpha | neg_mean_squared_error | best_neg_mean_squared_error | time_trial | time_ht |    state |
| ----- | ----------- | -------- | --------- | ----------------- | ---------------- | ------------ | --------- | ---------------------- | --------------------------- | ---------- | ------- | -------- |
| 0     | absolute_.. |     best |         5 |                 8 |               10 |         None |     0.035 |                -6.5456 |                     -6.5456 |     0.303s |  0.303s | COMPLETE |
| 1     | squared_e.. |     best |        10 |                 5 |                1 |          0.5 |      0.03 |                -7.1959 |                     -6.5456 |     0.064s |  0.367s | COMPLETE |
| 2     | absolute_.. |   random |        14 |                15 |               16 |         sqrt |     0.025 |                -8.5859 |                     -6.5456 |     0.073s |  0.440s | COMPLETE |
| 3     | friedman_.. |   random |         4 |                10 |               17 |          0.9 |      0.01 |                -7.4933 |                     -6.5456 |     0.065s |  0.505s | COMPLETE |
| 4     |     poisson |     best |        12 |                15 |                8 |          0.6 |      0.02 |                -5.8126 |                     -5.8126 |     0.074s |  0.579s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 4
Best parameters:
 --> criterion: poisson
 --> splitter: best
 --> max_depth: 12
 --> min_samples_split: 15
 --> min_samples_leaf: 8
 --> max_features: 0.6
 --> ccp_alpha: 0.02
Best evaluation --> neg_mean_squared_error: -5.8126
Time elapsed: 0.579s
Fit ---------------------------------------------
Train evaluation --> neg_mean_squared_error: -6.2977
Test evaluation --> neg_mean_squared_error: -7.1923
Time elapsed: 0.025s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_squared_error: -7.6026 ± 0.3783
Time elapsed: 0.086s
-------------------------------------------------
Total time: 0.690s


Running hyperparameter tuning for Bagging...
| trial | n_estimators | max_samples | max_features | bootstrap | bootstrap_features | neg_mean_squared_error | best_neg_mean_squared_error | time_trial | time_ht |    state |
| ----- | ------------ | ----------- | ------------ | --------- | ------------------ | ---------------------- | --------------------------- | ---------- | ------- | -------- |
| 0     |          190 |         1.0 |          0.9 |      True |               True |                -4.5751 |                     -4.5751 |     2.829s |  2.829s | COMPLETE |
| 1     |          440 |         0.8 |          0.9 |     False |               True |                -6.7839 |                     -4.5751 |     8.532s | 11.360s | COMPLETE |
| 2     |          100 |         0.6 |          0.6 |      True |              False |                -5.0065 |                     -4.5751 |     0.854s | 12.214s | COMPLETE |
| 3     |           70 |         0.6 |          0.7 |     False |              False |                -5.4027 |                     -4.5751 |     0.952s | 13.166s | COMPLETE |
| 4     |          300 |         0.5 |          0.8 |      True |              False |                -5.0964 |                     -4.5751 |     2.610s | 15.776s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 0
Best parameters:
 --> n_estimators: 190
 --> max_samples: 1.0
 --> max_features: 0.9
 --> bootstrap: True
 --> bootstrap_features: True
Best evaluation --> neg_mean_squared_error: -4.5751
Time elapsed: 15.776s
Fit ---------------------------------------------
Train evaluation --> neg_mean_squared_error: -0.7581
Test evaluation --> neg_mean_squared_error: -5.7896
Time elapsed: 3.644s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_squared_error: -5.9893 ± 0.1646
Time elapsed: 14.288s
-------------------------------------------------
Total time: 33.709s


Running hyperparameter tuning for ExtraTrees...
| trial | n_estimators |     criterion | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | max_samples | ccp_alpha | neg_mean_squared_error | best_neg_mean_squared_error | time_trial | time_ht |    state |
| ----- | ------------ | ------------- | --------- | ----------------- | ---------------- | ------------ | --------- | ----------- | --------- | ---------------------- | --------------------------- | ---------- | ------- | -------- |
| 0     |          190 | squared_error |         8 |                13 |                3 |          0.5 |      True |         0.6 |     0.025 |                -5.1462 |                     -5.1462 |     0.326s |  0.326s | COMPLETE |
| 1     |          230 | absolute_er.. |         8 |                 8 |                8 |         sqrt |      True |         0.6 |       0.0 |                -9.3444 |                     -5.1462 |     1.419s |  1.746s | COMPLETE |
| 2     |          180 | absolute_er.. |         7 |                 2 |                3 |          0.6 |      True |         0.6 |      0.03 |                -5.7371 |                     -5.1462 |     1.767s |  3.512s | COMPLETE |
| 3     |          100 | squared_error |        14 |                15 |                8 |         None |      True |         0.9 |     0.005 |                -5.1938 |                     -5.1462 |     0.279s |  3.792s | COMPLETE |
| 4     |          340 | squared_error |         6 |                15 |                8 |         None |      True |         0.8 |      0.01 |                -4.8716 |                     -4.8716 |     0.557s |  4.348s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 4
Best parameters:
 --> n_estimators: 340
 --> criterion: squared_error
 --> max_depth: 6
 --> min_samples_split: 15
 --> min_samples_leaf: 8
 --> max_features: None
 --> bootstrap: True
 --> max_samples: 0.8
 --> ccp_alpha: 0.01
Best evaluation --> neg_mean_squared_error: -4.8716
Time elapsed: 4.348s
Fit ---------------------------------------------
Train evaluation --> neg_mean_squared_error: -5.4808
Test evaluation --> neg_mean_squared_error: -6.3445
Time elapsed: 0.563s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_squared_error: -6.3694 ± 0.0737
Time elapsed: 2.491s
-------------------------------------------------
Total time: 7.402s


Final results ==================== >>
Total time: 42.254s
-------------------------------------
DecisionTree --> neg_mean_squared_error: -7.6026 ± 0.3783 ~
Bagging      --> neg_mean_squared_error: -5.9893 ± 0.1646 ~ !
ExtraTrees   --> neg_mean_squared_error: -6.3694 ± 0.0737 ~

Analyze the results¶

In [11]:

                
                    Copied!
                    
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()

In [12]:

                
                    Copied!
                    
atom.plot_errors()
atom.plot_errors()

In [13]:

                
                    Copied!
                    
# Analyze the relation between the target response and the features
atom.plot_partial_dependence(columns=(0, 1, 2, 3))
# Analyze the relation between the target response and the features
atom.plot_partial_dependence(columns=(0, 1, 2, 3))