Regression¶
This example shows how to use ATOM to apply PCA on the data and run a regression pipeline.
Download the abalone dataset from https://archive.ics.uci.edu/ml/datasets/Abalone. The goal of this dataset is to predict the rings (age) of abalone shells from physical measurements.
Load the data¶
In [1]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
In [2]:
Copied!
# Load the data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load the data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[2]:
| Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
| 1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
| 2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
| 3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
| 4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
In [3]:
Copied!
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, warnings=False, random_state=42)
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, warnings=False, random_state=42)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ====================== >> Shape: (4177, 9) Scaled: False Categorical features: 1 (12.5%) Outlier values: 192 (0.6%) --------------------------------------- Train set size: 3342 Test set size: 835
In [4]:
Copied!
# Encode the categorical features
atom.encode()
# Encode the categorical features
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
Copied!
# Plot the dataset's correlation matrix
atom.plot_correlation()
# Plot the dataset's correlation matrix
atom.plot_correlation()
In [6]:
Copied!
# Apply PCA for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
# Apply PCA for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
Fitting FeatureSelector... Performing feature selection ... --> Applying Principal Component Analysis... >>> Scaling features... >>> Total explained variance: 0.976
In [7]:
Copied!
# Note that the fetaures are automatically renamed to Component 1, 2, etc...
atom.columns
# Note that the fetaures are automatically renamed to Component 1, 2, etc...
atom.columns
Out[7]:
['Component 1', 'Component 2', 'Component 3', 'Component 4', 'Component 5', 'Component 6', 'Rings']
In [8]:
Copied!
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()
In [9]:
Copied!
atom.plot_components(figsize=(8, 6))
atom.plot_components(figsize=(8, 6))
Run the pipeline¶
In [10]:
Copied!
atom.run(
models=["Tree", "Bag", "ET"],
metric="MSE",
n_calls=5,
n_initial_points=2,
bo_params={"base_estimator": "GBRT", "cv": 1},
n_bootstrap=5,
)
atom.run(
models=["Tree", "Bag", "ET"],
metric="MSE",
n_calls=5,
n_initial_points=2,
bo_params={"base_estimator": "GBRT", "cv": 1},
n_bootstrap=5,
)
Training ===================================== >>
Models: Tree, Bag, ET
Metric: neg_mean_squared_error
Running BO for Decision Tree...
Initial point 1 ---------------------------------
Parameters --> {'criterion': 'mae', 'splitter': 'random', 'max_depth': 7, 'min_samples_split': 8, 'min_samples_leaf': 19, 'max_features': None, 'ccp_alpha': 0.016}
Evaluation --> neg_mean_squared_error: -8.3677 Best neg_mean_squared_error: -8.3677
Time iteration: 0.048s Total time: 0.054s
Initial point 2 ---------------------------------
Parameters --> {'criterion': 'mae', 'splitter': 'best', 'max_depth': 6, 'min_samples_split': 3, 'min_samples_leaf': 12, 'max_features': 0.9, 'ccp_alpha': 0.0}
Evaluation --> neg_mean_squared_error: -8.2055 Best neg_mean_squared_error: -8.2055
Time iteration: 0.198s Total time: 0.406s
Iteration 3 -------------------------------------
Parameters --> {'criterion': 'mae', 'splitter': 'best', 'max_depth': 6, 'min_samples_split': 14, 'min_samples_leaf': 9, 'max_features': 0.9, 'ccp_alpha': 0.005}
Evaluation --> neg_mean_squared_error: -6.1540 Best neg_mean_squared_error: -6.1540
Time iteration: 0.180s Total time: 0.741s
Iteration 4 -------------------------------------
Parameters --> {'criterion': 'mae', 'splitter': 'random', 'max_depth': 7, 'min_samples_split': 15, 'min_samples_leaf': 4, 'max_features': 0.7, 'ccp_alpha': 0.018}
Evaluation --> neg_mean_squared_error: -7.9567 Best neg_mean_squared_error: -6.1540
Time iteration: 0.077s Total time: 0.988s
Iteration 5 -------------------------------------
Parameters --> {'criterion': 'mae', 'splitter': 'best', 'max_depth': 6, 'min_samples_split': 14, 'min_samples_leaf': 5, 'max_features': 0.9, 'ccp_alpha': 0.009}
Evaluation --> neg_mean_squared_error: -7.1330 Best neg_mean_squared_error: -6.1540
Time iteration: 0.181s Total time: 1.341s
Results for Decision Tree:
Bayesian Optimization ---------------------------
Best parameters --> {'criterion': 'mae', 'splitter': 'best', 'max_depth': 6, 'min_samples_split': 14, 'min_samples_leaf': 9, 'max_features': 0.9, 'ccp_alpha': 0.005}
Best evaluation --> neg_mean_squared_error: -6.154
Time elapsed: 1.494s
Fit ---------------------------------------------
Train evaluation --> neg_mean_squared_error: -6.3073
Test evaluation --> neg_mean_squared_error: -5.5317
Time elapsed: 0.327s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_squared_error: -5.678 ± 0.2464
Time elapsed: 1.142s
-------------------------------------------------
Total time: 2.967s
Running BO for Bagging Regressor...
Initial point 1 ---------------------------------
Parameters --> {'n_estimators': 112, 'max_samples': 0.9, 'max_features': 0.6, 'bootstrap': False, 'bootstrap_features': False}
Evaluation --> neg_mean_squared_error: -5.7680 Best neg_mean_squared_error: -5.7680
Time iteration: 0.949s Total time: 0.953s
Initial point 2 ---------------------------------
Parameters --> {'n_estimators': 131, 'max_samples': 0.5, 'max_features': 0.5, 'bootstrap': False, 'bootstrap_features': False}
Evaluation --> neg_mean_squared_error: -6.8254 Best neg_mean_squared_error: -5.7680
Time iteration: 0.660s Total time: 1.642s
Iteration 3 -------------------------------------
Parameters --> {'n_estimators': 50, 'max_samples': 0.9, 'max_features': 0.6, 'bootstrap': False, 'bootstrap_features': True}
Evaluation --> neg_mean_squared_error: -5.4895 Best neg_mean_squared_error: -5.4895
Time iteration: 0.479s Total time: 2.253s
Iteration 4 -------------------------------------
Parameters --> {'n_estimators': 74, 'max_samples': 0.5, 'max_features': 0.5, 'bootstrap': False, 'bootstrap_features': True}
Evaluation --> neg_mean_squared_error: -6.0363 Best neg_mean_squared_error: -5.4895
Time iteration: 0.349s Total time: 2.749s
Iteration 5 -------------------------------------
Parameters --> {'n_estimators': 36, 'max_samples': 0.9, 'max_features': 0.6, 'bootstrap': True, 'bootstrap_features': False}
Evaluation --> neg_mean_squared_error: -6.0037 Best neg_mean_squared_error: -5.4895
Time iteration: 0.242s Total time: 3.133s
Results for Bagging Regressor:
Bayesian Optimization ---------------------------
Best parameters --> {'n_estimators': 50, 'max_samples': 0.9, 'max_features': 0.6, 'bootstrap': False, 'bootstrap_features': True}
Best evaluation --> neg_mean_squared_error: -5.4895
Time elapsed: 3.277s
Fit ---------------------------------------------
Train evaluation --> neg_mean_squared_error: -0.0867
Test evaluation --> neg_mean_squared_error: -4.9533
Time elapsed: 0.578s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_squared_error: -5.2363 ± 0.1099
Time elapsed: 2.335s
-------------------------------------------------
Total time: 6.190s
Running BO for Extra-Trees...
Initial point 1 ---------------------------------
Parameters --> {'n_estimators': 112, 'criterion': 'mae', 'max_depth': 1, 'min_samples_split': 9, 'min_samples_leaf': 7, 'max_features': 0.6, 'bootstrap': True, 'ccp_alpha': 0.016, 'max_samples': 0.6}
Evaluation --> neg_mean_squared_error: -10.2607 Best neg_mean_squared_error: -10.2607
Time iteration: 0.374s Total time: 0.382s
Initial point 2 ---------------------------------
Parameters --> {'n_estimators': 369, 'criterion': 'mae', 'max_depth': None, 'min_samples_split': 3, 'min_samples_leaf': 12, 'max_features': 0.9, 'bootstrap': True, 'ccp_alpha': 0.035, 'max_samples': 0.8}
Evaluation --> neg_mean_squared_error: -9.4727 Best neg_mean_squared_error: -9.4727
Time iteration: 5.091s Total time: 5.509s
Iteration 3 -------------------------------------
Parameters --> {'n_estimators': 385, 'criterion': 'mse', 'max_depth': None, 'min_samples_split': 6, 'min_samples_leaf': 18, 'max_features': 0.9, 'bootstrap': False, 'ccp_alpha': 0.02}
Evaluation --> neg_mean_squared_error: -5.5174 Best neg_mean_squared_error: -5.5174
Time iteration: 0.537s Total time: 6.412s
Iteration 4 -------------------------------------
Parameters --> {'n_estimators': 425, 'criterion': 'mse', 'max_depth': 1, 'min_samples_split': 20, 'min_samples_leaf': 19, 'max_features': 0.7, 'bootstrap': False, 'ccp_alpha': 0.016}
Evaluation --> neg_mean_squared_error: -9.1980 Best neg_mean_squared_error: -5.5174
Time iteration: 0.315s Total time: 6.905s
Iteration 5 -------------------------------------
Parameters --> {'n_estimators': 445, 'criterion': 'mse', 'max_depth': None, 'min_samples_split': 7, 'min_samples_leaf': 20, 'max_features': 0.6, 'bootstrap': False, 'ccp_alpha': 0.004}
Evaluation --> neg_mean_squared_error: -6.9959 Best neg_mean_squared_error: -5.5174
Time iteration: 0.430s Total time: 7.522s
Results for Extra-Trees:
Bayesian Optimization ---------------------------
Best parameters --> {'n_estimators': 385, 'criterion': 'mse', 'max_depth': None, 'min_samples_split': 6, 'min_samples_leaf': 18, 'max_features': 0.9, 'bootstrap': False, 'ccp_alpha': 0.02}
Best evaluation --> neg_mean_squared_error: -5.5174
Time elapsed: 7.732s
Fit ---------------------------------------------
Train evaluation --> neg_mean_squared_error: -6.1021
Test evaluation --> neg_mean_squared_error: -5.0002
Time elapsed: 0.708s
Bootstrap ---------------------------------------
Evaluation --> neg_mean_squared_error: -4.9204 ± 0.0591
Time elapsed: 3.270s
-------------------------------------------------
Total time: 11.711s
Final results ========================= >>
Duration: 20.870s
------------------------------------------
Decision Tree --> neg_mean_squared_error: -5.678 ± 0.2464 ~
Bagging Regressor --> neg_mean_squared_error: -5.2363 ± 0.1099 ~
Extra-Trees --> neg_mean_squared_error: -4.9204 ± 0.0591 ~ !
Analyze the results¶
In [11]:
Copied!
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()
In [12]:
Copied!
atom.plot_errors()
atom.plot_errors()
In [13]:
Copied!
# Analyze the relation between the target response and the features
atom.n_jobs = 8 # The method can be slow...
atom.ET.plot_partial_dependence(features=(0, 1, (2, 3)), figsize=(12, 5))
# Analyze the relation between the target response and the features
atom.n_jobs = 8 # The method can be slow...
atom.ET.plot_partial_dependence(features=(0, 1, (2, 3)), figsize=(12, 5))