Regression¶
This example shows how to use ATOM to apply PCA on the data and run a regression pipeline.
Download the abalone dataset from https://archive.ics.uci.edu/ml/datasets/Abalone. The goal of this dataset is to predict the rings (age) of abalone shells from physical measurements.
Load the data¶
In [14]:
Copied!
# Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
In [15]:
Copied!
# Load the data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load the data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
Out[15]:
Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
---|---|---|---|---|---|---|---|---|---|
0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 |
1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 |
2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 |
3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |
4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 |
In [16]:
Copied!
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, warnings=False, random_state=42)
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, warnings=False, random_state=42)
<< ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Scaled: False Categorical features: 1 (12.5%) Outlier values: 192 (0.6%) ------------------------------------- Train set size: 3342 Test set size: 835
In [17]:
Copied!
# Encode the categorical features
atom.encode()
# Encode the categorical features
atom.encode()
Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [18]:
Copied!
# Plot the dataset's correlation matrix
atom.plot_correlation()
# Plot the dataset's correlation matrix
atom.plot_correlation()
In [19]:
Copied!
# Apply PCA for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
# Apply PCA for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
Fitting FeatureSelector... Performing feature selection ... --> Applying Principal Component Analysis... >>> Scaling features... >>> Explained variance ratio: 0.976
In [20]:
Copied!
# Note that the fetaures are automatically renamed to Component 1, 2, etc...
atom.columns
# Note that the fetaures are automatically renamed to Component 1, 2, etc...
atom.columns
Out[20]:
['component 1', 'component 2', 'component 3', 'component 4', 'component 5', 'component 6', 'Rings']
In [21]:
Copied!
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()
In [22]:
Copied!
atom.plot_components(figsize=(8, 6))
atom.plot_components(figsize=(8, 6))
Run the pipeline¶
In [23]:
Copied!
atom.run(
models=["Tree", "Bag", "ET"],
metric="MSE",
n_calls=5,
n_initial_points=2,
bo_params={"base_estimator": "GBRT"},
n_bootstrap=5,
)
atom.run(
models=["Tree", "Bag", "ET"],
metric="MSE",
n_calls=5,
n_initial_points=2,
bo_params={"base_estimator": "GBRT"},
n_bootstrap=5,
)
Training ========================= >> Models: Tree, Bag, ET Metric: neg_mean_squared_error Running BO for Decision Tree... | call | criterion | splitter | max_depth | min_samples_split | min_samples_leaf | max_features | ccp_alpha | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ----------- | -------- | --------- | ----------------- | ---------------- | ------------ | --------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | absolute_.. | random | 7 | 8 | 19 | auto | 0.0161 | -7.725 | -7.725 | 0.109s | 0.125s | | Initial point 2 | absolute_.. | best | 6 | 3 | 12 | None | 0.0 | -8.3733 | -7.725 | 0.219s | 0.469s | | Iteration 3 | poisson | random | 7 | 7 | 17 | auto | 0.0177 | -8.2151 | -7.725 | 0.047s | 0.703s | | Iteration 4 | poisson | random | 6 | 9 | 19 | None | 0.0018 | -9.7263 | -7.725 | 0.047s | 0.922s | | Iteration 5 | friedman_.. | random | 7 | 10 | 19 | auto | 0.0093 | -7.0726 | -7.0726 | 0.047s | 1.156s | Bayesian Optimization --------------------------- Best call --> Iteration 5 Best parameters --> {'criterion': 'friedman_mse', 'splitter': 'random', 'max_depth': 7, 'min_samples_split': 10, 'min_samples_leaf': 19, 'max_features': 'auto', 'ccp_alpha': 0.0093} Best evaluation --> neg_mean_squared_error: -7.0726 Time elapsed: 1.344s Fit --------------------------------------------- Train evaluation --> neg_mean_squared_error: -6.3272 Test evaluation --> neg_mean_squared_error: -5.4156 Time elapsed: 0.016s Bootstrap --------------------------------------- Evaluation --> neg_mean_squared_error: -5.4807 ± 0.1807 Time elapsed: 0.031s ------------------------------------------------- Total time: 1.391s Running BO for Bagging... | call | n_estimators | max_samples | max_features | bootstrap | bootstrap_features | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ------------ | ----------- | ------------ | --------- | ------------------ | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | 112 | 0.9 | 0.6 | False | False | -6.007 | -6.007 | 1.172s | 1.172s | | Initial point 2 | 131 | 0.5 | 0.5 | False | False | -7.047 | -6.007 | 0.828s | 2.125s | | Iteration 3 | 50 | 0.9 | 0.6 | False | True | -5.3357 | -5.3357 | 0.484s | 2.828s | | Iteration 4 | 74 | 0.5 | 0.5 | False | True | -6.1811 | -5.3357 | 0.391s | 3.766s | | Iteration 5 | 18 | 0.8 | 0.6 | True | False | -6.3648 | -5.3357 | 0.159s | 4.160s | Bayesian Optimization --------------------------- Best call --> Iteration 3 Best parameters --> {'n_estimators': 50, 'max_samples': 0.9, 'max_features': 0.6, 'bootstrap': False, 'bootstrap_features': True} Best evaluation --> neg_mean_squared_error: -5.3357 Time elapsed: 4.332s Fit --------------------------------------------- Train evaluation --> neg_mean_squared_error: -0.0867 Test evaluation --> neg_mean_squared_error: -4.9533 Time elapsed: 0.563s Bootstrap --------------------------------------- Evaluation --> neg_mean_squared_error: -5.1436 ± 0.0702 Time elapsed: 2.578s ------------------------------------------------- Total time: 7.472s Running BO for Extra-Trees... | call | n_estimators | criterion | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | ccp_alpha | max_samples | neg_mean_squared_error | best_neg_mean_squared_error | time | total_time | | ---------------- | ------------ | ------------- | --------- | ----------------- | ---------------- | ------------ | --------- | --------- | ----------- | ---------------------- | --------------------------- | ------- | ---------- | | Initial point 1 | 112 | absolute_er.. | 1 | 9 | 7 | 0.6 | True | 0.0161 | 0.6 | -10.2144 | -10.2144 | 0.406s | 0.422s | | Initial point 2 | 369 | absolute_er.. | None | 3 | 12 | None | True | 0.0347 | 0.7 | -9.2503 | -9.2503 | 5.063s | 5.563s | | Iteration 3 | 383 | absolute_er.. | None | 6 | 20 | None | True | 0.0271 | 0.8 | -6.4866 | -6.4866 | 5.989s | 11.724s | | Iteration 4 | 412 | absolute_er.. | 1 | 13 | 17 | 0.6 | True | 0.0282 | 0.8 | -10.0739 | -6.4866 | 1.866s | 13.862s | | Iteration 5 | 125 | squared_error | None | 14 | 20 | None | True | 0.0156 | 0.9 | -6.2858 | -6.2858 | 0.220s | 14.291s | Bayesian Optimization --------------------------- Best call --> Iteration 5 Best parameters --> {'n_estimators': 125, 'criterion': 'squared_error', 'max_depth': None, 'min_samples_split': 14, 'min_samples_leaf': 20, 'max_features': None, 'bootstrap': True, 'ccp_alpha': 0.0156, 'max_samples': 0.9} Best evaluation --> neg_mean_squared_error: -6.2858 Time elapsed: 14.482s Fit --------------------------------------------- Train evaluation --> neg_mean_squared_error: -6.0139 Test evaluation --> neg_mean_squared_error: -4.9885 Time elapsed: 0.206s Bootstrap --------------------------------------- Evaluation --> neg_mean_squared_error: -4.984 ± 0.0229 Time elapsed: 0.998s ------------------------------------------------- Total time: 15.686s Final results ==================== >> Duration: 24.549s ------------------------------------- Decision Tree --> neg_mean_squared_error: -5.4807 ± 0.1807 ~ Bagging --> neg_mean_squared_error: -5.1436 ± 0.0702 ~ Extra-Trees --> neg_mean_squared_error: -4.984 ± 0.0229 ~ !
Analyze the results¶
In [24]:
Copied!
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()
In [25]:
Copied!
atom.plot_errors()
atom.plot_errors()
In [26]:
Copied!
# Analyze the relation between the target response and the features
atom.n_jobs = 8 # The method can be slow...
atom.ET.plot_partial_dependence(columns=(0, (2, 3)), figsize=(12, 5))
# Analyze the relation between the target response and the features
atom.n_jobs = 8 # The method can be slow...
atom.ET.plot_partial_dependence(columns=(0, (2, 3)), figsize=(12, 5))