Example: Regression¶
This example shows how to use ATOM to apply pca on the data and run a regression pipeline.
Download the abalone dataset from https://archive.ics.uci.edu/ml/datasets/Abalone. The goal of this dataset is to predict the rings (age) of abalone shells from physical measurements.
Load the data¶
In [1]:
                    Copied!
                    
                    
                # Import packages
import pandas as pd
from atom import ATOMRegressor
# Import packages
import pandas as pd
from atom import ATOMRegressor
        
        UserWarning: The pandas version installed (1.5.3) does not match the supported pandas version in Modin (1.5.2). This may cause undesired side effects!
In [2]:
                    Copied!
                    
                    
                # Load the data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
# Load the data
X = pd.read_csv("./datasets/abalone.csv")
# Let's have a look
X.head()
        
        Out[2]:
| Sex | Length | Diameter | Height | Whole weight | Shucked weight | Viscera weight | Shell weight | Rings | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 0.455 | 0.365 | 0.095 | 0.5140 | 0.2245 | 0.1010 | 0.150 | 15 | 
| 1 | M | 0.350 | 0.265 | 0.090 | 0.2255 | 0.0995 | 0.0485 | 0.070 | 7 | 
| 2 | F | 0.530 | 0.420 | 0.135 | 0.6770 | 0.2565 | 0.1415 | 0.210 | 9 | 
| 3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 | 
| 4 | I | 0.330 | 0.255 | 0.080 | 0.2050 | 0.0895 | 0.0395 | 0.055 | 7 | 
In [3]:
                    Copied!
                    
                    
                # Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, random_state=42)
# Initialize atom for regression tasks
atom = ATOMRegressor(X, "Rings", verbose=2, random_state=42)
        
        << ================== ATOM ================== >> Algorithm task: regression. Dataset stats ==================== >> Shape: (4177, 9) Train set size: 3342 Test set size: 835 ------------------------------------- Memory: 509.72 kB Scaled: False Categorical features: 1 (12.5%) Outlier values: 195 (0.6%)
In [4]:
                    Copied!
                    
                    
                # Encode the categorical features
atom.encode()
# Encode the categorical features
atom.encode()
        
        Fitting Encoder... Encoding categorical columns... --> OneHot-encoding feature Sex. Contains 3 classes.
In [5]:
                    Copied!
                    
                    
                # Plot the dataset's correlation matrix
atom.plot_correlation()
# Plot the dataset's correlation matrix
atom.plot_correlation()
        
        In [6]:
                    Copied!
                    
                    
                # Apply pca for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
# Apply pca for dimensionality reduction
atom.feature_selection(strategy="pca", n_features=6)
        
        Fitting FeatureSelector... Performing feature selection ... --> Applying Principal Component Analysis... --> Scaling features... --> Keeping 6 components. --> Explained variance ratio: 0.97
In [7]:
                    Copied!
                    
                    
                # Note that the fetaures are automatically renamed to pca0, pca1, etc...
atom.columns
# Note that the fetaures are automatically renamed to pca0, pca1, etc...
atom.columns
        
        Out[7]:
Index(['pca0', 'pca1', 'pca2', 'pca3', 'pca4', 'pca5', 'Rings'], dtype='object')
In [8]:
                    Copied!
                    
                    
                # Use the plotting methods to see the retained variance ratio
atom.plot_pca()
# Use the plotting methods to see the retained variance ratio
atom.plot_pca()
        
        In [9]:
                    Copied!
                    
                    
                atom.plot_components()
atom.plot_components()
        
        Run the pipeline¶
In [10]:
                    Copied!
                    
                    
                atom.run(
    models=["Tree", "Bag", "ET"],
    metric="mse",
    n_trials=5,
    n_bootstrap=5,
)
atom.run(
    models=["Tree", "Bag", "ET"],
    metric="mse",
    n_trials=5,
    n_bootstrap=5,
)
        
        Training ========================= >> Models: Tree, Bag, ET Metric: neg_mean_squared_error Running hyperparameter tuning for DecisionTree... | trial | criterion | splitter | max_depth | min_samples_split | min_samples_leaf | max_features | ccp_alpha | neg_mean_squared_error | best_neg_mean_squared_error | time_trial | time_ht | state | | ----- | ----------- | -------- | --------- | ----------------- | ---------------- | ------------ | --------- | ---------------------- | --------------------------- | ---------- | ------- | -------- | | 0 | absolute_.. | best | 5 | 8 | 10 | None | 0.035 | -6.5456 | -6.5456 | 0.303s | 0.303s | COMPLETE | | 1 | squared_e.. | best | 10 | 5 | 1 | 0.5 | 0.03 | -7.1959 | -6.5456 | 0.064s | 0.367s | COMPLETE | | 2 | absolute_.. | random | 14 | 15 | 16 | sqrt | 0.025 | -8.5859 | -6.5456 | 0.073s | 0.440s | COMPLETE | | 3 | friedman_.. | random | 4 | 10 | 17 | 0.9 | 0.01 | -7.4933 | -6.5456 | 0.065s | 0.505s | COMPLETE | | 4 | poisson | best | 12 | 15 | 8 | 0.6 | 0.02 | -5.8126 | -5.8126 | 0.074s | 0.579s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 4 Best parameters: --> criterion: poisson --> splitter: best --> max_depth: 12 --> min_samples_split: 15 --> min_samples_leaf: 8 --> max_features: 0.6 --> ccp_alpha: 0.02 Best evaluation --> neg_mean_squared_error: -5.8126 Time elapsed: 0.579s Fit --------------------------------------------- Train evaluation --> neg_mean_squared_error: -6.2977 Test evaluation --> neg_mean_squared_error: -7.1923 Time elapsed: 0.025s Bootstrap --------------------------------------- Evaluation --> neg_mean_squared_error: -7.6026 ± 0.3783 Time elapsed: 0.086s ------------------------------------------------- Total time: 0.690s Running hyperparameter tuning for Bagging... | trial | n_estimators | max_samples | max_features | bootstrap | bootstrap_features | neg_mean_squared_error | best_neg_mean_squared_error | time_trial | time_ht | state | | ----- | ------------ | ----------- | ------------ | --------- | ------------------ | ---------------------- | --------------------------- | ---------- | ------- | -------- | | 0 | 190 | 1.0 | 0.9 | True | True | -4.5751 | -4.5751 | 2.829s | 2.829s | COMPLETE | | 1 | 440 | 0.8 | 0.9 | False | True | -6.7839 | -4.5751 | 8.532s | 11.360s | COMPLETE | | 2 | 100 | 0.6 | 0.6 | True | False | -5.0065 | -4.5751 | 0.854s | 12.214s | COMPLETE | | 3 | 70 | 0.6 | 0.7 | False | False | -5.4027 | -4.5751 | 0.952s | 13.166s | COMPLETE | | 4 | 300 | 0.5 | 0.8 | True | False | -5.0964 | -4.5751 | 2.610s | 15.776s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 0 Best parameters: --> n_estimators: 190 --> max_samples: 1.0 --> max_features: 0.9 --> bootstrap: True --> bootstrap_features: True Best evaluation --> neg_mean_squared_error: -4.5751 Time elapsed: 15.776s Fit --------------------------------------------- Train evaluation --> neg_mean_squared_error: -0.7581 Test evaluation --> neg_mean_squared_error: -5.7896 Time elapsed: 3.644s Bootstrap --------------------------------------- Evaluation --> neg_mean_squared_error: -5.9893 ± 0.1646 Time elapsed: 14.288s ------------------------------------------------- Total time: 33.709s Running hyperparameter tuning for ExtraTrees... | trial | n_estimators | criterion | max_depth | min_samples_split | min_samples_leaf | max_features | bootstrap | max_samples | ccp_alpha | neg_mean_squared_error | best_neg_mean_squared_error | time_trial | time_ht | state | | ----- | ------------ | ------------- | --------- | ----------------- | ---------------- | ------------ | --------- | ----------- | --------- | ---------------------- | --------------------------- | ---------- | ------- | -------- | | 0 | 190 | squared_error | 8 | 13 | 3 | 0.5 | True | 0.6 | 0.025 | -5.1462 | -5.1462 | 0.326s | 0.326s | COMPLETE | | 1 | 230 | absolute_er.. | 8 | 8 | 8 | sqrt | True | 0.6 | 0.0 | -9.3444 | -5.1462 | 1.419s | 1.746s | COMPLETE | | 2 | 180 | absolute_er.. | 7 | 2 | 3 | 0.6 | True | 0.6 | 0.03 | -5.7371 | -5.1462 | 1.767s | 3.512s | COMPLETE | | 3 | 100 | squared_error | 14 | 15 | 8 | None | True | 0.9 | 0.005 | -5.1938 | -5.1462 | 0.279s | 3.792s | COMPLETE | | 4 | 340 | squared_error | 6 | 15 | 8 | None | True | 0.8 | 0.01 | -4.8716 | -4.8716 | 0.557s | 4.348s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 4 Best parameters: --> n_estimators: 340 --> criterion: squared_error --> max_depth: 6 --> min_samples_split: 15 --> min_samples_leaf: 8 --> max_features: None --> bootstrap: True --> max_samples: 0.8 --> ccp_alpha: 0.01 Best evaluation --> neg_mean_squared_error: -4.8716 Time elapsed: 4.348s Fit --------------------------------------------- Train evaluation --> neg_mean_squared_error: -5.4808 Test evaluation --> neg_mean_squared_error: -6.3445 Time elapsed: 0.563s Bootstrap --------------------------------------- Evaluation --> neg_mean_squared_error: -6.3694 ± 0.0737 Time elapsed: 2.491s ------------------------------------------------- Total time: 7.402s Final results ==================== >> Total time: 42.254s ------------------------------------- DecisionTree --> neg_mean_squared_error: -7.6026 ± 0.3783 ~ Bagging --> neg_mean_squared_error: -5.9893 ± 0.1646 ~ ! ExtraTrees --> neg_mean_squared_error: -6.3694 ± 0.0737 ~
Analyze the results¶
In [11]:
                    Copied!
                    
                    
                # Use the errors or residuals plots to check the model performances
atom.plot_residuals()
# Use the errors or residuals plots to check the model performances
atom.plot_residuals()
        
        In [12]:
                    Copied!
                    
                    
                atom.plot_errors()
atom.plot_errors()
        
        In [13]:
                    Copied!
                    
                    
                # Analyze the relation between the target response and the features
atom.plot_partial_dependence(columns=(0, 1, 2, 3))
# Analyze the relation between the target response and the features
atom.plot_partial_dependence(columns=(0, 1, 2, 3))