{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example: Automated feature scaling\n", "------------------------------------\n", "\n", "This example shows how ATOM handles models that require automated feature scaling.\n", "\n", "Import the breast cancer dataset from [sklearn.datasets](https://scikit-learn.org/stable/datasets/index.html#wine-dataset). This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import packages\n", "from sklearn.datasets import load_breast_cancer\n", "from atom import ATOMClassifier" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load the data\n", "X, y = load_breast_cancer(return_X_y=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run the pipeline" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<< ================== ATOM ================== >>\n", "Algorithm task: binary classification.\n", "\n", "Dataset stats ==================== >>\n", "Shape: (569, 31)\n", "Train set size: 456\n", "Test set size: 113\n", "-------------------------------------\n", "Memory: 141.24 kB\n", "Scaled: False\n", "Outlier values: 167 (1.2%)\n", "\n" ] } ], "source": [ "atom = ATOMClassifier(X, y, verbose=2, random_state=1)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | acronym | \n", "model | \n", "needs_scaling | \n", "
---|---|---|---|
0 | \n", "AdaB | \n", "AdaBoost | \n", "False | \n", "
1 | \n", "Bag | \n", "Bagging | \n", "False | \n", "
2 | \n", "BNB | \n", "BernoulliNB | \n", "False | \n", "
3 | \n", "CatB | \n", "CatBoost | \n", "True | \n", "
4 | \n", "CatNB | \n", "CategoricalNB | \n", "False | \n", "
5 | \n", "CNB | \n", "ComplementNB | \n", "False | \n", "
6 | \n", "Tree | \n", "DecisionTree | \n", "False | \n", "
7 | \n", "Dummy | \n", "Dummy | \n", "False | \n", "
8 | \n", "ETree | \n", "ExtraTree | \n", "False | \n", "
9 | \n", "ET | \n", "ExtraTrees | \n", "False | \n", "
10 | \n", "GNB | \n", "GaussianNB | \n", "False | \n", "
11 | \n", "GP | \n", "GaussianProcess | \n", "False | \n", "
12 | \n", "GBM | \n", "GradientBoosting | \n", "False | \n", "
13 | \n", "hGBM | \n", "HistGradientBoosting | \n", "False | \n", "
14 | \n", "KNN | \n", "KNearestNeighbors | \n", "True | \n", "
15 | \n", "LGB | \n", "LightGBM | \n", "True | \n", "
16 | \n", "LDA | \n", "LinearDiscriminantAnalysis | \n", "False | \n", "
17 | \n", "lSVM | \n", "LinearSVM | \n", "True | \n", "
18 | \n", "LR | \n", "LogisticRegression | \n", "True | \n", "
19 | \n", "MLP | \n", "MultiLayerPerceptron | \n", "True | \n", "
20 | \n", "MNB | \n", "MultinomialNB | \n", "False | \n", "
21 | \n", "PA | \n", "PassiveAggressive | \n", "True | \n", "
22 | \n", "Perc | \n", "Perceptron | \n", "True | \n", "
23 | \n", "QDA | \n", "QuadraticDiscriminantAnalysis | \n", "False | \n", "
24 | \n", "RNN | \n", "RadiusNearestNeighbors | \n", "True | \n", "
25 | \n", "RF | \n", "RandomForest | \n", "False | \n", "
26 | \n", "Ridge | \n", "Ridge | \n", "True | \n", "
27 | \n", "SGD | \n", "StochasticGradientDescent | \n", "True | \n", "
28 | \n", "SVM | \n", "SupportVectorMachine | \n", "True | \n", "
29 | \n", "XGB | \n", "XGBoost | \n", "True | \n", "