Models

Predefined models

ATOM provides 32 estimators for classification and regression tasks that can be used to fit the data in the pipeline. After fitting, a class containing the estimator is attached to the trainer as an attribute. We refer to these "subclasses" as models. Apart from the estimator, the models contain a variety of attributes and methods to help you understand how the underlying estimator performed. They can be accessed using their acronyms, e.g. atom.LGB to access the LightGBM's model. The available models and their corresponding acronyms are:

"Dummy" for Dummy Classification/Regression
"GP" for Gaussian Process
"GNB" for Gaussian Naive Bayes
"MNB" for Multinomial Naive Bayes
"BNB" for Bernoulli Naive Bayes
"CatNB" for Categorical Naive Bayes
"CNB" for Complement Naive Bayes
"OLS" for Ordinary Least Squares
"Ridge" for Ridge Classification/Regression
"Lasso" for Lasso Regression
"EN" for Elastic Net
"BR" for Bayesian Ridge
"ARD" for Automated Relevance Determination
"LR" for Logistic Regression
"LDA" for Linear Discriminant Analysis
"QDA" for Quadratic Discriminant Analysis
"KNN" for K-Nearest Neighbors
"RNN" for Radius Nearest Neighbors
"Tree" for Decision Tree
"Bag" for Bagging
"ET" for Extra-Trees
"RF" for Random Forest
"AdaB" for AdaBoost
"GBM" for Gradient Boosting Machine
"XGB" for XGBoost
"LGB" for LightGBM
"CatB" for CatBoost
"lSVM" for Linear-SVM
"kSVM" for Kernel-SVM
"PA" for Passive Aggressive
"SGD" for Stochastic Gradient Descent
"MLP" for Multi-layer Perceptron

Tip

The acronyms are case-insensitive, e.g. atom.lgb also calls the LightGBM's model.

Warning

The models can not be initialized directly by the user! Only use them through the trainers.

Custom models

It is also possible to create your own models in ATOM's pipeline. For example, imagine we want to use sklearn's Lars estimator (note that is not included in ATOM's predefined models). There are two ways to achieve this:

Using ATOMModel (recommended). With this approach you can pass the required model characteristics to the pipeline.

from sklearn.linear_model import Lars
from atom import ATOMRegressor, ATOMModel

model = ATOMModel(models=Lars, fullname="Lars Regression", needs_scaling=True)

atom = ATOMRegressor(X, y)
atom.run(model)

Using the estimator's class or an instance of the class. This approach will also call ATOMModel under the hood, but it will leave its parameters to their default values.

from sklearn.linear_model import Lars
from atom import ATOMRegressor, ATOMModel

atom = ATOMRegressor(X, y)
atom.run(Lars)

Additional things to take into account:

Custom models can be accessed through their acronym like any other model, e.g. atom.lars in the example above.
Custom models are not restricted to sklearn estimators, but they should follow sklearn's API, i.e. have a fit and predict method.
Parameter customization (for the initializer) is only possible for custom models which provide an estimator that has a set_params() method, i.e. it's a child class of BaseEstimator.
Hyperparameter optimization for custom models is ignored unless appropriate dimensions are provided through bo_params.
If the estimator has a n_jobs and/or random_state parameter that is left to its default value, it will automatically adopt the values from the trainer it's called from.

Deep learning

Deep learning models can be used through ATOM's custom models as long as they follow sklearn's API. For example, models implemented with the Keras package should use the sklearn wrappers KerasClassifier or KerasRegressor.

Many deep learning use cases, for example in computer vision, use datasets with more than 2 dimensions, e.g. image data can have shape (n_samples, length, width, rgb). These data structures are not intended to store in a two-dimensional pandas dataframe, but, since ATOM requires a dataframe for its internal API, datasets with more than two dimensions are stored in a single column called "Multidimensional feature", where every row contains one (multidimensional) sample. Note that the data cleaning, feature engineering and some of the plotting methods are unavailable when this is the case.

See in this example how to use ATOM to train and validate a Convolutional Neural Network implemented with Keras.