Nomenclature

This documentation consistently uses terms to refer to certain concepts related to this package. The most frequent terms are described hereunder.

ATOM

Refers to this package.

atom

Instance of the ATOMClassifier or ATOMRegressor classes (note that the examples use it as the default variable name).

branch

A pipeline, corresponding dataset and models fitted to that dataset. See the branches section of the user guide.

categorical columns

Refers to all columns of type object or category.

class

Unique value in a column, e.g. a binary classifier has 2 classes in the target column.

dataframe

Two-dimensional, size-mutable, potentially heterogeneous tabular data of type pd.DataFrame or its modin counterpart.

dataframe-like

Any type object from which a dataframe can be created. This includes an iterable, a dict whose values are 1d-arrays, a two-dimensional list, tuple, np.ndarray or sps.csr_matrix, and most commonly, a dataframe. This is the standard input format for any dataset.

estimator

An object which manages the estimation and decoding of an algorithm. The algorithm is estimated as a deterministic function of a set of parameters, a dataset and a random state. Should implement a fit method. Often used interchangeably with predictor because of user preference.

index

Immutable sequence used for indexing and alignment of type pd.Index, pd.MultiIndex or their modin counterparts.

missing values

All values in the missing attribute, as well as None, NaN, +inf and -inf.

model

Instance of a model in the pipeline. Not to confuse with estimator.

outliers

Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy.

outlier value

Value that lies further than 3 times the standard deviation away from the mean of its column, i.e. |z-score| > 3.

pipeline

Sequence of transformers in a specific (usually the current) branch.

predictor

An estimator implementing a predict method.

scorer

A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation.

sequence

A one-dimensional array of type list, tuple, np.ndarray or series. This is the standard input format for a dataset's target column.

series

One-dimensional ndarray with axis labels of type pd.Series or its modin counterpart.

target

The dependent variable in a supervised learning task. Passed as y to an estimator's fit method.

task

One of the six supervised machine learning approaches that ATOM supports:

transformer

An estimator implementing a transform method. This encompasses all data cleaning and feature engineering classes.