Nomenclature

This documentation consistently uses terms to refer to certain concepts related to this package. The most frequent terms are described hereunder.

ATOM

Refers to this package.

atom

Instance of the ATOMClassifier, ATOMForecaster or ATOMRegressor classes (note that the examples use it as the default variable name).

categorical columns

Refers to all columns of type object, category, string or boolean.

class

Unique value in a column, e.g., a binary classifier has two classes in the target column.

dataframe

Two-dimensional, size-mutable, potentially heterogeneous tabular data. The type is usually pd.DataFrame, but could potentially be any of the dataframe types backed by the selected data engine.

dataframe-like

Any type object from which a pd.DataFrame can be created. This includes an iterable, a dict whose values are 1d-arrays, a two-dimensional list, tuple, np.ndarray or sps.csr_matrix, or any object that follows the dataframe interchange protocol. This is the standard input format for any dataset.

Additionally, you can provide a callable whose output is any of the aforementioned types. This is useful when the dataset is very large and you are performing parallel operations, since it can avoid broadcasting a large dataset from the driver to the workers.

estimator

An object which manages the estimation and decoding of an algorithm. The algorithm is estimated as a deterministic function of a set of parameters, a dataset and a random state. Should implement a fit method. Often used interchangeably with predictor because of user preference.

missing values

All values in the missing attribute, as well as None, NaN, +inf and -inf.

model

Instance of a model in atom. Not to confuse with estimator.

outliers

Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy.

outlier value

Value that lies further than 3 times the standard deviation away from the mean of its column, i.e., |z-score| > 3.

predictor

An estimator implementing a predict method.

scorer

A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation.

segment

Subset (segment) of a sequence, whether through slicing or generating a range of values. When given as a parameter type, it includes both range and slice.

sequence

A one-dimensional, indexable array of type sequence (except string), np.ndarray, pd.Index or series. This is the standard input format for a dataset's target column.

series

One-dimensional ndarray with axis labels. The type is usually pd.Series, but could potentially be any of the series types backed by the selected data engine.

target

The dependent variable in a supervised learning task. Passed as y to an estimator's fit method.

task

One of the supervised machine learning approaches that ATOM supports:

transformer

An estimator implementing a transform method. This encompasses all data cleaning and feature engineering classes.