Nomenclature
This documentation consistently uses terms to refer to certain concepts related to this package. The most frequent terms are described hereunder.
Refers to this package.
Instance of the ATOMClassifier or ATOMRegressor classes (note that the examples use it as the default variable name).
A pipeline, corresponding dataset and models fitted to that dataset. See the branches section of the user guide.
Refers to all columns of type object
or category
.
Unique value in a column, e.g. a binary classifier has 2 classes in the target column.
Two-dimensional, size-mutable, potentially heterogeneous tabular data of type pd.DataFrame or its modin counterpart.
Any type object from which a dataframe can be created. This includes an iterable, a dict whose values are 1d-arrays, a two-dimensional list, tuple, np.ndarray or sps.csr_matrix, and most commonly, a dataframe. This is the standard input format for any dataset.
An object which manages the estimation and decoding of an algorithm.
The algorithm is estimated as a deterministic function of a set of
parameters, a dataset and a random state. Should implement a fit
method. Often used interchangeably with predictor because of user
preference.
Immutable sequence used for indexing and alignment of type pd.Index, pd.MultiIndex or their modin counterparts.
All values in the missing
attribute, as
well as None
, NaN
, +inf
and -inf
.
Sample that contains one or more outlier values. Note that the Pruner class can use a different definition for outliers depending on the chosen strategy.
Value that lies further than 3 times the standard deviation away from the mean of its column, i.e. |z-score| > 3.
Sequence of transformers in a specific (usually the current) branch.
An estimator implementing a predict
method.
A non-estimator callable object which evaluates an estimator on given test data, returning a number. Unlike evaluation metrics, a greater returned number must correspond with a better score. See sklearn's documentation.
A one-dimensional array of type list, tuple, np.ndarray or series. This is the standard input format for a dataset's target column.
The dependent variable in a supervised learning task. Passed as y
to
an estimator's fit method.
One of the six supervised machine learning approaches that ATOM supports:
An estimator implementing a transform
method. This encompasses all
data cleaning and feature engineering classes.