Nomenclature
This documentation consistently uses terms to refer to certain concepts related to this package. The most frequent terms are described hereunder.
dataframe-like
Any type object from which a pd.DataFrame
can be created. This includes an iterable
, a dict
whose values
are 1d-arrays, a two-dimensional list
, tuple
, np.array
or
scipy.sparse.matrix
, and most commonly, a dataframe. This is the
standard input format for any dataset.
atom
Instance of the ATOMClassifier or
ATOMRegressor classes (note that the
examples use it as the default variable name).
ATOM
Refers to this package.
branch
Collection of transformers fitted to a specific dataset. See
the branches section.
BO
Bayesian optimization algorithm used for hyperparameter tuning.
categorical columns
Refers to all columns of type object
or category
.
class
Unique value in a column, e.g. a binary classifier has 2 classes in the
target column.
estimator
An object which manages the estimation and decoding of an algorithm.
The algorithm is estimated as a deterministic function of a set of
parameters, a dataset and a random state.
missing values
All values in the missing
attribute, as well as None
, NaN
, +inf
and -inf
.
model
Instance of a model in the pipeline.
outlier
Sample that contains one or more outlier values. Note that the
Pruner class can use a different
definition for outliers depending on the chosen strategy.
outlier value
Value that lies further than 3 times the standard deviation away
from the mean of its column, i.e. |z-score| > 3.
pipeline
Dataset, transformers and models in a specific branch.
scorer
A non-estimator callable object which evaluates an estimator on given
test data, returning a number. Unlike evaluation metrics, a greater
returned number must correspond with a better score. See sklearn's
documentation.
sequence
A one-dimensional array of type list
, tuple
, np.array
or pd.Series
.
This is the standard input format for a dataset's target column.
target
Name of the dependent variable, passed as y to an estimator's fit method.
task
One of the three supervised machine learning approaches that ATOM supports:
trainer
Instance of a class that trains and evaluates the models (implements a
run
method). The following classes are considered trainers:
- ATOMClassifier
- ATOMRegressor
- DirectClassifier
- DirectRegressor
- SuccessiveHalvingClassifier
- SuccessiveHavingRegressor
- TrainSizingClassifier
- TrainSizingRegressor
transformer
An estimator implementing a transform
method. This encompasses all
data cleaning and feature engineering classes.