Branch

class atom.data.branch.Branch(name, data=None, holdout=None, memory=None)[source]

Object that contains the data.

A branch contains a specific pipeline, the dataset transformed through that pipeline, the models fitted on that dataset, and all data and utility attributes that refer to that dataset. Branches can be created and accessed through atom's branch attribute.

All public properties and attributes of the branch can be accessed from the parent.

Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 177 (1.3%)



>>> # Train a model
>>> atom.run("RF")


Training ========================= >>
Models: RF
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.993
Time elapsed: 0.182s
-------------------------------------------------
Time: 0.182s


Final results ==================== >>
Total time: 0.185s
-------------------------------------
RandomForest --> f1: 0.993


>>> # Change the branch and apply feature scaling
>>> atom.branch = "scaled"

Successfully created new branch: scaled.


>>> atom.scale()

Fitting Scaler...
Scaling features...

>>> atom.run("RF_scaled")


Training ========================= >>
Models: RF_scaled
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.9861
Time elapsed: 0.178s
-------------------------------------------------
Time: 0.178s


Final results ==================== >>
Total time: 0.181s
-------------------------------------
RandomForest --> f1: 0.9861


>>> # Compare the models
>>> atom.plot_roc()

Attributes

Attributes

pipeline: Pipeline

Pipeline of transformers.

Tip

Use the plot_pipeline method to visualize the pipeline.

mapping: dict[str, dict[str, int | float]]

Encoded values and their respective mapped values.

The column name is the key to its mapping dictionary. Only for columns mapped to a single column (e.g., Ordinal, Leave-one-out, etc...).

dataset: pd.DataFrame

Complete data set.

train: pd.DataFrame

Training set.

test: pd.DataFrame

Test set.

X: pd.DataFrame

Feature set.

y: pd.Series | pd.DataFrame

Target column(s).

holdout: pd.DataFrame | None

Holdout set.

X_train: pd.DataFrame

Features of the training set.

y_train: pd.Series | pd.DataFrame

Target column(s) of the training set.

X_test: pd.DataFrame

Features of the test set.

y_test: pd.Series | pd.DataFrame

Target column(s) of the test set.

shape: tuple[int, int]

Shape of the dataset (n_rows, n_columns).

columns: pd.Index

Name of all the columns.

n_columns: int

Number of columns.

features: pd.Index

Name of the features.

n_features: int

Number of features.

target: str | list[str]

Name of the target column(s).

Methods

check_scaling	Whether the feature set is scaled.
load	Load the branch's data from memory.
store	Store the branch's data as a pickle in memory.

method check_scaling()[source]

Whether the feature set is scaled.

A data set is considered scaled when it has mean~0 and std~1, or when there is a scaler in the pipeline. Categorical and binary columns (only zeros and ones) are excluded from the calculation. Sparse datasets always return False.

Returns

bool

Whether the feature set is scaled.

method load(assign=True)[source]

Load the branch's data from memory.

This method is used to restore the data of inactive branches.

Parameters	assign: bool, default=True Whether to assign the loaded data to `self`.
Returns	DataContainer or None Own data information. Returns None if no data is set.

method store(assign=True)[source]

Store the branch's data as a pickle in memory.

After storage, the data is deleted, and the branch is no longer usable until load is called. This method is used to store the data for inactive branches.

Note

This method is skipped silently for branches with no memory allocation.

Parameters

assign: bool, default=True

Whether to assign None to the data in self.