Skip to content

Branch


class atom.data.branch.Branch(name, data=None, holdout=None, memory=None)[source]
Object that contains the data.

A branch contains a specific pipeline, the dataset transformed through that pipeline, the models fitted on that dataset, and all data and utility attributes that refer to that dataset. Branches can be created and accessed through atom's branch attribute.

All public properties and attributes of the branch can be accessed from the parent.

Read more in the user guide.

Warning

This class should not be called directly. Branches are created internally by the ATOMClassifier, ATOMForecaster and ATOMRegressor classes.

Parametersname: str
Name of the branch.

data: DataContainer or None, default=None
Data for the branch.

holdout: pd.DataFrame or None, default=None
Holdout data set.

memory: str, Memory or None, default=None
Memory object for pipeline caching and to store the data when the branch is inactive.


See Also

BranchManager

Object that manages branches.


Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 177 (1.3%)



>>> # Train a model
>>> atom.run("RF")


Training ========================= >>
Models: RF
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.993
Time elapsed: 0.182s
-------------------------------------------------
Time: 0.182s


Final results ==================== >>
Total time: 0.185s
-------------------------------------
RandomForest --> f1: 0.993


>>> # Change the branch and apply feature scaling
>>> atom.branch = "scaled"

Successfully created new branch: scaled.


>>> atom.scale()

Fitting Scaler...
Scaling features...

>>> atom.run("RF_scaled")


Training ========================= >>
Models: RF_scaled
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.9861
Time elapsed: 0.178s
-------------------------------------------------
Time: 0.178s


Final results ==================== >>
Total time: 0.181s
-------------------------------------
RandomForest --> f1: 0.9861


>>> # Compare the models
>>> atom.plot_roc()


Attributes

Attributespipeline: Pipeline
Pipeline of transformers.

Tip

Use the plot_pipeline method to visualize the pipeline.

mapping: dict[str, dict[str, int | float]]
Encoded values and their respective mapped values.

The column name is the key to its mapping dictionary. Only for columns mapped to a single column (e.g., Ordinal, Leave-one-out, etc...).

dataset: pd.DataFrame
Complete data set.
train: pd.DataFrame
Training set.
test: pd.DataFrame
Test set.
X: pd.DataFrame
Feature set.
y: pd.Series | pd.DataFrame
Target column(s).
holdout: pd.DataFrame | None
Holdout set.
X_train: pd.DataFrame
Features of the training set.
y_train: pd.Series | pd.DataFrame
Target column(s) of the training set.
X_test: pd.DataFrame
Features of the test set.
y_test: pd.Series | pd.DataFrame
Target column(s) of the test set.
shape: tuple[int, int]
Shape of the dataset (n_rows, n_columns).
columns: pd.Index
Name of all the columns.
n_columns: int
Number of columns.
features: pd.Index
Name of the features.
n_features: int
Number of features.
target: str | list[str]
Name of the target column(s).


Methods

check_scalingWhether the feature set is scaled.
loadLoad the branch's data from memory.
storeStore the branch's data as a pickle in memory.


method check_scaling()[source]
Whether the feature set is scaled.

A data set is considered scaled when it has mean~0 and std~1, or when there is a scaler in the pipeline. Categorical and binary columns (only zeros and ones) are excluded from the calculation. Sparse datasets always return False.

Returnsbool
Whether the feature set is scaled.



method load(assign=True)[source]
Load the branch's data from memory.

This method is used to restore the data of inactive branches.

Parametersassign: bool, default=True
Whether to assign the loaded data to self.

ReturnsDataContainer or None
Own data information. Returns None if no data is set.



method store(assign=True)[source]
Store the branch's data as a pickle in memory.

After storage, the data is deleted, and the branch is no longer usable until load is called. This method is used to store the data for inactive branches.

Note

This method is skipped silently for branches with no memory allocation.

Parametersassign: bool, default=True
Whether to assign None to the data in self.