Skip to content

BranchManager


class atom.data.branchmanager.BranchManager(memory=None)[source]

Object that manages branches.

Maintains references to a series of branches and the current active branch. Additionally, always stores an 'original' branch containing the original dataset (previous to any transformations). The branches share a reference to a holdout set, not the instance self. When a memory object is specified, it stores inactive branches in memory.

Read more in the user guide.

Warning

This class should not be called directly. The BranchManager is created internally by the ATOMClassifier, ATOMForecaster and ATOMRegressor classes.

Parameters memory: str, Memory or None, default=None
Location to store inactive branches. If None, all branches are kept in memory. This memory object is passed to the branches for pipeline caching.

Attributes branches: ClassMap
Collection of branches.

og: Branch
Branch containing the original dataset. It can be any branch in branches or an internally made branch called og.

current: Branch
Current active branch.


See Also

Branch

Object that contains the data.


Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 180 (1.3%)


>>> # Train a model
>>> atom.run("RF")


Training ========================= >>
Models: RF
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.965
Time elapsed: 0.171s
-------------------------------------------------
Time: 0.171s


Final results ==================== >>
Total time: 0.174s
-------------------------------------
RandomForest --> f1: 0.965

>>> # Change the branch and apply feature scaling
>>> atom.branch = "scaled"

Successfully created new branch: scaled.

>>> atom.scale()

Fitting Scaler...
Scaling features...
>>> atom.run("RF_scaled")


Training ========================= >>
Models: RF_scaled
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.9718
Time elapsed: 0.172s
-------------------------------------------------
Time: 0.172s


Final results ==================== >>
Total time: 0.175s
-------------------------------------
RandomForest --> f1: 0.9718

>>> # Compare the models
>>> atom.plot_roc()


Attributes

Attributes branches: ClassMap
Collection of branches.

og: Branch
Branch containing the original dataset. It can be any branch in branches or an internally made branch called og.

current: Branch
Current active branch.


Methods

addAdd a new branch to the manager.
fillFill the current branch with data.
resetReset this instance to its initial state.


method add(name, parent=None)[source]

Add a new branch to the manager.

If the branch is called og (reserved name for the original branch), it's created separately and stored in memory.

Parameters name: str
Name for the new branch.

parent: Branch or None, default=None
Parent branch. Data and attributes from the parent are passed to the new branch.



method fill(data, holdout=None)[source]

Fill the current branch with data.

This call resets the cached holdout calculation.

Parameters data: DataContainer
New data for the current branch.

holdout: dataframe or None, default=None
Holdout data set (if any).



method reset(hard=False)[source]

Reset this instance to its initial state.

The initial state of the BranchManager contains a single branch called main with no data. There's no reference to an original (og) branch.

Parameters hard: bool, default=False
If True, flushes completely the cache.