BranchManager

class atom.data.branchmanager.BranchManager(memory=None)[source]

Object that manages branches.

Maintains references to a series of branches and the current active branch. Additionally, always stores an 'original' branch containing the original dataset (previous to any transformations). The branches share a reference to a holdout set, not the instance self. When a memory object is specified, it stores inactive branches in memory.

Example

>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer

>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)

>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)

<< ================== ATOM ================== >>

Configuration ==================== >>
Algorithm task: Binary classification.

Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 180 (1.3%)


>>> # Train a model
>>> atom.run("RF")


Training ========================= >>
Models: RF
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.965
Time elapsed: 0.171s
-------------------------------------------------
Time: 0.171s


Final results ==================== >>
Total time: 0.174s
-------------------------------------
RandomForest --> f1: 0.965

>>> # Change the branch and apply feature scaling
>>> atom.branch = "scaled"

Successfully created new branch: scaled.

>>> atom.scale()

Fitting Scaler...
Scaling features...
>>> atom.run("RF_scaled")


Training ========================= >>
Models: RF_scaled
Metric: f1


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.9718
Time elapsed: 0.172s
-------------------------------------------------
Time: 0.172s


Final results ==================== >>
Total time: 0.175s
-------------------------------------
RandomForest --> f1: 0.9718

>>> # Compare the models
>>> atom.plot_roc()

Attributes

Attributes

branches: ClassMap

Collection of branches.

og: Branch

Branch containing the original dataset. It can be any branch in branches or an internally made branch called og.

current: Branch

Current active branch.

Methods

add	Add a new branch to the manager.
fill	Fill the current branch with data.
reset	Reset this instance to its initial state.

method add(name, parent=None)[source]

Add a new branch to the manager.

If the branch is called og (reserved name for the original branch), it's created separately and stored in memory.

Parameters

name: str

Name for the new branch.

parent: Branch or None, default=None

Parent branch. Data and attributes from the parent are passed to the new branch.

method fill(data, holdout=None)[source]

Fill the current branch with data.

This call resets the cached holdout calculation.

Parameters

data: DataContainer

New data for the current branch.

holdout: dataframe or None, default=None

Holdout data set (if any).

method reset(hard=False)[source]

Reset this instance to its initial state.

The initial state of the BranchManager contains a single branch called main with no data. There's no reference to an original (og) branch.

Parameters

hard: bool, default=False

If True, flushes completely the cache.

Parameters	memory: str, Memory or None, default=None Location to store inactive branches. If None, all branches are kept in memory. This memory object is passed to the branches for pipeline caching.
Attributes	branches: ClassMap Collection of branches. og: Branch Branch containing the original dataset. It can be any branch in `branches` or an internally made branch called `og`. current: Branch Current active branch.