Branch
A branch contains a specific pipeline, the dataset transformed
through that pipeline, the models fitted on that dataset, and all
data and utility attributes that refer to that dataset. Branches
can be created and accessed through atom's branch
attribute.
All public properties and attributes of the branch can be accessed from the parent.
Read more in the user guide.
Warning
This class should not be called directly. Branches are created internally by the ATOMClassifier, ATOMForecaster and ATOMRegressor classes.
Parameters | name: str
Name of the branch.
data: DataContainer or None, default=None
Data for the branch.
holdout: pd.DataFrame or None, default=None
Holdout data set.
memory: str, Memory or None, default=None
Memory object for pipeline caching and to store the data when
the branch is inactive.
|
Example
>>> from atom import ATOMClassifier
>>> from sklearn.datasets import load_breast_cancer
>>> X, y = load_breast_cancer(return_X_y=True, as_frame=True)
>>> # Initialize atom
>>> atom = ATOMClassifier(X, y, verbose=2)
<< ================== ATOM ================== >>
Configuration ==================== >>
Algorithm task: Binary classification.
Dataset stats ==================== >>
Shape: (569, 31)
Train set size: 456
Test set size: 113
-------------------------------------
Memory: 138.97 kB
Scaled: False
Outlier values: 177 (1.3%)
>>> # Train a model
>>> atom.run("RF")
Training ========================= >>
Models: RF
Metric: f1
Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.993
Time elapsed: 0.182s
-------------------------------------------------
Time: 0.182s
Final results ==================== >>
Total time: 0.185s
-------------------------------------
RandomForest --> f1: 0.993
>>> # Change the branch and apply feature scaling
>>> atom.branch = "scaled"
Successfully created new branch: scaled.
>>> atom.scale()
Fitting Scaler...
Scaling features...
>>> atom.run("RF_scaled")
Training ========================= >>
Models: RF_scaled
Metric: f1
Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> f1: 1.0
Test evaluation --> f1: 0.9861
Time elapsed: 0.178s
-------------------------------------------------
Time: 0.178s
Final results ==================== >>
Total time: 0.181s
-------------------------------------
RandomForest --> f1: 0.9861
>>> # Compare the models
>>> atom.plot_roc()
Attributes
Attributes | pipeline: Pipeline Pipeline of transformers.
mapping: dict[str, dict[str, int | float]]Tip Use the plot_pipeline method to visualize the pipeline. Encoded values and their respective mapped values.
dataset: pd.DataFrameThe column name is the key to its mapping dictionary. Only for columns mapped to a single column (e.g., Ordinal, Leave-one-out, etc...). Complete data set.
train: pd.DataFrameTraining set.
test: pd.DataFrameTest set.
X: pd.DataFrameFeature set.
y: pd.Series | pd.DataFrameTarget column(s).
holdout: pd.DataFrame | NoneHoldout set.
X_train: pd.DataFrameFeatures of the training set.
y_train: pd.Series | pd.DataFrameTarget column(s) of the training set.
X_test: pd.DataFrameFeatures of the test set.
y_test: pd.Series | pd.DataFrameTarget column(s) of the test set.
shape: tuple[int, int]Shape of the dataset (n_rows, n_columns).
columns: pd.IndexName of all the columns.
n_columns: intNumber of columns.
features: pd.IndexName of the features.
n_features: intNumber of features.
target: str | list[str]Name of the target column(s).
|
Methods
check_scaling | Whether the feature set is scaled. |
load | Load the branch's data from memory. |
store | Store the branch's data as a pickle in memory. |
A data set is considered scaled when it has mean~0 and std~1, or when there is a scaler in the pipeline. Categorical and binary columns (only zeros and ones) are excluded from the calculation. Sparse datasets always return False.
Returns | bool
Whether the feature set is scaled.
|
This method is used to restore the data of inactive branches.
Parameters | assign: bool, default=True
Whether to assign the loaded data to self .
|
Returns | DataContainer or None
Own data information. Returns None if no data is set.
|
After storage, the data is deleted, and the branch is no longer usable until load is called. This method is used to store the data for inactive branches.
Note
This method is skipped silently for branches with no memory allocation.
Parameters | assign: bool, default=True
Whether to assign None to the data in self .
|