Predicting

Prediction methods

After training a model, you probably want to make predictions on new, unseen data. Just like a sklearn estimator, you can call the prediction methods from the model, e.g. atom.tree.predict(X).

All prediction methods transform the provided data through the pipeline in the model's branch before making the predictions. Transformers that should only be applied on the training set are excluded from this step (e.g. outlier pruning or class balancing).

The available prediction methods are the most common methods for estimators in sklearn's API:

decision_function	Get confidence scores on new data or rows in the dataset.
predict	Get class predictions on new data or rows in the dataset.
predict_log_proba	Get class log-probabilities on new data or rows in the dataset.
predict_proba	Get class probabilities on new data or rows in the dataset.
score	Get a metric score on new data.

Prediction attributes

The prediction methods can be calculated on the train, test and holdout set. You can access them through attributes of the form [method]_[data_set], e.g. atom.mnb.predict_train, atom.mnb.predict_test or atom.mnb.predict_holdout. The predictions for these attributes are not calculated until the attribute is called for the first time. This mechanism avoids having to make (perhaps) expensive calculations that are never used, saving time and memory.

Warning

The prediction attributes for the score method return atom's metric score on that set, not the metric returned by sklearn's score method for estimators. Use the method's metric parameter to calculate a different metric.

Note

Many of the plots use the prediction attributes. This can considerably increase the size of the instance for large datasets. Use the clear method if you need to free some memory.

Predictions on rows in the dataset

It's also possible to get the prediction for a specific row or rows in the dataset, providing the names or positions of their indices, e.g. atom.rf.predict(10) returns the random forest's prediction on the 10th row in the dataset, or atom.rf.predict_proba(["index1", "index2"]) returns the class probabilities for the rows in the dataset with indices index1 and index2.