Predicting

Prediction methods

After training a model, you probably want to make predictions on new, unseen data. Just like a sklearn estimator, you can call the prediction methods from the model, e.g. atom.tree.predict(X).

All prediction methods transform the provided data through the pipeline in the model's branch before making the predictions. Transformers that should only be applied on the training set are excluded from this step (e.g. outlier pruning or class balancing).

The available prediction methods are the most common methods for estimators in sklearn's API:

decision_function	Get confidence scores on new data or rows in the dataset.
predict	Get class predictions on new data or rows in the dataset.
predict_log_proba	Get class log-probabilities on new data or rows in the dataset.
predict_proba	Get class probabilities on new data or rows in the dataset.
score	Get a metric score on new data.

Prediction attributes

The prediction methods can be calculated on the train, test and holdout set. You can access them through attributes of the form [method]_[data_set], e.g. atom.mnb.predict_train, atom.mnb.predict_test or atom.mnb.predict_holdout. The results are cached after the first call to avoid consequent expensive calculations. This mechanism can increase the size of the instance for large datasets. Use the clear method if you need to free the memory.

Warning

The prediction attributes for the score method return atom's metric score on that set, not the metric returned by sklearn's score method for estimators. Use the method's metric parameter to calculate a different metric.

Note

The predict_proba method of some meta-estimators for multioutput tasks (such as MultioutputClassifier) return 3 dimensions, namely, a list of arrays with shape=(n_samples, n_classes). One array per target column. Since ATOM's prediction methods return pandas objects, such 3-dimensional arrays are converted to a multiindex pd.DataFrame, where the first level of the row indices are the target columns, and the second level are the classes. Use .loc[[name_of_target_column]] to only select the predictions for one target.

Predictions on rows in the dataset

It's also possible to get the prediction for a specific row or rows in the dataset, providing the names or positions of the rows to the prediction methods, e.g. atom.rf.predict(10) returns the random forest's prediction on the 10th row in the dataset, or atom.rf.predict_proba(["index1", "index2"]) returns the class probabilities for the rows in the dataset with indices index1 and index2.