Getting started
Installation
Standard installation
Install ATOM's newest release easily via pip
:
$ pip install -U atom-ml
or via conda
:
$ conda install -c conda-forge atom-ml
Note
Since atom was already taken, download the package under the name atom-ml
!
Optional dependencies
To install the optional dependencies, add [models] after the package's name.
$ pip install -U atom-ml[models]
Latest source
Sometimes, new features and bug fixes are already implemented in the
development
branch, but waiting for the next release to be made
available. If you can't wait for that, it's possible to install the
package directly from git.
$ pip install git+https://github.com/tvdboom/ATOM.git@development#egg=atom-ml
Don't forget to include #egg=atom-ml
to explicitly name the project,
this way pip can track metadata for it without having to have run the
setup.py
script.
Contributing
If you are planning to contribute to the project, you'll need the development dependencies. Install them adding [dev] after the package's name.
$ pip install -U atom-ml[dev]
Click here for a complete list of package files for all versions published on PyPI.
Usage
ATOM contains a variety of classes and functions to perform data cleaning, feature engineering, model training, plotting and much more. The easiest way to use everything ATOM has to offer is through one of the main classes:
- ATOMClassifier for binary or multiclass classification tasks.
- ATOMRegressor for regression tasks.
Let's walk you through an example. Click on the Google Colab badge on top of this section to run this example yourself.
Make the necessary imports and load the data.
import pandas as pd
from atom import ATOMClassifier
# Load the Australian Weather dataset
X = pd.read_csv("https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv")
X.head()
Initialize the ATOMClassifier or ATOMRegressor class. These two classes are convenient wrappers for the whole machine learning pipeline. Contrary to sklearn's API, they are initialized providing the data you want to manipulate. This data is stored in the instance and can be accessed at any moment through atom's data attributes. You can either let atom split the dataset into a train and test set or provide the sets yourself.
atom = ATOMClassifier(X, y="RainTomorrow", test_size=0.3, verbose=2)
Data transformations are applied through atom's methods. For example, calling the impute method will initialize an Imputer instance, fit it on the training set and transform the whole dataset. The transformations are applied immediately after calling the method (no fit and transform commands necessary).
atom.impute(strat_num="median", strat_cat="most_frequent")
atom.encode(strategy="LeaveOneOut", max_onehot=8)
Similarly, models are trained and evaluated using the run method. Here, we fit both a Random Forest and AdaBoost model, and apply hyperparameter tuning.
atom.run(models=["RF", "AdaB"], metric="auc", n_calls=10, n_initial_points=4)
Lastly, visualize the result using the integrated plots.
atom.plot_roc()
atom.rf.plot_confusion_matrix(normalize=True)