Example: Multivariate forecast¶
This example shows how to use ATOM to work with a multivariate time series dataset with exogenous features.
Import the macroeconomic dataset from sktime.datasets. This is a small dataset that measures US Macroeconomic Data between 1959 and 2009.
Load the data¶
In [1]:
Copied!
# Import packages
import numpy as np
from sktime.datasets import load_macroeconomic
from atom import ATOMForecaster
# Import packages
import numpy as np
from sktime.datasets import load_macroeconomic
from atom import ATOMForecaster
In [2]:
Copied!
# Load the data
X = load_macroeconomic()
print(X)
# Load the data
X = load_macroeconomic()
print(X)
          realgdp  realcons   realinv  realgovt  realdpi      cpi      m1  \
Period                                                                      
1959Q1   2710.349    1707.4   286.898   470.045   1886.9   28.980   139.7   
1959Q2   2778.801    1733.7   310.859   481.301   1919.7   29.150   141.7   
1959Q3   2775.488    1751.8   289.226   491.260   1916.4   29.350   140.5   
1959Q4   2785.204    1753.7   299.356   484.052   1931.3   29.370   140.0   
1960Q1   2847.699    1770.5   331.722   462.199   1955.5   29.540   139.6   
...           ...       ...       ...       ...      ...      ...     ...   
2008Q3  13324.600    9267.7  1990.693   991.551   9838.3  216.889  1474.7   
2008Q4  13141.920    9195.3  1857.661  1007.273   9920.4  212.174  1576.5   
2009Q1  12925.410    9209.2  1558.494   996.287   9926.4  212.671  1592.8   
2009Q2  12901.504    9189.0  1456.678  1023.528  10077.5  214.469  1653.6   
2009Q3  12990.341    9256.0  1486.398  1044.088  10040.6  216.385  1673.9   
        tbilrate  unemp      pop  infl  realint  
Period                                           
1959Q1      2.82    5.8  177.146  0.00     0.00  
1959Q2      3.08    5.1  177.830  2.34     0.74  
1959Q3      3.82    5.3  178.657  2.74     1.09  
1959Q4      4.33    5.6  179.386  0.27     4.06  
1960Q1      3.50    5.2  180.007  2.31     1.19  
...          ...    ...      ...   ...      ...  
2008Q3      1.17    6.0  305.270 -3.16     4.33  
2008Q4      0.12    6.9  305.952 -8.79     8.91  
2009Q1      0.22    8.1  306.547  0.94    -0.71  
2009Q2      0.18    9.2  307.226  3.37    -3.19  
2009Q3      0.12    9.6  308.013  3.56    -3.44  
[203 rows x 12 columns]
Analyze the data¶
In [3]:
Copied!
# We specify the last two columns as our target columns
atom = ATOMForecaster(X, y=(-2, -1), verbose=2, random_state=1)
# We specify the last two columns as our target columns
atom = ATOMForecaster(X, y=(-2, -1), verbose=2, random_state=1)
<< ================== ATOM ================== >> Configuration ==================== >> Algorithm task: Multivariate forecast. Dataset stats ==================== >> Shape: (203, 12) Train set size: 163 --> From: 1959Q1 To: 1999Q3 Test set size: 40 --> From: 1999Q4 To: 2009Q3 ------------------------------------- Memory: 29.41 kB Scaled: False Outlier values: 9 (0.5%)
In [4]:
Copied!
atom.dataset
atom.dataset
Out[4]:
| realgdp | realcons | realinv | realgovt | realdpi | cpi | m1 | tbilrate | unemp | pop | infl | realint | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Period | ||||||||||||
| 1959Q1 | 2710.349 | 1707.4 | 286.898 | 470.045 | 1886.9 | 28.980 | 139.7 | 2.82 | 5.8 | 177.146 | 0.00 | 0.00 | 
| 1959Q2 | 2778.801 | 1733.7 | 310.859 | 481.301 | 1919.7 | 29.150 | 141.7 | 3.08 | 5.1 | 177.830 | 2.34 | 0.74 | 
| 1959Q3 | 2775.488 | 1751.8 | 289.226 | 491.260 | 1916.4 | 29.350 | 140.5 | 3.82 | 5.3 | 178.657 | 2.74 | 1.09 | 
| 1959Q4 | 2785.204 | 1753.7 | 299.356 | 484.052 | 1931.3 | 29.370 | 140.0 | 4.33 | 5.6 | 179.386 | 0.27 | 4.06 | 
| 1960Q1 | 2847.699 | 1770.5 | 331.722 | 462.199 | 1955.5 | 29.540 | 139.6 | 3.50 | 5.2 | 180.007 | 2.31 | 1.19 | 
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | 
| 2008Q3 | 13324.600 | 9267.7 | 1990.693 | 991.551 | 9838.3 | 216.889 | 1474.7 | 1.17 | 6.0 | 305.270 | -3.16 | 4.33 | 
| 2008Q4 | 13141.920 | 9195.3 | 1857.661 | 1007.273 | 9920.4 | 212.174 | 1576.5 | 0.12 | 6.9 | 305.952 | -8.79 | 8.91 | 
| 2009Q1 | 12925.410 | 9209.2 | 1558.494 | 996.287 | 9926.4 | 212.671 | 1592.8 | 0.22 | 8.1 | 306.547 | 0.94 | -0.71 | 
| 2009Q2 | 12901.504 | 9189.0 | 1456.678 | 1023.528 | 10077.5 | 214.469 | 1653.6 | 0.18 | 9.2 | 307.226 | 3.37 | -3.19 | 
| 2009Q3 | 12990.341 | 9256.0 | 1486.398 | 1044.088 | 10040.6 | 216.385 | 1673.9 | 0.12 | 9.6 | 308.013 | 3.56 | -3.44 | 
203 rows × 12 columns
In [5]:
Copied!
# Examine the targets
atom.plot_series()
# Examine the targets
atom.plot_series()
In [6]:
Copied!
atom.plot_decomposition()
atom.plot_decomposition()
Run the pipeline¶
In [7]:
Copied!
# Exogenous features are transformed normally
atom.normalize()
# Exogenous features are transformed normally
atom.normalize()
Fitting Normalizer... Normalizing features...
In [8]:
Copied!
atom.warnings = True  # Let's turn on warnings for a sec
atom.warnings = True  # Let's turn on warnings for a sec
In [9]:
Copied!
# Use the apply method to transform the target columns
atom.apply(np.sqrt, columns=atom.target)
# Use the apply method to transform the target columns
atom.apply(np.sqrt, columns=atom.target)
Fitting FunctionTransformer...
C:\Users\Mavs\Documents\Python\ATOM\venv311\Lib\site-packages\pandas\core\internals\blocks.py:366: RuntimeWarning: invalid value encountered in sqrt
In [10]:
Copied!
# Note from the warnings that we might have NaNs in the dataset now
atom.nans
# Note from the warnings that we might have NaNs in the dataset now
atom.nans
Out[10]:
realgdp 0 realcons 0 realinv 0 realgovt 0 realdpi 0 cpi 0 m1 0 tbilrate 0 unemp 0 pop 0 infl 6 realint 52 dtype: int64
In [11]:
Copied!
# And, indeed, we can see them in the target columns
atom.y
# And, indeed, we can see them in the target columns
atom.y
Out[11]:
| infl | realint | |
|---|---|---|
| Period | ||
| 1959Q1 | 0.000000 | 0.000000 | 
| 1959Q2 | 1.529706 | 0.860233 | 
| 1959Q3 | 1.655295 | 1.044031 | 
| 1959Q4 | 0.519615 | 2.014944 | 
| 1960Q1 | 1.519868 | 1.090871 | 
| ... | ... | ... | 
| 2008Q3 | NaN | 2.080865 | 
| 2008Q4 | NaN | 2.984962 | 
| 2009Q1 | 0.969536 | NaN | 
| 2009Q2 | 1.835756 | NaN | 
| 2009Q3 | 1.886796 | NaN | 
203 rows × 2 columns
In [12]:
Copied!
# Impute the missing values created by the transformation
atom.impute(strat_num="bfill", columns=atom.target)
# Impute the missing values created by the transformation
atom.impute(strat_num="bfill", columns=atom.target)
Fitting Imputer... Imputing missing values... --> Imputing 6 missing values with bfill in column infl. --> Imputing 52 missing values with bfill in column realint.
In [13]:
Copied!
atom.y
atom.y
Out[13]:
| infl | realint | |
|---|---|---|
| Period | ||
| 1959Q1 | 0.000000 | 0.000000 | 
| 1959Q2 | 1.529706 | 0.860233 | 
| 1959Q3 | 1.655295 | 1.044031 | 
| 1959Q4 | 0.519615 | 2.014944 | 
| 1960Q1 | 1.519868 | 1.090871 | 
| ... | ... | ... | 
| 2008Q3 | 0.969536 | 2.080865 | 
| 2008Q4 | 0.969536 | 2.984962 | 
| 2009Q1 | 0.969536 | 2.984962 | 
| 2009Q2 | 1.835756 | 2.984962 | 
| 2009Q3 | 1.886796 | 2.984962 | 
203 rows × 2 columns
In [14]:
Copied!
atom.run(["BATS", "MSTL"], n_trials=10, warnings=False)
atom.run(["BATS", "MSTL"], n_trials=10, warnings=False)
Training ========================= >>
Models: BATS, MSTL
Metric: mape
Running hyperparameter tuning for BATS...
| trial | use_box_cox | use_trend | use_damped_trend | use_arma_errors |    mape | best_mape | time_trial | time_ht |    state |
| ----- | ----------- | --------- | ---------------- | --------------- | ------- | --------- | ---------- | ------- | -------- |
| 0     |       False |      True |             None |            True | -0.6224 |   -0.6224 |     5.472s |  5.472s | COMPLETE |
| 1     |        None |     False |             True |           False |  -0.638 |   -0.6224 |     0.159s |  5.631s | COMPLETE |
| 2     |        None |      True |            False |           False | -0.9856 |   -0.6224 |     0.322s |  5.954s | COMPLETE |
| 3     |       False |     False |            False |           False |  -0.638 |   -0.6224 |     0.158s |  6.112s | COMPLETE |
| 4     |        None |      True |            False |           False | -0.9856 |   -0.6224 |     0.001s |  6.113s | COMPLETE |
| 5     |       False |     False |            False |           False |  -0.638 |   -0.6224 |     0.002s |  6.115s | COMPLETE |
| 6     |        None |     False |            False |           False |  -0.638 |   -0.6224 |     0.157s |  6.272s | COMPLETE |
| 7     |       False |      True |             None |           False | -0.6224 |   -0.6224 |     1.245s |  7.517s | COMPLETE |
| 8     |        True |      True |             None |            True | -0.6224 |   -0.6224 |     5.364s | 12.881s | COMPLETE |
| 9     |        True |      None |             None |           False | -0.5939 |   -0.5939 |     1.417s | 14.299s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 9
Best parameters:
 --> use_box_cox: True
 --> use_trend: None
 --> use_damped_trend: None
 --> use_arma_errors: False
Best evaluation --> mape: -0.5939
Time elapsed: 14.299s
Fit ---------------------------------------------
Train evaluation --> mape: -31036189868254.254
Test evaluation --> mape: -0.7341
Time elapsed: 1.168s
-------------------------------------------------
Time: 15.467s
Running hyperparameter tuning for MSTL...
| trial | seasonal_deg | trend_deg | low_pass_deg |  robust |    mape | best_mape | time_trial | time_ht |    state |
| ----- | ------------ | --------- | ------------ | ------- | ------- | --------- | ---------- | ------- | -------- |
| 0     |            1 |         1 |            0 |   False | -0.6357 |   -0.6357 |    20.777s | 20.777s | COMPLETE |
| 1     |            1 |         1 |            1 |   False | -0.6357 |   -0.6357 |     0.095s | 20.872s | COMPLETE |
| 2     |            1 |         1 |            1 |   False | -0.6357 |   -0.6357 |     0.001s | 20.873s | COMPLETE |
| 3     |            1 |         0 |            1 |   False | -0.6357 |   -0.6357 |     0.104s | 20.977s | COMPLETE |
| 4     |            0 |         0 |            1 |   False | -0.6357 |   -0.6357 |     0.119s | 21.096s | COMPLETE |
| 5     |            0 |         1 |            1 |    True | -0.6357 |   -0.6357 |     0.123s | 21.219s | COMPLETE |
| 6     |            0 |         1 |            1 |    True | -0.6357 |   -0.6357 |     0.003s | 21.222s | COMPLETE |
| 7     |            0 |         1 |            1 |    True | -0.6357 |   -0.6357 |     0.002s | 21.224s | COMPLETE |
| 8     |            1 |         0 |            0 |    True | -0.6357 |   -0.6357 |     0.107s | 21.331s | COMPLETE |
| 9     |            1 |         0 |            0 |    True | -0.6357 |   -0.6357 |     0.001s | 21.332s | COMPLETE |
Hyperparameter tuning ---------------------------
Best trial --> 0
Best parameters:
 --> stl_kwargs: {'seasonal_deg': 1, 'trend_deg': 1, 'low_pass_deg': 0, 'robust': False}
Best evaluation --> mape: -0.6357
Time elapsed: 21.332s
Fit ---------------------------------------------
Train evaluation --> mape: -40954763135468.12
Test evaluation --> mape: -0.7308
Time elapsed: 0.107s
-------------------------------------------------
Time: 21.439s
Final results ==================== >>
Total time: 37.028s
-------------------------------------
BATS --> mape: -0.7341
MSTL --> mape: -0.7308 !
Analyze the results¶
In [15]:
Copied!
atom.evaluate()
atom.evaluate()
Out[15]:
| mae | mape | mse | r2 | rmse | |
|---|---|---|---|---|---|
| BATS | -0.553500 | -0.734100 | -0.505500 | -0.012300 | -0.693300 | 
| MSTL | -0.552600 | -0.730800 | -0.504800 | -0.011100 | -0.692900 | 
In [16]:
Copied!
with atom.canvas():
    atom.winner.plot_forecast(target=0)
    atom.winner.plot_forecast(target=1)
with atom.canvas():
    atom.winner.plot_forecast(target=0)
    atom.winner.plot_forecast(target=1)