2. Scikit-learn

Open In Colab

Decision Trees on Mauna Loa CO2 data

This example uses data that consists of the monthly average atmospheric CO2 concentrations (in parts per million by volume (ppm)) collected at the Mauna Loa Observatory in Hawaii, between 1958 and 2001. The objective is to model the CO2 concentration as a function of the time t.

Build the dataset

We will derive a dataset from the Mauna Loa Observatory that collected air samples. We are interested in estimating the concentration of CO2 and extrapolate it for further year. First, we load the original dataset available in OpenML.

First, we process the original dataframe to create a date index and select only the CO2 column.

Out: (Timestamp('1958-03-29 00:00:00'), Timestamp('2001-12-29 00:00:00'))

We see that we get CO2 concentration for some days from March, 1958 to December, 2001. We can plot these raw information to have a better understanding.

/Basic%20Binder%20Webpage

We will preprocess the dataset by taking a monthly average and drop month for which no measurements were collected. Such a processing will have an smoothing effect on the data.

Cross validation is an important step in machine learning. In cross validation, the machine learning model is trained and evaluated on different subsets of input data. This step is crucial for clean evaluation, increased generalizability, and minimize underfitting & overfitting.

/Basic%20Binder%20Webpage

The idea in this example will be to predict the CO2 concentration in function of the date. We are as well interested in extrapolating for upcoming year after 2001.

As a first step, we will divide the data and the target to estimate. The data being a date, we will convert it into a numeric.

Model fitting using Decision Tree Regression

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.

Decision trees learn from data to approximate a function with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules. Deeper trees are more powerful, but this can lead to overfitting.

/Basic%20Binder%20Webpage
Image from https://dinhanhthi.com/decision-tree-regression/

Next, we evaluate the performance on the validation data.

Out:
Validation MSE for Decision Tree with depth 2: 210.6841
Validation MSE for Decision Tree with depth 11: 96.5067

Now, we will use the the fitted models to predict on:

Thus, we create synthetic data from 1958 to the current month.

/Basic%20Binder%20Webpage

As you can see, the decision tree regression was able to fit data within the existing domain quite well. The criteria for decisions is intuitive and can be understood with a simple visualization. However, it has completely failed to predict any future trend outside the domain it was trained on.

Model the derivative of the data

To improve the model’s generalization, we will predict on differences in CO2 rather than absolute CO2 levels, using a “sliding window” of recent CO2 differences as the input. We will also normalize the data.

We will train 3 decision trees at different max depths.

Next, we will evaluate trained decision trees on the validation data.

Out:
Validation MSE for Decision Tree with depth 2: 0.1530
Validation MSE for Decision Tree with depth 10: 0.1828
Validation MSE for Decision Tree with depth 25: 0.1621

Now we will generate test data that runs all the way to the present day to see the model’s predictions. We use the model’s own prediction as part of the sliding window for the next prediction, to extrapolate arbitrarily far into the future.

Let’s plot the results. Note that it is still showing the differences (derivative) rather than the absolute value, and it’s still normalized.

/Basic%20Binder%20Webpage

We can see that all of the decision trees are fitting the past data nearly perfectly, but do not entirely agree on future predictions.

Let’s convert the predictions back into absolute CO2 levels.

And plot the results:

/Basic%20Binder%20Webpage

The results can vary quite a bit between runs. The method for fitting the decision trees is stochastic, and our many input variables are all similarly informative, so the tree’s hierarchy can vary significantly. Decision trees are not robust, so slight changes in input or in the tree structure can drastically alter predictions.

Recurrent Neural Networks - RNN

Long-Short Term Memory (LSTM) network

Training a Neural Network (NN) is computationally expensive. Training gets high resource consuming when the NN model is complex. Thus, we will use hardware acceleration (GPU) to speed up the computation.

Out: device(type='cuda')

Hyperparameters are external configuration settings for a machine learning model that are not learned from the data but are set prior to training, influencing the model’s performance and behavior. Tuning hyperparameters for research projects require use of systematic approaches and packages (e.g. SHERPA, Optuna). However, we will do manual hyperparamter tuning (trial and error) for now.

Then, we will reformat the input data for the autoregressive task using LSTM.

There are different python frameworks for building neural network models. Two most popular ones are:

  1. PyTorch
  2. TensorFlow

We will use PyTorch in this example. First, we need to convert data to be feasible with PyTorch.

After that, we define the LSTM model and instantiate other model configuration parameters. Out of these optimizer and loss function are two important configurations that affect the trained LSTM model.

Afterward, we train the LSTM model for multiple epochs (epoch = single pass through the entire training dataset during the training).

Then, we make evaluate the trained model on validation data.

Out: Validation MSE for RNN: 0.0710

Next, we make predictions on train data, validation data and extrapolate to future data similar to the way we did in decision trees example.

Finally, we convert CO2 derivates to absolute CO2 levels and compare with the results we got with Decision Trees.

/Basic%20Binder%20Webpage

Based on the visualization and validation mean squared error, it is clear that LSTM (RNN) model successfully fits the training data, and extrapolate to future learning the underlying pattern unlike decision tree models.