# Fractionally Differentiated Price as Feature for Machine Learning Models

Actualizado: 3 de dic de 2020

We discussed the memory content and the statistical properties of fractionally differentiated (fracdiff for shorts) time series in this __previous post__. In this publication we are going to compare the performance of the fracdiff time series as future predictor when compared to baseline predictions with undifferentiated price series or d=1 differentiated series.

We are going to use the premise from __this publication__, that is, past behaviour of the __SPY index__ can predict the value of the __VIX indicator__. We will also assume that predictions can be obtained from regression and we will employ commonly used methods to generate a simple "up" or "down" prediction for several days into the future.

Our first step is to obtain the required price and VIX value data using __Quantconnect__ add methods:

```
qb = QuantBook()
spy = qb.AddEquity('SPY').Symbol
vix = qb.AddData(CBOE, 'VIX').Symbol
symbol_list = [spy, vix]
#Dates for training the model:
start_date, end_date = datetime(2005, 1, 1), datetime(2015, 1, 1)
history = qb.History(qb.Securities.Keys, start_date, end_date, Resolution.Daily)
```

The resulting dataframe has to be reindexed and pivoted, we are going to use only the "close" price information this time, we ignore volume and bid-ask data that could be included in a more complex model:

```
reindex_history = history.reset_index()
reindex_history['time'] = reindex_history['time'].dt.date
close_prices = reindex_history.pivot(index = 'time', columns ='symbol', values = 'close').dropna()
```

From our __previous publication__ we know that the price of SPY is non-stationary during this period. In contrast, we know that VIX closing value for this period is stationary. In any case we are going to check both for their statistical properties in a fracdiff range of 0 (original time series) to 1 (day to day difference). We are using __MLFinlab library__ again. MLFinlab creators of have changed their approach to the development of the library, we still think the library is very good and useful and we maintain __patronage__ within our (modest) possibilities:

```
symbol_list = [spy, vix]
fracdiff_series_dictionary = {}
fracdiff_series_adf_tests = {}
fracdiff_values = np.linspace(0,1,11)
for frac in fracdiff_values:
for symbol in symbol_list:
frac = round(frac,1)
dic_index = (symbol, frac)
fracdiff_series_dictionary[dic_index] = frac_diff_ffd(close_prices[[symbol]], frac, thresh=0.0001)
fracdiff_series_dictionary[dic_index].dropna(inplace=True)
fracdiff_series_adf_tests[dic_index] = adfuller(fracdiff_series_dictionary[dic_index])
```

We keep only those fracdiff series that are stationary. These are committed to dictionaries in case we want, in the future, to use additional symbols or data or just inspect them.

```
stationary_fracdiffs = [key for key in fracdiff_series_adf_tests.keys() if fracdiff_series_adf_tests[key][1] < 0.05 ]
first_stationary_fracdiff = {}
diff_series = {}
for symbol, d in stationary_fracdiffs:
ds = [kvp for kvp in stationary_fracdiffs if kvp[0]==symbol]
first_stationary_fracdiff[symbol] = sorted(ds , key = lambda x: x[1], reverse=False)[0][1]
diff_series[symbol] = fracdiff_series_dictionary[(symbol,first_stationary_fracdiff[symbol])]
diff_series[symbol].columns =['diff_close_'+str(symbol)]
for symbol in symbol_list:
print("The lowest stationary fractionally differentiated series for ", symbol," is for d=", first_stationary_fracdiff[symbol],".")
```

**The lowest stationary fractionally differentiated series ****for**** ****SPY**** is ****for**** d****=**** ****0.5**** ****.****
The lowest stationary fractionally differentiated series ****for**** ****VIX****.****CBOE**** is ****for**** d****=**** ****0.0**** ****.**

We check that for this period SPY differentiated by 0.5 yields an statistically stationary series. VIX index value is stationary for d = 0, no differentiation needed. Now VIX time series and SPY differentiated by 0.5 should exhibit well-behaving statistical properties that will permit the use of machine learning tools to generate predictions. We will create the prediction target as "days into the future" movement as up (True) or down (False). In this manner our machine learning model just (as if it were easy) has to learn to classify the prediction input data into one of two classes:

```
prediction_target_days = [1,3,5,7,13,23]
predicted_symbol = vix
predicting_symbol = spy
data = close_prices
data['SPY_1D_Return'] = fracdiff_series_dictionary[(spy,1)]
data['VIX_1D_Return'] = fracdiff_series_dictionary[(vix,1)]
for symbol in symbol_list:
data = data.merge(diff_series[symbol],left_index=True, right_index=True)
for future in prediction_target_days:
data[('fut_dir_'+str(future)+'_'+str(symbol))] = data[[symbol]].shift(-future) > data[[symbol]]
data.dropna(inplace=True)
```

This data is split into a training set and a testing set. We are using a small testing set in this case as we will use cross-validation to train our prediction model and eventually (next publication) run a trading backtest that will serve as the effective validation of the model:

```
# Get sizes of each of the datasets
test_size = 0.1
num_test = int(test_size*len(data))
num_train = len(data)- num_test
print("Training data points: %d " %num_train)
print("Test data points: %d" % num_test)
# Split into train, cv, and test
train = data[:num_train]
test = data[num_train:]
print("Training DataFrame Shape: ",train.shape)
print("Testing DataFrame Shape: ",test.shape)
```

```
Training data points: 2086
Test data points: 231
Training DataFrame Shape: (2086, 17)
Testing DataFrame Shape: (231, 17)
```

Testing and training data can now be scaled. Remember to fit scalers to training data first and use that fit to transform testing data, otherwise data leaks can occur. We will use a minimum-maximum scaler and a standard scaler in sequence. The function becomes a bit monstrous as we are trying to keep the scaling transform data for the future in the form of a dictionary that we can eventually return when we needed.

```
from sklearn.preprocessing import StandardScaler, MinMaxScaler
def scale_data(selected_features, selected_target, predict_window, train, test):
min_max_scaler_parameters = {}
standard_scaler_parameters = {}
scaled_train_features = pd.DataFrame()
scaled_test_features = pd.DataFrame()
for feature in selected_features:
min_max_scaler = MinMaxScaler(feature_range=(-1,1))
min_max_scaler_parameters[feature] = min_max_scaler.fit(np.array(train[feature]).reshape(-1,1))
scaled_train_features[str(feature)+"_scaled"] = min_max_scaler.transform(np.array(train[feature]).reshape(-1,1)).flatten()
standard_scaler = StandardScaler()
standard_scaler_parameters[feature] = standard_scaler.fit(np.array(scaled_train_features[str(feature)+"_scaled"]).reshape(-1,1))
scaled_train_features[str(feature)+"_scaled"] = standard_scaler.transform(np.array(scaled_train_features[str(feature)+"_scaled"]).reshape(-1,1)).flatten()
if test_size == 0.0: continue
scaled_test_features[str(feature)+"_scaled"] = min_max_scaler.transform(np.array(test[feature]).reshape(-1,1)).flatten()
scaled_test_features[str(feature)+"_scaled"] = standard_scaler.transform(np.array(scaled_test_features[str(feature)+"_scaled"]).reshape(-1,1)).flatten()
X_train = np.array(scaled_train_features)
X_test = np.array(scaled_test_features)
y_train = train['fut_dir_'+str(predict_window)+'_'+str(selected_target)]
y_test = test['fut_dir_'+str(predict_window)+'_'+str(selected_target)]
return X_train, X_test, y_train, y_test
```

We have now entries for each day with the 0.5 fradiff of the closing SPY price, the daily change in SPY price and the current value of VIX. We can use this data to predict the future direction and to do so we will feed the data points sequentially into our machine learning model in windows of several days. Our training data will include a look-back time series of our prediction values, containing as many previous' days data as we wish. We can flexibly generate prediction inputs, targets and prediction futures for deeper analysis:

```
def input_features(data, lookback_window, predict_window):
nb_samples = len(data) - lookback_window - predict_window
input_list = [np.expand_dims(data[i:lookback_window+i,:], axis=0) for i in range(nb_samples)] #here nb_samples comes in handy
input_mat = np.concatenate(input_list, axis=0)
return input_mat
def output_targets(data, lookback_window, predict_window):
nb_samples = len(data) - lookback_window - predict_window
input_list = [np.expand_dims(data[i:lookback_window+i][-1], axis=0) for i in range(nb_samples)] #here nb_samples comes in handy
input_mat = np.concatenate(input_list, axis=0)
return input_mat
def generate_windows(X_train, X_test, y_train, y_test, lookback_window, predict_window, predict_symbol):
#format the features into the batch size
X_train_windows = input_features(X_train, lookback_window, predict_window).reshape(-1,lookback_window*len(selected_features))
X_test_windows = input_features(X_test, lookback_window, predict_window).reshape(-1,lookback_window*len(selected_features))
y_train = np.array(train['fut_dir_'+str(predict_window)+"_"+str(predict_symbol)])
y_test = np.array(test['fut_dir_'+str(predict_window)+"_"+str(predict_symbol)])
y_train_windows = output_targets(y_train, lookback_window, predict_window).astype(int)
y_test_windows = output_targets(y_test, lookback_window, predict_window).astype(int)
return X_train_windows, X_test_windows, y_train_windows, y_test_windows
```

Notice that there is a double shape change in the definition of the look-back matrices. This look-back window generator is also used to feed another LSTM model that requires a three dimensional array, it is a valid LSTM input with minor changes. With the data in place let's put the differentiated features to the test with an array of spot models to gain a quick indication for which classification model (out of a subsample of all sklearn classifiers) may have better initial performance:

```
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.model_selection import cross_val_score
def spot_models(X_train_matrix, y_train_matrix):
models = []
models.append(('Logistic Regression', LogisticRegression(solver = 'liblinear')))
models.append(('K-Nearest Neighbours', KNeighborsClassifier()))
models.append(('Random Forest', RandomForestClassifier(n_estimators = 10)))
models.append(('Support Vector Classifier', svm.SVC(gamma='auto')))
models.append(('Multi-layer Perceptron', MLPClassifier(max_iter = 1000, solver = 'sgd', learning_rate='adaptive', hidden_layer_sizes=(3))))
results = []
names = []
tscv = TimeSeriesSplit(n_splits=10)
for name, model in models:
cv_results = cross_val_score(model, X_train_matrix, y_train_matrix, cv=tscv, scoring='roc_auc')
results.append(cv_results)
names.append(name)
print('%s: Mean Acc.: %.2f std: %.2f - min: %.2f - max: %.2f' % (name, cv_results.mean(), cv_results.std(),cv_results.min(), cv_results.max()))
# Compare Algorithms
plt.boxplot(results, labels=names)
plt.title('Algorithm Accuracy Comparison')
plt.show()
```

A __time series cross validation__ model is used, as it can better describe the chronological phenomenon of price change.

We are going to fit our model in sequence to a logistic regression, a multi-layered perceptron, a K-nearest neighbours classifier, a random forest classifier and a support vector machine classifier. We will not worry ourselves with the detailed definition of these models, these are just to gain a high level view into which approach could be better for our test. These models are then fit and scored will all the available input data we have for prediction; stationary fracdiff of SPY prices, current SPY price, current VIX value and the 1 day returns of both SPY and VIX:

```
selected_features = ['diff_close_SPY', vix, 'SPY_1D_Return', spy, 'VIX_1D_Return']
lookback_window = 22
predict_window = 5
selected_target = vix
X_train, X_test, y_train, y_test = scale_data(selected_features, selected_target, predict_window, train, test)
X_train_windows, X_test_windows, y_train_windows, y_test_windows = generate_windows(X_train, X_test, y_train, y_test, lookback_window, predict_window, predict_symbol)
spot_models(X_train_windows, y_train_windows)
```

The accuracy statistics for each of the models is:

**Logistic Regression**: Mean Acc.: 0.56, std: 0.07 - min: 0.47 - max: 0.68
**K-Nearest Neighbours**: Mean Acc.: 0.52, std: 0.04 - min: 0.46 - max: 0.62
**Random Forest**: Mean Acc.: 0.55, std: 0.07 - min: 0.42 - max: 0.67
**Support Vector Classifier**: Mean Acc.: 0.56, std: 0.06 - min: 0.46 - max: 0.70
**Multi-layer Perceptron**: Mean Acc.: 0.56, std: 0.07 - min: 0.45 - max: 0.71

In the form of a boxplot, that could result much more useful:

For very similar mean accuracies of 55% the support vector classifier model offers the greatest potential among the considered classifiers. We will select the __SVC__ model as our baseline and try to improve the performance using a parameter grid search, basically brute forcing our way into the parameters without deeper investigation:

```
from sklearn.model_selection import GridSearchCV
model = svm.SVC()
param_search = {
'kernel': ['linear', 'rbf', 'sigmoid'],
'gamma': [1e-2, 1e-3],
'class_weight' : ['balanced'],
'C': [1, 10, 100],
'decision_function_shape': ['ovo', 'ovr']
}
tscv = TimeSeriesSplit(n_splits=10)
gsearch = GridSearchCV(estimator=model, cv=tscv, param_grid=param_search, scoring = 'roc_auc', verbose=2)
gsearch.fit(X_train_windows, y_train_windows)
best_score = gsearch.best_score_
best_model = gsearch.best_estimator_
```

Verbose option 2 in sklearn parameter grid search will produce a good amount of data and provides indication on the model training times for each of the parameter calculations in case we want to go deeper into parameter selection in a time efficient manner. This is the partial output of the grid search log:

```
Fitting 10 folds for each of 36 candidates, totalling 360 fits
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear, total= 0.0s
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear, total= 0.0s
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear, total= 0.1s
[CV] C=1, class_weight=balanced, decision_function_shape=ovo, gamma=0.01, kernel=linear
```

After finding the best possible model from the parameter search grid it is time to test the model with the validation data we have kept separate so far:

```
print('Best Model Score:', best_score.round(2))
print('Best Model Parameters:', best_model)
validation_score = best_model.score(X_test_windows, y_test_windows).round(2)
print('Best Model Validation Score:', validation_score)
```

**Best Model Score**: 0.58
**Best Model Parameters**: SVC(C=1, cache_size=200, class_weight='balanced', coef0=0.0,
decision_function_shape='ovo', degree=3, gamma=0.001, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
**Best Model Validation Score**: 0.55

So we have been able to classify correctly 55% of the test data points. The test data points are 104 True and 100 False, so this is slightly above a naive classifier. Let's finally investigate if our fracdiff feature was important in obtaining this results by removing it from the training data and repeating the whole process:

**Best Model Score:** 0.58
Best Model Parameters: SVC(C=1, cache_size=200, class_weight='balanced', coef0=0.0,
decision_function_shape='ovo', degree=3, gamma=0.001, kernel='sigmoid',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
**Best Model Validation Score:** 0.49

The fracdiff feature is definitively contributing positively to the score of the model. The favored kernel without the fracdiff feature is the sigmoid kernel instead of the rbf kernel, indicating that the fracdiff feature could be carrying most of the information in the previous model following a gaussian distribution that is lost without it.

It seems that using a fractional differentiation approach to obtain a feature we previously suspected to have predictive power improves the results of our prediction. In our next post we will implement the best model from this research into a validation backtest and evaluate what is the profitability of the predictive power of the model.

Remember that publications from Ostirion.net are not financial advice. Ostirion.net does not hold any positions on any of the mentioned instruments at the time of publication. If you need further information, asset management support, automated trading strategy development or tactical strategy deployment you can __contact us____ __here.