100%(2)2 out of 2 people found this document helpful
This preview shows page 191 - 193 out of 208 pages.
A common example of a low-bias, high-variance model is that of a polynomial spline fitapplied to a non-linear data set.The parameter of the model (the degree of the polynomial)
183could be adjusted to fit such a model very precisely (i.e.low-bias on the training data), butadditions of new points would almost certainly lead to the model having to modify its degree ofpolynomial to fit the new data. This would make it a very high-variance model on the in sampledata. Such a model would likely have very poor predictability or inferential capability on out ofsample data.Overfitting can also manifest itself on the trading strategy and not just the statistical model.For instance, we could optimise the Sharpe ratio by varying entry and exit threshold parameters.While this may improve profitability in the backtest (or minimise risk substantially), it wouldlikely not be behaviour that is replicated when the strategy was deployed live, as we might havebeen fitting such optimisations to noise in the historical data.We will discuss techniques below to minimise overfitting, as much as possible. However onehas to be aware that it is an ever-present danger in both algorithmic trading and statisticalanalysis in general.16.2Model SelectionIn this section we are going to consider how to optimise the statistical model that will underly atrading strategy. In the field of statistics and machine learning this is known asModel Selection.While I won’t present an exhaustive discussion on the various model selection techniques, I willdescribe some of the basic mechanisms such asCross ValidationandGrid Searchthat work wellfor trading strategies.16.2.1Cross ValidationCross Validation is a technique used to assess how a statistical model will generalise to new datathat it has not been exposed to before. Such a technique is usually used on predictive models,such as the aforementioned supervised classifiers used to predict the sign of the following dailyreturns of an asset price series. Fundamentally, the goal of cross validation is to minimise erroron out of sample data without leading to an overfit model.In this section we will describe thetraining/test splitandk-fold cross validation, as well as usetechniques within Scikit-Learn to automatically carry out these procedures on statistical modelswe have already developed.Train/Test SplitThe simplest example of cross validation is known as atraining/test split, or a2-fold crossvalidation.Once a prior historical data set is assembled (such as a daily time series of assetprices), it is split into two components. The ratio of the split is usually varied between 0.5 and0.8. In the latter case this means 80% of the data is used for training and 20% is used for testing.