ssrn-id240727 - Optimization of Trading Physics Models of...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Optimization of Trading Physics Models of Markets Lester Ingber Lester Ingber Research POB 06440 Sears Tower, Chicago, IL 60606 and DRW Investments LLC 311 S Wacker Dr, Ste 900, Chicago, IL 60606 ingber@ingber.com, ingber@alumni.caltech.edu and Radu Paul Mondescu DRW Investments LLC 311 S Wacker Dr, Ste 900, Chicago, IL 60606 rmondescu@drwtrading.com ABSTRACT We describe an end-to-end real-time S&P futures trading system. Inner-shell stochastic nonlinear dynamic models are developed, and Canonical Momenta Indicators (CMI) are derived from a fitted Lagrangian used by outer-shell trading models dependent on these indicators. Recursive and adaptive optimization using Adaptive Simulated Annealing (ASA) is used for fitting parameters shared across these shells of dynamic and trading models. Keywords: Simulated Annealing; Statistical Mechanics; Trading Financial Markets © 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Optimization of Trading -2- Ingber & Mondescu 1. INTRODUCTION 1.1. Approaches Real-world problems are almost intractable analytically, yet methods must be devised to deal with this complexity to extract practical information in finite time. This is indeed true in the field of financial engineering, where time series of various financial instruments reflect nonequilibrium, highly non-linear, possibly even chaotic [1] underlying processes. A further difficulty is the huge amount of data necessary to be processed. Under these circumstances, to develop models and schemes for automated, profitable trading is a non-trivial task. In the context of this paper, it is important to stress that dealing with such complex systems invariably requires modeling of dynamics, modeling of actions on these dynamics, and algorithms to fit parameters in these models to real data. We have elected to use methods of mathematical physics for our models of the dynamics, artificial intelligence (AI) heuristics for our models of trading rules acting on indicators derived from our dynamics, and methods of sampling global optimization for fitting our parameters. Too often there is confusion about how these three elements are being used for a complete system. For example, in the literature there often is discussion of neural net trading systems or genetic algorithm trading systems. However, neural net models (used for either or both models discussed here) also require some method of fitting their parameters, and genetic algorithms must have some kind of cost function or process specified to sample a parameter space, etc. Some powerful methods have emerged during years, appearing from at least two directions: One direction is based on inferring rules from past and current behavior of market data leading to learningbased, inductive techniques, such as neural networks, or fuzzy logic. Another direction starts from the bottom-up, trying to build physical and mathematical models based on different economic prototypes. In many ways, these two directions are complementary and a proper understanding of their main strengths and weaknesses should lead to synergetic effects beneficial to their common goals. Among approaches in the first direction, neural networks already have won a prominent role in the financial community, due to their ability to handle large quantities of data, and to uncover and model nonlinear functional relationships between various combinations of fundamental indicators and price data [2,3]. Optimization of Trading -3- Ingber & Mondescu In the second direction we can include models based on non-equilibrium statistical mechanics [4] fractal geometry [5], turbulence [6], spin glasses and random matrix theory [7], renormalization group [8], and gauge theory [9]. Although the very complex nonlinear multivariate character of financial markets is recognized [10], these approaches seem to have had a lesser impact on current quantitative finance practice, although it is becoming increasing clear that this direction can lead to practical trading strategies and models. To bridge the gap between theory and practice, as well as to afford a comparison with neural networks techniques, here we focus on presenting an effective trading system of S&P futures, anchored in the physical principles of nonequilibrium statistical mechanics applied to financial markets [4,11]. Starting with nonlinear, multivariate, nonlinear stochastic differential equation descriptions of the price evolution of cash and futures indices, we build an algebraic cost function in terms of a Lagrangian. Then, a maximum likelihood fit to the data is performed using a global optimization algorithm, Adaptive Simulated Annealing (ASA) [12]. As firmly rooted in field theoretical concepts, we derive market canonical momenta indicators, and we use these as technical signals in a recursive ASA optimization that tunes the outer-shell of trading rules. We do not employ metaphors for these physical indicators, but rather derive them directly from models fit to data. The outline of the paper is as follows: Just below we briefly discuss the optimization method and momenta indicators. In the next three sections we establish the theoretical framework supporting our model, the statistical mechanics approach, and the optimization method. In Section 5 we detail the trading system, and in Section 6 we describe our results. Our conclusions are presented in Section 7. 1.2. Optimization Large-scale, non-linear fits of stochastic nonlinear forms to financial data require methods robust enough across data sets. (Just one day, tick data for regular trading hours could reach 10,000−30,000 data points.) Simple regression techniques exhibit deficiencies with respect to obtaining reasonable fits. They too often get trapped in local minima typically found in nonlinear stochastic models of such data. ASA is a global optimization algorithm that has the advantage — with respect to other global optimization methods as genetic algorithms, combinatorial optimization, etc. — not only to be efficient in its importance-sampling search strategy, but to have the statistical guarantee of finding the best Optimization of Trading -4- Ingber & Mondescu optima [13,14]. This gives some confidence that a global minimum can be found, of course provided care is taken as necessary to tune the algorithm [15]. It should be noted that such powerful sampling algorithms also are often required by other models of complex systems than those we use here [16]. For example, neural network models have taken advantage of ASA [17-19], as have other financial and economic studies [20,21]. 1.3. Indicators In general, neural network approaches attempt classification and identification of patterns, or try forecasting patterns and future evolution of financial time series. Statistical mechanical methods attempt to find dynamic indicators derived from physical models based on general principles of non-equilibrium stochastic processes that reflect certain market factors. These indicators are used subsequently to generate trading signals or to try forecasting upcoming data. In this paper, the main indicators are called Canonical Momenta Indicators (CMI), as they faithfully mathematically carry the significance of market momentum, where the “mass” is inversely proportional to the price volatility (the “masses” are just the elements of the metric tensor in this Lagrangian formalism) and the “velocity” is the rate of price changes. 2. MODELS 2.1. Langevin Equations for Random Walks The use of Brownian motion as a model for financial systems is generally attributed to Bachelier [22], though he incorrectly intuited that the noise scaled linearly instead of as the square root relative to the random log-price variable. Einstein is generally credited with using the correct mathematical description in a larger physical context of statistical systems. However, several studies imply that changing prices of many markets do not follow a random walk, that they may have long-term dependences in price correlations, and that they may not be efficient in quickly arbitraging new information [23-25]. A random walk for returns, rate of change of prices over prices, is described by a Langevin equation with simple additive noise η , typically representing the continual random influx of information into the market. Optimization of Trading -5- Ingber & Mondescu ˙ M = − f + gη , ˙ M = dM / dt , < η (t ) >η = 0 , < η (t ), η (t ′) >η = δ (t − t ′) , (1) where f and g are constants, and M is the logarithm of (scaled) price, M (t ) = log( P (t )/ P (t − dt )) . Price, ( ) although the most dramatic observable, may not be the only appropriate dependent variable or order parameter for the system of markets [26]. This possibility has also been called the “semistrong form of the efficient market hypothesis” [23]. The generalization of this approach to include multivariate nonlinear nonequilibrium markets led to a model of statistical mechanics of financial markets (SMFM) [11]. 2.2. Adaptive Optimization of F x Models Our S&P model for the futures F is dF = µ dt + σ F x dz , < dz > = 0 , < dz (t ) dz (t ′) > = dt δ (t − t ′) We have used this model in several ways to fit the distribution’s volatility defined in terms of a scale and an exponent of the independent variable [4]. A major component of our trading system is the use of adaptive optimization, essentially constantly retuning the parameters of our dynamic model each time new data is encountered in our training, testing and real-time applications. The parameters { µ , σ } are constantly tuned using a quasi-local simplex code [27,28] included with the ASA (Adaptive Simulated Annealing) code [12]. We have tested several quasi-local codes for this kind of trading problem, versus using robust ASA adaptive optimizations, and the faster quasi-local codes seem to work quite well for adaptive updates after a zeroth order parameters set is found by ASA [29,30]. Optimization of Trading -6- Ingber & Mondescu 3. STATISTICAL MECHANICS OF FINANCIAL MARKETS (SMFM) 3.1. Statistical Mechanics of Large Systems Aggregation problems in nonlinear nonequilibrium systems typically are “solved” (accommodated) by having new entities/languages developed at these disparate scales in order to efficiently pass information back and forth between scales. This is quite different from the nature of quasi-equilibrium quasi-linear systems, where thermodynamic or cybernetic approaches are possible. These thermodynamic approaches typically fail for nonequilibrium nonlinear systems. Many systems are aptly modeled in terms of multivariate differential rate-equations, known as Langevin equations [31], ˙G M = f G + gG η j , (G = 1, . . . , Λ) , ( j = 1, . . . , N ) , ˆj ˙G M = dM G / dt , < η j (t ) >η = 0 , < η j (t ), η j ′ (t ′) >η = δ jj ′δ (t − t ′) , (2) where f G and gG are generally nonlinear functions of mesoscopic order parameters M G , j is a ˆj microscopic index indicating the source of fluctuations, and N ≥ Λ. The Einstein convention of summing over repeated indices is used. Vertical bars on an index, e.g., |j|, imply no sum is to be taken on repeated indices. Via a somewhat lengthy, albeit instructive calculation, outlined in several other papers [11,32,33], ¨ involving an intermediate derivation of a corresponding Fokker-Planck or Schrodinger-type equation for the conditional probability distribution P [ M (t )| M (t 0 )], the Langevin rate Eq. (2) is developed into the more useful probability distribution for M G at long-time macroscopic time event t = (u + 1)θ + t 0 , in terms of a Stratonovich path-integral over mesoscopic Gaussian conditional probabilities [34-38]. Here, macroscopic variables are defined as the long-time limit of the evolving mesoscopic system. ¨ The corresponding Schrodinger-type equation is [36,37] ∂ P /∂t = 1 2 ( gGG ′ P ),GG ′ − ( gG P ),G + V , ˆ j ˆk gGG ′ = k T δ jk gG gG ′ , Optimization of Trading 1 gG = f G + 2 δ -7- jk G ′ G g j g k ,G ′ ˆˆ Ingber & Mondescu , [. . .],G = ∂[. . .]/∂ M G . (3) This is properly referred to as a Fokker-Planck equation when V ≡ 0. Note that although the partial differential Eq. (3) contains information regarding M G as in the stochastic differential Eq. (2), all references to j have been properly averaged over. I.e., gG in Eq. (2) is an entity with parameters in both ˆj microscopic and mesoscopic spaces, but M is a purely mesoscopic variable, and this is more clearly reflected in Eq. (3). The path integral representation is given in terms of the “Feynman” Lagrangian L . P [ M t | M t 0 ] dM (t ) = ∫ . . . ∫ DM exp(−S )δ [ M (t0) = M0]δ [ M (t ) = M t ] , t − S = k T1 min ∫ dt ′ L , t 0 DM = lim u+1 G Π g1/2 Π (2π θ )−1/2 dM v , G u→∞ v =1 G ˙ L(M , M G , t) = hG = gG − 1 2 1 2 G ˙ ˙ ( M − hG ) gGG ′ ( M G′ − hG ′ ) + 1 2 hG ;G + R/6 − V , g−1/2 ( g1/2 gGG ′ ),G ′ , gGG ′ = ( gGG ′ )−1 , g = det( gGG ′ ) , F hG ;G = hG + ΓGF hG = g−1/2 ( g1/2 hG ),G , ,G Γ F ≡ g LF [ JK , L ] = g LF ( g JL , K + g KL , J − g JK , L ) , JK R = g JL R JL = g JL g JK R FJKL , R FJKL = 1 2 MN MN ( g FK , JL − g JK , FL − g FL , JK + g JL , FK ) + g MN (Γ FK Γ JL − Γ FL Γ JK ) . (4) Optimization of Trading -8- Ingber & Mondescu Mesoscopic variables have been defined as M G in the Langevin and Fokker-Planck representations, in terms of their development from the microscopic system labeled by j . The Riemannian curvature term R arises from nonlinear gGG ′ , which is a bona fide metric of this space [36]. Even if a stationary solution, ˙G ˙G i.e., M = 0, is ultimately sought, a necessarily prior stochastic treatment of M terms gives rise to these Riemannian “corrections.” Even for a constant metric, the term hG ;G contributes to L for a nonlinear mean hG . V may include terms such as Σ′ JT ′G M G , where the Lagrange multipliers JT ′G are constraints T on M G , which are advantageously modeled as extrinsic sources in this representation; they too may be time-dependent. For our purposes, the above Feynman Lagrangian defines a kernel of the short-time conditional probability distribution, in the curved space defined by the metric, in the limit of continuous time, whose iteration yields the solution of the previous partial differential equation Eq. (3). This differs from the Lagrangian which satisfies the requirement that the action is stationary to the first order in dt — the WKBJ approximation, but which does not include the first-order correction to the WKBJ approximation as does the Feynman Lagrangian. This latter Lagrangian differs from the Feynman Lagrangian, essentially by replacing R/6 above by R/12 [39]. In this sense, the WKBJ Lagrangian is more useful for some theoretical discussions [40]. However, the use of the Feynman Lagrangian coincides with the numerical method we use for long-time development of our distributions using our PATHINT code for other financial products, e.g., options [4]. This also is consistent with our use of relatively short-time “forecast” of data points using the most probable path [41] dM G / dt = gG − g1/2 ( g−1/2 gGG ′ ), G ′ . (5) Using the variational principle, J TG may also be used to constrain M G to regions where they are empirically bound. More complicated constraints may be affixed to L using methods of optimal control theory [42]. With respect to a steady state P , when it exists, the information gain in state P is defined by ϒ[ P ] = ∫ . . . ∫ DM ′ P ln (P/P) , DM ′ = DM / dM u+1 . (6) ˆ In the economics literature, there appears to be sentiment to define Eq. (2) by the Ito, rather than the ˆ Stratonovich prescription. It is true that Ito integrals have Martingale properties not possessed by Optimization of Trading -9- Ingber & Mondescu Stratonovich integrals [43] which leads to risk-neural theorems for markets [44,45], but the nature of the proper mathematics — actually a simple transformation between these two discretizations — should eventually be determined by proper aggregation of relatively microscopic models of markets. It should be noted that virtually all investigations of other physical systems, which are also continuous time models of discrete processes, conclude that the Stratonovich interpretation coincides with reality, when multiplicative noise with zero correlation time, modeled in terms of white noise η j , is properly considered as the limit of real noise with finite correlation time [46]. The path integral succinctly demonstrates the ˆ difference between the two: The Ito prescription corresponds to the prepoint discretization of L , wherein ˙ θ M (t ) → M v+1 − M v and M (t ) → M v . The Stratonovich prescription corresponds to the midpoint 1 ˙ discretization of L , wherein θ M (t ) → M v+1 − M v and M (t ) → ( M v+1 + M v ). In terms of the functions 2 ˆ appearing in the Fokker-Planck Eq. (3), the Ito prescription of the prepoint discretized Lagrangian, L I , is relatively simple, albeit deceptively so because of its nonstandard calculus. 1 ˙G ˙ G′ ˙G L I ( M , M G , t ) = ( M − gG ) gGG ′ ( M − gG ′ ) − V . 2 (7) ˆ In the absence of a nonphenomenological microscopic theory, the difference between a Ito prescription and a Stratonovich prescription is simply a transformed drift [39]. There are several other advantages to Eq. (4) over Eq. (2). Extrema and most probable states of M G , < M G > are simply derived by a variational principle, similar to conditions sought in previous < >, studies [47]. In the Stratonovich prescription, necessary, albeit not sufficient, conditions are given by δ G L = L ,G − L ,G :t = 0 , ˙ G′ ˙ G′ L ,G :t = L ,GG ′ M + L ,G G ′ M . ˙˙ ¨ ˙ ˙ (8) G G ˙G For stationary states, M = 0, and ∂ L /∂ M = 0 defines << M >>, where the bars identify stationary variables; in this case, the macroscopic variables are equal to their mesoscopic counterparts. Note that L is not the stationary solution of the system, e.g., to Eq. (3) with ∂ P /∂t = 0. However, in some cases [48], L is a definite aid to finding such stationary states. Many times only properties of stationary states are ˙G examined, but here a temporal dependence is included. E.g., the M terms in L permit steady states and their fluctuations to be investigated in a nonequilibrium context. Note that Eq. (8) must be derived from the path integral, Eq. (4), which is at least one reason to justify its development. Optimization of Trading - 10 - Ingber & Mondescu 3.2. Algebraic Complexity Yields Simple Intuitive Results It must be emphasized that the output of this formalism is not confined to complex algebraic forms or tables of numbers. Because L possesses a variational principle, sets of contour graphs, at different long-time epochs of the path-integral of P over its variables at all intermediate times, give a visually intuitive and accurate decision-aid to view the dynamic evolution of the scenario. For example, this Lagrangian approach permits a quantitative assessment of concepts usually only loosely defined. “Momentum” = ΠG = “Mass” = gGG ′ = “Force” = ∂L , ∂(∂ M G /∂t ) ∂2 L , ∂(∂ M G /∂t )∂(∂ M G ′ /∂t ) ∂L , ∂M G “F = ma”: δ L = 0 = ∂L ∂ ∂L − , G ∂t ∂(∂ M G /∂t ) ∂M (9) where M G are the variables and L is the Lagrangian. These physical entities provide another form of intuitive, but quantitatively precise, presentation of these analyses. For example, daily newspapers use some of this terminology to discuss the movement of security prices. In this paper, the ΠG serve as canonical momenta indicators (CMI) for these systems. 3.2.1. Derived Canonical Momenta Indicators (CMI) The extreme sensitivity of the CMI gives rapid feedback on changes in trends as well as the volatility of markets, and therefore are good indicators to use for trading rules [29]. A time-locked moving average provides manageable indicators for trading signals. This current project uses such CMI developed as a byproduct of the ASA fits described below. 3.3. Correlations In this paper we report results of our one-variable trading model. However, it is straightforward to include multi-variable trading models in our approach, and we have done this, for example, with coupled cash and futures S&P markets. Optimization of Trading - 11 - Ingber & Mondescu Correlations between variables are modeled explicitly in the Lagrangian as a parameter usually designated ρ . This section uses a simple two-factor model to develop the correspondence between the correlation ρ in the Lagrangian and that among the commonly written Wiener distribution dz . Consider coupled stochastic differential equations for futures F and cash C : dF = f F ( F , C ) dt + g F ( F , C )σ F dz F , ˆ dC = f C ( F , C ) dt + gC ( F , C )σ C dzC , ˆ < dz i >= 0 , i = { F , C } , < dz i (t ) dz j (t ′) >= dt δ (t − t ′) , i = j , < dz i (t ) dz j (t ′) >= ρ dt δ (t − t ′) , i ≠ j , 0 , , t ≠ t ′ , 1 , t = t′ , δ (t − t ′) = (10) where < . > denotes expectations with respect to the multivariate distribution. ˆ These can be rewritten as Langevin equations (in the Ito prepoint discretization) dF / dt = f F + g F σ F (γ +η 1 + sgn ρ γ −η 2 ) , ˆ dC / dt = gC + gC σ C (sgn ρ γ −η 1 + γ +η 2 ) , ˆ γ± = 1 √2 [1 ± (1 − ρ 2 )1/2 ]1/2 , ni = ( dt )1/2 pi , where p1 and p2 are independent [0,1] Gaussian distributions. The equivalent short-time probability distribution, P , for the above set of equations is P = g1/2 (2π dt )−1/2 exp(− Ldt ) , L= 1 2 M † gM , (11) Optimization of Trading M= - 12 - Ingber & Mondescu F dF / dt − f , dC / dt − f C g = det( g) . (12) g, the metric in { F , C }-space, is the inverse of the covariance matrix, ( g F σ )2 ˆ g−1 = F F ˆ ˆC ρ g g σ FσC ρ g F gC σ F σ C ˆˆ . ( gC σ C )2 ˆ (13) The CMI indicators are given by the formulas ΠF = ( dF / dt − f F ) ρ ( dC / dt − f C ) , − FC ( g F σ F )2 (1 − ρ 2 ) g g σ F σ C (1 − ρ 2 ) ˆ ˆˆ ΠC = ρ ( dF / dt − f F ) ( dC / dt − f C ) −CF . ˆˆ ( gC σ C )2 (1 − ρ 2 ) g g σ C σ F (1 − ρ 2 ) ˆ (14) 3.4. ASA Outline The algorithm Adaptive Simulated Annealing (ASA) fits short-time probability distributions to observed data, using a maximum likelihood technique on the Lagrangian. This algorithm has been developed to fit observed data to a theoretical cost function over a D-dimensional parameter space [13], adapting for varying sensitivities of parameters during the fit. The ASA code can be obtained at no charge, via WWW from http://www.ingber.com/ or via FTP from ftp.ingber.com [12]. 3.4.1. General Description Simulated annealing (SA) was developed in 1983 to deal with highly nonlinear problems [49], as an extension of a Monte-Carlo importance-sampling technique developed in 1953 for chemical physics problems. It helps to visualize the problems presented by such complex systems as a geographical terrain. For example, consider a mountain range, with two “parameters,” e.g., along the North−South and East−West directions. We wish to find the lowest valley in this terrain. SA approaches this problem similar to using a bouncing ball that can bounce over mountains from valley to valley. We start at a high “temperature,” where the temperature is an SA parameter that mimics the effect of a fast moving particle in a hot object like a hot molten metal, thereby permitting the ball to make very high bounces and being Optimization of Trading - 13 - Ingber & Mondescu able to bounce over any mountain to access any valley, given enough bounces. As the temperature is made relatively colder, the ball cannot bounce so high, and it also can settle to become trapped in relatively smaller ranges of valleys. We imagine that our mountain range is aptly described by a “cost function.” We define probability distributions of the two directional parameters, called generating distributions since they generate possible valleys or states we are to explore. We define another distribution, called the acceptance distribution, which depends on the difference of cost functions of the present generated valley we are to explore and the last saved lowest valley. The acceptance distribution decides probabilistically whether to stay in a new lower valley or to bounce out of it. All the generating and acceptance distributions depend on temperatures. In 1984 [50], it was established that SA possessed a proof that, by carefully controlling the rates of cooling of temperatures, it could statistically find the best minimum, e.g., the lowest valley of our example above. This was good news for people trying to solve hard problems which could not be solved by other algorithms. The bad news was that the guarantee was only good if they were willing to run SA forever. In 1987, a method of fast annealing (FA) was developed [51], which permitted lowering the temperature exponentially faster, thereby statistically guaranteeing that the minimum could be found in some finite time. However, that time still could be quite long. Shortly thereafter, Very Fast Simulated Reannealing (VFSR) was developed in 1987 [13], now called Adaptive Simulated Annealing (ASA), which is exponentially faster than FA. ASA has been applied to many problems by many people in many disciplines [15,16,52]. The feedback of many users regularly scrutinizing the source code ensures its soundness as it becomes more flexible and powerful. 3.4.2. Mathematical Outline i ASA considers a parameter α k in dimension i generated at annealing-time k with the range i α k ∈[ Ai , Bi ] , calculated with the random variable y i , i i α k +1 = α k + y i ( Bi − Ai ) , (15) Optimization of Trading - 14 - Ingber & Mondescu y i ∈[−1, 1] . (16) The generating function gT ( y ) is defined, D gT ( y ) = Π i =1 D 1 i ≡ Π gT ( y i ) , 2(| y i | + T i ) ln(1 + 1/T i ) i=1 (17) where the subscript i on T i specifies the parameter index, and the k -dependence in T i ( k ) for the annealing schedule has been dropped for brevity. Its cumulative probability distribution is yD y1 ∫ −∫1 GT ( y) = ... −1 i GT ( yi ) = D i dy ′1 . . . dy ′ D gT ( y ′) ≡ Π G T ( y i ) , i =1 1 sgn ( y i ) ln(1 + | y i |/T i ) + . 2 2 ln(1 + 1/T i ) (18) y i is generated from a ui from the uniform distribution ui ∈U [0, 1] , 1 y i = sgn (ui − )T i [(1 + 1/T i )|2u −1| − 1] . i 2 (19) It is straightforward to calculate that for an annealing schedule for T i T i ( k ) = T 0i exp(−c i k 1/ D ) , (20) a global minima statistically can be obtained. I.e., ∞ ∞ 0 0 D 1 1 Σ g k ≈ Σ [ Π 2| yi |ci ] k = ∞ . i =1 k k (21) Control can be taken over c i , such that T fi = T 0i exp(− m i ) when k f = exp ni , c i = m i exp(− ni / D) , (22) where m i and ni can be considered “free” parameters to help tune ASA for specific problems. ASA has over 100 OPTIONS available for tuning. A few important ones were used in this project. Optimization of Trading - 15 - Ingber & Mondescu 3.4.3. Reannealing Whenever doing a multi-dimensional search in the course of a complex nonlinear physical problem, inevitably one must deal with different changing sensitivities of the α i in the search. At any given annealing-time, the range over which the relatively insensitive parameters are being searched can be “stretched out” relative to the ranges of the more sensitive parameters. This can be accomplished by periodically rescaling the annealing-time k , essentially reannealing, every hundred or so acceptanceevents (or at some user-defined modulus of the number of accepted or generated states), in terms of the sensitivities si calculated at the most current minimum value of the cost function, C , si = ∂C /∂α i . (23) In terms of the largest si = smax , a default rescaling is performed for each k i of each parameter dimension, whereby a new index k ′i is calculated from each k i , k i → k ′i , T ′ik ′ = T ik ( smax / si ) , k ′i = ( ln(T i0 /T ik ′ )/c i ) D . (24) T i0 is set to unity to begin the search, which is ample to span each parameter dimension. 3.4.4. Quenching Another adaptive feature of ASA is its ability to perform quenching in a methodical fashion. This is applied by noting that the temperature schedule above can be redefined as T i ( k i ) = T 0i exp(−c i k i Qi / D ), c i = m i exp(− ni Qi / D) , (25) in terms of the “quenching factor” Qi . The sampling proof fails if Qi > 1 as D Σ Π 1/k Q /D = Σ 1/k Q < ∞ . k k i i (26) This simple calculation shows how the “curse of dimensionality” arises, and also gives a possible way of living with this disease. In ASA, the influence of large dimensions becomes clearly focussed on Optimization of Trading - 16 - Ingber & Mondescu the exponential of the power of k being 1/ D, as the annealing required to properly sample the space becomes prohibitively slow. So, if resources cannot be committed to properly sample the space, then for some systems perhaps the next best procedure may be to turn on quenching, whereby Qi can become on the order of the size of number of dimensions. The scale of the power of 1/ D temperature schedule used for the acceptance function can be altered in a similar fashion. However, this does not affect the annealing proof of ASA, and so this may used without damaging the sampling property. 3.4.5. Avoiding Repeating Cost Functions Doing a recursive optimization is very CPU expensive, as essentially the cross-product of parameter spaces among the various levels of optimization is required. Therefore, we have used an ASA OPTION for some of the parameters in the outer-shell trading model optimization of training sets, ASA_QUEUE, which sets up a first-in first-out (FIFO) queue, of user-defined size Queue_Size to collect generated states. When a new state is generated, its parameters are tested, within specified resolutions of a user-defined array Queue_Resolution. When parameters sets are repeated within this queue, the saved value of the cost function is returned without having to repeat the calculation. 3.4.6. Multiple Local Minima Our criteria for the global minimum of our cost function is minus the largest profit over a selected training data set (or in some cases, this value divided by the maximum drawdown). However, in many cases this may not give us the best set of parameters to find profitable trading in test sets or in real-time trading. Other considerations such as the total number of trades developed by the global minimum versus other close local minima may be relevant. For example, if the global minimum has just a few trades, while some nearby local minima (in terms of the value of the cost function) have many trades and was profitable in spite of our slippage factors, then the scenario with more trades might be more statistically dependable to deliver profits across testing and real-time data sets. Therefore, for the outer-shell global optimization of training sets, we have used an ASA OPTION, MULTI_MIN, which saves a user-defined number of closest local minima within a user-defined resolution Optimization of Trading - 17 - Ingber & Mondescu of the parameters. We then examine these results under several testing sets. 4. TRADING SYSTEM 4.1. Use of CMI As the CMI formalism carries the relevant information regarding the prices dynamics, we have used it as a signal generator for an automated trading system for S&P futures. Based on a previous work [30] applied to daily closing data, the overall structure of the trading ˆ system consists in 2 layers, as follows: We first construct the “short-time” Lagrangian function in the Ito representation (with the notation introduced in Section 3.3) 2 1 dF i F L (i |i − 1) = −f 2σ 2 F i2−x1 dt (27) with i the post-point index, corresponding to the one factor price model dF = f F dt + σ F x dz (t ) , (28) where f F and σ > 0 are taken to be constants, F (t ) is the S&P future price, and dz is the standard Gaussian noise with zero mean and unit standard deviation. We perform a global, maximum likelihood fit to the whole set of price data using ASA. This procedure produces the optimization parameters { x , f F } that are used to generate the CMI. One computational approach was to fix the diffusion multiplier σ to 1 during training for convenience, but used as free parameters in the adaptive testing and real-time fits. Another approach was to fix the scale of the volatility, using an improved model, dF = f F dt + σ x F dz (t ) , < F > (29) where σ now is calculated as the standard deviation of the price increments ∆ F / dt 1/2 , and < F > is just the average of the prices. As already remarked, to enhance the CMI sensitivity and response time to local variations (across a certain window size) in the distribution of price increments, the momenta are generated applying an adaptive procedure, i.e., after each new data reading another set of { f F , σ } parameters are calculated for the last window of data, with the exponent x — a contextual indicator of the noise statistics — fixed to the Optimization of Trading - 18 - Ingber & Mondescu value obtained from the global fit. The CMI computed in this manner are fed into the outer shell of the trading system, where an AItype optimization of the trading rules is executed, using ASA once again. The trading rules are a collection of logical conditions among the CMI, prices and optimization parameters that could be window sizes, time resolutions, or trigger thresholds. Based on the relationships between CMI and optimization parameters, a trading decision is made. The cost function in the outer shell is either the overall equity or the risk-adjusted profit (essentially the return). The inner and outer shell optimizations are coupled through some of the optimization parameters (e.g., time resolution of the data, window sizes), which justifies the recursive nature of the optimization. Next, we describe in more details the concrete implementation of this system. 4.2. Data Processing The CMI formalism is general and by construction permits us to treat multivariate coupled markets. In certain conditions (e.g., shorter time scales of data), and also due to superior scalability across different markets, it is desirable to have a trading system for a single instrument, in our case the S&P futures contracts that are traded electronically on Chicago Mercantile Exchange (CME). The focus of our system was intra-day trading, at time scales of data used in generating the buy/sell signals from 10 to 60 secs. In particular, we here give some results obtained when using data having a time resolution ∆t of 55 secs (the time between consecutive data elements is 55 secs). This particular choice of time resolution reflects the set of optimization parameters that have been applied in actual trading. It is important to remark that a data point in our model does not necessarily mean an actual tick datum. For some trading time scales and for noise reduction purposes, data is pre-processed into sampling bins of length ∆t using either a standard averaging procedure or spectral filtering (e.g., wavelets, Fourier) of the tick data. Alternatively, the data can be defined in block bins that contain disjoint sets of averaged tick data, or in overlapping bins of widths ∆t that update at every ∆t ′ < ∆t , such that an effective resolution ∆t ′ shorter than the width of the sampling bin is obtained. We present here work in which we have used disjoint block bins and a standard average of the tick data with time stamps falling within the bin width. Optimization of Trading - 19 - Ingber & Mondescu In Figs. 1 and 2 we present examples of S&P futures data sampled with 55 secs resolution. We remark that there are several time scales — from mins to one hour — at which an automated trading system might extract profits. Fig. 2 illustrates the sustained short trading region of 1.5 hours and several shorter long and short trading regions of about 10-20 mins. Fig. 1 illustrates that the profitable regions are prominent even for data representing a relatively flat market period. I.e., June 20 shows an uptrend region of about 1 hour 20 mins and several short and long trading domains between 10 mins and 20 mins. In both situations, there are a larger number of opportunities at time resolutions smaller than 5 mins. The time scale at which we sample the data for trading is itself a parameter that is extracted from the optimization of the trading rules and of the Lagrangian cost function Eq. (27). This is one of the coupling parameters between the inner- and the outer-shell optimizations. 4.3. Inner-Shell Optimization A cycle of optimization runs has three parts, training and testing, and finally real-time use — a variant of testing. Training consists in choosing a data set and performing the recursive optimization, which produces optimization parameters for trading. In our case there are six parameters: the time resolution ∆t of price data, the length of window W used in the local fitting procedures and in computation of moving averages of trading signals, the drift f F , volatility coefficient σ and exponent x from Eq. (28), and a multiplicative factor M necessary for the trading rules module, as discussed below. The optimization parameters computed from the training set are applied then to various test sets and final profit/loss analysis are produced. Based on these, the best set of optimization parameters are chosen to be applied in real-time trading runs. We remark once again that a single training data set could support more than one profitable sets of parameters and can be a function of the trader’s interest and the specific market dynamics targeted (e.g., short/long time scales). The optimization parameters corresponding to the global minimum in the training session may not necessarily represent the parameters that led to robust profits across real-time data. The training optimization occurs in two inter-related stages. An inner-shell maximum likelihood optimization over all training data is performed. The cost function that is fitted to data is the effective action constructed from the Lagrangian Eq. (27) including the pre-factors coming from the measure element in the expression of the short-time probability distribution Eq. (12). This is based on the fact [39] Optimization of Trading - 20 - Ingber & Mondescu that in the context of Gaussian multiplicative stochastic noise, the macroscopic transition probability P ( F , t | F ′, t ′) to start with the price F ′ at t ′ and reach the price F at t is determined by the short-time Lagrangian Eq. (27), P ( F , t | F ′, t ′) = 1 N exp − Σ L (i |i − 1) dt i , 2 F 2 x dt )1/2 i=1 (2π σ i−1 i (30) with dt i = t i − t i−1 . Recall that the main assumption of our model is that price increments (or the logarithm of price ratios, depending on which variables are considered independent) could be described by a system of coupled stochastic, non-linear equations as in Eq. (10). These equations are deceptively simple in structure, yet depending on the functional form of the drift coefficients and the multiplicative noise, they could describe a variety of interactions between financial instruments in various market conditions (e.g., constant elasticity of variance model [53], stochastic volatility models, etc.). In particular, this type of models include the case of Black-Scholes price dynamics ( x = 1). In the system presented here, we have applied the model from Eq. (28). The fitted parameters were the drift coefficient f F and the exponent x . In the case of a coupled futures and cash system, besides the corresponding values of f F and x for the cash index, another parameter, the correlation coefficient ρ as introduced in Eq. (10), must be considered. 4.4. Trading Rules (Outer-Shell) Recursive Optimization In the second part of the training optimization, we calculate the CMI and execute trades as required by a selected set of trading rules based on CMI values, price data or combinations of both indicators. Recall that three external shell optimization parameters are defined: the time resolution ∆t of the data expressed as the time interval between consecutive data points, the window length W (in number of time epochs or data points) used in the adaptive calculation of CMI, and a numerical coefficient M that scales the momentum uncertainty discussed below. At each moment a local refit of f F and σ over data in the local window W is executed, moving the window M across the training data set and using the zeroth order optimization parameters f F and x resulting from the inner-shell optimization as a first guess. It was found that a faster quasi-local code is sufficient for computational purposes for these adaptive updates. In more complicated models, ASA can be successfully applied recursively, although in real-time trading the response time of the system is a Optimization of Trading - 21 - Ingber & Mondescu major factor that requires attention. All expressions that follow can be generalized to coupled systems in the manner described in Section 3. Here we use the one factor nonlinear model given by Eq. (28). At each time epoch we calculate the following momentum related quantities: ΠF = 1 σ 2 F 2x F Π0 = − dF − fF , dt fF , σ 2 F 2x ∆Π F = < (Π F − < Π F >)2 >1/2 = 1 σ F x √ dt , (31) where we have used < Π F >= 0 as implied by Eqs. (28) and (27). In the previous expressions, Π F is the F CMI, Π0 is the neutral line or the momentum of a zero change in prices, and ∆Π F is the uncertainty of momentum. The last quantity reflects the Heisenberg principle, as derived from Eq. (28) by calculating ∆ F ≡ < ( dF − < dF >)2 >1/2 = σ F x √ , dt ∆Π F ∆ F ≥ 1 , (32) ˆ where all expectations are in terms of the exact noise distribution, and the calculation implies the Ito approximation (equivalent to considering non-anticipative functions). Various moving averages of these momentum signals are also constructed. Other dynamical quantities, as the Hamiltonian, could be used as well. (By analogy to the energy concept, we found that the Hamiltonian carries information regarding the overall trend of the market, giving another useful measure of price volatility.) Regarding the practical implementation of the previous relations for trading, some comments are necessary. In terms of discretization, if the CMI are calculated at epoch i , then dF i = F i − F i−1 , ˆ dt i = t i − t i−1 = ∆t , and all prefactors are computed at moment i − 1 by the Ito prescription (e.g., σ F x = σ F ix 1 ). The momentum uncertainty band ∆Π F can be calculated from the discretized theoretical − value Eq. (31), or by computing the estimator of the standard deviation from the actual time series of Π F . There are also two ways of calculating averages over CMI values: One way is to use the set of local optimization parameters { f F , σ } obtained from the local fit procedure in the current window W for all CMI data within that window (local-model average). The second way is to calculate each CMI in the Optimization of Trading - 22 - Ingber & Mondescu current local window W with another set { f F , σ } obtained from a previous local fit window measured from the CMI data backwards W points (multiple-models averaged, as each CMI corresponds to a different model in terms of the fitting parameters { f F , σ }). The last observation is that the neutral line divides all CMI in two classes: long signals, when F Π F > Π0 , as any CMI satisfying this condition indicates a positive price change, and short signals when F Π F < Π0 , which reflects a negative price change. After the CMI are calculated, based on their meaning as statistical momentum indicators, trades are executed following a relatively simple model: Entry in and exit from a long (short) trade points are defined as points where the value of CMIs is greater (smaller) than a certain fraction of the uncertainty band M ∆Π F (− M ∆Π F ), where M is the multiplicative factor mentioned in the beginning of this subsection. This is a choice of a symmetric trading rule, as M is the same for long and short trading signals, which is suitable for volatile markets without a sustained trend, yet without diminishing too severely profits in a strictly bull or bear region. Inside the momentum uncertainty band, one could define rules to stay in a previously open trade, or exit immediately, because by its nature the momentum uncertainty band implies that the probabilities of price movements in either direction (up or down) are balanced. From another perspective, this type of trading rule exploits the relaxation time of a strong market advance or decline, until a trend reversal occurs or it becomes more probable. Other sets of trading rules are certainly possible, by utilizing not only the current values of the momenta indicators, but also their local-model or multiple-models averages. A trading rule based on the F maximum distance between the current CMI data ΠiF and the neutral line Π0 shows faster response to markets evolution and may be more suitable to automatic trading in certain conditions. Stepping through the trading decisions each trading day of the training set determined the profit/loss of the training set as a single value of the outer-sell cost function. As ASA importancesampled the outer-shell parameter space {∆t , W , M }, these parameters are fed into the inner shell, and a new inner-shell recursive optimization cycle begins. The final values for the optimization parameters in the training set are fixed when the largest net profit (calculated from the total equity by subtracting the transactions costs defined by the slippage factor) is realized. In practice, we have collected optimization parameters from multiple local minima that are near the global minimum (the outer-shell cost function is Optimization of Trading - 23 - Ingber & Mondescu defined with the sign reversed) of the training set. The values of the optimization parameters {∆t , W , M , f F , σ , x } resulting from a training cycle are then applied to out-of-sample test sets. During the test run, the drift coefficient f F and the volatility coefficient σ are refitted adaptively as described previously. All other parameters are fixed. We have mentioned that the optimization parameters corresponding to the highest profit in the training set may not be the sufficiently robust across test sets. Then, for all test sets, we have tested optimization parameters related to the multiple minima (i.e., the global maximum profit, the second best profit, etc.) resulting from the training set. We performed a bootstrap-type reversal of the training-test sets (repeating the training runs procedures using one of the test sets, including the previous training set in the new batch of test sets), followed by a selection of the best parameters across all data sets. This is necessary to increase the chances of successful trading sessions in real-time. 5. RESULTS 5.1. Alternative Algorithms In the previous sections we noted that there are different combinations of methods of processing data, methods of computing the CMI and various sets of trading rules that need to be tested — at least in a sampling manner — before launching trading runs in real-time: 1. Data can be preprocessed in block or overlapping bins, or forecasted data derived from the most probable transition path [41] could be used as in one of our most recent models. 2. Exponential smoothing, wavelets or Fourier decomposition can be applied for statistical processing. We presently favor exponential moving averages. 3. The CMI can be calculated using averaged data or directly with tick data, although the optimization parameters were fitted from preprocessed (averaged) price data. 4. The trading rules can be based on current signals (no average is performed over the signal themselves), on various averages of the CMI trading signals, on various combination of CMI data (momenta, neutral line, uncertainty band), on symmetric or asymmetric trading rules, or on mixed priceCMI trading signals. Optimization of Trading - 24 - Ingber & Mondescu 5. Different models (one and two-factors coupled) can be applied to the same market instrument, e.g., to define complementary indicators. The selection process evidently must consider many specific economic factors (e.g., liquidity of a given market), besides all other physical, mathematical and technical considerations. In the work presented here, as we tested our system and using previous experience, we focused toward S&P500 futures electronic trading, using block processed data, and symmetric, local-model and multiple-models trading rules. In Table 1 we show results obtained for several training and testing sets in the mentioned context. 5.2. Trading System Design The design of a successful electronic trading system is complex as it must incorporate several aspects of a trader’s actions that sometimes are difficult to translate into computer code. Three important features that must be implemented are factoring in the transactions costs, devising money management techniques, and coping with execution deficiencies. Generally, most trading costs can be included under the “slippage factor,” although this could easily lead to poor estimates. Given that the margin of profits from exploiting market inefficiencies are thin, a high slippage factor can easily result in a non-profitable trading system. In our situation, for testing purposes we used a $35 slippage factor per buy & sell order, a value we believe is rather high for an electronic trading environment, although it represents less than three ticks of a mini-S&P futures contract. (The mini-S&P is the S&P futures contract that is traded electronically on CME.) This higher value was chosen to protect ourselves against the bid-ask spread, as our trigger price (at what price the CMI was generated) and execution price (at what price a trade signaled by a CMI was executed) were taken to be equal to the trading price. (We have changed this aspect of our algorithm in later models.) The slippage is also strongly influenced by the time resolution of the data. Although the slippage is linked to bid-ask spreads and markets volatility in various formulas [54], the best estimate is obtained from experience and actual trading. Money management was introduced in terms of a trailing stop condition that is a function of the price volatility, and a stop-loss threshold that we fixed by experiment to a multiple of the mini-S&P contract value ($200). It is tempting to tighten the trailing stop or to work with a small stop-loss value, Optimization of Trading - 25 - Ingber & Mondescu yet we found — as otherwise expected — that higher losses occurred as the signals generated by our stochastic model were bypassed. Regarding the execution process, we have to account for the response of the system to various execution conditions in the interaction with the electronic exchange: partial fills, rejections, uptick rule (for equity trading), etc. Except for some special conditions, all these steps must be automated. 5.3. Some Explicit Results Typical CMI data in Figs. 3 and 4 (obtained from real-time trading after a full cycle of trainingtesting was performed) are related to the price data in Figs. 1 and 2. We have plotted the fastest (55 secs F apart) CMI values Π F , the neutral line Π0 and the uncertainty band ∆Π F . All CMI data were produced using the optimization parameters set {55 secs, 88 epochs, 0. 15} of the second-best net profit obtained with the training set “4D ESM0 0321-0324” (Table 1). Although the CMIs exhibit an inherently ragged nature and oscillate around a zero mean value within the uncertainty band — the width of which is decreasing with increasing price volatility, as the uncertainty principle would also indicate — time scales at which the CMI average or some persistence time are not balanced about the neutral line. These characteristics, which we try to exploit in our system, are better depicted in Figs. 5 and 6. F One set of trading signals, the local-model average of the neutral line < Π0 > and the uncertainty band multiplied by the optimization factor M = 0. 15, and centered around the theoretical zero mean of the F CMI, is represented versus time. Note entry points in a short trading position (< Π0 > > M ∆Π F ) at around 10:41 (Fig. 5 in conjunction with S&P data in Fig. 1) with a possible exit at 11:21 (or later), and a F F first long entry (< Π0 > < − M ∆Π F ) at 12:15. After 14:35, a stay long region appears (< Π0 > < 0), which indicates correctly the price movement in Fig.1. In Fig. 6 corresponding to June 22 price data from Fig. 2, a first long signal is generated at around 12:56 and a first short signal is generated at 14:16 that reflects the long downtrend region in Fig. 2. Due to the averaging process, a time lag is introduced, reflected by the long signal at 12:56 in Fig. 4, related to a past upward trend seen in Fig. 2; yet the neutral line relaxes rather rapidly (given the 55 sec time resolution and the window of 88 ≈ 1.5 hour) toward the uncertainty band. A judicious choice of trading rules, or avoiding standard averaging methods, helps in controlling this lag problem. Optimization of Trading - 26 - Ingber & Mondescu In Tables 1 and 2 we show some results obtained for several training and testing sets following the procedures described at the end of the previous section. In both tables, under the heading “Training” or “Testing Set” we specify the data set used (e.g., “4D ESM0 0321-0324” represents four days of data from the mini-S&P futures contract that expired in June). The type of trading rules used is identified by “LOCAL MODEL” or “MULTIPLE MODELS” tags. These tags refer to how we calculate the averages of the trading signals: either by using a single pair of optimization parameters { f F , σ } for all CMI data within the current adaptive fit window, or a different pair { f F , σ } for each CMI data. In the “Statistics” column we report the net (subtracting the slippage) profit or loss (in parenthesis) across the whole data set, the total number of trades (“trades”), the number of days with positive balance (“days +”), and the percentage of winning trades (“winners”). The “Parameters” are the optimization parameters resulting from the first three best profit maxima of each listed training set. The parameters are listed in the order {∆t , W , M }, with the data time resolution ∆t measured in seconds, the length of the local fit window W measured in time epochs, and M the numerical coefficient of the momentum uncertainty band ∆Π F . Recall that the trading rules presented are symmetric (the long and short entry/exit signals are controlled by the same M factor), and we apply a stay-long condition if the neutral-line is below the F average momentum < Π F >= 0 and stay-short if Π0 > 0. The drift f F and volatility coefficient σ are refitted adaptively and the exponent x is fixed to the value obtained in the training set. Typical values are f F ∈ ± [0. 003: 0. 05], x ∈ ± [0. 01: 0. 03]. During the local fit, due to the shorter time scale involved, the drift may increase by a factor of ten, and σ ∈[0. 01: 1. 2]. Comparing the data in the training and testing tables, we note that the most robust optimization factors — in terms of maximum cumulative profit resulted for all test sets — do not correspond to the maximum profit in the training sets: For the local-model rules, the optimum parameters are {55, 88, 0. 15}, and for the multiple models rules the optimum set is {45, 72, 0. 2}, both realized by the training set “4D ESM0 0321-0324.” Other observations are that, for the data presented here, the multiple-models averages trading rules consistently performed better and are more robust than the local-model averages trading rules. The number of trades is similar, varying between 15 and 35 (eliminating cumulative values smaller than 10 trades), and the time scale of the local fit is rather long in the 30 mins to 1.5 hour range. In the current set-up, this extended time scale implies that is advisable to deploy this system as a trader-assisted tool. Optimization of Trading - 27 - Ingber & Mondescu An important factor is the average length of the trades. For the type of rules presented in this work, this length is of several minutes, up to one hour, as the time scale of the local fit window mentioned above suggested. Related to the length of a trade is the length of a winning long/short trade in comparison to a losing long/short trade. Our experience indicates that a ratio of 2:1 between the length of a winning trade and the length of a losing trade is desirable for a reliable trading system. Here, using the local-model trading rules seems to offer an advantage, although this is not as clear as one would expect. Finally, the training sets data (Table 1) show that the percentage of winners is markedly higher in the case of multiple-models average than local-average trading rules. In the testing sets (Table 2) the situation is almost reversed, albeit the overall profits (losses) are higher (smaller) in the multiple-model case. Apparently, the multiple-model trading rules can stay in winning trades longer to increase profits, relative to losses incurred with these rules in losing trades. (In the testing sets, this correlates with the higher number of trades executed using local-model trading rules.) 6. CONCLUSIONS 6.1. Main Features The main stages of building and testing this system were: 1. We developed a multivariate, nonlinear statistical mechanics model of S&P futures and cash markets, based on a system of coupled stochastic differential equations. 2. We constructed a two-stage, recursive optimization procedure using methods of ASA global optimization: An inner-shell extracts the characteristics of the stochastic price distribution and an outershell generates the technical indicators and optimize the trading rules. 3. We trained the system on different sets of data and retained the multiple minima generated (corresponding to the global maximum net profit realized and the neighboring profit maxima). 4. We tested the system on out-of-sample data sets, searching for most robust optimization parameters to be used in real-time trading. Robustness was estimated by the cumulative profit/loss across diverse test sets, and by testing the system against a bootstrap-type reversal of training-testing sets in the optimization cycle. Optimization of Trading - 28 - Ingber & Mondescu Modeling the market as a dynamical physical system makes possible a direct representation of empirical notions as market momentum in terms of CMI derived naturally from our theoretical model. We have shown that other physical concepts as the uncertainty principle may lead to quantitative signals (the momentum uncertainty band ∆Π F ) that captures other aspects of market dynamics and which can be used in real-time trading. 6.2. Summary We have presented a description of a trading system composed of an outer-shell trading-rule model and an inner-shell nonlinear stochastic dynamic model of the market of interest, S&P500. The inner-shell is developed adhering to the mathematical physics of multivariate nonlinear statistical mechanics, from which we develop indicators for the trading-rule model, i.e., canonical momenta indicators (CMI). We have found that keeping our model faithful to the underlying mathematical physics is not a limiting constraint on profitability of our system; quite the contrary. An important result of our work is that the ideas for our algorithms, and the proper use of the mathematical physics faithful to these algorithms, must be supplemented by many practical considerations en route to developing a profitable trading system. For example, since there is a subset of parameters, e.g., time resolution parameters, shared by the inner- and outer-shell models, recursive optimization is used to get the best fits to data, as well as developing multiple minima with approximate similar profitability. The multiple minima often have additional features requiring consideration for realtime trading, e.g., more trades per day increasing robustness of the system, etc. The nonlinear stochastic nature of our data required a robust global optimization algorithm. The output of these parameters from these training sets were then applied to testing sets on out-of-sample data. The best models and parameters were then used in real-time by traders, further testing the models as a precursor to eventual deployment in automated electronic trading. We have used methods of statistical mechanics to develop our inner-shell model of market dynamics and a heuristic AI type model for our outer-shell trading-rule model, but there are many other candidate (quasi-)global algorithms for developing a cost function that can be used to fit parameters to data, e.g., neural nets, fractal scaling models, etc. To perform our fits to data, we selected an algorithm, Adaptive Simulated Annealing (ASA), that we were familiar with, but there are several other candidate Optimization of Trading - 29 - Ingber & Mondescu algorithms that likely would suffice, e.g., genetic algorithms, tabu search, etc. We have shown that a minimal set of trading signals (the CMI, the neutral line representing the momentum of the trend of a given time window of data, and the momentum uncertainty band) can generate a rich and robust set of trading rules that identify profitable domains of trading at various time scales. This is a confirmation of the hypothesis that markets are not efficient, as noted in other studies [11,30,55]. 6.3. Future Directions Although this paper focused on trading of a single instrument, the futures S&P 500, the code we have developed can accommodate trading on multiple markets. For example, in the case of tickresolution coupled cash and futures markets, which was previously prototyped for inter-day trading [29,30], the utility of CMI stems from three directions: (a) The inner-shell fitting process requires a global optimization of all parameters in both futures and cash markets. (b) The CMI for futures contain, by our Lagrangian construction, the coupling with the cash market through the off-diagonal correlation terms of the metric tensor. The correlation between the futures and cash markets is explicitly present in all futures variables. (c) The CMI of both markets can be used as complimentary technical indicators for trading in futures market. Several near term future directions are of interest: orienting the system toward shorter trading time scales (10-30 secs) more suitable for electronic trading, introducing fast response “averaging” methods and time scale identifiers (exponential smoothing, wavelets decomposition), identifying mini-crashes points using renormalization group techniques, investigating the use of CMI in pattern-recognition based trading rules, and exploring the use of forecasted data evaluated from most probable transition path formalism. Our efforts indicate the invaluable utility of a joint approach (AI-based and quantitative) in developing automated trading systems. Optimization of Trading - 30 - Ingber & Mondescu 6.4. Standard Disclaimer We must emphasize that there are no claims that all results are positive or that the present system is a safe source of riskless profits. There as many negative results as positive, and a lot of work is necessary to extract meaningful information. Our purpose here is to describe an approach to developing an electronic trading system complementary to those based on neural-networks type technical analysis and pattern recognition methods. The system discussed in this paper is rooted in the physical principles of nonequilibrium statistical mechanics, and we have shown that there are conditions under which such a model can be successful. ACKNOWLEDGMENTS We thank Donald Wilson for his financial support. We thank K.S. Balasubramaniam and Colleen Chen for their programming support and participation in formulating parts of our trading system. Data was extracted from the DRW Reuters feed. Optimization of Trading - 31 - Ingber & Mondescu REFERENCES [1] E. Peters, Chaos and Order in the Capital Markets, Wiley & Sons, New York,NY, 1991. [2] E.M. Azoff, Neural Network Time Series Forecasting of Financial Markets, Wiley & Sons, New York,NY, 1994. [3] E. Gately, Neural Networks for financial Forecasting, Wiley & Sons, New York,NY, 1996. [4] L. Ingber, ‘‘High-resolution path-integral development of financial options,’’ Physica A, vol. 283, pp. 529-558, 2000. [5] B.B. Mandelbrot, Fractals and Scaling in Finance, Springer-Verlag, New York, NY, 1997. [6] R.N. Mantegna and H.E. Stanley, ‘‘Turbulence and financial markets,’’ Nature, vol. 383, pp. 587-588, 1996. [7] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, ‘‘Noise dressing of financial correlation matrices,’’ Phys. Rev. Lett., vol. 83, pp. 1467-1470, 1999. [8] A. Johansen, D. Sornette, and O. Ledoit, Predicting financial crashes using discrete scale invariance, http://xxx.lanl.gov/cond-mat, 2000. [9] K. Ilinsky and G. Kalinin, Black-Scholes equation from gauge theory of arbitrage, http://xxx.lanl.gov/hep-th/9712034, 1997. [10] J.C. Hull, Options, Futures, and Other Derivatives, 4th Edition, Prentice Hall, Upper Saddle River, NJ, 2000. [11] L. Ingber, ‘‘Statistical mechanics of nonlinear nonequilibrium financial markets,’’ Math. Modelling, vol. 5, pp. 343-361, 1984. [12] L. Ingber, ‘‘Adaptive Simulated Annealing (ASA),’’ Global optimization C-code, Caltech Alumni Association, Pasadena, CA, 1993. [13] L. Ingber, ‘‘Very fast simulated re-annealing,’’ Mathl. Comput. Modelling, vol. 12, pp. 967-973, 1989. [14] L. Ingber and B. Rosen, ‘‘Genetic algorithms and very fast simulated reannealing: A comparison,’’ Oper. Res. Management Sci., vol. 33, pp. 523, 1993. Optimization of Trading - 32 - Ingber & Mondescu [15] L. Ingber, ‘‘Adaptive simulated annealing (ASA): Lessons learned,’’ Control and Cybernetics, vol. 25, pp. 33-54, 1996. [16] L. Ingber, ‘‘Simulated annealing: Practice versus theory,’’ Mathl. Comput. Modelling, vol. 18, pp. 29-57, 1993. [17] G. Indiveri, G. Nateri, L. Raffo, and D. Caviglia, ‘‘A neural network architecture for defect detection through magnetic inspection,’’ Report, University of Genova, Genova, Italy, 1993. [18] B. Cohen, Training synaptic delays in a recurrent neural network, M.S. Thesis, Tel-Aviv University, Tel-Aviv, Israel, 1994. [19] R.A. Cozzio-Bueler, ‘‘The design of neural networks using a priori knowledge,’’ Ph.D. Thesis, ¨ Swiss Fed. Inst. Tech., Zurich, Switzerland, 1995. [20] D.G. Mayer, P.M. Pepper, J.A. Belward, K. Burrage, and A.J. Swain, ‘‘Simulated annealing - A robust optimization technique for fitting nonlinear regression models,’’ in Proceedings ’Modelling, Simulation and Optimization’ Conference, International Association of Science and Technology for Development (IASTED), 6-9 May 1996 Gold Coast, 1996. [21] S. Sakata and H. White, ‘‘High breakdown point conditional dispersion estimation with application to S&P 500 daily returns volatility,’’ Econometrica, vol. 66, pp. 529-567, 1998. ´ ´ [22] L. Bachelier, ‘‘Theorie de la Speculation,’’ Annales de l’Ecole Normale Superieure, vol. 17, pp. ´ 21-86, 1900. [23] M. C. Jensen, ‘‘Some anomalous evidence regarding market efficiency, an editorial introduction,’’ J. Finan. Econ., vol. 6, pp. 95-101, 1978. [24] B. B. Mandelbrot, ‘‘When can price be arbitraged efficiently? A limit to the validity of the random walk and martingale models,’’ Rev. Econ. Statist., vol. 53, pp. 225-236, 1971. [25] S. J. Taylor, ‘‘Tests of the random walk hypothesis against a price-trend hypothesis,’’ J. Finan. Quant. Anal., vol. 17, pp. 37-61, 1982. [26] P. Brown, A. W. Kleidon, and T. A. Marsh, ‘‘New evidence on the nature of size-related anomalies in stock prices,’’ J. Fin. Econ., vol. 12, pp. 33-56, 1983. [27] J.A. Nelder and R. Mead, ‘‘A simplex method for function minimization,’’ Computer J. (UK), vol. 7, pp. 308-313, 1964. Optimization of Trading - 33 - Ingber & Mondescu [28] G.P. Barabino, G.S. Barabino, B. Bianco, and M. Marchesi, ‘‘A study on the performances of simplex methods for function minimization,’’ in Proc. IEEE Int. Conf. Circuits and Computers, 1980, pp. 1150-1153. [29] L. Ingber, ‘‘Canonical momenta indicators of financial markets and neocortical EEG,’’ in Progress in Neural Information Processing, , ed. by S.-I. Amari, L. Xu, I. King, and K.-S. Leung, Springer, New York, 1996, pp. 777-784. [30] L. Ingber, ‘‘Statistical mechanics of nonlinear nonequilibrium financial markets: Applications to optimized trading,’’ Mathl. Computer Modelling, vol. 23, pp. 101-121, 1996. [31] H. Haken, Synergetics, 3rd ed., Springer, New York, 1983. [32] L. Ingber, M.F. Wehner, G.M. Jabbour, and T.M. Barnhill, ‘‘Application of statistical mechanics methodology to term-structure bond-pricing models,’’ Mathl. Comput. Modelling, vol. 15, pp. 77-98, 1991. [33] L. Ingber, ‘‘Statistical mechanics of neocortical interactions: A scaling paradigm applied to electroencephalography,’’ Phys. Rev. A, vol. 44, pp. 4017-4060, 1991. [34] K.S. Cheng, ‘‘Quantization of a general dynamical system by Feynman’s path integration formulation,’’ J. Math. Phys., vol. 13, pp. 1723-1726, 1972. [35] H. Dekker, ‘‘Functional integration and the Onsager-Machlup Lagrangian for continuous Markov processes in Riemannian geometries,’’ Phys. Rev. A, vol. 19, pp. 2102-2111, 1979. [36] R. Graham, ‘‘Path-integral methods in nonequilibrium thermodynamics and statistics,’’ in Stochastic Processes in Nonequilibrium Systems, , ed. by L. Garrido, P. Seglar, and P.J. Shepherd, Springer, New York, NY, 1978, pp. 82-138. [37] F. Langouche, D. Roekaerts, and E. Tirapegui, ‘‘Discretization problems of functional integrals in phase space,’’ Phys. Rev. D, vol. 20, pp. 419-432, 1979. [38] F. Langouche, D. Roekaerts, and E. Tirapegui, ‘‘Short derivation of Feynman Lagrangian for general diffusion process,’’ J. Phys. A, vol. 113, pp. 449-452, 1980. [39] F. Langouche, D. Roekaerts, and E. Tirapegui, Functional Integration and Semiclassical Expansions, Reidel, Dordrecht, The Netherlands, 1982. Optimization of Trading - 34 - Ingber & Mondescu [40] M. Rosa-Clot and S. Taddei, A path integral approach to derivative security pricing: I. Formalism and analytic results, INFN, Firenze, Italy, 1999. [41] H. Dekker, ‘‘On the most probable transition path of a general diffusion process,’’ Phys. Lett. A, vol. 80, pp. 99-101, 1980. [42] P. Hagedorn, Non-Linear Oscillations, Oxford Univ., New York, NY, 1981. [43] B. Oksendal, Stochastic Differential Equations, Springer, New York, NY, 1998. [44] J.M. Harrison and D. Kreps, ‘‘Martingales and arbitrage in multiperiod securities markets,’’ J. Econ. Theory, vol. 20, pp. 381-408, 1979. [45] S.R. Pliska, Introduction to Mathematical Finance, Blackwell, Oxford, UK, 1997. [46] C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, Springer-Verlag, Berlin, Germany, 1983. [47] R. C. Merton, ‘‘An intertemporal capital asset pricing model,’’ Econometrica, vol. 41, pp. 867-887, 1973. [48] L. Ingber, ‘‘Statistical mechanics of neocortical interactions: Stability and duration of the 7+−2 rule of short-term-memory capacity,’’ Phys. Rev. A, vol. 31, pp. 1183-1186, 1985. [49] S. Kirkpatrick, C.D. Gelatt, Jr., and M.P. Vecchi, ‘‘Optimization by simulated annealing,’’ Science, vol. 220, pp. 671-680, 1983. [50] S. Geman and D. Geman, ‘‘Stochastic relaxation, Gibbs distribution and the Bayesian restoration in images,’’ IEEE Trans. Patt. Anal. Mac. Int., vol. 6, pp. 721-741, 1984. [51] H. Szu and R. Hartley, ‘‘Fast simulated annealing,’’ Phys. Lett. A, vol. 122, pp. 157-162, 1987. [52] M. Wofsey, ‘‘Technology: Shortcut tests validity of complicated formulas,’’ The Wall Street Journal, vol. 222, pp. B1, 1993. [53] J. C. Cox and S. A. Ross, ‘‘The valuation of options for alternative stochastic processes,’’ J. Fin. Econ., vol. 3, pp. 145-166, 1976. [54] P.J. Kaufman, Trading systems and Methods, 3rd edition, John Wiley & Sons, New York, NY, 1998. [55] W. Brock, J. Lakonishok, and B. LeBaron, ‘‘Simple technical trading rules and the stochastic properties of stock returns,’’ J. Finance, vol. 47, pp. 1731-1763, 1992. Optimization of Trading - 35 - Ingber & Mondescu FIGURE CAPTIONS Figure 1. Futures and cash data, contract ESU0 June 20: solid line — futures; dashed line — cash. Figure 2. Futures and cash data, contract ESU0 June 22: solid line — futures; dashed line — cash. Figure 3. CMI data, real-time trading June 20: solid line — CMI; dashed line — neutral line; dotted line — uncertainty band. Figure 4. CMI data, real-time trading, June 22: solid line — CMI; dashed line — neutral line; dotted line — uncertainty band. Figure 5. CMI trading signals, real-time trading June 20: dashed line — local-model average of the neutral line; dotted line — uncertainty band multiplied by the optimization parameter M = 0. 15. Figure 6. CMI trading signals, real-time trading June 22: dashed line — local-model average of the neutral line; dotted line — uncertainty band multiplied by the optimization parameter M = 0. 15. Optimization of Trading TABLE CAPTIONS Table 1. Matrix of Training Runs. Table 2. Matrix of Testing Runs. - 36 - Ingber & Mondescu Optimization of Trading - Figure 1 - Ingber & Mondescu ESU0 data June 20 time resolution = 55 secs 1505 Futures Cash 1500 1495 S&P 1490 1485 1480 1475 1470 1465 06-20 10:46:16 06-20 11:45:53 06-20 12:45:30 06-20 13:45:07 TIME (mm-dd hh-mm-ss) 06-20 14:44:44 Optimization of Trading - Figure 2 - Ingber & Mondescu ESU0 data June 22 time resolution = 55 secs 1485 Futures Cash 1480 1475 S&P 1470 1465 1460 1455 1450 06-22 12:56:53 06-22 13:56:30 TIME (mm-dd hh-mm-ss) 06-22 14:56:07 Optimization of Trading - Figure 3 - Ingber & Mondescu Canonical Momenta Indicators (CMI) time resolution = 55 secs F Π (CMI Futures) 8 F Π 0 (neutral CMI) F ∆Π (theory) CMI 4 0 -4 -8 06-20 10:46:16 06-20 11:45:53 06-20 12:45:30 06-20 13:45:07 TIME (mm-dd hh-mm-ss) 06-20 14:44:44 Optimization of Trading - Figure 4 - Ingber & Mondescu Canonical Momenta Indicators (CMI) time resolution = 55 secs F 8 Π (CMI Futures) Π F (neutral CMI) 0 F ∆Π (theory) CMI 4 0 -4 -8 06-22 12:56:53 06-22 13:56:30 TIME (mm-dd hh-mm-ss) 06-22 14:56:07 Optimization of Trading - Figure 5 - Ingber & Mondescu Canonical Momenta Indicators (CMI) time resolution = 55 secs 1 F <Π 0> (local) M ∆Π F CMI 0.5 0 -0.5 -1 06-20 10:46:16 06-20 11:45:53 06-20 12:45:30 06-20 13:45:07 TIME (mm-dd hh-mm-ss) 06-20 14:44:44 Optimization of Trading - Figure 6 - Ingber & Mondescu Canonical Momenta Indicators time resolution = 55 secs 1 F < Π 0> M ∆Π F CMI 0.5 0 -0.5 -1 06-22 12:56:53 06-22 13:56:30 TIME (mm-dd hh-mm-ss) 06-22 14:56:07 Optimization of Trading - Table 1 - TRAINING SET TRADING RULES STATISTICS 4D ESM0 0321-0324 LOCAL MODEL Parameters $ profit (loss) # trades # days + % winners Parameters $ profit (loss) # trades # days + % winners Parameters $ profit (loss) # trades # days + % winners Parameters $ profit (loss) # trades # days + % winners Parameters $ profit (loss) # trades # days + % winners Parameters $ profit (loss) # trades # days + % winners MULTIPLE MODELS 5D ESM0 0327-0331 LOCAL MODEL MULTIPLE MODELS 5D ESM0 0410-0414 LOCAL MODEL MULTIPLE MODELS Ingber & Mondescu PARAMETERS (∆ t W M ) 55 90 0.125 1390 16 3 75 45 76 0.175 2270 18 4 83 20 22 0.60 437 15 3 67 45 74 0.25 657.5 3 5 100 50 102 0.10 1875 35 3 60 45 46 0.25 2285 39 3 72 55 88 0.15 1215 16 3 75 45 72 0.20 2167.5 17 4 88 20 24 0.55 352 16 3 63 40 84 0.175 635 19 3 68 50 142 0.10 1847 19 3 58 40 48 0.30 2145 23 3 87 60 40 0.275 1167 17 3 76 60 59 0.215 1117.5 17 3 76 10 54 0.5 (35) 1 0 0 30 110 0.15 227.5 26 2 65 35 142 0.10 1485 34 4 62 60 34 0.30 1922.5 29 3 72 Optimization of Trading - Table 2 - TESTING SETS STATISTICS 5D ESM0 0327-0331 $ profit (loss) # trades # days + % winners $ profit (loss) # trades # days + % winners $ profit (loss) # trades # days + % winners 4D ESM0 0403-0407 5D ESM0 0410-0414 Ingber & Mondescu PARAMETERS (∆ t W M ) LOCAL MODEL MULTIPLE MODELS 55 90 0.125 55 88 0.15 60 40 0.275 45 76 0.175 45 72 0.20 60 59 0.215 (712) 20 2 50 (30) 18 3 56 750 30 3 60 (857) 17 2 47 258 13 3 54 1227 21 3 62 (1472) 16 1 44 602 16 2 56 (117) 23 3 48 (605) 18 3 67 1340 16 1 50 (530) 23 2 48 (220) 12 1 67 2130 17 1 53 (1125) 20 2 50 (185) 11 1 54 932 13 1 38 (380) 18 3 50 ...
View Full Document

Ask a homework question - tutors are online