This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Optimization of Trading Physics Models of Markets
Lester Ingber
Lester Ingber Research
POB 06440 Sears Tower, Chicago, IL 60606
and
DRW Investments LLC
311 S Wacker Dr, Ste 900, Chicago, IL 60606
ingber@ingber.com, ingber@alumni.caltech.edu and
Radu Paul Mondescu
DRW Investments LLC
311 S Wacker Dr, Ste 900, Chicago, IL 60606
rmondescu@drwtrading.com ABSTRACT
We describe an endtoend realtime S&P futures trading system. Innershell stochastic nonlinear
dynamic models are developed, and Canonical Momenta Indicators (CMI) are derived from a ﬁtted
Lagrangian used by outershell trading models dependent on these indicators. Recursive and adaptive
optimization using Adaptive Simulated Annealing (ASA) is used for ﬁtting parameters shared across
these shells of dynamic and trading models.
Keywords: Simulated Annealing; Statistical Mechanics; Trading Financial Markets © 2000 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be
obtained from the IEEE.
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without
notice, after which this version may no longer be accessible.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and
all rights therein are retained by authors or by other copyright holders. All persons copying this information
are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these
works may not be reposted without the explicit permission of the copyright holder. Optimization of Trading 2 Ingber & Mondescu 1. INTRODUCTION 1.1. Approaches
Realworld problems are almost intractable analytically, yet methods must be devised to deal with
this complexity to extract practical information in ﬁnite time. This is indeed true in the ﬁeld of ﬁnancial
engineering, where time series of various ﬁnancial instruments reﬂect nonequilibrium, highly nonlinear,
possibly even chaotic [1] underlying processes. A further difﬁculty is the huge amount of data necessary
to be processed. Under these circumstances, to develop models and schemes for automated, proﬁtable
trading is a nontrivial task.
In the context of this paper, it is important to stress that dealing with such complex systems
invariably requires modeling of dynamics, modeling of actions on these dynamics, and algorithms to ﬁt
parameters in these models to real data. We have elected to use methods of mathematical physics for our
models of the dynamics, artiﬁcial intelligence (AI) heuristics for our models of trading rules acting on
indicators derived from our dynamics, and methods of sampling global optimization for ﬁtting our
parameters. Too often there is confusion about how these three elements are being used for a complete
system. For example, in the literature there often is discussion of neural net trading systems or genetic
algorithm trading systems. However, neural net models (used for either or both models discussed here)
also require some method of ﬁtting their parameters, and genetic algorithms must have some kind of cost
function or process speciﬁed to sample a parameter space, etc.
Some powerful methods have emerged during years, appearing from at least two directions: One
direction is based on inferring rules from past and current behavior of market data leading to learningbased, inductive techniques, such as neural networks, or fuzzy logic. Another direction starts from the
bottomup, trying to build physical and mathematical models based on different economic prototypes. In
many ways, these two directions are complementary and a proper understanding of their main strengths
and weaknesses should lead to synergetic effects beneﬁcial to their common goals.
Among approaches in the ﬁrst direction, neural networks already have won a prominent role in the
ﬁnancial community, due to their ability to handle large quantities of data, and to uncover and model
nonlinear functional relationships between various combinations of fundamental indicators and price
data [2,3]. Optimization of Trading 3 Ingber & Mondescu In the second direction we can include models based on nonequilibrium statistical mechanics [4]
fractal geometry [5], turbulence [6], spin glasses and random matrix theory [7], renormalization group [8],
and gauge theory [9]. Although the very complex nonlinear multivariate character of ﬁnancial markets is
recognized [10], these approaches seem to have had a lesser impact on current quantitative ﬁnance
practice, although it is becoming increasing clear that this direction can lead to practical trading strategies
and models.
To bridge the gap between theory and practice, as well as to afford a comparison with neural
networks techniques, here we focus on presenting an effective trading system of S&P futures, anchored in
the physical principles of nonequilibrium statistical mechanics applied to ﬁnancial markets [4,11].
Starting with nonlinear, multivariate, nonlinear stochastic differential equation descriptions of the
price evolution of cash and futures indices, we build an algebraic cost function in terms of a Lagrangian.
Then, a maximum likelihood ﬁt to the data is performed using a global optimization algorithm, Adaptive
Simulated Annealing (ASA) [12]. As ﬁrmly rooted in ﬁeld theoretical concepts, we derive market
canonical momenta indicators, and we use these as technical signals in a recursive ASA optimization that
tunes the outershell of trading rules. We do not employ metaphors for these physical indicators, but
rather derive them directly from models ﬁt to data.
The outline of the paper is as follows: Just below we brieﬂy discuss the optimization method and
momenta indicators. In the next three sections we establish the theoretical framework supporting our
model, the statistical mechanics approach, and the optimization method. In Section 5 we detail the
trading system, and in Section 6 we describe our results. Our conclusions are presented in Section 7. 1.2. Optimization
Largescale, nonlinear ﬁts of stochastic nonlinear forms to ﬁnancial data require methods robust
enough across data sets. (Just one day, tick data for regular trading hours could reach 10,000−30,000 data
points.) Simple regression techniques exhibit deﬁciencies with respect to obtaining reasonable ﬁts. They
too often get trapped in local minima typically found in nonlinear stochastic models of such data. ASA is
a global optimization algorithm that has the advantage — with respect to other global optimization
methods as genetic algorithms, combinatorial optimization, etc. — not only to be efﬁcient in its
importancesampling search strategy, but to have the statistical guarantee of ﬁnding the best Optimization of Trading 4 Ingber & Mondescu optima [13,14]. This gives some conﬁdence that a global minimum can be found, of course provided care
is taken as necessary to tune the algorithm [15].
It should be noted that such powerful sampling algorithms also are often required by other models
of complex systems than those we use here [16]. For example, neural network models have taken
advantage of ASA [1719], as have other ﬁnancial and economic studies [20,21]. 1.3. Indicators
In general, neural network approaches attempt classiﬁcation and identiﬁcation of patterns, or try
forecasting patterns and future evolution of ﬁnancial time series. Statistical mechanical methods attempt
to ﬁnd dynamic indicators derived from physical models based on general principles of nonequilibrium
stochastic processes that reﬂect certain market factors. These indicators are used subsequently to generate
trading signals or to try forecasting upcoming data.
In this paper, the main indicators are called Canonical Momenta Indicators (CMI), as they faithfully
mathematically carry the signiﬁcance of market momentum, where the “mass” is inversely proportional to
the price volatility (the “masses” are just the elements of the metric tensor in this Lagrangian formalism)
and the “velocity” is the rate of price changes. 2. MODELS 2.1. Langevin Equations for Random Walks
The use of Brownian motion as a model for ﬁnancial systems is generally attributed to
Bachelier [22], though he incorrectly intuited that the noise scaled linearly instead of as the square root
relative to the random logprice variable. Einstein is generally credited with using the correct
mathematical description in a larger physical context of statistical systems. However, several studies
imply that changing prices of many markets do not follow a random walk, that they may have longterm
dependences in price correlations, and that they may not be efﬁcient in quickly arbitraging new
information [2325]. A random walk for returns, rate of change of prices over prices, is described by a
Langevin equation with simple additive noise η , typically representing the continual random inﬂux of
information into the market. Optimization of Trading 5 Ingber & Mondescu ˙
M = − f + gη ,
˙
M = dM / dt ,
< η (t ) >η = 0 , < η (t ), η (t ′) >η = δ (t − t ′) , (1) where f and g are constants, and M is the logarithm of (scaled) price, M (t ) = log( P (t )/ P (t − dt )) . Price,
(
)
although the most dramatic observable, may not be the only appropriate dependent variable or order
parameter for the system of markets [26]. This possibility has also been called the “semistrong form of
the efﬁcient market hypothesis” [23].
The generalization of this approach to include multivariate nonlinear nonequilibrium markets led to
a model of statistical mechanics of ﬁnancial markets (SMFM) [11]. 2.2. Adaptive Optimization of F x Models
Our S&P model for the futures F is
dF = µ dt + σ F x dz ,
< dz > = 0 ,
< dz (t ) dz (t ′) > = dt δ (t − t ′)
We have used this model in several ways to ﬁt the distribution’s volatility deﬁned in terms of a scale
and an exponent of the independent variable [4].
A major component of our trading system is the use of adaptive optimization, essentially constantly
retuning the parameters of our dynamic model each time new data is encountered in our training, testing
and realtime applications. The parameters { µ , σ } are constantly tuned using a quasilocal simplex
code [27,28] included with the ASA (Adaptive Simulated Annealing) code [12].
We have tested several quasilocal codes for this kind of trading problem, versus using robust ASA
adaptive optimizations, and the faster quasilocal codes seem to work quite well for adaptive updates after
a zeroth order parameters set is found by ASA [29,30]. Optimization of Trading 6 Ingber & Mondescu 3. STATISTICAL MECHANICS OF FINANCIAL MARKETS (SMFM) 3.1. Statistical Mechanics of Large Systems
Aggregation problems in nonlinear nonequilibrium systems typically are “solved” (accommodated)
by having new entities/languages developed at these disparate scales in order to efﬁciently pass
information back and forth between scales. This is quite different from the nature of quasiequilibrium
quasilinear systems, where thermodynamic or cybernetic approaches are possible. These thermodynamic
approaches typically fail for nonequilibrium nonlinear systems.
Many systems are aptly modeled in terms of multivariate differential rateequations, known as
Langevin equations [31],
˙G
M = f G + gG η j , (G = 1, . . . , Λ) , ( j = 1, . . . , N ) ,
ˆj
˙G
M = dM G / dt ,
< η j (t ) >η = 0 , < η j (t ), η j ′ (t ′) >η = δ jj ′δ (t − t ′) , (2) where f G and gG are generally nonlinear functions of mesoscopic order parameters M G , j is a
ˆj
microscopic index indicating the source of ﬂuctuations, and N ≥ Λ. The Einstein convention of summing
over repeated indices is used. Vertical bars on an index, e.g., j, imply no sum is to be taken on repeated
indices.
Via a somewhat lengthy, albeit instructive calculation, outlined in several other papers [11,32,33],
¨
involving an intermediate derivation of a corresponding FokkerPlanck or Schrodingertype equation for
the conditional probability distribution P [ M (t ) M (t 0 )], the Langevin rate Eq. (2) is developed into the
more useful probability distribution for M G at longtime macroscopic time event t = (u + 1)θ + t 0 , in
terms of a Stratonovich pathintegral over mesoscopic Gaussian conditional probabilities [3438]. Here,
macroscopic variables are deﬁned as the longtime limit of the evolving mesoscopic system.
¨
The corresponding Schrodingertype equation is [36,37]
∂ P /∂t = 1
2 ( gGG ′ P ),GG ′ − ( gG P ),G + V , ˆ j ˆk
gGG ′ = k T δ jk gG gG ′ , Optimization of Trading
1 gG = f G + 2 δ 7 jk G ′ G
g j g k ,G ′
ˆˆ Ingber & Mondescu , [. . .],G = ∂[. . .]/∂ M G . (3) This is properly referred to as a FokkerPlanck equation when V ≡ 0. Note that although the partial
differential Eq. (3) contains information regarding M G as in the stochastic differential Eq. (2), all
references to j have been properly averaged over. I.e., gG in Eq. (2) is an entity with parameters in both
ˆj
microscopic and mesoscopic spaces, but M is a purely mesoscopic variable, and this is more clearly
reﬂected in Eq. (3).
The path integral representation is given in terms of the “Feynman” Lagrangian L .
P [ M t  M t 0 ] dM (t ) = ∫ . . . ∫ DM exp(−S )δ [ M (t0) = M0]δ [ M (t ) = M t ] , t −
S = k T1 min ∫ dt ′ L ,
t
0 DM = lim u+1 G
Π g1/2 Π (2π θ )−1/2 dM v ,
G u→∞ v =1 G ˙
L(M , M G , t) = hG = gG − 1
2 1
2 G ˙
˙
( M − hG ) gGG ′ ( M G′ − hG ′ ) + 1
2 hG ;G + R/6 − V , g−1/2 ( g1/2 gGG ′ ),G ′ , gGG ′ = ( gGG ′ )−1 ,
g = det( gGG ′ ) ,
F
hG ;G = hG + ΓGF hG = g−1/2 ( g1/2 hG ),G ,
,G Γ F ≡ g LF [ JK , L ] = g LF ( g JL , K + g KL , J − g JK , L ) ,
JK
R = g JL R JL = g JL g JK R FJKL ,
R FJKL = 1
2 MN
MN
( g FK , JL − g JK , FL − g FL , JK + g JL , FK ) + g MN (Γ FK Γ JL − Γ FL Γ JK ) . (4) Optimization of Trading 8 Ingber & Mondescu Mesoscopic variables have been deﬁned as M G in the Langevin and FokkerPlanck representations, in
terms of their development from the microscopic system labeled by j . The Riemannian curvature term R
arises from nonlinear gGG ′ , which is a bona ﬁde metric of this space [36]. Even if a stationary solution,
˙G
˙G
i.e., M = 0, is ultimately sought, a necessarily prior stochastic treatment of M terms gives rise to these
Riemannian “corrections.” Even for a constant metric, the term hG ;G contributes to L for a nonlinear
mean hG . V may include terms such as Σ′ JT ′G M G , where the Lagrange multipliers JT ′G are constraints
T on M G , which are advantageously modeled as extrinsic sources in this representation; they too may be
timedependent.
For our purposes, the above Feynman Lagrangian deﬁnes a kernel of the shorttime conditional
probability distribution, in the curved space deﬁned by the metric, in the limit of continuous time, whose
iteration yields the solution of the previous partial differential equation Eq. (3). This differs from the
Lagrangian which satisﬁes the requirement that the action is stationary to the ﬁrst order in dt — the
WKBJ approximation, but which does not include the ﬁrstorder correction to the WKBJ approximation
as does the Feynman Lagrangian. This latter Lagrangian differs from the Feynman Lagrangian,
essentially by replacing R/6 above by R/12 [39]. In this sense, the WKBJ Lagrangian is more useful for
some theoretical discussions [40]. However, the use of the Feynman Lagrangian coincides with the
numerical method we use for longtime development of our distributions using our PATHINT code for
other ﬁnancial products, e.g., options [4]. This also is consistent with our use of relatively shorttime
“forecast” of data points using the most probable path [41]
dM G / dt = gG − g1/2 ( g−1/2 gGG ′ ), G ′ . (5) Using the variational principle, J TG may also be used to constrain M G to regions where they are
empirically bound. More complicated constraints may be afﬁxed to L using methods of optimal control
theory [42]. With respect to a steady state P , when it exists, the information gain in state P is deﬁned by
ϒ[ P ] = ∫ . . . ∫ DM ′ P ln (P/P) , DM ′ = DM / dM u+1 . (6) ˆ
In the economics literature, there appears to be sentiment to deﬁne Eq. (2) by the Ito, rather than the
ˆ
Stratonovich prescription. It is true that Ito integrals have Martingale properties not possessed by Optimization of Trading 9 Ingber & Mondescu Stratonovich integrals [43] which leads to riskneural theorems for markets [44,45], but the nature of the
proper mathematics — actually a simple transformation between these two discretizations — should
eventually be determined by proper aggregation of relatively microscopic models of markets. It should be
noted that virtually all investigations of other physical systems, which are also continuous time models of
discrete processes, conclude that the Stratonovich interpretation coincides with reality, when
multiplicative noise with zero correlation time, modeled in terms of white noise η j , is properly considered
as the limit of real noise with ﬁnite correlation time [46]. The path integral succinctly demonstrates the
ˆ
difference between the two: The Ito prescription corresponds to the prepoint discretization of L , wherein
˙
θ M (t ) → M v+1 − M v and M (t ) → M v . The Stratonovich prescription corresponds to the midpoint
1
˙
discretization of L , wherein θ M (t ) → M v+1 − M v and M (t ) → ( M v+1 + M v ). In terms of the functions
2 ˆ
appearing in the FokkerPlanck Eq. (3), the Ito prescription of the prepoint discretized Lagrangian, L I , is
relatively simple, albeit deceptively so because of its nonstandard calculus.
1
˙G
˙ G′
˙G
L I ( M , M G , t ) = ( M − gG ) gGG ′ ( M − gG ′ ) − V .
2 (7) ˆ
In the absence of a nonphenomenological microscopic theory, the difference between a Ito prescription
and a Stratonovich prescription is simply a transformed drift [39].
There are several other advantages to Eq. (4) over Eq. (2). Extrema and most probable states of
M G , < M G > are simply derived by a variational principle, similar to conditions sought in previous
<
>,
studies [47]. In the Stratonovich prescription, necessary, albeit not sufﬁcient, conditions are given by
δ G L = L ,G − L ,G :t = 0 ,
˙
G′
˙ G′
L ,G :t = L ,GG ′ M + L ,G G ′ M .
˙˙ ¨
˙
˙ (8) G
G
˙G
For stationary states, M = 0, and ∂ L /∂ M = 0 deﬁnes << M >>, where the bars identify stationary variables; in this case, the macroscopic variables are equal to their mesoscopic counterparts. Note that L
is not the stationary solution of the system, e.g., to Eq. (3) with ∂ P /∂t = 0. However, in some cases [48],
L is a deﬁnite aid to ﬁnding such stationary states. Many times only properties of stationary states are
˙G
examined, but here a temporal dependence is included. E.g., the M terms in L permit steady states and
their ﬂuctuations to be investigated in a nonequilibrium context. Note that Eq. (8) must be derived from
the path integral, Eq. (4), which is at least one reason to justify its development. Optimization of Trading  10  Ingber & Mondescu 3.2. Algebraic Complexity Yields Simple Intuitive Results
It must be emphasized that the output of this formalism is not conﬁned to complex algebraic forms
or tables of numbers. Because L possesses a variational principle, sets of contour graphs, at different
longtime epochs of the pathintegral of P over its variables at all intermediate times, give a visually
intuitive and accurate decisionaid to view the dynamic evolution of the scenario. For example, this
Lagrangian approach permits a quantitative assessment of concepts usually only loosely deﬁned.
“Momentum” = ΠG = “Mass” = gGG ′ = “Force” = ∂L
,
∂(∂ M G /∂t ) ∂2 L
,
∂(∂ M G /∂t )∂(∂ M G ′ /∂t ) ∂L
,
∂M G “F = ma”: δ L = 0 = ∂L
∂
∂L
−
,
G
∂t ∂(∂ M G /∂t )
∂M (9) where M G are the variables and L is the Lagrangian. These physical entities provide another form of
intuitive, but quantitatively precise, presentation of these analyses. For example, daily newspapers use
some of this terminology to discuss the movement of security prices. In this paper, the ΠG serve as
canonical momenta indicators (CMI) for these systems. 3.2.1. Derived Canonical Momenta Indicators (CMI)
The extreme sensitivity of the CMI gives rapid feedback on changes in trends as well as the
volatility of markets, and therefore are good indicators to use for trading rules [29]. A timelocked
moving average provides manageable indicators for trading signals. This current project uses such CMI
developed as a byproduct of the ASA ﬁts described below. 3.3. Correlations
In this paper we report results of our onevariable trading model. However, it is straightforward to
include multivariable trading models in our approach, and we have done this, for example, with coupled
cash and futures S&P markets. Optimization of Trading  11  Ingber & Mondescu Correlations between variables are modeled explicitly in the Lagrangian as a parameter usually
designated ρ . This section uses a simple twofactor model to develop the correspondence between the
correlation ρ in the Lagrangian and that among the commonly written Wiener distribution dz .
Consider coupled stochastic differential equations for futures F and cash C :
dF = f F ( F , C ) dt + g F ( F , C )σ F dz F ,
ˆ
dC = f C ( F , C ) dt + gC ( F , C )σ C dzC ,
ˆ
< dz i >= 0 , i = { F , C } ,
< dz i (t ) dz j (t ′) >= dt δ (t − t ′) , i = j ,
< dz i (t ) dz j (t ′) >= ρ dt δ (t − t ′) , i ≠ j ,
0 , , t ≠ t ′ , 1 , t = t′ , δ (t − t ′) = (10) where < . > denotes expectations with respect to the multivariate distribution.
ˆ
These can be rewritten as Langevin equations (in the Ito prepoint discretization)
dF / dt = f F + g F σ F (γ +η 1 + sgn ρ γ −η 2 ) ,
ˆ
dC / dt = gC + gC σ C (sgn ρ γ −η 1 + γ +η 2 ) ,
ˆ
γ± = 1 √2 [1 ± (1 − ρ 2 )1/2 ]1/2 , ni = ( dt )1/2 pi ,
where p1 and p2 are independent [0,1] Gaussian distributions.
The equivalent shorttime probability distribution, P , for the above set of equations is
P = g1/2 (2π dt )−1/2 exp(− Ldt ) ,
L= 1
2 M † gM , (11) Optimization of Trading M=  12  Ingber & Mondescu F dF / dt − f , dC / dt − f C g = det( g) . (12) g, the metric in { F , C }space, is the inverse of the covariance matrix, ( g F σ )2
ˆ
g−1 = F F
ˆ ˆC ρ g g σ FσC ρ g F gC σ F σ C ˆˆ .
( gC σ C )2 ˆ (13) The CMI indicators are given by the formulas
ΠF = ( dF / dt − f F )
ρ ( dC / dt − f C )
,
− FC
( g F σ F )2 (1 − ρ 2 ) g g σ F σ C (1 − ρ 2 )
ˆ
ˆˆ ΠC = ρ ( dF / dt − f F )
( dC / dt − f C )
−CF
.
ˆˆ
( gC σ C )2 (1 − ρ 2 ) g g σ C σ F (1 − ρ 2 )
ˆ (14) 3.4. ASA Outline
The algorithm Adaptive Simulated Annealing (ASA) ﬁts shorttime probability distributions to
observed data, using a maximum likelihood technique on the Lagrangian. This algorithm has been
developed to ﬁt observed data to a theoretical cost function over a Ddimensional parameter space [13],
adapting for varying sensitivities of parameters during the ﬁt. The ASA code can be obtained at no
charge, via WWW from http://www.ingber.com/ or via FTP from ftp.ingber.com [12]. 3.4.1. General Description
Simulated annealing (SA) was developed in 1983 to deal with highly nonlinear problems [49], as an
extension of a MonteCarlo importancesampling technique developed in 1953 for chemical physics
problems. It helps to visualize the problems presented by such complex systems as a geographical terrain.
For example, consider a mountain range, with two “parameters,” e.g., along the North−South and
East−West directions. We wish to ﬁnd the lowest valley in this terrain. SA approaches this problem
similar to using a bouncing ball that can bounce over mountains from valley to valley. We start at a high
“temperature,” where the temperature is an SA parameter that mimics the effect of a fast moving particle
in a hot object like a hot molten metal, thereby permitting the ball to make very high bounces and being Optimization of Trading  13  Ingber & Mondescu able to bounce over any mountain to access any valley, given enough bounces. As the temperature is
made relatively colder, the ball cannot bounce so high, and it also can settle to become trapped in
relatively smaller ranges of valleys.
We imagine that our mountain range is aptly described by a “cost function.” We deﬁne probability
distributions of the two directional parameters, called generating distributions since they generate possible
valleys or states we are to explore. We deﬁne another distribution, called the acceptance distribution,
which depends on the difference of cost functions of the present generated valley we are to explore and
the last saved lowest valley. The acceptance distribution decides probabilistically whether to stay in a new
lower valley or to bounce out of it. All the generating and acceptance distributions depend on
temperatures.
In 1984 [50], it was established that SA possessed a proof that, by carefully controlling the rates of
cooling of temperatures, it could statistically ﬁnd the best minimum, e.g., the lowest valley of our
example above. This was good news for people trying to solve hard problems which could not be solved
by other algorithms. The bad news was that the guarantee was only good if they were willing to run SA
forever. In 1987, a method of fast annealing (FA) was developed [51], which permitted lowering the
temperature exponentially faster, thereby statistically guaranteeing that the minimum could be found in
some ﬁnite time. However, that time still could be quite long. Shortly thereafter, Very Fast Simulated
Reannealing (VFSR) was developed in 1987 [13], now called Adaptive Simulated Annealing (ASA),
which is exponentially faster than FA.
ASA has been applied to many problems by many people in many disciplines [15,16,52]. The
feedback of many users regularly scrutinizing the source code ensures its soundness as it becomes more
ﬂexible and powerful. 3.4.2. Mathematical Outline
i
ASA considers a parameter α k in dimension i generated at annealingtime k with the range
i
α k ∈[ Ai , Bi ] , calculated with the random variable y i ,
i
i
α k +1 = α k + y i ( Bi − Ai ) , (15) Optimization of Trading  14  Ingber & Mondescu y i ∈[−1, 1] . (16) The generating function gT ( y ) is deﬁned,
D gT ( y ) = Π
i =1 D
1
i
≡ Π gT ( y i ) ,
2( y i  + T i ) ln(1 + 1/T i ) i=1 (17) where the subscript i on T i speciﬁes the parameter index, and the k dependence in T i ( k ) for the annealing
schedule has been dropped for brevity. Its cumulative probability distribution is
yD y1 ∫ −∫1 GT ( y) = ... −1 i
GT ( yi ) = D i
dy ′1 . . . dy ′ D gT ( y ′) ≡ Π G T ( y i ) ,
i =1 1 sgn ( y i ) ln(1 +  y i /T i )
+
.
2
2
ln(1 + 1/T i ) (18) y i is generated from a ui from the uniform distribution
ui ∈U [0, 1] ,
1 y i = sgn (ui − )T i [(1 + 1/T i )2u −1 − 1] .
i 2 (19) It is straightforward to calculate that for an annealing schedule for T i
T i ( k ) = T 0i exp(−c i k 1/ D ) , (20) a global minima statistically can be obtained. I.e.,
∞ ∞ 0 0 D 1 1 Σ g k ≈ Σ [ Π 2 yi ci ] k = ∞ .
i =1
k
k (21) Control can be taken over c i , such that
T fi = T 0i exp(− m i ) when k f = exp ni ,
c i = m i exp(− ni / D) , (22) where m i and ni can be considered “free” parameters to help tune ASA for speciﬁc problems.
ASA has over 100 OPTIONS available for tuning. A few important ones were used in this project. Optimization of Trading  15  Ingber & Mondescu 3.4.3. Reannealing
Whenever doing a multidimensional search in the course of a complex nonlinear physical problem,
inevitably one must deal with different changing sensitivities of the α i in the search. At any given
annealingtime, the range over which the relatively insensitive parameters are being searched can be
“stretched out” relative to the ranges of the more sensitive parameters. This can be accomplished by
periodically rescaling the annealingtime k , essentially reannealing, every hundred or so acceptanceevents (or at some userdeﬁned modulus of the number of accepted or generated states), in terms of the
sensitivities si calculated at the most current minimum value of the cost function, C ,
si = ∂C /∂α i . (23) In terms of the largest si = smax , a default rescaling is performed for each k i of each parameter dimension,
whereby a new index k ′i is calculated from each k i ,
k i → k ′i ,
T ′ik ′ = T ik ( smax / si ) ,
k ′i = ( ln(T i0 /T ik ′ )/c i ) D . (24) T i0 is set to unity to begin the search, which is ample to span each parameter dimension. 3.4.4. Quenching
Another adaptive feature of ASA is its ability to perform quenching in a methodical fashion. This
is applied by noting that the temperature schedule above can be redeﬁned as
T i ( k i ) = T 0i exp(−c i k i Qi / D ), c i = m i exp(− ni Qi / D) , (25) in terms of the “quenching factor” Qi . The sampling proof fails if Qi > 1 as
D Σ Π 1/k Q /D = Σ 1/k Q < ∞ .
k
k
i i (26) This simple calculation shows how the “curse of dimensionality” arises, and also gives a possible
way of living with this disease. In ASA, the inﬂuence of large dimensions becomes clearly focussed on Optimization of Trading  16  Ingber & Mondescu the exponential of the power of k being 1/ D, as the annealing required to properly sample the space
becomes prohibitively slow. So, if resources cannot be committed to properly sample the space, then for
some systems perhaps the next best procedure may be to turn on quenching, whereby Qi can become on
the order of the size of number of dimensions.
The scale of the power of 1/ D temperature schedule used for the acceptance function can be altered
in a similar fashion. However, this does not affect the annealing proof of ASA, and so this may used
without damaging the sampling property. 3.4.5. Avoiding Repeating Cost Functions
Doing a recursive optimization is very CPU expensive, as essentially the crossproduct of parameter
spaces among the various levels of optimization is required.
Therefore, we have used an ASA OPTION for some of the parameters in the outershell trading
model optimization of training sets, ASA_QUEUE, which sets up a ﬁrstin ﬁrstout (FIFO) queue, of
userdeﬁned size Queue_Size to collect generated states. When a new state is generated, its parameters
are tested, within speciﬁed resolutions of a userdeﬁned array Queue_Resolution. When parameters sets
are repeated within this queue, the saved value of the cost function is returned without having to repeat
the calculation. 3.4.6. Multiple Local Minima
Our criteria for the global minimum of our cost function is minus the largest proﬁt over a selected
training data set (or in some cases, this value divided by the maximum drawdown). However, in many
cases this may not give us the best set of parameters to ﬁnd proﬁtable trading in test sets or in realtime
trading. Other considerations such as the total number of trades developed by the global minimum versus
other close local minima may be relevant. For example, if the global minimum has just a few trades,
while some nearby local minima (in terms of the value of the cost function) have many trades and was
proﬁtable in spite of our slippage factors, then the scenario with more trades might be more statistically
dependable to deliver proﬁts across testing and realtime data sets.
Therefore, for the outershell global optimization of training sets, we have used an ASA OPTION,
MULTI_MIN, which saves a userdeﬁned number of closest local minima within a userdeﬁned resolution Optimization of Trading  17  Ingber & Mondescu of the parameters. We then examine these results under several testing sets. 4. TRADING SYSTEM 4.1. Use of CMI
As the CMI formalism carries the relevant information regarding the prices dynamics, we have used
it as a signal generator for an automated trading system for S&P futures.
Based on a previous work [30] applied to daily closing data, the overall structure of the trading
ˆ
system consists in 2 layers, as follows: We ﬁrst construct the “shorttime” Lagrangian function in the Ito
representation (with the notation introduced in Section 3.3)
2
1 dF i
F
L (i i − 1) =
−f 2σ 2 F i2−x1 dt (27) with i the postpoint index, corresponding to the one factor price model
dF = f F dt + σ F x dz (t ) , (28) where f F and σ > 0 are taken to be constants, F (t ) is the S&P future price, and dz is the standard
Gaussian noise with zero mean and unit standard deviation. We perform a global, maximum likelihood ﬁt
to the whole set of price data using ASA. This procedure produces the optimization parameters { x , f F }
that are used to generate the CMI. One computational approach was to ﬁx the diffusion multiplier σ to 1
during training for convenience, but used as free parameters in the adaptive testing and realtime ﬁts.
Another approach was to ﬁx the scale of the volatility, using an improved model,
dF = f F dt + σ x
F
dz (t ) ,
< F > (29) where σ now is calculated as the standard deviation of the price increments ∆ F / dt 1/2 , and < F > is just
the average of the prices.
As already remarked, to enhance the CMI sensitivity and response time to local variations (across a
certain window size) in the distribution of price increments, the momenta are generated applying an
adaptive procedure, i.e., after each new data reading another set of { f F , σ } parameters are calculated for
the last window of data, with the exponent x — a contextual indicator of the noise statistics — ﬁxed to the Optimization of Trading  18  Ingber & Mondescu value obtained from the global ﬁt.
The CMI computed in this manner are fed into the outer shell of the trading system, where an AItype optimization of the trading rules is executed, using ASA once again.
The trading rules are a collection of logical conditions among the CMI, prices and optimization
parameters that could be window sizes, time resolutions, or trigger thresholds. Based on the relationships
between CMI and optimization parameters, a trading decision is made. The cost function in the outer
shell is either the overall equity or the riskadjusted proﬁt (essentially the return). The inner and outer
shell optimizations are coupled through some of the optimization parameters (e.g., time resolution of the
data, window sizes), which justiﬁes the recursive nature of the optimization.
Next, we describe in more details the concrete implementation of this system. 4.2. Data Processing
The CMI formalism is general and by construction permits us to treat multivariate coupled markets.
In certain conditions (e.g., shorter time scales of data), and also due to superior scalability across different
markets, it is desirable to have a trading system for a single instrument, in our case the S&P futures
contracts that are traded electronically on Chicago Mercantile Exchange (CME). The focus of our system
was intraday trading, at time scales of data used in generating the buy/sell signals from 10 to 60 secs. In
particular, we here give some results obtained when using data having a time resolution ∆t of 55 secs (the
time between consecutive data elements is 55 secs). This particular choice of time resolution reﬂects the
set of optimization parameters that have been applied in actual trading.
It is important to remark that a data point in our model does not necessarily mean an actual tick
datum. For some trading time scales and for noise reduction purposes, data is preprocessed into
sampling bins of length ∆t using either a standard averaging procedure or spectral ﬁltering (e.g., wavelets,
Fourier) of the tick data. Alternatively, the data can be deﬁned in block bins that contain disjoint sets of
averaged tick data, or in overlapping bins of widths ∆t that update at every ∆t ′ < ∆t , such that an effective
resolution ∆t ′ shorter than the width of the sampling bin is obtained. We present here work in which we
have used disjoint block bins and a standard average of the tick data with time stamps falling within the
bin width. Optimization of Trading  19  Ingber & Mondescu In Figs. 1 and 2 we present examples of S&P futures data sampled with 55 secs resolution. We
remark that there are several time scales — from mins to one hour — at which an automated trading
system might extract proﬁts. Fig. 2 illustrates the sustained short trading region of 1.5 hours and several
shorter long and short trading regions of about 1020 mins. Fig. 1 illustrates that the proﬁtable regions are
prominent even for data representing a relatively ﬂat market period. I.e., June 20 shows an uptrend region
of about 1 hour 20 mins and several short and long trading domains between 10 mins and 20 mins. In
both situations, there are a larger number of opportunities at time resolutions smaller than 5 mins.
The time scale at which we sample the data for trading is itself a parameter that is extracted from
the optimization of the trading rules and of the Lagrangian cost function Eq. (27). This is one of the
coupling parameters between the inner and the outershell optimizations. 4.3. InnerShell Optimization
A cycle of optimization runs has three parts, training and testing, and ﬁnally realtime use — a
variant of testing. Training consists in choosing a data set and performing the recursive optimization,
which produces optimization parameters for trading. In our case there are six parameters: the time
resolution ∆t of price data, the length of window W used in the local ﬁtting procedures and in
computation of moving averages of trading signals, the drift f F , volatility coefﬁcient σ and exponent x
from Eq. (28), and a multiplicative factor M necessary for the trading rules module, as discussed below.
The optimization parameters computed from the training set are applied then to various test sets and
ﬁnal proﬁt/loss analysis are produced. Based on these, the best set of optimization parameters are chosen
to be applied in realtime trading runs. We remark once again that a single training data set could support
more than one proﬁtable sets of parameters and can be a function of the trader’s interest and the speciﬁc
market dynamics targeted (e.g., short/long time scales). The optimization parameters corresponding to
the global minimum in the training session may not necessarily represent the parameters that led to robust
proﬁts across realtime data.
The training optimization occurs in two interrelated stages. An innershell maximum likelihood
optimization over all training data is performed. The cost function that is ﬁtted to data is the effective
action constructed from the Lagrangian Eq. (27) including the prefactors coming from the measure
element in the expression of the shorttime probability distribution Eq. (12). This is based on the fact [39] Optimization of Trading  20  Ingber & Mondescu that in the context of Gaussian multiplicative stochastic noise, the macroscopic transition probability
P ( F , t  F ′, t ′) to start with the price F ′ at t ′ and reach the price F at t is determined by the shorttime
Lagrangian Eq. (27),
P ( F , t  F ′, t ′) = 1
N exp − Σ L (i i − 1) dt i ,
2 F 2 x dt )1/2 i=1 (2π σ i−1 i (30) with dt i = t i − t i−1 . Recall that the main assumption of our model is that price increments (or the
logarithm of price ratios, depending on which variables are considered independent) could be described
by a system of coupled stochastic, nonlinear equations as in Eq. (10). These equations are deceptively
simple in structure, yet depending on the functional form of the drift coefﬁcients and the multiplicative
noise, they could describe a variety of interactions between ﬁnancial instruments in various market
conditions (e.g., constant elasticity of variance model [53], stochastic volatility models, etc.). In
particular, this type of models include the case of BlackScholes price dynamics ( x = 1).
In the system presented here, we have applied the model from Eq. (28). The ﬁtted parameters were
the drift coefﬁcient f F and the exponent x . In the case of a coupled futures and cash system, besides the
corresponding values of f F and x for the cash index, another parameter, the correlation coefﬁcient ρ as
introduced in Eq. (10), must be considered. 4.4. Trading Rules (OuterShell) Recursive Optimization
In the second part of the training optimization, we calculate the CMI and execute trades as required
by a selected set of trading rules based on CMI values, price data or combinations of both indicators.
Recall that three external shell optimization parameters are deﬁned: the time resolution ∆t of the
data expressed as the time interval between consecutive data points, the window length W (in number of
time epochs or data points) used in the adaptive calculation of CMI, and a numerical coefﬁcient M that
scales the momentum uncertainty discussed below.
At each moment a local reﬁt of f F and σ over data in the local window W is executed, moving the
window M across the training data set and using the zeroth order optimization parameters f F and x
resulting from the innershell optimization as a ﬁrst guess. It was found that a faster quasilocal code is
sufﬁcient for computational purposes for these adaptive updates. In more complicated models, ASA can
be successfully applied recursively, although in realtime trading the response time of the system is a Optimization of Trading  21  Ingber & Mondescu major factor that requires attention.
All expressions that follow can be generalized to coupled systems in the manner described in
Section 3. Here we use the one factor nonlinear model given by Eq. (28). At each time epoch we
calculate the following momentum related quantities:
ΠF = 1
σ 2 F 2x F
Π0 = − dF − fF , dt fF
,
σ 2 F 2x ∆Π F = < (Π F − < Π F >)2 >1/2 = 1
σ F x √ dt , (31) where we have used < Π F >= 0 as implied by Eqs. (28) and (27). In the previous expressions, Π F is the
F
CMI, Π0 is the neutral line or the momentum of a zero change in prices, and ∆Π F is the uncertainty of momentum. The last quantity reﬂects the Heisenberg principle, as derived from Eq. (28) by calculating
∆ F ≡ < ( dF − < dF >)2 >1/2 = σ F x √ , dt
∆Π F ∆ F ≥ 1 , (32) ˆ
where all expectations are in terms of the exact noise distribution, and the calculation implies the Ito
approximation (equivalent to considering nonanticipative functions). Various moving averages of these
momentum signals are also constructed. Other dynamical quantities, as the Hamiltonian, could be used as
well. (By analogy to the energy concept, we found that the Hamiltonian carries information regarding the
overall trend of the market, giving another useful measure of price volatility.)
Regarding the practical implementation of the previous relations for trading, some comments are
necessary. In terms of discretization, if the CMI are calculated at epoch i , then dF i = F i − F i−1 ,
ˆ
dt i = t i − t i−1 = ∆t , and all prefactors are computed at moment i − 1 by the Ito prescription (e.g.,
σ F x = σ F ix 1 ). The momentum uncertainty band ∆Π F can be calculated from the discretized theoretical
− value Eq. (31), or by computing the estimator of the standard deviation from the actual time series of Π F .
There are also two ways of calculating averages over CMI values: One way is to use the set of local
optimization parameters { f F , σ } obtained from the local ﬁt procedure in the current window W for all
CMI data within that window (localmodel average). The second way is to calculate each CMI in the Optimization of Trading  22  Ingber & Mondescu current local window W with another set { f F , σ } obtained from a previous local ﬁt window measured
from the CMI data backwards W points (multiplemodels averaged, as each CMI corresponds to a
different model in terms of the ﬁtting parameters { f F , σ }).
The last observation is that the neutral line divides all CMI in two classes: long signals, when
F
Π F > Π0 , as any CMI satisfying this condition indicates a positive price change, and short signals when
F
Π F < Π0 , which reﬂects a negative price change. After the CMI are calculated, based on their meaning as statistical momentum indicators, trades are
executed following a relatively simple model: Entry in and exit from a long (short) trade points are
deﬁned as points where the value of CMIs is greater (smaller) than a certain fraction of the uncertainty
band M ∆Π F (− M ∆Π F ), where M is the multiplicative factor mentioned in the beginning of this
subsection. This is a choice of a symmetric trading rule, as M is the same for long and short trading
signals, which is suitable for volatile markets without a sustained trend, yet without diminishing too
severely proﬁts in a strictly bull or bear region.
Inside the momentum uncertainty band, one could deﬁne rules to stay in a previously open trade, or
exit immediately, because by its nature the momentum uncertainty band implies that the probabilities of
price movements in either direction (up or down) are balanced. From another perspective, this type of
trading rule exploits the relaxation time of a strong market advance or decline, until a trend reversal
occurs or it becomes more probable.
Other sets of trading rules are certainly possible, by utilizing not only the current values of the
momenta indicators, but also their localmodel or multiplemodels averages. A trading rule based on the
F
maximum distance between the current CMI data ΠiF and the neutral line Π0 shows faster response to markets evolution and may be more suitable to automatic trading in certain conditions.
Stepping through the trading decisions each trading day of the training set determined the
proﬁt/loss of the training set as a single value of the outersell cost function. As ASA importancesampled the outershell parameter space {∆t , W , M }, these parameters are fed into the inner shell, and a
new innershell recursive optimization cycle begins. The ﬁnal values for the optimization parameters in
the training set are ﬁxed when the largest net proﬁt (calculated from the total equity by subtracting the
transactions costs deﬁned by the slippage factor) is realized. In practice, we have collected optimization
parameters from multiple local minima that are near the global minimum (the outershell cost function is Optimization of Trading  23  Ingber & Mondescu deﬁned with the sign reversed) of the training set.
The values of the optimization parameters {∆t , W , M , f F , σ , x } resulting from a training cycle are
then applied to outofsample test sets. During the test run, the drift coefﬁcient f F and the volatility
coefﬁcient σ are reﬁtted adaptively as described previously. All other parameters are ﬁxed. We have
mentioned that the optimization parameters corresponding to the highest proﬁt in the training set may not
be the sufﬁciently robust across test sets. Then, for all test sets, we have tested optimization parameters
related to the multiple minima (i.e., the global maximum proﬁt, the second best proﬁt, etc.) resulting from
the training set.
We performed a bootstraptype reversal of the trainingtest sets (repeating the training runs
procedures using one of the test sets, including the previous training set in the new batch of test sets),
followed by a selection of the best parameters across all data sets. This is necessary to increase the
chances of successful trading sessions in realtime. 5. RESULTS 5.1. Alternative Algorithms
In the previous sections we noted that there are different combinations of methods of processing
data, methods of computing the CMI and various sets of trading rules that need to be tested — at least in a
sampling manner — before launching trading runs in realtime:
1. Data can be preprocessed in block or overlapping bins, or forecasted data derived from the most
probable transition path [41] could be used as in one of our most recent models.
2. Exponential smoothing, wavelets or Fourier decomposition can be applied for statistical
processing. We presently favor exponential moving averages.
3. The CMI can be calculated using averaged data or directly with tick data, although the
optimization parameters were ﬁtted from preprocessed (averaged) price data.
4. The trading rules can be based on current signals (no average is performed over the signal
themselves), on various averages of the CMI trading signals, on various combination of CMI data
(momenta, neutral line, uncertainty band), on symmetric or asymmetric trading rules, or on mixed priceCMI trading signals. Optimization of Trading  24  Ingber & Mondescu 5. Different models (one and twofactors coupled) can be applied to the same market instrument,
e.g., to deﬁne complementary indicators.
The selection process evidently must consider many speciﬁc economic factors (e.g., liquidity of a
given market), besides all other physical, mathematical and technical considerations. In the work
presented here, as we tested our system and using previous experience, we focused toward S&P500
futures electronic trading, using block processed data, and symmetric, localmodel and multiplemodels
trading rules. In Table 1 we show results obtained for several training and testing sets in the mentioned
context. 5.2. Trading System Design
The design of a successful electronic trading system is complex as it must incorporate several
aspects of a trader’s actions that sometimes are difﬁcult to translate into computer code. Three important
features that must be implemented are factoring in the transactions costs, devising money management
techniques, and coping with execution deﬁciencies.
Generally, most trading costs can be included under the “slippage factor,” although this could easily
lead to poor estimates. Given that the margin of proﬁts from exploiting market inefﬁciencies are thin, a
high slippage factor can easily result in a nonproﬁtable trading system. In our situation, for testing
purposes we used a $35 slippage factor per buy & sell order, a value we believe is rather high for an
electronic trading environment, although it represents less than three ticks of a miniS&P futures contract.
(The miniS&P is the S&P futures contract that is traded electronically on CME.) This higher value was
chosen to protect ourselves against the bidask spread, as our trigger price (at what price the CMI was
generated) and execution price (at what price a trade signaled by a CMI was executed) were taken to be
equal to the trading price. (We have changed this aspect of our algorithm in later models.) The slippage
is also strongly inﬂuenced by the time resolution of the data. Although the slippage is linked to bidask
spreads and markets volatility in various formulas [54], the best estimate is obtained from experience and
actual trading.
Money management was introduced in terms of a trailing stop condition that is a function of the
price volatility, and a stoploss threshold that we ﬁxed by experiment to a multiple of the miniS&P
contract value ($200). It is tempting to tighten the trailing stop or to work with a small stoploss value, Optimization of Trading  25  Ingber & Mondescu yet we found — as otherwise expected — that higher losses occurred as the signals generated by our
stochastic model were bypassed.
Regarding the execution process, we have to account for the response of the system to various
execution conditions in the interaction with the electronic exchange: partial ﬁlls, rejections, uptick rule
(for equity trading), etc. Except for some special conditions, all these steps must be automated. 5.3. Some Explicit Results
Typical CMI data in Figs. 3 and 4 (obtained from realtime trading after a full cycle of trainingtesting was performed) are related to the price data in Figs. 1 and 2. We have plotted the fastest (55 secs
F
apart) CMI values Π F , the neutral line Π0 and the uncertainty band ∆Π F . All CMI data were produced using the optimization parameters set {55 secs, 88 epochs, 0. 15} of the secondbest net proﬁt obtained
with the training set “4D ESM0 03210324” (Table 1).
Although the CMIs exhibit an inherently ragged nature and oscillate around a zero mean value
within the uncertainty band — the width of which is decreasing with increasing price volatility, as the
uncertainty principle would also indicate — time scales at which the CMI average or some persistence
time are not balanced about the neutral line.
These characteristics, which we try to exploit in our system, are better depicted in Figs. 5 and 6.
F
One set of trading signals, the localmodel average of the neutral line < Π0 > and the uncertainty band multiplied by the optimization factor M = 0. 15, and centered around the theoretical zero mean of the
F
CMI, is represented versus time. Note entry points in a short trading position (< Π0 > > M ∆Π F ) at around 10:41 (Fig. 5 in conjunction with S&P data in Fig. 1) with a possible exit at 11:21 (or later), and a
F
F
ﬁrst long entry (< Π0 > < − M ∆Π F ) at 12:15. After 14:35, a stay long region appears (< Π0 > < 0), which indicates correctly the price movement in Fig.1.
In Fig. 6 corresponding to June 22 price data from Fig. 2, a ﬁrst long signal is generated at around
12:56 and a ﬁrst short signal is generated at 14:16 that reﬂects the long downtrend region in Fig. 2. Due
to the averaging process, a time lag is introduced, reﬂected by the long signal at 12:56 in Fig. 4, related to
a past upward trend seen in Fig. 2; yet the neutral line relaxes rather rapidly (given the 55 sec time
resolution and the window of 88 ≈ 1.5 hour) toward the uncertainty band. A judicious choice of trading
rules, or avoiding standard averaging methods, helps in controlling this lag problem. Optimization of Trading  26  Ingber & Mondescu In Tables 1 and 2 we show some results obtained for several training and testing sets following the
procedures described at the end of the previous section. In both tables, under the heading “Training” or
“Testing Set” we specify the data set used (e.g., “4D ESM0 03210324” represents four days of data from
the miniS&P futures contract that expired in June). The type of trading rules used is identiﬁed by
“LOCAL MODEL” or “MULTIPLE MODELS” tags. These tags refer to how we calculate the averages
of the trading signals: either by using a single pair of optimization parameters { f F , σ } for all CMI data
within the current adaptive ﬁt window, or a different pair { f F , σ } for each CMI data. In the “Statistics”
column we report the net (subtracting the slippage) proﬁt or loss (in parenthesis) across the whole data
set, the total number of trades (“trades”), the number of days with positive balance (“days +”), and the
percentage of winning trades (“winners”). The “Parameters” are the optimization parameters resulting
from the ﬁrst three best proﬁt maxima of each listed training set. The parameters are listed in the order
{∆t , W , M }, with the data time resolution ∆t measured in seconds, the length of the local ﬁt window W
measured in time epochs, and M the numerical coefﬁcient of the momentum uncertainty band ∆Π F .
Recall that the trading rules presented are symmetric (the long and short entry/exit signals are
controlled by the same M factor), and we apply a staylong condition if the neutralline is below the
F
average momentum < Π F >= 0 and stayshort if Π0 > 0. The drift f F and volatility coefﬁcient σ are reﬁtted adaptively and the exponent x is ﬁxed to the value obtained in the training set. Typical values are
f F ∈ ± [0. 003: 0. 05], x ∈ ± [0. 01: 0. 03]. During the local ﬁt, due to the shorter time scale involved, the
drift may increase by a factor of ten, and σ ∈[0. 01: 1. 2].
Comparing the data in the training and testing tables, we note that the most robust optimization
factors — in terms of maximum cumulative proﬁt resulted for all test sets — do not correspond to the
maximum proﬁt in the training sets: For the localmodel rules, the optimum parameters are
{55, 88, 0. 15}, and for the multiple models rules the optimum set is {45, 72, 0. 2}, both realized by the
training set “4D ESM0 03210324.”
Other observations are that, for the data presented here, the multiplemodels averages trading rules
consistently performed better and are more robust than the localmodel averages trading rules. The
number of trades is similar, varying between 15 and 35 (eliminating cumulative values smaller than 10
trades), and the time scale of the local ﬁt is rather long in the 30 mins to 1.5 hour range. In the current
setup, this extended time scale implies that is advisable to deploy this system as a traderassisted tool. Optimization of Trading  27  Ingber & Mondescu An important factor is the average length of the trades. For the type of rules presented in this work,
this length is of several minutes, up to one hour, as the time scale of the local ﬁt window mentioned above
suggested.
Related to the length of a trade is the length of a winning long/short trade in comparison to a losing
long/short trade. Our experience indicates that a ratio of 2:1 between the length of a winning trade and
the length of a losing trade is desirable for a reliable trading system. Here, using the localmodel trading
rules seems to offer an advantage, although this is not as clear as one would expect.
Finally, the training sets data (Table 1) show that the percentage of winners is markedly higher in
the case of multiplemodels average than localaverage trading rules. In the testing sets (Table 2) the
situation is almost reversed, albeit the overall proﬁts (losses) are higher (smaller) in the multiplemodel
case. Apparently, the multiplemodel trading rules can stay in winning trades longer to increase proﬁts,
relative to losses incurred with these rules in losing trades. (In the testing sets, this correlates with the
higher number of trades executed using localmodel trading rules.) 6. CONCLUSIONS 6.1. Main Features
The main stages of building and testing this system were:
1. We developed a multivariate, nonlinear statistical mechanics model of S&P futures and cash
markets, based on a system of coupled stochastic differential equations.
2. We constructed a twostage, recursive optimization procedure using methods of ASA global
optimization: An innershell extracts the characteristics of the stochastic price distribution and an outershell generates the technical indicators and optimize the trading rules.
3. We trained the system on different sets of data and retained the multiple minima generated
(corresponding to the global maximum net proﬁt realized and the neighboring proﬁt maxima).
4. We tested the system on outofsample data sets, searching for most robust optimization
parameters to be used in realtime trading. Robustness was estimated by the cumulative proﬁt/loss across
diverse test sets, and by testing the system against a bootstraptype reversal of trainingtesting sets in the
optimization cycle. Optimization of Trading  28  Ingber & Mondescu Modeling the market as a dynamical physical system makes possible a direct representation of
empirical notions as market momentum in terms of CMI derived naturally from our theoretical model.
We have shown that other physical concepts as the uncertainty principle may lead to quantitative signals
(the momentum uncertainty band ∆Π F ) that captures other aspects of market dynamics and which can be
used in realtime trading. 6.2. Summary
We have presented a description of a trading system composed of an outershell tradingrule model
and an innershell nonlinear stochastic dynamic model of the market of interest, S&P500. The innershell
is developed adhering to the mathematical physics of multivariate nonlinear statistical mechanics, from
which we develop indicators for the tradingrule model, i.e., canonical momenta indicators (CMI). We
have found that keeping our model faithful to the underlying mathematical physics is not a limiting
constraint on proﬁtability of our system; quite the contrary.
An important result of our work is that the ideas for our algorithms, and the proper use of the
mathematical physics faithful to these algorithms, must be supplemented by many practical
considerations en route to developing a proﬁtable trading system. For example, since there is a subset of
parameters, e.g., time resolution parameters, shared by the inner and outershell models, recursive
optimization is used to get the best ﬁts to data, as well as developing multiple minima with approximate
similar proﬁtability. The multiple minima often have additional features requiring consideration for realtime trading, e.g., more trades per day increasing robustness of the system, etc. The nonlinear stochastic
nature of our data required a robust global optimization algorithm. The output of these parameters from
these training sets were then applied to testing sets on outofsample data. The best models and
parameters were then used in realtime by traders, further testing the models as a precursor to eventual
deployment in automated electronic trading.
We have used methods of statistical mechanics to develop our innershell model of market
dynamics and a heuristic AI type model for our outershell tradingrule model, but there are many other
candidate (quasi)global algorithms for developing a cost function that can be used to ﬁt parameters to
data, e.g., neural nets, fractal scaling models, etc. To perform our ﬁts to data, we selected an algorithm,
Adaptive Simulated Annealing (ASA), that we were familiar with, but there are several other candidate Optimization of Trading  29  Ingber & Mondescu algorithms that likely would sufﬁce, e.g., genetic algorithms, tabu search, etc.
We have shown that a minimal set of trading signals (the CMI, the neutral line representing the
momentum of the trend of a given time window of data, and the momentum uncertainty band) can
generate a rich and robust set of trading rules that identify proﬁtable domains of trading at various time
scales. This is a conﬁrmation of the hypothesis that markets are not efﬁcient, as noted in other
studies [11,30,55]. 6.3. Future Directions
Although this paper focused on trading of a single instrument, the futures S&P 500, the code we
have developed can accommodate trading on multiple markets. For example, in the case of tickresolution coupled cash and futures markets, which was previously prototyped for interday
trading [29,30], the utility of CMI stems from three directions:
(a) The innershell ﬁtting process requires a global optimization of all parameters in both futures
and cash markets.
(b) The CMI for futures contain, by our Lagrangian construction, the coupling with the cash market
through the offdiagonal correlation terms of the metric tensor. The correlation between the futures and
cash markets is explicitly present in all futures variables.
(c) The CMI of both markets can be used as complimentary technical indicators for trading in
futures market.
Several near term future directions are of interest: orienting the system toward shorter trading time
scales (1030 secs) more suitable for electronic trading, introducing fast response “averaging” methods
and time scale identiﬁers (exponential smoothing, wavelets decomposition), identifying minicrashes
points using renormalization group techniques, investigating the use of CMI in patternrecognition based
trading rules, and exploring the use of forecasted data evaluated from most probable transition path
formalism.
Our efforts indicate the invaluable utility of a joint approach (AIbased and quantitative) in
developing automated trading systems. Optimization of Trading  30  Ingber & Mondescu 6.4. Standard Disclaimer
We must emphasize that there are no claims that all results are positive or that the present system is
a safe source of riskless proﬁts. There as many negative results as positive, and a lot of work is necessary
to extract meaningful information. Our purpose here is to describe an approach to developing an
electronic trading system complementary to those based on neuralnetworks type technical analysis and
pattern recognition methods. The system discussed in this paper is rooted in the physical principles of
nonequilibrium statistical mechanics, and we have shown that there are conditions under which such a
model can be successful. ACKNOWLEDGMENTS
We thank Donald Wilson for his ﬁnancial support. We thank K.S. Balasubramaniam and Colleen
Chen for their programming support and participation in formulating parts of our trading system. Data
was extracted from the DRW Reuters feed. Optimization of Trading  31  Ingber & Mondescu REFERENCES
[1] E. Peters, Chaos and Order in the Capital Markets, Wiley & Sons, New York,NY, 1991. [2] E.M. Azoff, Neural Network Time Series Forecasting of Financial Markets, Wiley & Sons, New
York,NY, 1994. [3] E. Gately, Neural Networks for ﬁnancial Forecasting, Wiley & Sons, New York,NY, 1996. [4] L. Ingber, ‘‘Highresolution pathintegral development of ﬁnancial options,’’ Physica A, vol. 283,
pp. 529558, 2000. [5] B.B. Mandelbrot, Fractals and Scaling in Finance, SpringerVerlag, New York, NY, 1997. [6] R.N. Mantegna and H.E. Stanley, ‘‘Turbulence and ﬁnancial markets,’’ Nature, vol. 383, pp.
587588, 1996. [7] L. Laloux, P. Cizeau, J.P. Bouchaud, and M. Potters, ‘‘Noise dressing of ﬁnancial correlation
matrices,’’ Phys. Rev. Lett., vol. 83, pp. 14671470, 1999. [8] A. Johansen, D. Sornette, and O. Ledoit, Predicting ﬁnancial crashes using discrete scale
invariance, http://xxx.lanl.gov/condmat, 2000. [9] K. Ilinsky and G. Kalinin, BlackScholes equation from gauge theory of arbitrage,
http://xxx.lanl.gov/hepth/9712034, 1997. [10] J.C. Hull, Options, Futures, and Other Derivatives, 4th Edition, Prentice Hall, Upper Saddle River,
NJ, 2000.
[11] L. Ingber, ‘‘Statistical mechanics of nonlinear nonequilibrium ﬁnancial markets,’’ Math. Modelling,
vol. 5, pp. 343361, 1984.
[12] L. Ingber, ‘‘Adaptive Simulated Annealing (ASA),’’ Global optimization Ccode, Caltech Alumni
Association, Pasadena, CA, 1993.
[13] L. Ingber, ‘‘Very fast simulated reannealing,’’ Mathl. Comput. Modelling, vol. 12, pp. 967973,
1989.
[14] L. Ingber and B. Rosen, ‘‘Genetic algorithms and very fast simulated reannealing: A comparison,’’
Oper. Res. Management Sci., vol. 33, pp. 523, 1993. Optimization of Trading  32  Ingber & Mondescu [15] L. Ingber, ‘‘Adaptive simulated annealing (ASA): Lessons learned,’’ Control and Cybernetics, vol.
25, pp. 3354, 1996.
[16] L. Ingber, ‘‘Simulated annealing: Practice versus theory,’’ Mathl. Comput. Modelling, vol. 18, pp.
2957, 1993.
[17] G. Indiveri, G. Nateri, L. Raffo, and D. Caviglia, ‘‘A neural network architecture for defect
detection through magnetic inspection,’’ Report, University of Genova, Genova, Italy, 1993.
[18] B. Cohen, Training synaptic delays in a recurrent neural network, M.S. Thesis, TelAviv University,
TelAviv, Israel, 1994.
[19] R.A. CozzioBueler, ‘‘The design of neural networks using a priori knowledge,’’ Ph.D. Thesis,
¨
Swiss Fed. Inst. Tech., Zurich, Switzerland, 1995.
[20] D.G. Mayer, P.M. Pepper, J.A. Belward, K. Burrage, and A.J. Swain, ‘‘Simulated annealing  A
robust optimization technique for ﬁtting nonlinear regression models,’’ in Proceedings ’Modelling,
Simulation and Optimization’ Conference, International Association of Science and Technology for
Development (IASTED), 69 May 1996 Gold Coast, 1996.
[21] S. Sakata and H. White, ‘‘High breakdown point conditional dispersion estimation with application
to S&P 500 daily returns volatility,’’ Econometrica, vol. 66, pp. 529567, 1998.
´
´
[22] L. Bachelier, ‘‘Theorie de la Speculation,’’ Annales de l’Ecole Normale Superieure, vol. 17, pp.
´
2186, 1900.
[23] M. C. Jensen, ‘‘Some anomalous evidence regarding market efﬁciency, an editorial introduction,’’ J.
Finan. Econ., vol. 6, pp. 95101, 1978.
[24] B. B. Mandelbrot, ‘‘When can price be arbitraged efﬁciently? A limit to the validity of the random
walk and martingale models,’’ Rev. Econ. Statist., vol. 53, pp. 225236, 1971.
[25] S. J. Taylor, ‘‘Tests of the random walk hypothesis against a pricetrend hypothesis,’’ J. Finan.
Quant. Anal., vol. 17, pp. 3761, 1982.
[26] P. Brown, A. W. Kleidon, and T. A. Marsh, ‘‘New evidence on the nature of sizerelated anomalies
in stock prices,’’ J. Fin. Econ., vol. 12, pp. 3356, 1983.
[27] J.A. Nelder and R. Mead, ‘‘A simplex method for function minimization,’’ Computer J. (UK), vol.
7, pp. 308313, 1964. Optimization of Trading  33  Ingber & Mondescu [28] G.P. Barabino, G.S. Barabino, B. Bianco, and M. Marchesi, ‘‘A study on the performances of
simplex methods for function minimization,’’ in Proc. IEEE Int. Conf. Circuits and Computers,
1980, pp. 11501153.
[29] L. Ingber, ‘‘Canonical momenta indicators of ﬁnancial markets and neocortical EEG,’’ in Progress
in Neural Information Processing, , ed. by S.I. Amari, L. Xu, I. King, and K.S. Leung, Springer,
New York, 1996, pp. 777784.
[30] L. Ingber, ‘‘Statistical mechanics of nonlinear nonequilibrium ﬁnancial markets: Applications to
optimized trading,’’ Mathl. Computer Modelling, vol. 23, pp. 101121, 1996.
[31] H. Haken, Synergetics, 3rd ed., Springer, New York, 1983.
[32] L. Ingber, M.F. Wehner, G.M. Jabbour, and T.M. Barnhill, ‘‘Application of statistical mechanics
methodology to termstructure bondpricing models,’’ Mathl. Comput. Modelling, vol. 15, pp.
7798, 1991.
[33] L. Ingber, ‘‘Statistical mechanics of neocortical interactions: A scaling paradigm applied to
electroencephalography,’’ Phys. Rev. A, vol. 44, pp. 40174060, 1991.
[34] K.S. Cheng, ‘‘Quantization of a general dynamical system by Feynman’s path integration
formulation,’’ J. Math. Phys., vol. 13, pp. 17231726, 1972.
[35] H. Dekker, ‘‘Functional integration and the OnsagerMachlup Lagrangian for continuous Markov
processes in Riemannian geometries,’’ Phys. Rev. A, vol. 19, pp. 21022111, 1979.
[36] R. Graham, ‘‘Pathintegral methods in nonequilibrium thermodynamics and statistics,’’ in
Stochastic Processes in Nonequilibrium Systems, , ed. by L. Garrido, P. Seglar, and P.J. Shepherd,
Springer, New York, NY, 1978, pp. 82138.
[37] F. Langouche, D. Roekaerts, and E. Tirapegui, ‘‘Discretization problems of functional integrals in
phase space,’’ Phys. Rev. D, vol. 20, pp. 419432, 1979.
[38] F. Langouche, D. Roekaerts, and E. Tirapegui, ‘‘Short derivation of Feynman Lagrangian for
general diffusion process,’’ J. Phys. A, vol. 113, pp. 449452, 1980.
[39] F. Langouche, D. Roekaerts, and E. Tirapegui, Functional Integration and Semiclassical
Expansions, Reidel, Dordrecht, The Netherlands, 1982. Optimization of Trading  34  Ingber & Mondescu [40] M. RosaClot and S. Taddei, A path integral approach to derivative security pricing: I. Formalism
and analytic results, INFN, Firenze, Italy, 1999.
[41] H. Dekker, ‘‘On the most probable transition path of a general diffusion process,’’ Phys. Lett. A, vol.
80, pp. 99101, 1980.
[42] P. Hagedorn, NonLinear Oscillations, Oxford Univ., New York, NY, 1981.
[43] B. Oksendal, Stochastic Differential Equations, Springer, New York, NY, 1998.
[44] J.M. Harrison and D. Kreps, ‘‘Martingales and arbitrage in multiperiod securities markets,’’ J.
Econ. Theory, vol. 20, pp. 381408, 1979.
[45] S.R. Pliska, Introduction to Mathematical Finance, Blackwell, Oxford, UK, 1997.
[46] C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences,
SpringerVerlag, Berlin, Germany, 1983.
[47] R. C. Merton, ‘‘An intertemporal capital asset pricing model,’’ Econometrica, vol. 41, pp. 867887,
1973.
[48] L. Ingber, ‘‘Statistical mechanics of neocortical interactions: Stability and duration of the 7+−2 rule
of shorttermmemory capacity,’’ Phys. Rev. A, vol. 31, pp. 11831186, 1985.
[49] S. Kirkpatrick, C.D. Gelatt, Jr., and M.P. Vecchi, ‘‘Optimization by simulated annealing,’’ Science,
vol. 220, pp. 671680, 1983.
[50] S. Geman and D. Geman, ‘‘Stochastic relaxation, Gibbs distribution and the Bayesian restoration in
images,’’ IEEE Trans. Patt. Anal. Mac. Int., vol. 6, pp. 721741, 1984.
[51] H. Szu and R. Hartley, ‘‘Fast simulated annealing,’’ Phys. Lett. A, vol. 122, pp. 157162, 1987.
[52] M. Wofsey, ‘‘Technology: Shortcut tests validity of complicated formulas,’’ The Wall Street
Journal, vol. 222, pp. B1, 1993.
[53] J. C. Cox and S. A. Ross, ‘‘The valuation of options for alternative stochastic processes,’’ J. Fin.
Econ., vol. 3, pp. 145166, 1976.
[54] P.J. Kaufman, Trading systems and Methods, 3rd edition, John Wiley & Sons, New York, NY, 1998.
[55] W. Brock, J. Lakonishok, and B. LeBaron, ‘‘Simple technical trading rules and the stochastic
properties of stock returns,’’ J. Finance, vol. 47, pp. 17311763, 1992. Optimization of Trading  35  Ingber & Mondescu FIGURE CAPTIONS
Figure 1. Futures and cash data, contract ESU0 June 20: solid line — futures; dashed line — cash.
Figure 2. Futures and cash data, contract ESU0 June 22: solid line — futures; dashed line — cash.
Figure 3. CMI data, realtime trading June 20: solid line — CMI; dashed line — neutral line;
dotted line — uncertainty band.
Figure 4. CMI data, realtime trading, June 22: solid line — CMI; dashed line — neutral line;
dotted line — uncertainty band.
Figure 5. CMI trading signals, realtime trading June 20: dashed line — localmodel average of the
neutral line; dotted line — uncertainty band multiplied by the optimization parameter M = 0. 15.
Figure 6. CMI trading signals, realtime trading June 22: dashed line — localmodel average of the
neutral line; dotted line — uncertainty band multiplied by the optimization parameter M = 0. 15. Optimization of Trading
TABLE CAPTIONS
Table 1. Matrix of Training Runs.
Table 2. Matrix of Testing Runs.  36  Ingber & Mondescu Optimization of Trading  Figure 1  Ingber & Mondescu ESU0 data June 20
time resolution = 55 secs 1505 Futures
Cash 1500
1495 S&P 1490
1485 1480
1475
1470
1465 0620 10:46:16 0620 11:45:53 0620 12:45:30 0620 13:45:07 TIME (mmdd hhmmss) 0620 14:44:44 Optimization of Trading  Figure 2  Ingber & Mondescu ESU0 data June 22
time resolution = 55 secs
1485
Futures
Cash 1480
1475 S&P 1470
1465
1460
1455
1450
0622 12:56:53 0622 13:56:30
TIME (mmdd hhmmss) 0622 14:56:07 Optimization of Trading  Figure 3  Ingber & Mondescu Canonical Momenta Indicators (CMI)
time resolution = 55 secs
F Π (CMI Futures) 8 F Π 0 (neutral CMI)
F ∆Π (theory) CMI 4 0 4 8 0620 10:46:16 0620 11:45:53 0620 12:45:30 0620 13:45:07 TIME (mmdd hhmmss) 0620 14:44:44 Optimization of Trading  Figure 4  Ingber & Mondescu Canonical Momenta Indicators (CMI)
time resolution = 55 secs F 8 Π (CMI Futures)
Π F (neutral CMI) 0
F ∆Π (theory) CMI 4 0 4 8 0622 12:56:53 0622 13:56:30
TIME (mmdd hhmmss) 0622 14:56:07 Optimization of Trading  Figure 5  Ingber & Mondescu Canonical Momenta Indicators (CMI)
time resolution = 55 secs
1
F <Π 0> (local)
M ∆Π F CMI 0.5 0 0.5 1 0620 10:46:16 0620 11:45:53 0620 12:45:30 0620 13:45:07 TIME (mmdd hhmmss) 0620 14:44:44 Optimization of Trading  Figure 6  Ingber & Mondescu Canonical Momenta Indicators
time resolution = 55 secs 1
F < Π 0>
M ∆Π F CMI 0.5 0 0.5 1 0622 12:56:53 0622 13:56:30
TIME (mmdd hhmmss) 0622 14:56:07 Optimization of Trading  Table 1  TRAINING SET TRADING RULES STATISTICS 4D ESM0 03210324 LOCAL MODEL Parameters
$ proﬁt (loss)
# trades
# days +
% winners
Parameters
$ proﬁt (loss)
# trades
# days +
% winners
Parameters
$ proﬁt (loss)
# trades
# days +
% winners
Parameters
$ proﬁt (loss)
# trades
# days +
% winners
Parameters
$ proﬁt (loss)
# trades
# days +
% winners
Parameters
$ proﬁt (loss)
# trades
# days +
% winners MULTIPLE MODELS 5D ESM0 03270331 LOCAL MODEL MULTIPLE MODELS 5D ESM0 04100414 LOCAL MODEL MULTIPLE MODELS Ingber & Mondescu
PARAMETERS (∆ t W M )
55 90 0.125
1390
16
3
75
45 76 0.175
2270
18
4
83
20 22 0.60
437
15
3
67
45 74 0.25
657.5
3
5
100
50 102 0.10
1875
35
3
60
45 46 0.25
2285
39
3
72 55 88 0.15
1215
16
3
75
45 72 0.20
2167.5
17
4
88
20 24 0.55
352
16
3
63
40 84 0.175
635
19
3
68
50 142 0.10
1847
19
3
58
40 48 0.30
2145
23
3
87 60 40 0.275
1167
17
3
76
60 59 0.215
1117.5
17
3
76
10 54 0.5
(35)
1
0
0
30 110 0.15
227.5
26
2
65
35 142 0.10
1485
34
4
62
60 34 0.30
1922.5
29
3
72 Optimization of Trading  Table 2  TESTING SETS STATISTICS 5D ESM0 03270331 $ proﬁt (loss)
# trades
# days +
% winners
$ proﬁt (loss)
# trades
# days +
% winners
$ proﬁt (loss)
# trades
# days +
% winners 4D ESM0 04030407 5D ESM0 04100414 Ingber & Mondescu PARAMETERS (∆ t W M )
LOCAL MODEL
MULTIPLE MODELS
55 90 0.125
55 88 0.15
60 40 0.275
45 76 0.175
45 72 0.20
60 59 0.215
(712)
20
2
50
(30)
18
3
56
750
30
3
60 (857)
17
2
47
258
13
3
54
1227
21
3
62 (1472)
16
1
44
602
16
2
56
(117)
23
3
48 (605)
18
3
67
1340
16
1
50
(530)
23
2
48 (220)
12
1
67
2130
17
1
53
(1125)
20
2
50 (185)
11
1
54
932
13
1
38
(380)
18
3
50 ...
View
Full
Document
 Winter '11
 BARNARD
 Physics, pH, The Land

Click to edit the document details