v0.36, 24 Mar 2013
1. Wengert List: List of all the non over-writable program variables (Wengert (1964)) – can also be seen as a linearization of the computational graph. By construction, it is an intermediate variable.
2. Intermediate Wengert Canonical Variable: These are intermediate financial variables those are fixed from the point-of-view of the output Jacobians and the input parameters that serve as computation graph parsimonious optimizers (Figures 1 and 2).
3. Wengert fan-in and fan-out: Reduction of a set of initial/intermediate Wengert variates onto the subsequent set is called fan-in; the opposite is fan-out.
4. Wengert funneling: Same as Wengert fan-in.
5. Micro-Jacobian: Change in the calibrated instrument measure coefficients to unit change in the quoted instrument measures.
6. Self-Jacobian: Self-Jacobian refers to the Jacobian of the Objective Function at any point in the variate to the Objective Function at the segment nodes, i.e., . Self-Jacobian is a type of micro-Jacobian.
7. Derivative Entity: The entity whose dynamics are determined by the evolution of a stochastic variate, and whose specific facets/measures are observable.
8. Path-wise Derivative Estimator: , where V is the value of the derivative, and is the starting value for a specific stochastic variate.
9. Non-Parsimonized Parameters: Parameters that map one-to-one with the input instrument set, e.g., typical curve bootstrapping.
10. Parsimonization: Reduction of the parameter space from the input measure space.
Motivation and Literature Review
1. AD History: Iri (1991)
2. Mathematical Foundations: Griewank (2000)
3. Survey: Berz (1996)
4. Implementation Tools, Methodologies, Processes, and Techniques (Bischof, Hovland, and Norris (2005))
5. AD Resource: http://www.autodiff.org/
1. Focus has been primarily on Monte-Carlo methodologies.
2. Although path-wise optimized sensitivity generation had been employed earlier (Glasserman (2004)), Giles and Glasserman (2006) first discussed adjoint methods in path-wise sensitivity generation.
3. Full extension to LMM based stochastic variate evolution and a corresponding exotic (in this case Bermudan) swap option evaluation (Leclerc, Liang, and Schneider (2009)), as well as to correlated defaults and their sensitivities (Capriotti and Giles (2011)).
4. Capriotti (2011) covers automated Greek generation, but with a focus on automatic differentiation, and in the context of Monte-Carlo methods.
5. Finally, algorithmic differentiation has also been applied to addressing the issue of calibration along with sensitivity generation (Schlenkirch (2011)).
1. Stochastic Analysis Formulation Grounding SKU + Software Framework.
2. Stochastic Analysis Sensitivity Generation
3. Parametric Calibration, and back-fit from Observations.
4. Dimensionality reduction and Path Evolution Optimization Techniques – including evaluation of semi-closed form solutions.
5. Bayesian/MLE type path likelihood maximization conformant prediction?
1. Definition: Automatic differentiation is a set of techniques for transforming a program that calculates the numerical values of a function into a program that calculates numerical values for derivatives of that function with about the same accuracy and efficiency as the function values themselves (Bartholomew-Biggs, Brown, Christianson, and Dixon (2000)).
2. Symbolic Derivatives: Calculate the local symbolic derivatives rather than the a) divided differences, or b) numerical differentials (Automatic Differentiation - Wikipedia Entry).
3. Calculation Speed: Same number of Objective Function Calculation as the original; however, potential “chain rule” multiplication factor effects.
4. Accuracy vs. Performance: Due to the usage of symbolics, accuracy of Automatic Differentiation always better than numerical differentials; however, due to the chain-rule issue, may not be always faster.
5. Scalability at infinitesimal variates: Since Automatic Differentiation is always symbolic and therefore infinitesimal, it will automatically scale to arbitrarily small variate infinitesimals – reduced errors due to bit cancellation etc:
6. Higher-order derivatives: Automatic Differentiation does not need additional objective function evaluations for higher order derivative calculations (beyond the chain-rule issues); therefore, those are infinitesimally correct too.
1. Forward Automatic Differentiation: Express the final and the intermediate variables as a consequence of a computed forward graph, and derive the symbolic forward derivative graph.
· Effectively computes the gradient of the intermediate variables to the variates or the “independent variables” and transmits them up the graph.
2. Reverse Automatic Differentiation: Express the intermediate variables and the input variates as nodes in the computed reverse graph, and derive the symbolic reverse derivative graph.
· Often may still need the forward path to store the calculated intermediates needed on the way back.
· Effectively computes the gradient of the intermediate variables to the “dependent variables” and transmits them down the graph.
· Forward Mode => Speed proportional to n, the number of “independent” variables
· Reverse Mode => Speed proportional to m, the number of “dependent” variables
4. Memory Usage (Ghaffari, Li, Li, and Nie (2007)):
· Forward Mode => a) Each Wengert variable, b) Forward Jacobian for each Wengert, c) Forward Dependency Graph
· Reverse Mode => a) Each Wengert Adjoint, b) Reverse Jacobian for each Wengert, c) Forward/Backward Dependency graph
5. When the difference is minimal: When the dependence of the final Jacobian sensitivity step is the dominating factor, and the adjointing step is not the rate-determining part, then the performance will always be , where n is the number of sensitivities – for e.g., if , given that is trivial to calculate, the performance will always be .
· For instance, given a univariate objective function (as in constrained/unconstrained optimization (e.g., maximization/minimization) problems), either forward or reverse Automatic Differentiation is an equally good choice for sensitivity generation, owing to its performance.
1. Program Line-level decomposition: Canonicalization decomposes the program/statement units into specific analysis bits.
2. Canonicalization Implementation: In general, canonicalization and other related Automatic Differentiation Source Code generation/transformation techniques should go hand in hand with optimizing compiled code emission techniques, program active variable activity analysis.
3. Equivalence with Wengert Structuring: Given that canonicalization consists of hoisting all the l-value updates separately without side effects, it is effectively the same as Wengert un-rolling and DAG linearization.
4. Limitations with the implementation: For many of the reasons above, automated implementations of canonicalization (like other automated code generation/re-structuring) might result in “invisible inefficiencies”, and the had-drafted techniques those are based upon essentially the same principles may be more optimal.
Post canonicalized Enhancement Cost: Given that the worst case is division, going from to results in going from 1 function unit execution cost to 4 automatic differentiation execution unit costs. Typically due to “weird” functions, the worst-case addition to a single post-canonicalized statement is a factor of 5, not 4.
1. Deep-dig perspective: Re-purposed Automatic Differentiation perspective forces the visualization of the computation at the granularity of the symbolic functional forms of the objective function.
a. Objective Function evaluator over-loading => This requires propagation of the inner most symbolic graph nodes through the graph chain => causes additional cognitive SKU export!!
b. Objective Function Neighborhood Behavior => With every Wengert variable, calculation of the set of forward sensitivities and the reverse Jacobians builds a local picture of the Objective Function without having to evaluate it.
2. Block-level View Fixation: Source code transformation techniques are very invasive, and require highly locally frozen view fixation, and are therefore less cognitive. Operator overloading techniques enable retention of the domain focus, and are therefore more cognitive.
a. Naïve operator overloading would simply generate a block-level (or function call level) adjoint. This can explode the required storage, in addition to generating sub-optimal reverse-mode code. Needless to mention, source code transformation techniques can be built to overcome this – in practice, however, many may not quite do it.
3. Complied language Automatic Differentiation implementation: Without the usage of obfuscating “versatile” templates, auto-generation of very generic forward/reverse accumulation code is impossible. Therefore source level function overloading and automated program instrumentation techniques are very hard.
a. Further, compiled language source code transformation appears to be a vestige of “smart compiler” efforts of the ‘90s – classic instance of where a simple idea is “intellectually transmitted” than “built out-of the-box”.
1. Combination of Forward/Reverse Modes: Forward (n inputs) and reverse (m outputs) mode represent just two possible (extreme) ways of recursing through the chain rule. For n > 1 and m > 1 there is a golden mean, but finding the optimal way is probably an NP-hard problem (Berland (2006)) – optimal Jacobian accumulation is NP-complete (Naumann (2008)).
2. Wengert Intermediate Fan-in and possibly fan-out: See Figures 1 to 3 for illustrate this.
· Wengert Intermediate Performance Enhancement => If there exists an intermediate quantity that is fixed from the point-of-view of the output Jacobians and the input parameters, the performance may be improved (see Figure 1).
· Reusable Intermediate Performance Improvement => If the input/output computation leads to sufficient commonality among the Wengert intermediate calculation, that may also reduce computation by promoting reuse, thereby improving efficiency.
· Wengert Funneling Criterion => For non-optimizing, non-parsimonized Wengert funnels, for the Wengert fan to be a funneling fan-in – otherwise rippling out causes huge non-diagonal Markov matrices. This is true for a) I -> P, b) P-> W, and c) W -> O.
3. Standardized Computational Finance Structures: In computational finance (esp. computational fixed income finance), the payout/product/pricer object serves the function of the intermediate Wengert variate indicated above. From below this variate you have the inputs/parameters rippling up, and from above you have the Jacobians/output measure adjoints feeding down (Figure 2).
1. Math Modules:
· Forward differentials and auto-adjointing of math modules => May be needed for most of them.
· Every block, compute the base “value”, forward differential, and reverse adjoint.
· In fact, for every active double-precision variable v, source code transformation automatic differentiation techniques recursively automatically generate the doublet . Further, this calculation may also be parallelized.
o This particular calculation may also be propagated at the function call level, so that the assignment outputs are automatically generated for the doublet/multiple.
· Computational structures => Design restrictions may also be imposed by the computability of the AD of a math module, i.e., would the financial MarketParamsContainer be broken down into further parameter units?
2. Stochastic Variate Automatic Differentiation: Evolution of stochastic variates and their derivative entities may be further optimized by exploiting sparse-ness of the multi-factor co-variance matrix, thereby evolving the variate/derivative matrix that is sparse optimally (as opposed to blind delta bumps that may happen when computing differentials).
3. Quasi-analytic Computation Models: No Monte-Carlo evolution needed at all, but still Wengert intermediate level reformulation necessary to enhance the quasi-analytics analysis (e.g., Copula methods).
4. Calibration and entity-variate focus:
· De-convolving the instrument entity/measure combination is necessary for the extraction of the parameter set (this is accomplished by the calibration process).
· Of course, calibration occurs among the elastic and the inelastic dimensions, and the inelastics are parameter set!
· Parameter calibration/parameterization etc: inherently involve parsimonization – this is where the models come in.
1. Advantages: In addition to the usual advantage that Automatic Differentiation provides on doing Greeks on the same run as pricing, there is no need for multiple bumped curves anymore – but the proper Jacobians need to be calculated.
2. Curve Calibration Deltas: Typical deltas are with respect to the
3. Span/Segment Elastic Variates: There are 5 different kinds.
4. Span/Segment variate relations: For a given calculated/formulated output measure , the following are true by definition:
5. Sensitivities to the elastic variates:
6. Piece-wise constant segment variate: In this case, .
7. Splined segment variate: Recall that segment spline coefficient calibration is simply a problem of matching to a terminal node (which is the quoted instrument measure at the terminal node). Thus, for a formulated output , at node k, it is obvious that .
8. Linear Dependence of Integrand Quadrature: For many functional formulations in finance, the calculated product measure () has a linear dependence on the stochastic evolution variate, i.e., . This implies that , i.e., only, and not on the quadrature details.
Stochastic Entity Evolution
1. Evolution Dynamics: Simplest evolution of stochastic variables will be ones with constant forward volatilities. Once the dynamics is formulated according to where is the component drift, and is the component co-variance to the factor , subsequent evolution can be determined.
2. Evolution of the derivative entity: Once the stochastic variate dynamics is established, the dynamics of the observed derivative entity can be progressively determined.
3. Derivative Entity Measure path-wise evolution: Evolution sequence can be determined for the individual pay-off measures as well. These measures may further be dependent on the differentials of the derivative entity, so those may also need to be evolved using automatic differentiation.
4. Derivative Entity Computational efficiency enhancement:
5. Derivative Entity Measure Calculation: [Instrument, Measure] input to [Instrument, Measure] output is equivalently maintained in the Jacobian, although the real computation Jacobian may hold [Instrument-specific calibrated parameter] to [Instrument, Measure] Output map.
1. The forward rates (or indeed any term instrument measures) need to evolve such that
2. For e.g., the evolution dynamics of the forward rates (or indeed any term instrument measures) can still be via LMM, but splines may still be applicable to the intermediate nodes, as the segment spline coefficients adjust to the forward rate nodes.
3. Splines may also be used for any term instrument measure determinant (e.g., the volatility surface maybe also be interpolatively constructed using splines), so as to preserve the continuity/smoothness, as opposed to piece-wise discreteness.
Rewrite #1: , where
Formulation of Sensitivities for Pay-off Functions
Bermudan Swap Option Sensitivities
1. H x M Exercised Bermudan Swap Option Delta/Parameter Sensitivity [Piterbarg (2004), Capriotti and Giles (2011)]:
2. Individual Cash-flow PV and Greeks [Leclerc, Liang, and Schneider (2009)]:
3. Cash-flow PV Delta:
NTD Basket Sensitivities