Lakshmi Krishnamurthy

**v0.41**, 20 January 2014

**Introduction**

1. __Wengert
List__: List of all the non over-writable program variables (Wengert (1964))
– can also be seen as a linearization of the computational graph. By
construction, it is an intermediate variable.

2. __Intermediate
Wengert Canonical Variable__: These are intermediate financial variables
those are fixed from the point-of-view of the output Jacobians and the input
parameters that serve as computation graph parsimonious optimizers (Figures 1
and 2).

3. __Wengert
fan-in and fan-out:__ Reduction of a set of initial/intermediate Wengert
variates onto the subsequent set is called fan-in; the opposite is fan-out.

4. __Wengert
funneling:__ Same as Wengert fan-in.

5. __Micro-Jacobian:__
Change in the calibrated instrument measure coefficients to unit change in the
quoted instrument measures.

6. __Self-Jacobian__:
Self-Jacobian refers to the Jacobian of the Objective Function at any point in
the variate to the Objective Function at the segment nodes, i.e., _{}. Self-Jacobian is a type of micro-Jacobian.

7. __Derivative
Entity__: The entity whose dynamics are determined by the evolution of a
stochastic variate, and whose specific facets/measures are observable.

8. __Path-wise
Derivative Estimator__: _{}, where V is the value of the derivative, and _{} is the starting value
for a specific stochastic variate.

**9. **__Non-Parsimonized
Parameters:__ Parameters that map one-to-one with the input instrument set,
e.g., typical curve bootstrapping.

10. __Parsimonization:__
Reduction of the parameter space from the input measure space.

1. AD History: Iri (1991)

2. Mathematical Foundations: Griewank (2000)

3. Survey: Berz (1996)

4. Implementation Tools, Methodologies, Processes, and Techniques (Bischof, Hovland, and Norris (2005))

5. AD
Resource: *http://www.autodiff.org/*

1. Focus has been primarily on Monte-Carlo methodologies.

2. Although path-wise optimized sensitivity generation had been employed earlier (Glasserman (2004)), Giles and Glasserman (2006) first discussed adjoint methods in path-wise sensitivity generation.

3. Full extension to LMM based stochastic variate evolution and a corresponding exotic (in this case Bermudan) swap option evaluation (Leclerc, Liang, and Schneider (2009)), as well as to correlated defaults and their sensitivities (Capriotti and Giles (2011)).

4. Capriotti (2011) covers automated Greek generation, but with a focus on automatic differentiation, and in the context of Monte-Carlo methods.

5. Finally, algorithmic differentiation has also been applied to addressing the issue of calibration along with sensitivity generation (Schlenkirch (2011)).

1. Introduction

a. Glossary

b. Overview

c. Algorithmic Differentiation in Finance

d. Document Layout

2. Algorithmic Differentiation - Basics

a. Motivation and Advantages

b. Program Sequence Construction Modes

c. Canonicalization – Program Sequence Simplification by Decomposition

d. Challenges of Automating the Differentiation

e. Wengert Representation and Optimal Program Structure Synthesis

f. Optimization using Pre-accumulation and Check-Pointing

g. Financial Application Space Customization

3. Sensitivity Generation During Curve Construction

a. Introduction

b. Curve Jacobian

4. Stochastic Entity Evolution

a. Sensitivity Formulation

b. Sensitivities to Stochastic Latent State Variables and Dynamical Parameters

c. Stochastic Variate Evolution Constrained by Splines

d. Formulation of the Evolution of the Stochastic Variate Self-Jacobian

e. Correlated Stochastic Variables Evolution

f. LMM Forward Rate Evolution

5. Formulation of Sensitivities for Pay-off Functions

a. Formulation of Pay-off Function Stochastic Evolution

b. Path Greek

c. Payoff Sensitivity to the Correlation Matrix

d. Algorithmic Differentiation in the Payoff Sensitivities Calculation

6. Bermudan Swap Option Sensitivities

a. Base Formulation

b. Greek Estimation

c. LSM Formulation

7. NTD Basket Sensitivities

a. NTD Product Formulation

8. Basket Options

9. References

10. Figures

11. Stochastic Analyzer Software Components

a. Univariate Function Package

b. Univariate Calculus Package

**Algorithmic Differentiation
- Basics**

1. __Definition__:
Automatic differentiation is a set of
techniques for transforming a program that calculates the numerical values of a
function into a program that calculates numerical values for derivatives of
that function with about the same accuracy and efficiency as the function
values themselves (Bartholomew-Biggs,
Brown, Christianson, and Dixon (2000)).

2. __Symbolic
Derivatives__: Calculate the local symbolic derivatives rather than the a)
divided differences, or b) numerical differentials (*Automatic
Differentiation - Wikipedia Entry*).

3. __Calculation
Speed__: Same number of Objective Function Calculation as the original;
however, potential “chain rule” multiplication factor effects.

4. __Accuracy
vs. Performance__: Due to the usage of symbolics, accuracy of Automatic
Differentiation always better than numerical differentials; however, due to the
chain-rule issue, may not be always faster.

5. __Scalability
at infinitesimal variates__: Since Automatic Differentiation is always
symbolic and therefore infinitesimal, it will automatically scale to
arbitrarily small variate infinitesimals – reduced errors due to bit
cancellation etc:

6. __Higher-order
derivatives__: Automatic Differentiation does not need additional objective
function evaluations for higher order derivative calculations (beyond the
chain-rule issues); therefore, those are infinitesimally correct too.

1. __Forward
Automatic Differentiation__: Express the final and the intermediate variables
as a consequence of a computed forward graph, and derive the symbolic forward
derivative graph.

· Effectively computes the gradient of the intermediate variables to the variates or the “independent variables” and transmits them up the graph.

2. __Reverse
Automatic Differentiation__: Express the intermediate variables and the input
variates as nodes in the computed reverse graph, and derive the symbolic
reverse derivative graph.

· Often may still need the forward path to store the calculated intermediates needed on the way back.

· Effectively computes the gradient of the intermediate variables to the “dependent variables” and transmits them down the graph.

3. __Speed__:

· Forward Mode => Speed proportional to n, the number of “independent” variables

· Reverse Mode => Speed proportional to m, the number of “dependent” variables

4. __Memory
Usage (Ghaffari, Li, Li, and Nie (2007))__:

· Forward Mode => a) Each Wengert variable, b) Forward Jacobian for each Wengert, c) Forward Dependency Graph

· Reverse Mode => a) Each Wengert Adjoint, b) Reverse Jacobian for each Wengert, c) Forward/Backward Dependency graph

5. __When
the difference is minimal__: When the dependence of the final Jacobian
sensitivity step is the dominating factor, and the adjointing step is not the
rate-determining part, then the performance will always be _{}, where n is the number of sensitivities – for e.g., if _{}, given that _{} is trivial to
calculate, the performance will always be _{}.

· For instance, given a univariate objective function (as in constrained/unconstrained optimization (e.g., maximization/minimization) problems), either forward or reverse Automatic Differentiation is an equally good choice for sensitivity generation, owing to its performance.

** **

1. __Program
Line-level decomposition__: Canonicalization decomposes the program/statement
units into specific analysis bits.

- Canonicalization is commonly used in many areas of computer science, e.g., in compiler design/code generation, SKU formulation/synthesis/customization etc.

2. __Canonicalization
Implementation__: In general, canonicalization and other related Automatic
Differentiation Source Code generation/transformation techniques should go hand
in hand with optimizing compiled code emission techniques, program active
variable activity analysis.

- Canonicalization sequence should include steps (Bischof, Hovland, and Norris (2005)) where you would be able to mark the mathematical “Automatically Differentiable” code segments to separate from the others during, for instance, pre-processing etc.
- For true program transformation effectiveness, Hot-Spot type dynamic run-time analysis is needed in addition to static compile time data flow analysis etc.
- In VM oriented languages like Java, the run-time GC already works, so would it might make a candidate for embedding AD execution/selective sensitivity generation in.

3. __Equivalence
with Wengert Structuring__: Given that canonicalization consists of hoisting
all the l-value updates separately without side effects, it is effectively the
same as Wengert un-rolling and DAG linearization.

4. __Limitations
with the implementation__: For many of the reasons above, automated
implementations of canonicalization (like other automated code
generation/re-structuring) might result in “invisible inefficiencies”, and the
had-drafted techniques those are based upon essentially the same principles may
be more optimal.

5. __Post
canonicalized Enhancement Cost__: Given that the worst case operation is
division, going from _{} to _{} results in going from
1 function unit execution cost to 4 automatic differentiation execution unit
costs. Typically due to “weird” functions, the worst-case addition to a single
post-canonicalized statement is a factor of 5, not 4.

6. __Divided
Differences based Differentiation Fall back__: _{}.

1. __Deep-dig
perspective__: Re-purposed Automatic Differentiation perspective forces the
visualization of the computation at the granularity of the symbolic functional
forms of the objective function.

a. Objective Function evaluator over-loading => This requires propagation of the inner most symbolic graph nodes through the graph chain => causes additional cognitive SKU export!!

b. Objective Function Neighborhood Behavior => With every Wengert variable, calculation of the set of forward sensitivities and the reverse Jacobians builds a local picture of the Objective Function without having to evaluate it.

2. __Block-level
View Fixation__: Source code transformation techniques are very invasive, and
require highly locally frozen view fixation, and are therefore less cognitive.
Operator overloading techniques enable retention of the domain focus, and are
therefore more cognitive.

a. Naïve operator overloading would simply generate a block-level (or function call level) adjoint. This can explode the required storage, in addition to generating sub-optimal reverse-mode code. Needless to mention, source code transformation techniques can be built to overcome this – in practice, however, many may not quite do it.

3. __Complied
language Automatic Differentiation implementation__: Without the usage of
obfuscating “versatile” templates, auto-generation of very generic
forward/reverse accumulation code is impossible. Therefore source level
function overloading and automated program instrumentation techniques are very
hard.

a. Further, compiled language source code transformation appears to be a vestige of “smart compiler” efforts of the ‘90s – classic instance of where a simple idea is “intellectually transmitted” than “built out-of the-box”.

4. __Symbolic
Differentiation Challenges with certain Unit Functional Forms__: When you
consider functions such as _{}, and you seek _{} symbolically, the
higher order symbolic differentiations become much more challenging - _{}, _{} and so on for higher
orders. Thus symbolically handling these series this way gets out of control
fast!

1. __Combination of Forward/Reverse
Modes__: Forward
(n inputs) and reverse (m outputs) mode represent just two possible (extreme)
ways of recursing through the chain rule. For n > 1 and m > 1 there is a
golden mean, but finding the optimal way is probably an NP-hard problem
(Berland (2006)) – optimal Jacobian accumulation is NP-complete (Naumann
(2008)).

2. __Wengert
Intermediate Fan-in and possibly fan-out__: See Figures 1 to 3 for illustrate
this.

· Wengert Intermediate Performance Enhancement => If there exists an intermediate quantity that is fixed from the point-of-view of the output Jacobians and the input parameters, the performance may be improved (see Figure 1).

· Reusable Intermediate Performance Improvement => If the input/output computation leads to sufficient commonality among the Wengert intermediate calculation, that may also reduce computation by promoting reuse, thereby improving efficiency.

·
Wengert Funneling Criterion => For non-optimizing,
non-parsimonized Wengert funnels, _{} for the Wengert fan
to be a funneling fan-in – otherwise rippling out causes huge non-diagonal
Markov matrices. This is true for a) I -> P, b) P-> W, and c) W -> O.

3. __Standardized
Computational Finance Structures__: In computational finance (esp.
computational fixed income finance), the payout/product/pricer object serves
the function of the intermediate Wengert variate indicated above. From below
this variate you have the inputs/parameters rippling up, and from above you
have the Jacobians/output measure adjoints feeding down (Figure 2).

- Reactive Tree Upticks => Every intermediate element in Figure 2 is a reactive tree dependent node from the entity below, so forwarding/adjointing should happen with every real-time uptick.
- Automatic Differentiation for the Wengert Canonicals => This involves the following:
- Identifying the abstractable financial canonical/reusable common object structures (market parameters, product parameters, pricer parameters, etc.)
- Working out their forward differentials and the reverse adjoints.

- One Financial Automatic Differentiation view => The Intermediate Wengert Canonical View is the conceptual parsimonisation of the variate parameters space and the Jacobian measure space.

1. __Math
Modules__:

· Forward differentials and auto-adjointing of math modules => May be needed for most of them.

· Every block, compute the base “value”, forward differential, and reverse adjoint.

·
In fact, for every active double-precision variable v,
source code transformation automatic differentiation techniques recursively
automatically generate the doublet _{}. Further, this calculation may also be parallelized.

o This particular calculation may also be propagated at the function call level, so that the assignment outputs are automatically generated for the doublet/multiple.

o Computational
structures => Design restrictions may also be imposed by the computability
of the AD of a math module, i.e., would the financial **MarketParamsContainer**
be broken down into further parameter units?

2. __Stochastic
Variate Automatic Differentiation__: Evolution of stochastic variates and
their derivative entities may be further optimized by exploiting sparse-ness of
the multi-factor co-variance matrix, thereby evolving the variate/derivative
matrix that is sparse optimally (as opposed to blind delta bumps that may
happen when computing differentials).

- Variance Reduction along the forward path => If a specific forward path a) does not need to be traveled, or b) certain forward Wengert intermediates automatically compute to zero, then these produce zero path derivatives. Further, external pre-computations can be done during the adjoint generation.

- Delta effects on the Optimal Exercise Dates => This imposes restrictions on how the path derivatives maybe computed using automatic differentiation. This may also be used in conjunction with regression analysis for estimating optimal exercise times. That certainly enables adjoint automatic differentiation techniques to be used.

- Tangent multi-mode arc derivatives =>
- Identifying the circumstances under which they are re-usable
- Arc derivatives extraction intermediates may also be re-used
- Depends (as always) on the speed up and memory used.

3. __Quasi-analytic
Computation Models__: No Monte-Carlo evolution needed at all, but still
Wengert intermediate level reformulation necessary to enhance the
quasi-analytics analysis (e.g., Copula methods).

- Adjoint-Natural Formulation Mode => Typical formulation works out the Wengerts backwards from the final measure (e.g., say from PV), so they are automatically amenable to the adjoint mode of automatic differentiation.

4. __Latent
State Calibration from Observed Manifest Measures:__

· Formulation of the de-convolution the latent state from the observed manifest measure is necessary for the extraction of the latent state parameter set (this is accomplished by the calibration process).

· Of course, latent state calibration occurs among the elastic and the inelastic dimensions, and the inelastics are parameter set!

· Latent state calibration/parameterization etc: inherently involve parsimonization – this is where the models come in.

** **

1. __Advantages__:
In addition to the usual advantage that Automatic Differentiation provides on
doing accurate Greeks on the same run as pricing, there is no need for multiple
bumped curves anymore – but the proper Jacobians need to be calculated.

- Further speed up => The segment micro-Jacobian
needs to be pre-calculated right during the calibration - we need to
calculate the Jacobian
_{}where_{}is the i^{th}coefficient, and_{}is the j^{th}input.

2. __Curve
Calibration Deltas__: Typical deltas are with respect to the

- dynamical latent state stochastic variates (e.g., the forward rates)
- calibrated parameters (e.g., the segment spline coefficients)
- unit change in the quoted instrument measures (e.g., 1 bp change) - here the Jacobians need to ripple upwards from the quoted instrument manifest measures.

3. __Span/Segment
Elastic Variates__: Consider the situation where the latent state in itself
(not its transformation) is explicitly measured. There are 5 different kinds of
latent state proxies to consider:

_{}=> Span stochastic latent state evolution variate._{}=> Stochastic latent state evolution variate for segment k._{}=> Implied Span Quoted Instrument Manifest Measure._{}=> Implied Quoted Instrument Manifest Measure for Segment k._{}=> Observed Quoted Instrument Manifest Measure for Segment k at precisely a single variate point – typically, the observations are done at the anterior/posterior terminal ends of the segment.

4. __Span/Segment
variate relations__: For a given calculated/formulated output manifest
measure _{}, the following are true by definition:

_{}_{}

5. __Sensitivities
to the elastic variates__:

- Sensitivity to Stochastic Evolution Variate =>
_{} - Sensitivity to Implied Span Quoted Instrument Measure
=>
_{} - Sensitivity to Observed Span Quoted Instrument
Measure =>
_{} _{}(Case c) above) is what you need to calculate the hedge ratio

6. __Piece-wise
constant segment variate__: In this case, _{}.

7. __Splined
segment variate__: Recall that segment spline coefficient calibration is
simply a problem of matching to a terminal node (which is the quoted instrument
measure at the terminal node). Thus, for a formulated output _{}, at node k, it is obvious that _{}.

- Stochastic Evolution Variate Derivative => For the
case where
_{}refers to the discount factor, it can be shown that_{}, where t_{j}< t < t_{j+1}. Thus,_{} - Quoted Instrument Manifest Measure Derivative =>
This depends on the actual details of the quadrature. Thus,
_{}

8. __Linear
Dependence of Integrand Quadrature__: For many functional formulations in
finance, the calculated product measure (_{}) has a linear dependence on the stochastic evolution variate,
i.e., _{}. This implies that _{}, i.e., _{} only, and not on the
quadrature details.

__ __

**Stochastic Entity Evolution**

__ __

1. __Evolution
Dynamics__: Simplest evolution of stochastic variables _{} will be ones with
constant forward volatilities. Once the dynamics is formulated according to _{} where _{} is the component
drift, and _{} is the component
co-variance to the factor _{}, subsequent evolution can be determined.

- The Eulerized version of the above is:
_{}, where h is the time-step, and Z is the Weiner random variable. - In the case of forward rates, e.g., the drifts can be established by a no-arbitrage condition binding the forward rate drifts to their variances.

2. __Evolution
of the derivative entity__: Once the stochastic variate dynamics is
established, the dynamics of the observed derivative entity can be
progressively determined.

3. __Derivative
Entity Measure path-wise evolution__: Evolution sequence can be determined
for the individual pay-off measures as well. These measures may further be
dependent on the differentials of the derivative entity, so those may also need
to be evolved using automatic differentiation.

4. __Derivative
Entity Computational efficiency enhancement__:

- Using the adjoint automatic differentiation methods
- Using optimal combination of forward and adjoint automatic differentiation methods
- Further optimizations using sparse-ness of the multi-factor co-variance matrix, thereby evolving the variate/derivative matrix that is sparse optimally (as opposed to blind delta bumps that may happen when computing differentials).
- Quasi-analytic computation models and automatic differentiation techniques => No Monte-Carlo evolution needed at all, but still Wengert intermediate level reformulation necessary to enhance the quasi-analytics analysis (e.g., Copula methods).

5. __Derivative
Entity Measure Calculation__: [Instrument, Manifest Measure] input to
[Instrument, Manifest Measure] output is equivalently maintained in the
Jacobian. Alternately, the computation may also hold [Latent State calibrated
parameter] to [Instrument, Manifest Measure] Output map.

__ __

1. The forward rates (or indeed any term instrument measures) need to evolve such that

- They are continuous at the boundaries
- The first (and possibly the second) derivatives are continuous at the boundaries
- The boundary conditions (either financial or tensional) are retained intact

2. For e.g., the evolution dynamics of the forward rates (or indeed any term instrument measures) can still be via LMM, but splines may still be applicable to the intermediate nodes, as the segment spline coefficients adjust to the forward rate nodes.

3. Splines may also be used for any term instrument measure determinant (e.g., the volatility surface maybe also be interpolatively constructed using splines), so as to preserve the continuity/smoothness, as opposed to piece-wise discreteness.

Rewrite #1: _{}, where _{}

**Formulation of Sensitivities
for Pay-off Functions**

**Bermudan Swap Option
Sensitivities**

1. __H
x M Exercised Bermudan Swap Option Delta/Parameter Sensitivity [Piterbarg
(2004), Capriotti and Giles (2011)]__: _{} _{}

2. __Individual
Cash-flow PV and Greeks [Leclerc, Liang, and Schneider (2009)]__:

_{}_{}- Remember that
_{}where_{}is given by the LMM formulation presented earlier.

3. __Cash-flow
PV Delta__:

_{}_{}_{}

**NTD Basket Sensitivities**

**Basket Options**

1. __DerivativeControl__:
DerivativeControl
provides bumps needed for numerically approximating derivatives. Bumps can be
absolute or relative, and they default to a floor.

2. __Differential__:
Differential holds the incremental differentials for the variate and the
objective functions.

3. __WengertJacobian__:
WengertJacobian
contains the Jacobian of the given set of Wengert variables to the set of
parameters. It exposes the following functionality:

o Set/Retrieve
the Wengert variables

o Accumulate the
Partials

o Scale the
partial entries

o Merge the
Jacobian with another

o Retrieve the
WengertJacobian elements

o Display the
contents of the WengertJacobian

4. __Integrator__:
Integrator
implements the following routines for integrating the objective functions:

o Linear
Quadrature

o Mid-Point
Scheme

o Trapezoidal
Scheme

o Simpson/Simpson38
Schemes

o Boole Scheme