Rough Volatility

Bergomi's model revisited

Variance swap

A variance swap with maturity $T$ is a contract which pays out the realized variance of the logarithmic total returns up to $T$ less a strike called the variance swap rate $V_{0}^{T}$ , determined in such a way that the contract has zero value today.

The annualized realized variance of a stock price process $S$ for the period $[0, T]$ with business days $0 = t_{0} < \dots < t_{n} = T$ is usually defined as $R V^{0, T} := \frac{d}{n} i = 1 \sum n (lo g \frac{S _{t_{i}}}{S _{t_{i - 1}}})^{2}$ The constant $d$ denotes the number of trading days per year and is usually fixed to 252 so that $\frac{d}{n} \approx \frac{1}{T}$ . We assume the market is arbitrage-free and prices of traded instruments are represented as conditional expectations with respect to an equivalent pricing measure $Q$ .

A standard result gives that as $sup_{i = 1, \dots, n} ∣ t_{i} - t_{i - 1} ∣ ⟶ 0$ , we have

$i = 1 \sum n (lo g \frac{S _{t_{i}}}{S _{t_{i - 1}}})^{2} ⟶ ⟨ lo g S ⟩_{T} in probability$

when $(S_{t})_{t \geq 0}$ is a continuous semimartingale.

Approximating the realized variance by the quadratic variation of the log returns works very well for variance swaps, but care should be taken in practise if we price short dated non-linear payoffs on realized variance. Denote by $V_{t}^{T}$ , the price at time $t$ of a variance swap with maturity $T < \infty$ . It is given under $Q$ by $V_{t}^{T} = E_{t}^{Q} [R V^{0, T}] = E_{t}^{Q} [⟨ lo g S ⟩_{T}]$

We define the forward variance curve $(ξ^{T})_{T \geq 0}$ as $ξ_{t}^{T} := \partial_{T} V_{t}^{T}, T \geq t \geq 0$ Note that, if we assume that the S&PX index follows a diffusion process, $d S_{t} =$ $μ_{t} S_{t} d t + σ_{t} S_{t} d W_{t}$ with a general stochastic volatility process, $σ$ , the forward variance is given by $ξ_{t}^{T} = E_{t}^{Q} (σ_{T}^{2})$ It can be seen as the forward instantaneous variance for date $T$ , observed at $t$ . In particular $ξ_{t}^{t} = σ_{t}^{2}, \forall t \geq 0$

The current price of a variance swap, $V_{t}^{T}$ , is given in terms of the forward variances as $V_{t}^{T} = ⟨ lo g S ⟩_{t} + \int_{t}^{T} ξ_{t}^{u} d u$ The models used in practice are based on diffusion dynamics where forward variance curves are given as a functional of a finite-dimensional Markov-process: $ξ_{t}^{T} = G (T; t, Z_{t})$ where the function $G$ and the m-dimensional Markov-process $Z$ satisfy some consistency condition, which essentially ensures that for every fixed maturity $T > 0$ , the forward variance $(ξ_{t}^{T})_{t \leq T}$ is a martingale.

※ Pricing under rough volatility ※

ATM volatility skew

$ψ (τ) := \frac{\partial}{\partial k} σ_{BS} (k, τ)_{k = 0}$ 其中 $τ = T - t$ 是离到期日的时间， $k = lo g K / S_{t}$ 是log-strike．在传统随机波动率模型中， $ψ (τ)$ 对短期时间是常数，对长时间与 $τ$ 成反比．经验上观测到 $ψ (τ)$ 对某些 $0 < α < 1/2$ 与 $1/ τ^{α}$ 成比例．

forward variance curve

$v_{u} = σ_{u}^{2}$ 表示 $u > t$ 时刻瞬时方差．则 forward variance curve 为： $ξ_{t} (u) = E [v_{u} ∣ F_{t}], u \geq t$

Wick exponential

对零均值的 Gaussian R.V. ，其 Wick exponential 为 $E (Ψ) = exp (Ψ - \frac{1}{2} E [∣Ψ ∣^{2}])$

这里只作记号使用，不涉及其运算．

模型推导

Gatheral et al. (2014) 发现已实现方差（realized variance）与如下模型一致 $lo g σ_{t + Δ} - lo g σ_{t} = v (W_{t + Δ}^{H} - W_{t}^{H}) (1)$

其中 $W^{H}$ 是 fBm．This relationship was found to hold for all 21 equity indices in the Oxford-Man database, Bund futures, Crude Oil futures, and Gold futures. Perhaps this feature of the time series of volatility is universal?

考虑 fBm $W^{H}$ 的 Mandelbrot-Van Ness 表示 $W_{t}^{H} = C_{H} {\int_{- \infty}^{t} \frac{d W _{s}^{P}}{( t - s ) ^{γ}} - \int_{- \infty}^{0} \frac{d W _{s}^{P}}{( - s ) ^{γ}}} (2)$

其中 $γ = 1/2 - H$ ， $C_{H} = \frac{2 H Γ ( 3/2 - H )}{Γ ( H + 1/2 ) Γ ( 2 - 2 H )}$ 这样选取是为了保证 $E [W_{t}^{H} W_{s}^{H}] = \frac{1}{2} {t^{2 H} + s^{2 H} - ∣ t - s ∣^{2 H}}$

将（2）带入（1）并由 $v_{t} = σ_{t}^{2}$ ，可以得到 $v_{u}$ 基于 physical measure $P$ 的变化: $lo g v_{u} - lo g v_{t} = = = 2 v C_{H} {\int_{- \infty}^{u} \frac{d W _{s}^{P}}{( u - s ) ^{γ}} - \int_{- \infty}^{t} \frac{d W _{s}^{P}}{( t - s ) ^{γ}}} 2 ν C_{H} {\int_{t}^{u} \frac{1}{( u - s ) ^{γ}} d W_{s}^{P} + \int_{- \infty}^{t} [\frac{1}{( u - s ) ^{γ}} - \frac{1}{( t - s ) ^{γ}}] d W_{s}^{P}} : 2 v C_{H} {M_{t} (u) + Z_{t} (u)}$

注意到 $Z_{t} (u)$ 是 $F_{t}$ -可测，而 $M_{t} (u)$ 与 $F_{t}$ 独立且是 Gaussian with mean zero and variance $(u - t)^{2 H} / (2 H)$ ．用如下记号： $\tilde{W}_{t}^{P} (u) := 2 H \int_{t}^{u} \frac{d W _{s}^{P}}{( u - s ) ^{γ}} (3)$ 和 $M_{t} (u)$ 有相同分布，仅仅方差变为 $(u - t)^{2 H}$ ．记 $η := 2 v C_{H} / 2 H$ 则有 $2 v C_{H} M_{t} (u) =$ $η \tilde{W}_{t}^{P} (u)$ 且 $E^{P} [v_{u} ∣ F_{t}] = v_{t} exp {2 v C_{H} Z_{t} (u) + \frac{1}{2} η^{2} E \tilde{W}_{t}^{P} (u)^{2}}$ 结合 Wick expenential $v_{u} = v_{t} exp {η \tilde{W}_{t}^{P} (u) + 2 ν C_{H} Z_{t} (u)} = E^{P} [v_{u} ∣ F_{t}] E (η \tilde{W}_{t}^{P} (u)) (4)$ 这里，由 1 式可知 $Z_{t} (u)$ 依赖 $W^{P}$ 的整个历史，所以 $v_{u}$ 是 non-Markovian．而 2 式表示 the conditional distribution of $v_{u}$ depends on $F_{t}$ only through the instantaneous variance forecasts $E^{P} [v_{u} ∣ F_{t}], u > t .$

总结，得到如下模型基于实际概率测度 $P$ : $\frac{d S _{u}}{S _{u}} v_{u} = μ_{u} d u + v_{u} d Z_{u}^{P} = v_{t} exp {η \tilde{W}_{t}^{P} (u) + 2 v C_{H} Z_{t} (u)} (5)$ 其中，两个布朗运动 $Z^{P}$ 和 $W^{P}$ 相关系数为 $ρ$ ．

Pricing under Q

期权在 t 时刻的定价基于等价鞅测度 $Q \sim P$ on $[t, T]$ s.t. 资产价格过程 $S_{t}$ 在 $Q$ 下是一个鞅．

在固定的时间域 $[t, T]$ 中，通过 Girsanov 变换， $d Z_{u}^{Q} = d Z_{u}^{P} + \frac{μ _{u}}{v _{u}} d u, t \leq u \leq T$ 使得 $\frac{d S _{u}}{S _{u}} = v_{u} d Z_{u}^{Q}, t \leq u \leq T$

另一方面 $\tilde{W}_{t}^{P}$ 由 $W^{P}$ 而来，而 $W^{P}$ 是一个布朗运动与 $Z_{u}^{P}$ 以如下关系相关， $W^{P} = ρ Z^{P} + \overset{ρ}{ˉ} \overset{ˉ}{Z}^{P}, ρ^{2} + \overset{ρ}{ˉ}^{2} = 1$ 其中 $(Z^{P}, \overset{ˉ}{Z}^{P})$ 是一对独立的标准布朗运动．对第二项的一个标准的测度变换为 $d \overset{ˉ}{Z}_{u}^{Q} = d \overset{ˉ}{Z}_{u}^{P} + γ_{u} d u$ 其中 $γ = γ (u, ω)$ ，for $u \in [t, T]$ ，是一个合适的适应过程，称为波动率风险的市场价格．所以有 $d W_{u}^{Q} = d W_{u}^{P} + [ρ μ_{u} / v_{u} + \overset{ρ}{ˉ} γ_{u}] d u, t \leq u \leq T ，$ 将其重写为 $d W_{s}^{P} = d W_{s}^{Q} + λ_{s} d s$ 由 4 ，在 $P :$ 下， $v_{u} = E^{P} [v_{u} ∣ F_{t}] E (η \tilde{W}_{t}^{P} (u))$ 特别的， $E^{P} [v_{u} ∣ F_{t}]$ 适应于由 $W^{P}$ 生成的域流（和由 $W^{Q}$ 生成的域流一致）．把上式重写为 $v_{u} = E^{P} = E^{P} [v_{u} ∣ F_{t}] exp {η 2 H \times \int_{t}^{u} \frac{1}{( u - s ) ^{γ}} d W_{s}^{P} - \frac{η ^{2}}{2} (u - t)^{2 H}} [v_{u} ∣ F_{t}] E (η \tilde{W}_{t}^{Q} (u)) \times exp {η 2 H \int_{t}^{u} \frac{λ _{s}}{( u - s ) ^{γ}} d s} (6)$ 指数中的最后一项明显改变了 $v_{u}$ 的边缘分布．虽然 $v_{u}$ 在 $P$ 下的条件分布是对数正态的，它在 $Q$ 下不是对数正态．

rBergomi model

考虑最简单的测度变换， $d W_{s}^{P} = d W_{s}^{Q} + λ (s) d s$ assuming for simplicity, resp. as a first approximation, $λ (s)$ 是关于 $s$ 的确定性函数．则由 6 我们有 $v_{u} = E^{P} [v_{u} ∣ F_{t}] E (η \tilde{W}_{t}^{Q} (u)) \times exp {η 2 H \int_{t}^{u} \frac{1}{( u - s ) ^{γ}} λ (s) d s} = ξ_{t} (u) E (η \tilde{W}_{t}^{Q} (u)) (7)$ 其中 $ξ_{t} (u) = E^{Q} [v_{u} ∣ F_{t}]$ ．进一步有 forward variance curve $ξ_{t} (u) = E^{P} [v_{u} ∣ F_{t}] exp {η 2 H \int_{t}^{u} \frac{1}{( u - s ) ^{γ}} λ (s) d s}$

是如下两项的乘积： $E^{P} [v_{u} ∣ F_{t}]$ 依赖于驱动布朗运动的历史；另一项依赖于风险价格 $λ (s)$ .

模型 7 是 non-Markovian 因为 $v_{t} : E^{Q} [v_{u} ∣ F_{t}] \neq =$ $E^{Q} [v_{u} ∣ v_{t}]$ ．

※on deep calibration of rough sv model※

一．介绍

Bayer C, Horvath B, Muguruza A, et al. On deep calibration of (rough) stochastic volatility models[J]. arXiv preprint arXiv:1908.08806, 2019.

从隐含波动率按 moneyness 和 maturity 的变化可以观察到存在着著名的 smiles 和 at-the-money(ATM) skews 现象，与 BS 公式相悖．特别的，Bayer, Friz, and Gatheral 经验性地表明 ATM skew 符合如下形式：

$\frac{\partial}{\partial m} σ_{iv} (m, T) \sim T^{- 0.4}, T \to 0 (1)$ 其中 $m$ 为 $lo g$ moneynessand ， $T$ 为 time to maturity .

根据 Gatheral ，扩散的随机波动率模型不能复现当 time to maturity 趋于零时 volatility skew 的幂指数爆炸现象，反而表现为常数现象．

RSV 可定义为一族连续路径的随机波动率模型，其瞬时波动率由一个 Holder 正则性比布兰运动小的随机过程驱动，通常刻画为 Hurst 系数 H<1/2 的分形布朗运动．

这种范式转变的证据现在是 overwhelming ，一方面在物理测度下，时间序列分析表明对数已实现波动率的 Holder 正则为 0.1 阶；另一方面，在定价测度下经验性观察也表明在零附近由模型能够生成 power-law behaviour 的 volatility skew．

模型的一大难点来自于分形布朗运动的非马尔科夫性．

本文介绍两种方法

one-step approach : 直接学习从隐含波动率曲面到模型参数的映射，
two-step approach : 第一步学习从模型参数到期权价格的映射，然后根据实际市场价格校准模型．又分为 point-wise approach 和 grid-wise approach，前者将行权价和到期日作为输入，后者事先设定好这两项．

二．模型校准概述（未使用神经网络）

校准（calibration）意思是调整模型参数以使得模型曲面符合由欧式期权通过BS公式计算出的经验隐含波动率曲面．

假设模型有一个参数集 $Θ$ 决定， i.e.，由 $θ \in Θ$ ．进一步，我们假设期权由参数集 $ζ \in Z$ 决定．E.g.，对看涨看跌期权我们有 $ζ = (T, k)$ ，分别为到期日和 log-moneyness．有些参数由市场观测得到，如现价、利率等，不在校准过程中．定价映射为 $(θ, ζ) \mapsto P (θ, ζ)$ 带参数 $θ$ 的模型中带参数 $ζ$ 的期权的价格．我们通过 $P (ζ)$ 给定了有限子集 $ζ \in Z^{'} \subset Z$ 以及所有可能的期权参数对应的期权价格．校准是决定模型参数以使模型价格 $(P (θ, ζ))_{ζ \in Z^{'}}$ 和市场价格 $(P (ζ))_{ζ \in Z^{'}}$ 在给定距离度量下最小，i.e.: $θ = θ \in Θ argmin δ ((P (θ, ζ))_{ζ \in Z^{'}}, (P (ζ))_{ζ \in Z^{'}}) .$

事实上，最常用的 $δ$ 是加权最小二乘： $θ = θ \in Θ argmin ζ \in Z^{'} \sum w_{ζ} (P (θ, ζ) - P (ζ))^{2}$ 这里的权重 $w_{ζ}$ 反映了 $ζ$ 对应期权的重要性以及 $P (ζ)$ 的可靠性．例如可以选择 bid-ask spread 的倒数．

只要模型参数比 $∣ Z^{'} ∣$ 少，此时就是超定(overdetermined)的非线性最小二乘问题，通常采用数值迭代的方法解决，如 Levenberg-Marquardt（LM）算法．

rBergomi ：表示为 $M^{rBergomi} (Θ^{rBergomi})$ ，参数 $θ = (ξ_{0}, η, ρ, H) \in Θ^{rBergomi}$ ，例如可以设为 $Θ^{rBergomi} = R_{> 0} \times R_{> 0} \times [- 1, 1] \times] 0, 1/2 [$ 模型基于如下系统 $d X_{t} V_{t} = - \frac{1}{2} V_{t} d t + V_{t} d W_{t}, for t > 0, X_{0} = 0 = ξ_{0} (t) E (2 H η \int_{0}^{t} (t - s)^{H - 1/2} d Z_{s}), for t > 0, V_{0} = v_{0} > 0$ 其中 $H$ 是 Hurst 系数， $η > 0$ ， $E (\cdot)$ 是 Wick exponential， $ξ_{0} (\cdot) > 0$ 表示初始forward variance curve， $W$ 和 $Z$ 是以 $ρ \in [- 1, 1]$ 相关的布朗运动．

三．深度校准

3.1 one-step approach

Hernandez A. Model calibration with neural networks[J]. Available at SSRN 2812140, 2016.

直接学习校准过程，即将模型参数视作市场价格（隐含波动率）的函数，i.e. $Π^{- 1} : (P (ζ))_{ζ \in Z^{'}} \mapsto θ$ 更具体地，训练神经网络基于标签数据 $(x_{i}, y_{i}) ， i = 1 ， \dots, N$ ， $x_{i} = (P (ζ))_{ζ \in Z_{i}^{'}}$ 及其对应标签 $y_{i} = θ_{i}$

3.2 two-step approach

首先学习定价映射，将模型参数映射为市场价格（或隐含波动率），然后使用标准校准方法进行校准．我们用 $P (θ, ζ) \approx P (θ, ζ)$ 表示 $P$ 是 $P$ 的通过神经网络得到的近似．然后第二步我们进行校准 $θ = θ \in Θ argmin δ ((P (θ, ζ))_{ζ \in Z^{'}}, (P (ζ))_{ζ \in Z^{'}}) (2)$

两步方法相较而言最大的好处如下：

神经网络只负责期权定价，所以能用人工数据来训练．
自然地将误差分为定价误差和模型误差．神经网络表现和模型对市场适应性做出的调整相互独立．

3.2.1 two-step approach: 逐点训练（pointwise）和基于网格(grid-based)训练

In this section, we examine its advantages and present an analysis of the objective function with the goal to enhance learning performance. Within this framework, the pointwise approach has the ability to asses the quality of $P$ using Monte Carlo or PDE methods, and indeed it is superior training in terms of robustness.

Pointwise learning

step 1:学习映射 $P (θ, T, k) = σ^{M (θ)} (T, k)$ 即上述（2）式令 $ζ = (T, k)$ ．在标准化期权（vanilla option， $(ζ = (T, k))$ ）情况下，我们可以直接学习隐含波动率映射 $map \tilde{σ}^{M (θ)} (T, k)$ ，而不是期权定价的映射 $P (θ, T, k)$ ．用 $F (w; θ, ζ)$ 表示神经网络，最优化问题如下： $ω := w \in R^{n} argmin i = 1 \sum N_{Train} η_{i} (F (w; θ_{i}, T_{i}, k_{i}) - σ^{M} (θ_{i}, T_{i}, k_{i}))^{2}$

Step 2: 解决经典的模型校准步骤： $θ := θ \in Θ argmin j = 1 \sum m β_{j} (σ^{M (θ)} (θ, T_{j}, k_{j}) - σ_{BS}^{MKT} (k_{j}, T_{j}))^{2}$

这里 $P (θ, T, k)$ 或者 $σ^{M (θ)} (T, k)$ 被替换成 step1 中的近似网络 $F (ω; θ, T, k)$ ．

第一步中，关键在于训练数据和网络结构的选择．训练数据在于选择 $θ$ 和 $ζ$ $(= (T, k)$ 的‘先验’的、有实际意义的分布．

Implicit & grid-based learning

用 $Δ := {k_{i}, T_{j}}_{i = 1, j = 1}^{n,} m_{i = 1}$ 记关于到期日和行权价的网格．则

step 1：学习映射 $F (θ) = {σ_{BS}^{M (θ)} (T_{i}, k_{j})}_{i = 1, j = 1}^{n, m}$ ，输入是 $θ \in Θ$ ，输出是 ${σ_{BS}^{M (θ)} (T_{i}, k_{j})}_{i = 1, j = 1}^{n, m}$ 这样的 $n \times m$ 网格． $F$ 取值在 $R^{L}$ 中，其中 $L =$ strikes $\times$ maturities $= nm .$ 最优化问题变为如下： $ω := w \in R^{n} argmin i = 1 \sum N_{Train}^{reduced} j = 1 \sum L η_{j} (F (θ_{i})_{j} - σ^{M} (θ_{i}, T_{j}, k_{j}))^{2}$ 其中 $N_{Train} = N_{Train}^{reduced} \times L$ ． Step 2: $θ := θ \in Θ argmin i = 1 \sum L β_{j} (F (θ)_{i} - σ_{BS}^{MKT} (T_{i}, k_{i}))^{2}$

这里期权的参数 $ζ = (T, k)$ 是固定了的，不再是学习的一部分．

3.2.2 pointwise versus grid-based

最大的不同在于 grid-based 在遇到不在网格上的 T,K 时需要手动插值
grid-based 方法自然地有 reduction of variance ，
pointwise 中对使样本符合实际金融数据的操作更简单，改变采样的分布．而 grid-wise 则是通过改变权重或者网格密度．
grid-based 方法可以看做是一种降低维度的操作，将输入的维度转移到了输出的维度．

四．Pratical implementation

4.1 网络结构与训练

隐藏层为 3 层的全连接前馈神经网络，每层 30 个结点
输入维度记 $= n$
输出维度为 $= 11 \times 8$
总共有 $(n + 1) \times 30 + 3 \times (1 + 30) \times 30 + (30 + 1) \times 88 = 30 n + 5548$ 个参数．
激活函数选择 Elu， $σ_{Elu} = α (e^{x} - 1)$ ，梯度下降选择 Adam．

4.2 校准

使用第二节中讲的 LM 等算法．

五．数值实验

5.1定价近似网络的速度和精确度

※Deep learning volatility: a deep neural network perspective on pricing and calibration in (rough）volatility models※

Horvath B, Muguruza A, Tomas M. Deep learning volatility: a deep neural network perspective on pricing and calibration in (rough) volatility models[J]. Quantitative Finance, 2021, 21(1): 11-27.

Github 代码

fBm的 Monte-Carlo 模拟

1.理论基础

Horvath B, Jacquier A J, Muguruza A. Functional central limit theorems for rough volatility[J]. Available at SSRN 3078743, 2017.

Notations: 在单位区间 $I := [0, 1] 上, C (I)$ 表示连续函数空间， $C^{α} (I)$ 表示 $α$ -Hölder 连续函数空间， $α \in (0, 1)$ ． $C^{1} (I)$ 和 $C_{b}^{1} (I)$ 是 $I$ 上连续可微和有界连续可微函数空间．

1.1. Hölder spaces and fractional operators

For $β \in (0, 1]$ , the $β$ -Hölder space $C^{β} (I)$ , with the norm $∥ f ∥_{β} := ∣ f ∣_{β} + ∥ f ∥_{\infty} = t \neq = s t , s \in I sup \frac{∣ f ( t ) - f ( s ) ∣}{∣ t - s ∣ ^{β}} + t \in I max ∣ f (t) ∣$ is a non-separable Banach space. Following the spirit of Riemann-Liouville fractional operators recalled in Appendix $A$ , we introduce the class of Generalised Fractional Operators (GFO). For any $λ \in (0, 1)$ we introduce the intervals $ℜ^{λ} := {α \in (- 1, 1) such that α + λ \in (0, 1)}$ $ℜ_{+}^{λ} := ℜ^{λ} \cap (0, 1), ℜ_{-}^{λ} := ℜ^{λ} \cap (- 1, 0)$ , and the space $L^{α} := {u \mapsto u^{α} L (u) : L \in C_{b}^{1} (I)}$ , for any $α \in ℜ^{λ}$

Definition 1.1. For any $λ \in (0, 1)$ and $α \in R^{λ}$ , the GFO associated to $g \in L^{α}$ is defined on $C^{λ} (I)$ as $(G^{α} f) (t) := {\int_{0}^{t} f (s) \frac{d}{d t} g (t - s) d s, \frac{d}{d t} \int_{0}^{t} f (s) g (t - s) d s, if α \in [0, 1 - λ) if α \in (- λ, 0) (1.1)$ We shall further use the notation $G (t) := \int_{0}^{t} g (u) d u$ , for any $t \in I$ . Of particular interest in mathematical finance are the following kernels and operators: $R i e mann - L i o uv i ll e : G amma f r a c t i o na l : P o w er - l a w : g (u) = u^{α}, g (u) = u^{α} e^{β u}, g (u) = u^{α} (1 + u)^{β - α}, f or α \in (- 1, 1) f or α \in (- 1, 1), β < 0 f or α \in (- 1, 1), β < - 1 (1.2)$

Proposition 1.2. For any $λ \in (0, 1)$ and $α \in R^{λ}$ , the operator $G^{α} : C^{λ} (I) \to C^{λ + α} (I)$ is continuous.

We develop here an approximation scheme for the following system, generalising the concept of rough volatility in the context of mathematical finance, where the process $X$ represents the dynamics of the logarithm of a stock price process: $d X_{t} V_{t} = - \frac{1}{2} V_{t} d t + V_{t} d B_{t}, = Φ (G^{α} Y) (t), X_{0} = 0 V_{0} > 0 (1.3)$ with $α \in (- \frac{1}{2}, \frac{1}{2})$ , and $Y$ the (strong) solution to the stochastic differential equation $(1.4)$ $d Y_{t} = b (Y_{t}) d t + a (Y_{t}) d W_{t}, Y_{0} \in D_{Y} (1.4)$

where $D_{Y}$ denotes the state space of the process $Y$ , usually $R$ or $R_{+} .$ The two Brownian motions $B$ and $W$ , defined on a common filtered probability space $(Ω, F, (F_{t})_{t \in I}, P)$ , are correlated by the parameter $ρ \in [- 1, 1]$ , and the functional $Φ$ is assumed to be smooth on $C^{1} (I) .$ This is enough to ensure that the first stochastic differential equation is well defined. It remains to formulate the precise definition for $G^{α} Y$ (Proposition 1.4) to fully specify the system (1.3) and clarify the existence of solutions. Existence and (strong) uniquess of a solution to the second $SDE$ in (1.4) is guaranteed by the following standard assumption :

Assumption 1.3. There exist $C_{b}, C_{a} > 0$ such that, for all $x, y \in D_{Y}$ $∣ b (x) - b (y) ∣ \leq C_{b} ∣ x - y ∣ and ∣ a (x) - a (y) ∣ \leq C_{a} ∣ x - y ∣$

Proposition 1.4. For any $α \in R^{1/2}$ ，the equality $(G^{α} W) (t) = \int_{0}^{t} g (t - s) d W_{s}$ holds almost surely for $t \in I$ ．

Example 1.5. This example is the rough Bergomi model introduced by Bayer, Friz and Gatheral, where $V_{t} = ξ_{0} (t) E (2 ν C_{H} \int_{0}^{t} (t - s)^{α} d W_{s})$ with $V_{0}, ν, ξ_{0} (\cdot) > 0, α \in ℜ^{1/2}$ and $E (\cdot)$ is the Wick stochastic exponential. This corresponds exactly to $(1.3)$ with $g (u) \equiv u^{α}, Y = W$ and $Φ (φ) (t) := ξ_{0} (t) exp (2 ν C_{H} φ (t)) exp {- 2 ν^{2} C_{H}^{2} \int_{0}^{t} (t - s)^{2 α} d s}$

1.2 The approximation scheme

We now move on to the core of the project, namely an approximation scheme for the system (1.3). The basic ingredient to construct approximating sequences is a family of iid random variables, which satisfies the following assumption: Assumption 1.6. The family $(ξ_{i})_{i \geq 1}$ forms an iid sequence of centered random variables with finite moments of all orders and $E [ξ_{1}^{2}] = σ^{2} > 0$

Following Donsker and Lamperti, we first define, for any $ω \in Ω, n \geq 1, t \in I$ , the approximating sequence for the driving Brownian motion $B$ as $B_{n} (t) := \frac{1}{σ n} k = 1 \sum ⌊ n t ⌋ ξ_{k} + \frac{n t - ⌊ n t ⌋}{σ n} ξ_{⌊ n t ⌋ + 1} (1.5)$ As will be explained later, a similar construction holds to approximate the process $Y$ : $Y_{n} (t) := \frac{1}{n} k = 1 \sum ⌊ n t ⌋ b (Y_{n}^{k - 1}) + \frac{n t - ⌊ n t ⌋}{n} b (Y_{n}^{⌊ n t ⌋}) + \frac{1}{σ n} k = 1 \sum ⌊ n t ⌋ a (Y_{n}^{k - 1}) ζ_{k} + \frac{n t - ⌊ n t ⌋}{σ n} a (Y_{n}^{⌊ n t ⌋}) ζ ⌊ n t ⌋ + 1 (1.6)$ where $Y_{n}^{k} := Y_{n} (t_{k})$ and $T_{n} := {t_{k} = \frac{k}{n}} .$ Here ${ξ}_{i = 1}^{⌊ n t ⌋}$ and ${ζ}_{i = 1}^{⌊ n t ⌋}$ satisfy Assumption $1.6$ , with appropriate correlation structure between the pairs ${(ζ_{i}, ξ_{i})}_{i = 1}^{⌊ n t ⌋}$ that will be made precise later. We shall always use $(ξ_{i})$ to denote the sequence generating $B$ and $(ζ_{i})$ the one generating $W$ . Consequently, we deduce an approximating scheme (up to the interpolating term which decays to zero by Chebyshev's inequality) for $X$ as $X_{n} (t) := - \frac{1}{2 n} k = 1 \sum ⌊ n t ⌋ Φ (G^{α} Y_{n}) (t_{k}) + \frac{1}{σ n} k = 1 \sum ⌊ n t ⌋ Φ (G^{α} Y_{n}) (t_{k}) (B_{n}^{k + 1} - B_{n}^{k}) (1.7)$ All the approximations above, as well as all the convergence statements below should be understood pathwise, but we omit the $ω$ dependence in the notations for clarity. The main result here is a convergence statement about the approximating sequence $(X_{n})_{n \geq 1}$ ． As usual in weak convergence analysis, convergence is stated in the Skorokhod space $(D (I), ∥ \cdot ∥_{D})$ of càdlàg processes equipped with the Skorokhod topology. Theorem 1.7. The sequence $(X_{n})_{n \geq 1}$ converges weakly to $X$ in $(D (I), ∥ \cdot ∥_{D})$ . The construction of the proof allows to extend the convergence to the case where $Y$ is a $d$ -dimensional diffusion without additional work. The proof of the theorem requires a certain number of steps: we start with the convergence of the approximation $(Y_{n})$ in some Hölder space, which we translate, first into convergence of the stochastic integral in $(1.3)$ , then, by continuity of the mapping $Φ$ , into convergence of the sequence $(Φ (G^{α} Y_{n}))$ . All these ingredients are detailed in Section 1.3 below. Once this is achieved, the proof of the theorem itself is relatively straightforward.

1.3. Monte-Carlo.

Theorem 1.7 introduces the theoretical foundations of Monte-Carlo methods (in particular for path-dependent options) for rough volatility models. In this section we give a general and easy-to-understand recipe to implement the class of rough volatility models (1.3). For the numerical recipe to be as general as possible, we shall consider the general time partition $T := {t_{i} = \frac{i T}{n}}_{i = 0, \dots, n}$ on $[0, T]$ with $T > 0$ .

Algorithm 1.8 (Simulation of rough volatility models). (1) Simulate two $N (0, 1)$ matrices ${ξ_{j, i}}_{j = 1, \dots, M} i = 1, \dots, n$ and ${ζ_{j, i}}_{j = 1, \dots, M} i = 1, \dots, n$ with $corr (ξ_{j, i}, ζ_{j, i}) = ρ$ ; (2) simulate M paths of $Y_{n}$ viad $Y_{n}^{j} (t_{i}) = \frac{T}{n} k = 1 \sum i b (Y_{n}^{j} (t_{k - 1})) + \frac{T}{n} k = 1 \sum i a (Y_{n}^{j} (t_{k - 1})) ζ_{j, k}, i = 1, \dots, n and j = 1, \dots, M$ and also compute $Δ Y_{n}^{j} (t_{i}) := Y_{n}^{j} (t_{i}) - Y_{n}^{j} (t_{i - 1}), i = 1, \dots, n and j = 1, \dots, M$ (3) Simulate $M$ paths of the fractional driving process $((G^{α} Y_{n}) (t))_{t \in T}$ using $(G^{α} Y_{n})^{j} (t_{i}) := k = 1 \sum i g (t_{i - k + 1}) Δ Y_{n}^{j} (t_{k}) = k = 1 \sum i g (t_{k}) Δ Y_{n}^{j} (t_{i - k + 1}), i = 1, \dots, n and j = 1, \dots, M$ The complexity of this step is in general of order $O (n^{2})$ (see Appendix $[B$ for details). However, this step is easily implemented using discrete convolution with complexity $O (n lo g n)$ (see Algorithm [B.4 in Appendix $[B$ for details in the implementation). With the vectors $g := (g (t_{i}))_{i = 1, \dots, n}$ and $Δ Y_{n}^{j} :=$ $(Δ Y_{n}^{j} (t_{i}))_{i = 1, \dots, n}$ for $j = 1, \dots, M$ , we can write $(G^{α} Y_{n})^{j} (T) = \frac{T}{n} (g * Δ Y_{n}^{j})$ , for $j = 1, \dots, M$ , where $*$ represents the discrete convolution operator. (4) Use the forward Euler scheme to simulate the log-stock process, for all $i = 1, \dots, n, j = 1, \dots, M$ , as $X^{j} (t_{i}) = X^{j} (t_{i - 1}) - \frac{1}{2} \frac{T}{n} k = 1 \sum i Φ (G^{α} Y_{n})^{j} (t_{k - 1}) + \frac{T}{n} k = 1 \sum i Φ (G^{α} Y_{n})^{j} (t_{k - 1}) ξ_{j, k}$

Remark:

When $Y = W$ , we may skip step (2) and replace $Δ Y_{n}^{j} (t_{i})$ by $T / n ζ_{i, j}$ on step (33).
Step (3) may be replaced by the Hybrid scheme algorithm 11 only when $Y = W$ .

Antithetic variates in Algorithm 1.8 are easy to implement as it suffices to consider the uncorrelated random vectors $ζ_{j} := (ζ_{j, 1}, ζ_{j, 2}, \dots, ζ_{j, n})$ and $ξ_{j} := (ξ_{j, 1}, ξ_{j, 2}, \dots, ξ_{j, n})$ , for $j = 1, \dots, M .$ Then $(ρ ξ_{j} + \overset{ρ}{ˉ} ζ_{j}, ξ_{j}), (ρ ξ_{j} -$ $\overset{ρ}{ˉ} ζ_{j}, ξ_{j}), (- ρ ξ_{j} - \overset{ρ}{ˉ} ζ_{j}, - ξ_{j})$ and $(- ρ ξ_{j} + \overset{ρ}{ˉ} ζ_{j}, - ξ_{j})$ , for $j = 1, \dots, M$ , constitute the antithetic variates, which significantly improves the performance of the Algorithm 1.8 by reducing memory requirements, reducing variance and accelerating execution by exploiting symmetry of the antithetic random variables.

1.3.1 Enhancing performance. A standard practice in Monte-Carlo simulation is to match moments of the approximating sequence with the target process. In particular, when the process is Gaussian, matching first and second moments suffices. We only illustrate this approximation for Brownian motion: the left-point approximation may be modified to match moments as $(G^{α} W) (t_{i}) \approx \frac{1}{σ n} k = 1 \sum i g (t_{k}^{*}) ζ_{k}, for i = 0, \dots, n (1.8)$ where $t_{k}^{*}$ is chosen optimally. Since the kernel $g (\cdot)$ is deterministic, there is no confusion with the Stratonovich stochastic integral, and the resulting approximation will always converge to the Itô integral. The first two moments of $G^{α} W$ read $E ((G^{α} W) (t)) = 0 and V ((G^{α} W) (t)) = \int_{0}^{t} g (t - s)^{2} d s$ The first moment of the approximating sequence 1.8 is always zero, and the second moment reads $V (\frac{1}{σ n} k = 1 \sum j - 1 g (t_{k}^{*}) ζ_{k}) = \frac{1}{n} k = 1 \sum j - 1 g (t_{k}^{*})^{2}$ Equating the theoretical and approximating quantities we obtain $\frac{1}{n} g (t_{k}^{*})^{2} d s = \int_{t_{k - 1}}^{t_{k}} g (t - s)^{2} d s$ for $k = 1, \dots, n$ , so that the optimal evaluation point can be computed as $g (t_{k}^{*}) = n \int_{t_{k - 1}}^{t_{k}} g (t - s)^{2} d s, for any k = 1, \dots, n$ In the Riemann-Liouville fractional Brownian motion case, $g (u) = u^{H - 1/2}$ , and the optimal point can be computed in closed form as $t_{k}^{*} = (\frac{n}{2 H} [(t - t_{k - 1})^{2 H} - (t - t_{k})^{2 H}])^{1/ (2 H - 1)}, for each k = 1, \dots, n$

1.3.2 Reducing Variance.

As Bayer, Friz and Gatheral, a major drawback in simulating rough volatility models is the very high variance of the estimators, so that a large number of simulations are needed to produce a decent price estimate. Nevertheless, the rDonsker scheme admits a very simple conditional expectation technique which reduces both memory requirements and variance while also admitting antithetic variates. This approach is best suited for calibrating European type options. We consider $F_{t}^{B} = σ (B_{s} : s \leq t)$ and $F_{t}^{W} = σ (W_{s} : s \leq t)$ the natural filtrations generated by the Brownian motions $B$ and $W .$ In particular the conditional variance process $V_{t} ∣ F_{t}^{W}$ is deterministic. As discussed by Romano and Touzi, and recently adapted to the rBergomi case by McCrickerd and Pakkanen, we can decompose the stock price process as $e^{X_{t}} = E (ρ \int_{0}^{t} Φ (G^{α} Y) (s) d B_{s}) E (1 - ρ^{2} \int_{0}^{t} Φ (G^{α} Y) (s) d B_{s}^{⊥}) := e^{X_{t}^{1}} e^{X_{t}^{2}}$ and notice that $X_{t} ∣ (F_{t}^{W} \land F_{0}^{B}) \sim N (X_{t}^{1} - (1 - ρ^{2}) \int_{0}^{t} Φ (G^{α} Y) (s) d s, (1 - ρ^{2}) \int_{0}^{t} Φ (G^{α} Y) (s) d s)$ Thus $exp (X_{t})$ becomes log-normal and the Black-Scholes closed-form formulae are valid here (European, Barrier options, maximum,...). The advantage of this approach is that the orthogonal Brownian motion $B^{⊥}$ is completely unnecessary for the simulation, hence the generation of random numbers is reduced to a half, yielding proportional memory saving. Not only this, but also this simple trick reduces the variance of the Monte-Carlo estimate, hence fewer simulations are needed to obtain the same precision. We present a simple algorithm to implement the rDonsker with conditional expectation and assuming that $Y = W$ .

Algorithm 1.9 (Simulation of rough volatility models with Brownian drivers). Consider the equidistant grid $T$ . (1) Draw a random matrix ${ζ_{j, i}}_{j = 1, \dots, M /2}$ with unit variance, and create antithetic variates ${- ζ_{j, i}}_{j = 1, \dots, M /2}$ ; (2) Create a correlated matrix ${ξ_{j, i}}$ as above; (3) Simulate $M$ paths of the fractional driving process $G^{α} W$ using discrete convolution: $(G^{α} W)^{j} (T) = \frac{T}{n} (g * ζ_{j}), j = 1, \dots, M$ and store in memory $(1 - ρ^{2}) \int_{0}^{T} (G^{α} W)^{j} (s) d s \approx (1 - ρ^{2}) \frac{T}{n} \sum_{k = 0}^{n - 1} (G^{α} W)^{j} (t_{k}) =: Σ^{j}$ for each $j = 1, \dots, M$ (4) use the forward Euler scheme to simulate the log-stock process, for each $i = 1, \dots, n, j = 1, \dots, M$ , as $X^{j} (t_{i}) = X^{j} (t_{i - 1}) - \frac{ρ ^{2}}{2} \frac{T}{n} k = 1 \sum i Φ (G^{α} W)^{j} (t_{k - 1}) + ρ \frac{T}{n} k = 1 \sum i Φ (G^{α} W)^{j} (t_{k - 1}) ξ_{j, i}$ (5) Finally, we have $X^{j} (T) \sim N (X_{T}^{j} - Σ^{j}, Σ^{j})$ for $j = 1, \dots, M;$ we may compute any option using the Black-Scholes formula. For instance a Call option with strike $K$ would be given by $C^{j} (K) =$ $exp (X_{T}^{j}) N (d_{1}^{j}) - K N (d_{2}^{j})$ for $j = 1, \dots, M$ , where $d_{1}^{j} := \frac{1}{Σ ^{j}} (X_{T}^{j} - lo g (K) + \frac{1}{2} Σ^{j})$ and $d_{2}^{j} = d_{1}^{j} - Σ^{j}$ Thus, the output of the model would be $C (K) = \frac{1}{M} \sum_{k = 1}^{M} C^{j} (K)$

The algorithm is easily adapted to the case of general diffusions $Y$ as drivers of the volatility (see Algorithm 1.8 step 2). Algorithm 1.8 is obviously faster than 1.9, especially when using control variates. Nevertheless, with the same number of paths, Algorithm 1.9 remarkably reduces the Monte-Carlo variance, meaning in turn that fewer simulations are needed, making it very competitive for calibration.

2.传统cholesky分解法模拟

If you need to generate $n$ correlated Gaussian distributed random variables $Y \sim N (μ, Σ)$ where $Y = (Y_{1}, \dots, Y_{n})$ is the vector you want to simulate, $μ = (μ_{1}, \dots, μ_{n})$ the vector of means and $Σ$ the given covariance matrix, 1.you first need to simulate a vector of uncorrelated Gaussian random variables, $Z$ 2.then find a square root of $Σ$ , i.e. a matrix $C$ such that $CC^{⊤} = Σ$ . Your target vector is given by $Y = μ + CZ$ A popular choice to calculate $C$ is the Cholesky decomposition.

而对于本 rBergomi 模型， $S_{t} v_{u} = S_{0} E (\int_{0}^{t} v_{u} d Z_{u}) = ξ_{0} (u) E (η 2 H \int_{0}^{u} \frac{1}{( u - s ) ^{γ}} d W_{s}) = ξ_{0} (u) E (η \tilde{W}_{u})$

where $\tilde{W}$ is a Volterra processt with the scaling property $Var [\tilde{W}_{u}] = u^{2 H}$ . So far $\tilde{W}$ behaves just like $fBm$ . However, the dependence structure is different. Specifically, for $v > u$ $E [\tilde{W}_{v} \tilde{W}_{u}] = u^{2 H} G (\frac{v}{u})$ where, for $x \geq 1$ and with $γ = 1/2 - H$ , $G (x) = 2 H \int_{0}^{1} \frac{d s}{( 1 - s ) ^{γ} ( x - s ) ^{γ}} = \frac{1 - 2 γ}{1 - γ} x^{γ}_{2} F_{1} (1, γ, 2 - γ, x)$ where $_{2} F_{1} (\cdot)$ denotes the confluent hypergeometric function. Remark $4.1$ The dependence structure of the Volterra process $\tilde{W}$ is markedly different from that of $fBm$ with the MolchanGolosov kernel $K (u, s)$ given by $K (u, s) = c_{H} \frac{1}{( u - s ) ^{γ}}_{2} F_{1} (1/2 - H, H - 1/2, H + 1/2, \frac{s - u}{s}) 0 < s < t$ for some constant $c_{H} .$ In particular, for small $H$ , correlations drop precipitously as the ratio $u / v$ moves away from 1 .

We also need covariances of the Brownian motion $Z$ with the Volterra process $\tilde{W}$ . With $v \geq u$ , these are given by $E [\tilde{W}_{v} Z_{u}] = ρ D_{H} {v^{H + 1/2} - (v - u)^{H + 1/2}}$ and $E [Z_{v} \tilde{W}_{u}] = ρ D_{H} u^{H + 1/2}$ where for future convenience, we have defined the constant, $D_{H} = \frac{2 H}{H + 1/2}$ These two formulae may be conveniently combined as $E [\tilde{W}_{v} Z_{u}] = ρ D_{H} {v^{H + 1/2} - (v - min (u, v))^{H + 1/2}}$ Lastly, of course, for $v \geq u, E [Z_{v} Z_{u}] = u$ . With $m$ the number of time steps and $n$ the number of simulations, our rBergomi model simulation algorithm may then be summarized as follows.

Construct the joint covariance matrix for the Volterra process $\tilde{W}$ and the Brownian motion $Z$ and compute its Cholesky decomposition.
For each time, generate iid normal random vectors and multiply them by the lower triangular matrix obtained by the Cholesky decomposition to get a $m \times 2 n$ matrix of paths of $\tilde{W}$ and $Z$ with the correct joint marginals.
With these paths held in memory, we may evaluate the expectation under $Q$ of any payoff of interest.

we simulate the process $\int_{0}^{t} (t - s)^{H - 1/2} d W_{s}$

import numpy as np
import matplotlib.pyplot as plt
import scipy.special as special
def fBm_path_chol(grid_points, M, H, T):
    """
    @grid_points: # points in the simulation grid
    @H: Hurst Index
    @T: time horizon
    @M: # paths to simulate
    """
    
    assert 0<H<1.0
    
    ## Step1: create partition 
    
    X=np.linspace(0, 1, num=grid_points)
    
    # get rid of starting point
    X=X[1:grid_points]
    
    ## Step 2: compute covariance matrix
    Sigma=np.zeros((grid_points-1,grid_points-1))
    for j in range(grid_points-1):
        for i in range(grid_points-1):
            if i==j:
                Sigma[i,j]=np.power(X[i],2*H)/2/H
            else:
                s=np.minimum(X[i],X[j])
                t=np.maximum(X[i],X[j])
                Sigma[i,j]=np.power(t-s,H-0.5)/(H+0.5)*np.power(s,0.5+H)*special.hyp2f1(0.5-H, 0.5+H, 1.5+H, -s/(t-s))
        
    ## Step 3: compute Cholesky decomposition
    
    P=np.linalg.cholesky(Sigma)
    
    ## Step 4: draw Gaussian rv
    
    Z=np.random.normal(loc=0.0, scale=1.0, size=[M,grid_points-1])
    
    ## Step 5: get V
    
    W=np.zeros((M,grid_points))
    for i in range(M):
        W[i,1:grid_points]=np.dot(P,Z[i,:])
        
    #Use self-similarity to extend to [0,T] 
    
    return W*np.power(T,H)

3.rDonker方法

def fBm_path_rDonsker(grid_points, M, H, T, kernel="optimal"):
    """
    @grid_points: # points in the simulation grid
    @H: Hurst Index
    @T: time horizon
    @M: # paths to simulate
    @kernel: kernel evaluation point use "optimal" for momen-match or "naive" for left-point
    """
    
    assert 0<H<1.0
    
    ## Step1: create partition 
    dt=1./(grid_points-1)
    X=np.linspace(0, 1, num=grid_points)
    
    # get rid of starting point
    X=X[1:grid_points]
    
    ## Step 2: Draw random variables
    
    dW = np.power(dt, H) *np.random.normal(loc=0, scale=1, size=[M, grid_points-1])
        
    ## Step 3: compute the kernel evaluation points
    i=np.arange(grid_points-1) + 1
    # By default use optimal moment-matching
    if kernel=="optimal":
        opt_k=np.power((np.power(i,2*H)-np.power(i-1.,2*H))/2.0/H,0.5)
    # Alternatively use left-point evaluation
    elif kernel=="naive" : 
        opt_k=np.power(i,H-0.5)
    else:
        raise NameError("That was not a valid kernel")
    
    
    ## Step 4: Compute the convolution
    
    Y = np.zeros([M, n])
    for i in range(int(M)):
        Y[i, 1:n] = np.convolve(opt_k, dW[i, :])[0:n - 1]
        
    #Use self-similarity to extend to [0,T] 
    
    return Y*np.power(T,H)

Github 代码

※使用GAN对LSV模型的校准※

Cuchiero C, Khosrawi W, Teichmann J. A generative adversarial network approach to calibration of local stochastic volatility models[J]. Risks, 2020, 8(4): 101.

This means parameterizing the model pool in a way which is accessible for machine learning techniques and interpreting the inverse problem as a training task of a generative network, whose quality is assessed by anadversary.We pursue this approach in the presentarticle and use as generative models so-called neural stochastic differential equations (SDE),which just means to parameterize the drift and volatility of an Itˆo-SDE by neural networks.

1．介绍

文中指的neural SDE即通过神经网络来对Ito-SDE的漂移项和波动率进行参数化．
这里考虑的某资产的折现后价格过程（discounted price process） $(S_{t})_{t \geq 0}$ ： $d S_{t} = S_{t} L (t, S_{t}) α_{t} d W_{t} (1.1)$ 其中 $(α_{t})_{t \geq 0}$ 是某个 $R$ 中取值的随机过程， $L (t, s)$ 称为杠杆函数（Leverage function）取决于 $t$ 和资产当前价格．
$L$ 的选取非常重要，需要很好地校准市场上观测到的隐含波动率．故 $L$ 需要满足如下条件： $L^{2} (t, s) = \frac{σ _{Dup}^{2} ( t , s )}{E [ α _{t}^{2} ∣ S _{t} = s ]}, (1.1)$ 其中 $σ_{Dup}$ 指 Dupire 的local volatility function．注意到（1.1）是 $L$ 的隐式方程，因为 $E [α_{t}^{2} ∣ S_{t} = s]$ 中需要 $(S_{t}, α_{t})$ ．故此时 $(S_{t})_{t \geq 0}$ 满足的SDE也成为了一个McKean-Vlasov SDE．

本文采用了 fully data-driven 方法，规避了其他计算 Dupire 局部波动率的方法中必须的对波动率曲面插值的做法，即此方法只需离散数据．

令 $T_{0} = 0$ , $T_{1} < T_{2} < ... < T_{n}$ 为不同期权的到期日．使用神经网络族 $F^{i} : R \to R$ 将杠杆函数参数化，参数为 $θ_{i} \in Θ_{i}$ ，i.e. $L (s, t, θ) = 1 + F^{i} (s, θ_{i}) ， t \in [T_{i - 1}, T_{i}) ， i \in 1, ..., n ．$

于是有了neural SDE的生成模型组（generative model class），即使用带参数 $θ$ 的神经网络来参数化漂移项 $μ (\cdot,\cdot, θ)$ 和波动率项 $σ (\cdot,\cdot, θ)$ ，i.e. $d X_{t} (θ) = μ (X_{t} (θ), t, θ) d t + σ (X_{t} (θ), t, θ) d W_{t}, X_{0} (θ) = x$
本文中，没有漂移项，波动率项如下所示： $σ (S_{t} (θ), t, θ) = S_{t} (θ) (1 + i = 1 \sum n F^{i} (S_{t} (θ), θ_{i}) 1_{[T_{i - 1}, T_{i})} (t)) α_{t} .$

依次对每个到期日，参数优化采用如下的校准法则： $θ in f γ sup j = 1 \sum J w_{j}^{γ} ℓ^{γ} (π_{j}^{mod} (θ) - π_{j}^{mkt})$ 其中 $J$ 是期权的数目， $π_{j}^{mod}$ 和 $π_{j}^{mkt}$ 是模型与市场分别的价格．

对固定的 $γ$ ， $ℓ^{γ}$ 是非线性非负凸函数满足 $ℓ^{γ} (0) = 0$ 且 $ℓ^{γ} (x) > 0$ 对 $x \neq = 0$ ，衡量模型和市场价的距离． $w_{j}^{γ}$ 某种权重，参数 $γ$ 扮演了对抗（adversarial）的部分，注意到 $ℓ^{γ}$ 和 $w_{j}^{γ}$ 都受 $γ$ 控制．本文中 $w_{j}^{γ}$ 采用的是 Cont R, Ben Hamida S. Recovering volatility from option prices by evolutionary optimization[J]. 2004.中的 vega-type．

2.VARIANCE REDUCTION FOR PRICING AND CALIBRATION VIA HEDGING AND DEEP HEDGING

介绍在蒙特卡洛定价和校准中利用对冲投资组合作为控制变量的方差缩减技术．在 LSV 校准中非常重要．

考虑有限时域 $T > 0$ ，已折现的市场中有 $r$ 个交易中的金融产品 $(Z_{t})_{t \in [0, T]}$ ，它是在某个概率空间 $(Ω, (F_{t})_{t \in [0, T]}, F, Q)$ 上在 $R^{r}$ 中取值的随机变量． $Q$ 是风险中性测度， $(F_{t})_{t \in [0, T]}$ 假设是右连续的．特别的，假设 $(Z_{t})_{t \in [0, T]}$ 是有右连左极路径的 $r$ 维平方可积鞅．

令 $C$ 是 $F_{T}$ 可测的随机变量，表示表示某个欧式期权在到期日 $T > 0$ 的支付．那么通常的对这个期权价格的 Monte Carlo 估计是： $π = \frac{1}{N} n = 1 \sum N C_{n} ． (2.1)$ 其中， $(C_{1}, \dots, C_{N})$ 是以分布 $C$ ， $N \in N$ i.i.d 的．可以简单改造这个估计，加上关于 $Z$ 的随机积分．考虑一个策略 $(h_{t})_{t \in [0, T]} \in L^{2} (Z)$ 和某个常数 $c$ ．用 $I = (h ∙ Z)_{T}$ 记关于 $Z$ 的随机积分，考虑如下估计： $π = \frac{1}{N} n = 1 \sum N (C_{n} - c I_{n}) ． (2.2)$ 其中， $(I_{1}, \dots, I_{N})$ 是以分布 $I$ i.i.d 的．则对于任意的 $(h_{t})_{t \in [0, T]} \in L^{2} (Z)$ 和 $c$ ，这个估计仍是期权价格的无偏估计，因为随机积分的期望消失了．记 $H = \frac{1}{N} n = 1 \sum N I_{n} ．$ 则 $π$ 的方差为： $Var (π) = Var (π) + c^{2} Var (H) - 2 c Cov (π, H) ，$ 在以下取法下达到最小 $c = \frac{Cov ( π , H )}{Var ( H )} ．$ 此时 $Var (π) = (1 - Corr^{2} (π, H)) Var (π) ．$ 特别地，在沿路径完美对冲的情形下， $π = H$ a.s.，有 $Corr (π, H) = 1$ 和 $Var (π) = 0$ ，此时 $Var (π) = Var (H) = Cov (π, H)$ 因此，找到一个好的近似对冲投资组合使得 $Corr^{2} (π, H)$ 大是很重要的．

2.1 Black&Scholes Delta Hedge

In many cases, of local stochastic volatility models as of form (1.1) and options depending only on the terminal value of the price process, a Delta hedge of the BlackâĂŞScholes model works well.

令 $C = g (S_{T})$ ， $π_{BS}^{g} (t, T, s, σ)$ 是 BS 模型下 $t$ 时刻的价格．对冲策略为： $h_{t} = \partial_{s} π_{BS}^{g} (t, T, S_{t}, L (t, S_{t}) α_{t})$

2.2 Hedging Strategies as Neural Networks-Deep Hedging

在对冲产品数很多等情况下时，可以将对冲策略用神经网络参数化．令期权的支付是对冲产品最终价值的函数，i.e.， $C = g (Z_{T})$ ．在马尔科夫模型中，可以用函数表示对冲策略： $h : R_{+} \times R^{r} \to R^{r}, h_{t} = h (t, z)$ 对应这样一个神经网络： $(t, z) \mapsto h (t, z, δ) \in N N_{r + 1, r}$ ． $δ$ 是网络参数．根据Buehler H, Gonon L, Teichmann J, et al. Deep hedging[J]. Quantitative Finance, 2019, 19(8): 1271-1291. 给定 $π^{mkt}$ 的最优对冲可以如下计算 $δ \in Δ in f E [u (- C + π^{mkt} + (h (\cdot, Z_{--}, δ) ∙ Z \cdot)_{T})]$ $u : R \to R_{+}$ 是凸的损失函数．

为了解决这个最优问题，采用随机梯度下降，随机目标函数 $Q (δ) (ω)$ 为： $Q (δ) (ω) = u (- C (ω) + π^{mkt} + (h (\cdot, Z_{. -}, δ) (ω) ∙ Z . (ω))_{T})$ 记最优的参数 $δ^{*}$ 和最优对冲策略 $h (\cdot, \cdot, δ^{*})$ ．

假定激活函数和凸损失函数是光滑的．下面要证明 $Q (δ)$ 的梯度是： $\nabla_{δ} Q (δ) (ω) = u^{'} (- C (ω) + π^{mkt} + (h (\cdot, Z_{--}, δ) (ω) ∙ Z \cdot (ω))_{T}) (\nabla_{δ} h (\cdot, Z_{. -}, δ) (ω) ∙ Z . (ω))_{T}$ i.e.，我们可以把梯度移到随机积分中．为此，我们要使用下述定理．

定理 2.1： $\forall ε \geq 0$ ，令 $Z^{ε}$ 是

Theorem 2.1. For ling, let $Z^{ε}$ be a solution of a stochastic differential equation as described in Theorem $A .3$ with drivers $Y = (Y^{1}, \dots, Y^{d})$ , functionally Lipschitz operators $F_{j}^{ε, i}, i = 1, \dots, r$ , $j = 1, \dots, d$ and a process $(J^{ε, 1}, \dots J^{ε, r})$ , which is here for all $ε \geq 0$ simply $J 1_{{t = 0}} (t)$ for some constant vector $J \in R^{r} = J$ , i.e. $Z_{t}^{ε, i} = J^{i} + j = 1 \sum d \int_{0}^{t} F_{j}^{ε, i} (Z^{ε})_{s -} d Y_{s}^{j}, t \geq 0$ Let $(ε, t, z) \mapsto f^{ε} (t, z)$ be a map, such that the bounded càglàd process $f^{ε} := f^{ε} (. -, Z_{. -}^{0})$ converges $u c p$ to $f^{0} := f^{0} (. -, Z_{. -}^{0})$ , then $ε \to 0 lim (f^{ε} ∙ Z^{ε}) = (f^{0} ∙ Z^{0})$ holds true.

证明过程

推论 2.2： $\forall ε > 0$ ，令 $Z^{ε}$ 为对冲产品过程 $Z \equiv Z^{0}$ 的离散，使得定理 2.1 中的条件都满足．对应的对冲策略 $(t, z, δ) \mapsto h^{ε} (t, z, δ)$ 由神经网络 $N N_{r + 1, r}$ 给出，其中网络的激活函数有界 $C^{1}$ ，且导数有界．那么

(i) 随机积分在 $δ_{0}$ 点关于 $δ$ 导数 $\nabla_{δ} (h (\cdot, Z . -, δ) ∙ Z)$ 满足 $\nabla_{δ} (h (\cdot, Z_{--}, δ_{0}) ∙ Z) = (\nabla_{δ} h (\cdot, Z_{-}, δ_{0}) ∙ Z)$

(ii) 若当 $ε \to 0$ 时， $\nabla_{δ} h^{ε} (\cdot, Z_{--}, δ_{0})$ ucp 收敛到 $\nabla_{δ} h (\cdot, Z_{--}, δ_{0})$ ，则离散积分的方向导数，i.e. $\nabla_{δ} (h^{ε} (\cdot, Z_{-}^{ε}, δ_{0}) ∙ Z^{ε})$ 随着离散刻度 $ε \to 0$ 收敛到 $ε \to 0 lim (\nabla_{δ} h^{ε} (\cdot, Z_{--}^{ε}, δ_{0}) ∙ Z^{ε}) = (\nabla_{δ} h (\cdot, Z_{\cdot-}, δ_{0}) ∙ Z)$

ucp means uniform convergence on compacts in probability,i.e.，if $P (s < t sup ∣ X_{s}^{n} - X_{s} ∣ > ϵ) \to 0$ for all $ϵ, t > 0$ . The notation $X^{n} ⟶ ucp X$ is sometimes used, and $X^{n}$ is said to converge ucp to $X$ .

3. LSV的校准

考虑定义在某个概率空间 $(Ω, (F_{t})_{t \in [0, T]}, F, Q)$ 上的（1.1）LSV模型， $Q$ 是风险中性测度．假定随机过程 $α$ 固定．所以实际中我们可以先令 $L \equiv 1$ 来近似校准其他参数并固定他们．

主要目标是确定符合市场数据的杠杆函数 $L$ ，根据通用近似定理（universal approximation properties），对其参数化．令 $T_{0} = 0 ， 0 < T_{1} \dots < T_{n} = T$ 为欧式看涨期权的到期日．将 $L (t, s)$ 用如下神经网络近似 $L (t, s, θ) = (1 + i = 1 \sum n F^{i} (s, θ_{i}) 1_{[T_{i - 1}, T_{i})} (t)) (3.1)$ 其中 $F^{i} \in N N_{1, 1}$ ， $i = 1, \dots, n$ ．方便起见，通常省略 $θ_{i} \in Θ_{i}$ ．当我们写 $S_{t} (θ)$ 时， $θ$ 表示 $t$ 时刻前所有的参数 $θ_{i}$ ．

训练过程中，我们需要计算 LSV 过程关于 $θ$ 的导数．以下结果可以看做 $\nabla_{θ} S (θ)$ 对应的链式法则．从附录 A 推导而来．

定理 3.1：令 $(t, s, θ) \mapsto L (t, s, θ)$ 为（3.1）形式，神经网络 $(s, θ_{i}) \mapsto F^{i} (s, θ_{i})$ 有界且 $C^{1}$ ，导数有界且 Lipschitz 连续．则关于 $θ$ 在 $θ$ 点处的导数满足： $d (\nabla_{θ} S_{t} (θ)) = (\nabla_{θ} S_{t} (θ) L (t, S_{t} (θ), θ) + S_{t} (θ) \partial_{s} L (t, S_{t} (θ), θ) \nabla_{θ} S_{t} (θ) + S_{t} (θ) \nabla_{θ} L (t, S_{t} (θ), θ)) α_{t} d W_{t} (3.2)$ 初值为 0．这个可以通过常数变易来解，i.e. $\nabla_{θ} S_{t} (θ) = \int_{0}^{t} P_{t - s} S_{s} (θ) \nabla_{θ} L (s, S_{s} (θ), θ) α_{s} d W_{s} (3.3)$ 其中 $P_{t} = E (\int_{0}^{t} (L (s, S_{s} (θ), θ) + S_{s} (θ) \nabla_{s} L (s, S_{s} (θ), θ)) α_{s} d W_{s})$ $E$ 表示随机指数（stochastic exponential）．

证明过程

Remark $3.2$

(i) 只看存在唯一性的话， $d S_{t} (θ) = S_{t} (θ) L (t, S_{t} (θ), θ) α_{t} d W_{t}$ $L (t, s, θ)$ 为 (3.1) 形式，那么神经网络 $s \mapsto F^{i} (s, θ_{i})$ 有界以及 Lipschitz 足够了， $\forall i = 1, \dots, n$ ．

(ii) 公式 (3.3) 可以用来倒向传播．

定理 3.1 保证了导数过程的存在唯一性．这也保证了基于梯度搜索的学习算法的建立．

下面叙述如何具体优化．为了记号方便，省略权重 $w$ 和损失函数 $ℓ$ 对应的参数 $γ$ ．对每个到期日 $T_{i}$ ，我们假定有 $J_{i}$ 个期权，行权价为 $K_{ij}, j \in {1, \dots, J_{i}}$ ．对第 $i$ 个到期日，校准函数的形式为 $θ_{i} \in Θ_{i} argmin j = 1 \sum J_{i} w_{ij} ℓ (π_{ij}^{mod} (θ_{i}) - π_{ij}^{mkt}), i \in {1, \dots, n} (3.5)$ 回忆 $π_{ij}^{mod} (θ_{i})$ 指的是对应到期日 $T_{i}$ 和行权价 $K_{ij}$ 的模型期权价格． $ℓ : R \to R_{+}$ 是某个非负非线性凸的损失函数满足 $ℓ (0) = 0 ， ℓ (x) > 0$ 对 $x \neq = 0$ ． $w_{ij}$ 是权重．

我们通过迭代地计算最优化问题（3.5），从 $T_{1}$ 和 $θ_{1}$ 出发，计算 $π_{2 j}^{mod} (θ_{2})$ ，然后解决对应 $T_{2}$ 的（3.5）．为了简便记号，去掉 $i$ ，考虑一般的到期日 $T > 0$ ，(3.5)变为 $θ \in Θ argmin j = 1 \sum J w_{j} ℓ (π_{j}^{mod} (θ) - π_{j}^{mkt})$ 模型价格由下式给出 $π_{j}^{mod} (θ) = E [(S_{T} (θ) - K_{j})^{+}] (3.6)$ 我们有 $π_{j}^{mod} (θ) - π_{j}^{mkt} = E [Q_{j} (θ)]$ ，其中 $Q_{j} (θ) (ω) := (S_{T} (θ) (ω) - K_{j})^{+} - π_{j}^{mkt} (3.7)$ 那么校准问题变为寻找最小的 $f (θ) := j = 1 \sum J w_{j} ℓ (E [Q_{j} (θ)]) (3.8)$ 因为 $ℓ$ 是非线性函数，不是 B.1 中的期望形式，标准的随机梯度下降方法不能直接用．我们通过第二节中讲的对冲控制变量 (hedge control variates) 解决这个问题．

3.1 极小化校准方程

考虑标准的对（3.8） $E [Q_{j} (θ)]$ 的 Monte-Carlo 模拟： $f^{MC} (θ) := j = 1 \sum J w_{j} ℓ (\frac{1}{N} n = 1 \sum N Q_{j} (θ) (ω_{n})) (3.9)$ 对 i.i.d 的样本 ${ω_{1}, \dots, ω_{N}} \in Ω$ ．Monte-Carlo 误差以 $\frac{1}{N}$ 递减．模拟次数 $N$ 必须很大 $(\approx 1 0^{8})$ ．因为由于 $ℓ$ 非线性，随机梯度下降不能直接使用，所以看起来要计算整个函数 $f (θ)$ 的梯度来最小化（3.9）．但 $N \approx 1 0^{8}$ ，这一做法计算成本太大且不稳定，因为要计算 $1 0^{8}$ 项的和的导数．

一个方便的做法是应用对冲控制变量来降低方差，可以将 Monte-Carlo 的样本数 $N$ 降为大约 $5 \times 1 0^{4}$ ．

假定我们有 $r$ 个对冲产品（包含价格过程 $S$ ），用 $(Z_{t})_{t \in [0, T]}$ 表示，为 $Q$ 下的平方可积鞅，在 $R^{r}$ 下取值．对 $j = 1, \dots, J$ ，策略 $h_{j} : [0, T] \times R^{r} \to R^{r}$ 使得 $h (\cdot, Z .) \in L^{2} (Z)$ ， $c$ 为常数，定义 $X_{j} (θ) (ω) := Q_{j} (θ) (ω) - c (h_{j} (\cdot, Z_{. -} (θ) (ω)) ∙ Z . (θ) (ω))_{T} (3.10)$ 则校准函数（3.8）和（3.9）可以通过替换 $Q_{j} (θ) (ω)$ 为 $X_{j} (θ) (ω)$ 来定义，变为最小化 $f (θ) (ω_{1}, \dots, ω_{N}) = j = 1 \sum J w_{j} ℓ (\frac{1}{N} n = 1 \sum N X_{j} (θ) (ω_{n})) (3.11)$ 对此，我们应用如下梯度下降的变种：从初始猜测 $θ^{(0)}$ 出发，迭代计算 $θ^{(k + 1)} = θ^{(k)} - η_{k} G (θ^{(k)}) (ω_{1}^{(k)}, \dots, ω_{N}^{(k)}) (3.12)$ 对某个学习率 $η_{k}$ ，i.i.d 样本 $(ω_{1}^{(k)}, \dots, ω_{N}^{(k)})$ ．其中 $G (θ^{(k)}) (ω_{1}^{(k)}, \dots, ω_{N}^{(k)})$ 是基于梯度待确定的量，样本在每次迭代中可以一样，可以另取．本文中另取．

最简单情形下，可以令 $G (θ^{(k)}) (ω_{1}^{(k)}, \dots, ω_{N}^{(k)}) = \nabla f (θ) (ω_{1}^{(k)}, \dots, ω_{N}^{(k)}) (3.13)$

注意到（3.10）中随机积分项的导数计算通常是昂贵的．我们进行下述改造．令 $ω^{N} Q_{j}^{N} (θ) (ω^{N}) Q^{N} (θ) (ω^{N}) = (ω_{1}, \dots, ω_{N}) = \frac{1}{N} n = 1 \sum N Q_{j} (θ) (ω_{n}) = (Q_{1}^{N} (θ) (ω^{N}), \dots, Q_{J}^{N} (θ) (ω^{N}))$ 定义 $\tilde{f} : R^{J} \to R$ ： $\tilde{f} (x) = j = 1 \sum J w_{j} ℓ (x_{j})$ 然后令 $G (θ) (ω^{N}) = D_{x} (\tilde{f}) (X^{N} (θ) (ω^{N})) D_{θ} (Q^{N}) (θ) (ω^{N})$ 注意到基于倒向传播，这一项计算起来是很简单的．Moreover, leaving the stochastic integral away in the inner derivative is justified by its vanishing expectation. During the forward pass, the stochastic integral terms are included in the computation; however the contribution to the gradient (during the backward pass) is partly neglected, which can e.g. be implemented via the tensorflow stop_gradient function.

关于对冲策略的选择，我们可以按照 2.2 节中的方法将其用神经网络参数化，并通过下式计算最优的权重 $δ$ ： $δ \in Δ argmin \frac{1}{N} n = 1 \sum N u (- X_{j} (θ, δ) (ω_{n}))$ 对 i.i.d 样本 ${ω_{1}, \dots, ω_{N}} \in Ω$ 和损失函数 $u$ ．此处 $X_{j} (θ, δ) (ω) = (S_{T} (θ) (ω) - K_{j})^{+} - (h_{j} (\cdot, Z_{--} (θ) (ω), δ) ∙ Z . (θ) (ω))_{T} - π_{j}^{mkt}$ 这意味着迭代两个优化步骤，i.e.，优化（3.11）中的 $θ$ （固定 $δ$ ）和（3.14）中的 $δ$ （固定 $θ$ ）．

4. 数值实验流程

实际使用的 SABR-LSV 模型如下 $d S_{t} d α_{t} d ⟨ W, B ⟩_{t} = S_{t} L (t, S_{t}) α_{t} d W_{t} = ν α_{t} d B_{t} = ϱ d t$ 参数为 $ν \in R, ϱ \in [- 1, 1]$ ，初值有 $α_{0} > 0, S_{0} > 0$ ． $B$ 和 $W$ 是两个相关的布朗运动．

Remark：一般使用的是关于 $S$ 的对数价格 $X := lo g S$ ．故模型也可写为： $d X_{t} d α_{t} d ⟨ W, B ⟩_{t} = α_{t} L (t, X_{t}) d W_{t} - \frac{1}{2} α_{t}^{2} L^{2} (t, X_{t}) d t = ν α_{t} d B_{t} = ϱ d t$ 注意到 $α$ 是一个几何布朗运动，也就是说它有表达式： $α_{t} = α_{0} exp (- \frac{ν ^{2}}{2} t + ν B_{t})$

生成样本

在已有文献中，有推荐的局部波动率函数族 $a_{ξ}$ 如下： $ξ = (p_{1}, p_{2}, σ_{0}, σ_{1}, σ_{2})$ 其中 $p_{0} = 1 - (p_{1} + p_{2})$ 且参数满足如下约束： $σ_{0}, σ_{1}, σ_{2}, p_{1}, p_{2} > 0 and p_{1} + p_{2} \leq 1 ．$ 令 $k (t, x, σ) = exp (- x^{2} / (2 t σ^{2}) - t σ^{2} /8)$ ， $a_{ξ}$ 如下定义： $a_{ξ}^{2} (t, x) = \frac{\sum _{i = 0}^{2} p _{i} σ _{i} k ( t , x , σ _{i} )}{\sum _{i = 0}^{2} ( p _{i} / σ _{i} ) k ( t , x , σ _{i} )}$ 文中作者修改为： $a_{ξ}^{2} (t, x) = \frac{1}{4} \times min 2, \frac{( \sum _{i = 0}^{2} p _{i} σ _{i} k ( t , x , σ _{i} ) + Λ ( t , x ) ) ( 1 - 0.6 \times 1 _{(t > 0.1)} )}{\sum _{i = 0}^{2} ( p _{i} / σ _{i} ) k ( t , x , σ _{i} ) + 0.01}$ 其中 $Λ (t, x) := (\frac{1 _{(t \leq 0.1)}}{1 + 0.1 t})^{λ_{2}} min {(γ_{1} (x - β_{1})_{+} + γ_{2} (- x - β_{2})_{+})^{κ}, λ_{1}}$ 注意到 $a_{ξ}^{2}$ 与 $t$ 有关．所以在做 Monte Carlo 模拟时，我们将 $a_{ξ}^{2} (0, x)$ 替换为 $a_{ξ}^{2} (Δ_{t}, x)$ ， $Δ_{t}$ 是 Monte Carlo 模拟的时间间隔． What is left to be specified are the parameters $ξ = (p_{1}, p_{2}, σ_{0}, σ_{1}, σ_{2})$ 模型变为： $d X_{t} = - \frac{1}{2} a_{ξ}^{2} (t, X_{t}) d t + a_{ξ} (t, X_{t}) d W_{t} (1)$ 上式是用来生成人工市场价格样本的．

所以我们实际的做法是随机对 $a_{ξ}$ 中的 $ξ$ 采样再根据 (1) 计算出价格，然后对 SABR-LSV 模型进行校准，i.e. 寻找使模型符合上述价格的参数 $ν ， ϱ$ ， $α_{0}$ 以及 $L$ ．

到期日 $T_{1} < \dots < T_{n}$ ，每个到期日 $T_{i}$ 对应行权价为 $K_{ij}, j \in {1, \dots, J_{i}}$ ．用 Monte-Carlo 模拟以 $Δ_{t} = 1/100$ 间隔计算价格．

具体如下：

在 $m = 1, \dots, 200$ 下对 $ξ_{m}$ 以给定分布进行模拟．
对每个 $m$ ，根据（1）式计算 $T_{i}$ and strikes $K_{ij}$ for $i = 1, \dots, n = 4$ and $j = 1, \dots, 20$ 对应的欧式期权的价格．每个 $m$ 分别使用不同的 $1 0^{7}$ 条布朗运动轨道．
保存这些价格数据

$†$ 准备做的工作 $†$ （弃案）

寻找最适合市场波动率曲面的“复合”模型，即假设市场波动率曲面实际是由一些波动率模型的凸组合决定的．

回忆：波动率曲面即隐含波动率以 $T$ ：time to maturity 和 $lo g (K / S_{0})$ ：log-moneyness 为自变量构成的曲面．例如，我们可以假设当前 ${σ_{ma r k e t}^{T, K} = a σ_{Hes t o n}^{T, K} + b σ_{ro ug h}^{T, K}}_{T, K}$ ，其中 $a + b = 1 ， a \geq 0 ， b \geq 0$ ．

大致做法

记号：分别以 $ξ_{1}$ 、 $ξ_{2}$ 记 Heston 和 rough 模型的参数集，以 $N_{1}$ 、 $N_{2}$ 记该两者通过神经网络训练得到的从模型参数到市场价格（隐含波动率）的映射， $a$ 、 $b$ 为前述凸组合系数．

我们这里考虑直接通过神经网络来学习市场波动率曲面 ${σ_{ma r k e t}^{T, K}}$ 到凸组合系数 $a, b$ 的映射．

我们对两个模型的参数以及 $a, b$ 分别均匀采样，然后根据两者的模型分别模拟出不同凸组合下两者的复合波动率曲面，但要注意的是两者采用同一个参数 $ρ$ （即两个标准布朗运动的相关系数）并且一个凸组合下两模型使用同一条 Monte-Carlo 轨道．这时，忽略掉模型的参数，我们有了带有 ${a, b}$ 标签的许多波动率曲面样本，我们利用前述 grid-based 的方法通过神经网络学习从波动率曲面到凸组合系数的映射．

知道了 $a, b$ 后，如何校准出两个模型分别的参数？

$†$ LSV-ROUGH 模型的校准 $†$

模型： $d S_{t} V_{t} d ⟨ W, Z ⟩_{t} = S_{t} L (t, S_{t}) V_{t} d W_{t} = ξ_{0} (t) E (2 H η \int_{0}^{t} (t - s)^{H - 1/2} d Z_{s}), for t > 0, V_{0} = v_{0} > 0 = ϱ d t (1.1)$

杠杆函数： $L^{2} (t, s) = \frac{σ _{Dup}^{2} ( t , s )}{E [ V _{t} ∣ S _{t} = s ]}$

主要共有两个神经网络，一个负责 Rough 的部分，一个负责 LSV 的部分．

一方面，Rough 部分的网络对应的即 Bayer 提出的 two-step 校准方法，即如下模型（（1.1）中 $L \equiv 1$ 时）： $d S_{t} V_{t} d ⟨ W, B ⟩_{t} = S_{t} V_{t} d W_{t} = ξ_{0} (t) E (2 H η \int_{0}^{t} (t - s)^{H - 1/2} d Z_{s}), for t > 0, V_{0} = v_{0} > 0 = ϱ d t (1.1)$ 对应的从模型参数到模型对应价格的映射的网络．只需用人工模拟数据训练一次后，网络就固定住了，在校准等步骤中是不会再变动的．

回忆：用 $Δ := {k_{i}, T_{j}}_{i = 1, j = 1}^{n,} m_{i = 1}$ 记关于到期日和行权价的网格．则 step 1：学习映射 $F (θ) = {σ_{BS}^{M (θ)} (T_{i}, k_{j})}_{i = 1, j = 1}^{n, m}$ ，输入是 $θ \in Θ$ ，输出是 ${σ_{BS}^{M (θ)} (T_{i}, k_{j})}_{i = 1, j = 1}^{n, m}$ 这样的 $n \times m$ 网格． $F$ 取值在 $R^{L}$ 中，其中 $L =$ strikes $\times$ maturities $= nm .$ 最优化问题变为如下： $ω := w \in R^{n} argmin i = 1 \sum N_{Train}^{reduced} j = 1 \sum L η_{j} (F (θ_{i})_{j} - σ^{M} (θ_{i}, T_{j}, k_{j}))^{2}$ 其中 $N_{Train} = N_{Train}^{reduced} \times L$ ． Step 2: $θ := θ \in Θ argmin i = 1 \sum L β_{j} (F (θ)_{i} - σ_{BS}^{MKT} (T_{i}, k_{i}))^{2}$

另一方面，我们将 $L (t, S_{t})$ 这个函数用网络近似，这个网络中的参数是随着校准不断变动的．具体地，令 $T_{0} = 0 ， 0 < T_{1} \dots < T_{n} = T$ 为欧式看涨期权的到期日．将 $L (t, s)$ 用如下神经网络近似 $L (t, s, θ) = (1 + i = 1 \sum n F^{i} (s, θ_{i}) 1_{[T_{i - 1}, T_{i})} (t)) (1.2)$ 其中 $F^{i} \in N N_{1, 1}$ ， $i = 1, \dots, n$ ．

为了记号方便，省略权重 $w$ 和损失函数 $ℓ$ 对应的参数 $γ$ ．对每个到期日 $T_{i}$ ，我们假定有 $J_{i}$ 个期权，行权价为 $K_{ij}, j \in {1, \dots, J_{i}}$ ．对第 $i$ 个到期日，校准函数的形式为 $θ_{i} \in Θ_{i} argmin j = 1 \sum J_{i} w_{ij} ℓ (π_{ij}^{mod} (θ_{i}) - π_{ij}^{mkt}), i \in {1, \dots, n} (1.3)$ 回忆 $π_{ij}^{mod} (θ_{i})$ 指的是对应到期日 $T_{i}$ 和行权价 $K_{ij}$ 的模型期权价格． $ℓ : R \to R_{+}$ 是某个非负非线性凸的损失函数满足 $ℓ (0) = 0 ， ℓ (x) > 0$ 对 $x \neq = 0$ ． $w_{ij}$ 是权重．

我们通过迭代地计算最优化问题（1.3），从 $T_{1}$ 和 $θ_{1}$ 出发，计算 $π_{2 j}^{mod} (θ_{2})$ ，然后解决对应 $T_{2}$ 的（1.3）．为了简便记号，去掉 $i$ ，考虑一般的到期日 $T > 0$ ，(1.3)变为 $θ \in Θ argmin j = 1 \sum J w_{j} ℓ (π_{j}^{mod} (θ) - π_{j}^{mkt})$ 模型价格由下式给出 $π_{j}^{mod} (θ) = E [(S_{T} (θ) - K_{j})^{+}] (1.4)$ 我们有 $π_{j}^{mod} (θ) - π_{j}^{mkt} = E [Q_{j} (θ)]$ ，其中 $Q_{j} (θ) (ω) := (S_{T} (θ) (ω) - K_{j})^{+} - π_{j}^{mkt} (1.5)$ 那么校准问题变为寻找最小的 $f (θ) := j = 1 \sum J w_{j} ℓ (E [Q_{j} (θ)]) (1.6)$ 我们通过第二节中讲的对冲控制变量 (hedge control variates) 解决这个问题．

考虑标准的对（3.8） $E [Q_{j} (θ)]$ 的 Monte-Carlo 模拟： $f^{MC} (θ) := j = 1 \sum J w_{j} ℓ (\frac{1}{N} n = 1 \sum N Q_{j} (θ) (ω_{n})) (1.7)$ 对 i.i.d 的样本 ${ω_{1}, \dots, ω_{N}} \in Ω$ ．Monte-Carlo 误差以 $\frac{1}{N}$ 递减．模拟次数 $N$ 必须很大 $(\approx 1 0^{8})$ ．因为由于 $ℓ$ 非线性，随机梯度下降不能直接使用，所以看起来要计算整个函数 $f (θ)$ 的梯度来最小化（3.9）．但 $N \approx 1 0^{8}$ ，这一做法计算成本太大且不稳定，因为要计算 $1 0^{8}$ 项的和的导数．

一个方便的做法是应用对冲控制变量来降低方差，可以将 Monte-Carlo 的样本数 $N$ 降为大约 $5 \times 1 0^{4}$ ．

算法1：模型的校准步骤

# 初始化网络参数
$ini t ia l i ze θ_{1}, \dots, θ_{4}$
# 定义初始模拟轨道数和初始步骤值
$N, k = 500, 1$
# 定义时间离散间隔和误差容忍度
$Δ_{t}, t o l = 0.01, 0.001$
$for i = 1, \dots, 4$ :
$n e x t s l i ce = F a l se$
# 计算此次切片的初始正规化权重
$w_{j} = \tilde{w}_{j} / \sum_{l = 1}^{20} \tilde{w}_{l}$
$while n e x t s l i ce == F a l se$
$d o :$
$模拟 N 条模型到 T_{i} 时刻的轨道，计算支付$
$d o :$
$计算这些轨道对应的到 T_{i} 时刻的对冲的随机积分$
$d o :$
$计算 (1.9) 式的校准函数$
$d o :$
$进行一步优化 θ_{i}^{(k - 1)} 到 θ_{i}^{(k)}$
$d o :$
$利用算法 2 更新参数 N ， n e x t s l i ce 并计算模型价格 π_{m o d e l}$
$d o :$
$k = k + 1$

算法2：超参的更新

附录

定理3.1

证明：

首先定理 A.2 暗示了 $d S_{t} (θ) = S_{t} (θ) L (t, S_{t} (θ), θ) α_{t} d W_{t}$ $\forall θ$ 的解存在唯一性．这里驱动过程是一维的 $Y = \int_{0}^{*} α_{s} d W_{s}$ ．事实上，若 $(t, s) \mapsto L (t, s, θ)$ 有界，对 $t$ 左极右连，对 $s$ Lipschitz 连续以一个与 $t$ 无关的 Lipschitz 常数． $S \mapsto S \cdot (θ) L (\cdot, S \cdot (θ), θ)$ 为 functionally Lipschitz，得到结论．这些条件由 $L (t, s, θ)$ 的形式和 $F^{i}$ 的条件保证．

为了证明导数过程的形式，我们对如下系统应用定理 A.3： $d S_{t} (θ) = S_{t} (θ) L (t, S_{t} (θ), θ) α_{t} d W_{t}$ 和 $d S_{t} (θ + εθ) = S_{t} (θ + εθ) L (t, S_{t} (θ + εθ), θ + εθ) α_{t} d W_{t}$ 以及 $d \frac{S _{t} ( θ + εθ ) - S _{t} ( θ )}{ε} = = \frac{S _{t} ( θ + εθ ) L ( t , S _{t} ( θ + εθ ) , θ + εθ ) - S _{t} ( θ ) L ( t , S _{t} ( θ ) , θ )}{ε} α_{t} d W_{t} (\frac{S _{t} ( θ + εθ ) - S _{t} ( θ )}{ε} L (t, S_{t} (θ + εθ), θ + εθ) + S_{t} (θ) \frac{L ( t , S _{t} ( θ + εθ ) , θ + εθ ) - L ( t , S _{t} ( θ ) , θ )}{ε} α_{t} d W_{t}$ 在定理 A.3 中， $Z^{ε, 1} = S (θ), Z^{ε, 2} = S (θ + εθ), Z^{ε, 3} = \frac{S _{t} ( θ + εθ ) - S _{t} ( θ )}{ε}$ ． $F^{ε, 3}$ 为 $F^{ε, 3} (Z_{t}^{0}) = Z_{t}^{0, 3} L (t, Z_{t}^{0, 2}, θ + εθ) + Z_{t}^{0, 1} \partial_{s} L (t, Z_{t}^{0, 1}, θ) Z_{t}^{0, 3} + O (ε) + Z_{t}^{0, 1} \frac{L ( t , Z _{t}^{0, 1} , θ + εθ ) - L ( t , Z _{t}^{0, 1} , θ )}{ε} (3.4)$ ucp 收敛到 $F^{0, 3} (Z_{t}^{0}) = Z_{t}^{0, 3} L (t, Z_{t}^{0, 2}, θ) + Z_{t}^{0, 1} \partial_{s} L (t, Z_{t}^{0, 1}, θ) Z_{t}^{0, 3} + Z_{t}^{0, 1} \nabla_{θ} L (t, Z_{t}^{0, 1}, θ)$ 事实上， $\forall t$ ， ${s \mapsto L (t, s, θ + εθ), ∣ ε \in [0, 1]}$ 等度连续．因此，点点收敛暗示对 $s$ 的一致连续．This together with $L (t, s, θ)$ being piecewise constant in $t$ yields: $ε \to 0 lim (t, s) sup ∣ L (t, s, θ + εθ) - L (t, s, θ) ∣ = 0$ whence ucp convergence of the first term in (3.4). The convergence of term two is clear. The one of term three follows again from the fact that the family ${s \mapsto \nabla_{θ} L (t, s, θ + εθ) ∣ ε \in [0, 1]}$ is equicontinuous, which is again a consequence of the form of the neural networks.

By the assumptions on the derivatives, $F^{0, 3}$ is functionally Lipschitz. Hence Theorem A.2 yields the existence of a unique solution to (3.2) and Theorem A.3 implies convergence. $□$

定理2.1

Proof. Consider the extended system $d (f^{ε} ∙ Z^{ε}) = j = 1 \sum d f^{ε} (t -, Z_{t -}^{ε}) F_{j}^{ε, i} (Z^{ε})_{t -} d Y_{t}^{j}$ and $d Z_{t}^{ε, i} = j = 1 \sum d F_{j}^{ε, i} (Z^{ε})_{t -} d Y_{t}^{j}$ where we obtain existence, uniqueness and stability for the second equation by Theorem A.3, and from where we obtain ucp convergence of the integrand of the first equation: since stochastic integration is continuous with respect to the ucp topology we obtain the result.

文献

首次在波动率校准中运用神经网络 Hernandez A. Model calibration with neural networks[J]. Available at SSRN 2812140, 2016.
rough波动率模型的神经网络校准 Bayer C, Horvath B, Muguruza A, et al. On deep calibration of (rough) stochastic volatility models[J]. arXiv preprint arXiv:1908.08806, 2019.和 Horvath B, Muguruza A, Tomas M. Deep learning volatility: a deep neural network perspective on pricing and calibration in (rough) volatility models[J]. Quantitative Finance, 2021, 21(1): 11-27. Github 代码
LSV模型GAN校准 Cuchiero C, Khosrawi W, Teichmann J. A generative adversarial network approach to calibration of local stochastic volatility models[J]. Risks, 2020, 8(4): 101. Github 代码
损失函数中不同期权权重取法 Cont R, Ben Hamida S. Recovering volatility from option prices by evolutionary optimization[J]. 2004.
fBm的MC模拟 Horvath B, Jacquier A J, Muguruza A. Functional central limit theorems for rough volatility[J]. Available at SSRN 3078743, 2017. Github 代码
rBergomi提出 Bayer C, Friz P, Gatheral J. Pricing under rough volatility[J]. Quantitative Finance, 2016, 16(6): 887-904.

布武不舞