廣義雙曲分佈#

廣義雙曲分佈定義為常態變異數-均值混合，其中廣義逆高斯分佈作為混合分佈。「雙曲」特性指的是對數機率分佈的形狀可以描述為雙曲線。雙曲分佈有時被稱為半肥尾分佈，因為它們的機率密度下降速度比「次雙曲」分佈（例如常態分佈，其對數機率呈二次方下降）慢，但比其他「極值」分佈（例如 pareto 分佈，其對數機率呈對數下降）快。

函數#

文獻中存在不同的參數化方法；SciPy 實作了 Prause (1999) 中的「第四種參數化」。

\begin{eqnarray*} f(x, p, a, b) & = & \frac{(a^2 - b^2)^{p/2}} {\sqrt{2\pi}a^{p-0.5} K_p\Big(\sqrt{a^2 - b^2}\Big)} e^{bx} \times \frac{K_{p - 1/2} (a \sqrt{1 + x^2})} {(\sqrt{1 + x^2})^{1/2 - p}} \end{eqnarray*}

對於

\(x, p \in ( - \infty; \infty)\)
\(|b| < a\) if \(p \ge 0\)
\(|b| \le a\) if \(p < 0\)
\(K_{p}(.)\) denotes the modified Bessel function of the second kind and order \(p\) (scipy.special.kn)

上面的機率密度是以「標準化」形式定義的。若要平移和/或縮放分佈，請使用 \(\text{loc}\) 和 \(\text{scale}\) 參數。具體來說，\(f(x, p, a, b, \text{loc}, \text{scale})\) 完全等同於 \(\frac{1}{\text{scale}}f(y, p, a, b)\)，其中 \(y = \frac{1}{\text{scale}}(x - \text{loc})\)。

此參數化方法源自 Barndorff (1978) 中的原始 \((\lambda, \alpha, \beta, \delta, \mu)\) 參數化，透過設定

\(\lambda = p\)
\(\alpha = \frac{a}{\delta} = \frac{\hat{\alpha}}{\delta}\)
\(\beta = \frac{b}{\delta} = \frac{\hat{\beta}}{\delta}\)
\(\delta = \text{scale}\)
\(\mu = \text{location}\)

對於 scipy.stats.genhyperbolic 的隨機變數可以從上述的常態變異數-均值混合中有效率地採樣，其中 scipy.stats.geninvgauss 的參數化形式為 \(GIG\Big(p = p, b = \sqrt{\hat{\alpha}^2 - \hat{\beta}^2}, \text{loc} = \text{location}, \text{scale} = \frac{1}{\sqrt{\hat{\alpha}^2 - \hat{\beta}^2}}\Big)\)，因此：\(GH(p, \hat{\alpha}, \hat{\beta}) = \hat{\beta} \cdot GIG + \sqrt{GIG} \cdot N(0,1)\)

「廣義」特性表明此分佈是其他幾個機率分佈的超類別，例如

\(f(p = -\nu/2, a = 0, b = 0, \text{loc} = 0, \text{scale} = \sqrt{\nu})\) 具有自由度為 \(\nu\) 的 Student's t 分佈 (scipy.stats.t)。
\(f(p = 1, a = \hat{\alpha}, b = \hat{\beta}, \text{loc} = \mu, \text{scale} = \delta)\) 具有雙曲分佈。
\(f(p = - 1/2, a = \hat{\alpha}, b = \hat{\beta}, \text{loc} = \mu, \text{scale} = \delta)\) 具有常態逆高斯分佈 (scipy.stats.norminvgauss)。
\(f(p = 1, a = \delta, b = 0, loc = \mu, \text{scale} = \delta)\) 對於 \(\delta \rightarrow 0\) 具有 Laplace 分佈 (scipy.stats.laplace)

範例#

了解參數如何影響分佈的形狀很有用。雖然將 \(b\) 的意義解釋為偏度相當簡單，但理解 \(a\) 和 \(p\) 之間的差異並不明顯，因為兩者都會影響分佈的峰度。\(a\) 可以解釋為機率密度衰減的速度（其中 \(a > 1\) 時，漸近衰減比 \(log_e\) 快，反之亦然），或者等效地，解釋為對數機率雙曲線漸近線的斜率（其中 \(a > 1\) 時，衰減比 \(|1|\) 快，反之亦然）。\(p\) 可以看作是機率密度分佈肩部的寬度（其中 \(p < 1\) 會導致肩部狹窄，反之亦然），或者等效地，看作是對數機率雙曲線的形狀，當 \(p < 1\) 時為凸形，否則為凹形。

import numpy as np
from matplotlib import pyplot as plt
from scipy import stats

p, a, b, loc, scale = 1, 1, 0, 0, 1
x = np.linspace(-10, 10, 100)

# plot GH for different values of p
plt.figure(0)
plt.title("Generalized Hyperbolic | -10 < p < 10")
plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        label = 'GH(p=1, a=1, b=0, loc=0, scale=1)')
plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'red', alpha = 0.5, label='GH(p>1, a=1, b=0, loc=0, scale=1)')
[plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'red', alpha = 0.1) for p in np.linspace(1, 10, 10)]
plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'blue', alpha = 0.5, label='GH(p<1, a=1, b=0, loc=0, scale=1)')
[plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'blue', alpha = 0.1) for p in np.linspace(-10, 1, 10)]
plt.plot(x, stats.norm.pdf(x, loc, scale), label = 'N(loc=0, scale=1)')
plt.plot(x, stats.laplace.pdf(x, loc, scale), label = 'Laplace(loc=0, scale=1)')
plt.plot(x, stats.pareto.pdf(x+1, 1, loc, scale), label = 'Pareto(a=1, loc=0, scale=1)')
plt.ylim(1e-15, 1e2)
plt.yscale('log')
plt.legend(bbox_to_anchor=(1.1, 1))
plt.subplots_adjust(right=0.5)

# plot GH for different values of a
plt.figure(1)
plt.title("Generalized Hyperbolic | 0 < a < 10")
plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        label = 'GH(p=1, a=1, b=0, loc=0, scale=1)')
plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'blue', alpha = 0.5, label='GH(p=1, a>1, b=0, loc=0, scale=1)')
[plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'blue', alpha = 0.1) for a in np.linspace(1, 10, 10)]
plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'red', alpha = 0.5, label='GH(p=1, 0<a<1, b=0, loc=0, scale=1)')
[plt.plot(x, stats.genhyperbolic.pdf(x, p, a, b, loc, scale),
        color = 'red', alpha = 0.1) for a in np.linspace(0, 1, 10)]
plt.plot(x, stats.norm.pdf(x, loc, scale),  label = 'N(loc=0, scale=1)')
plt.plot(x, stats.laplace.pdf(x, loc, scale), label = 'Laplace(loc=0, scale=1)')
plt.plot(x, stats.pareto.pdf(x+1, 1, loc, scale), label = 'Pareto(a=1, loc=0, scale=1)')
plt.ylim(1e-15, 1e2)
plt.yscale('log')
plt.legend(bbox_to_anchor=(1.1, 1))
plt.subplots_adjust(right=0.5)

plt.show()

參考文獻#

常態變異數-均值混合 https://en.wikipedia.org/wiki/Normal_variance-mean_mixture
廣義雙曲分佈 https://en.wikipedia.org/wiki/Generalised_hyperbolic_distribution
O. Barndorff-Nielsen, “Hyperbolic Distributions and Distributions on Hyperbolae”, Scandinavian Journal of Statistics, Vol. 5(3), pp. 151-157, 1978. https://www.jstor.org/stable/4615705
Eberlein E., Prause K. (2002) The Generalized Hyperbolic Model: Financial Derivatives and Risk Measures. In: Geman H., Madan D., Pliska S.R., Vorst T. (eds) Mathematical Finance - Bachelier Congress 2000. Springer Finance. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-12429-1_12
Scott, David J, Würtz, Diethelm, Dong, Christine and Tran, Thanh Tam, (2009), Moments of the generalized hyperbolic distribution, MPRA Paper, University Library of Munich, Germany, https://EconPapers.repec.org/RePEc:pra:mprapa:19081.

實作：scipy.stats.genhyperbolic