scipy.stats.

energy_distance#

scipy.stats.energy_distance(u_values, v_values, u_weights=None, v_weights=None)[source]#

計算兩個 1D 分佈之間的能量距離。

在版本 1.0.0 中新增。

參數:

u_values, v_valuesarray_like: 在（經驗）分佈中觀察到的值。
u_weights, v_weightsarray_like, 選項性: 每個值的權重。如果未指定，則每個值都會被分配相同的權重。u_weights（或 v_weights）的長度必須與 u_values（或 v_values）相同。如果權重總和與 1 不同，它仍然必須是正數且有限的，以便權重可以被正規化為總和為 1。

回傳:

distancefloat: 計算出的分佈之間的距離。

註解

兩個分佈 \(u\) 和 \(v\) 之間的能量距離，其各自的 CDF 為 \(U\) 和 \(V\)，等於

\[D(u, v) = \left( 2\mathbb E|X - Y| - \mathbb E|X - X'| - \mathbb E|Y - Y'| \right)^{1/2}\]

其中 \(X\) 和 \(X'\) （或 \(Y\) 和 \(Y'\)）是獨立的隨機變數，其機率分佈為 \(u\) （或 \(v\)）。

有時，此量的平方被稱為「能量距離」（例如，在 [2]、[4] 中），但正如在 [1] 和 [3] 中指出的那樣，只有上述定義滿足距離函數（度量）的公理。

如 [2] 所示，對於一維實值變數，能量距離與 Cramér-von Mises 距離的非無分佈版本相關

\[D(u, v) = \sqrt{2} l_2(u, v) = \left( 2 \int_{-\infty}^{+\infty} (U-V)^2 \right)^{1/2}\]

請注意，常見的 Cramér-von Mises 準則使用距離的無分佈版本。有關距離的兩個版本的更多詳細資訊，請參閱 [2]（第 2 節）。

輸入分佈可以是經驗性的，因此來自樣本，其值實際上是函數的輸入，或者它們可以被視為廣義函數，在這種情況下，它們是位於指定值的狄拉克 delta 函數的加權總和。

參考文獻

[1]

Rizzo, Szekely “Energy distance.” Wiley Interdisciplinary Reviews: Computational Statistics, 8(1):27-38 (2015).

[2] (1,2,3)

Szekely “E-statistics: The energy of statistical samples.” Bowling Green State University, Department of Mathematics and Statistics, Technical Report 02-16 (2002).

[3]

“能量距離”, https://en.wikipedia.org/wiki/Energy_distance

[4]

Bellemare, Danihelka, Dabney, Mohamed, Lakshminarayanan, Hoyer, Munos “The Cramer Distance as a Solution to Biased Wasserstein Gradients” (2017). arXiv:1705.10743.

範例

>>> from scipy.stats import energy_distance
>>> energy_distance([0], [2])
2.0000000000000004
>>> energy_distance([0, 8], [0, 8], [3, 1], [2, 2])
1.0000000000000002
>>> energy_distance([0.7, 7.4, 2.4, 6.8], [1.4, 8. ],
...                 [2.1, 4.2, 7.4, 8. ], [7.6, 8.8])
0.88003340976158217