scipy.stats.

weightedtau#

scipy.stats.weightedtau(x, y, rank=True, weigher=None, additive=True)[原始碼]#

計算 Kendall’s \(\tau\) 的加權版本。

加權 \(\tau\) 是 Kendall’s \(\tau\) 的加權版本，其中高權重的交換比低權重的交換更具影響力。預設參數計算指標的加性雙曲版本 \(\tau_\mathrm h\)，該版本已被證明可以在重要和不重要的元素之間提供最佳平衡 [1]。

加權透過等級陣列定義，該陣列為每個元素分配一個非負等級（較高的重要性等級與較小的值相關聯，例如，0 是可能的最高等級），以及一個權重函數，該函數根據等級為每個元素分配權重。然後，交換的權重是交換元素的等級權重的總和或乘積。預設參數計算 \(\tau_\mathrm h\)：等級為 \(r\) 和 \(s\) 的元素（從零開始）之間的交換權重為 \(1/(r+1) + 1/(s+1)\)。

只有當您心中有一個外部重要性標準時，指定等級陣列才有意義。如果像通常發生的那樣，您心中沒有特定的等級，則加權 \(\tau\) 是透過平均使用 (x, y) 和 (y, x) 的遞減詞典編纂等級獲得的值來定義的。這是使用預設參數的行為。請注意，此處用於排名的慣例（較小的值表示更高的重要性）與其他 SciPy 統計函數使用的慣例相反。

參數:

x, yarray_like: 分數陣列，形狀相同。如果陣列不是 1 維，它們將被展平為 1 維。
rank整數或布林值的類陣列 (array_like of ints or bool)，可選: 分配給每個元素的非負等級。如果為 None，則將使用 (x, y) 的遞減詞典編纂等級：等級較高的元素將是那些具有較大 x 值，並使用 y 值來打破平局（特別是，交換 x 和 y 將給出不同的結果）。如果為 False，則元素索引將直接用作等級。預設值為 True，在這種情況下，此函數返回使用 (x, y) 和 (y, x) 的遞減詞典編纂等級獲得的值的平均值。
weigher可呼叫物件 (callable)，可選: 權重函數。必須將非負整數（零表示最重要的元素）映射到非負權重。預設值 None 提供雙曲加權，也就是說，等級 \(r\) 映射到權重 \(1/(r+1)\)。
additive布林值 (bool)，可選: 如果為 True，則交換的權重是透過將交換元素的等級權重相加來計算的；否則，權重相乘。預設值為 True。

返回:

res: SignificanceResult

包含屬性的物件

statisticfloat: 加權 \(\tau\) 相關係數。
pvaluefloat: 目前為 np.nan，因為統計量的零分佈是未知的（即使在加性雙曲情況下也是如此）。

參見

kendalltau: 計算 Kendall’s tau。
spearmanr: 計算 Spearman 等級順序相關係數。
theilslopes: 計算一組點 (x, y) 的 Theil-Sen 估計器。

Notes

此函數使用 \(O(n \log n)\)，基於合併排序的演算法 [1]，它是 Knight 的 Kendall’s \(\tau\) 演算法的加權擴展 [2]。它可以透過將 additive 和 rank 設定為 False 來計算無關係排名（即排列）之間的 Shieh 加權 \(\tau\) [3]，因為 [1] 中給出的定義是 Shieh 定義的推廣。

NaN 被視為可能的最小分數。

在 0.19.0 版本中新增。

參考文獻

[1] (1,2,3)

Sebastiano Vigna, “A weighted correlation index for rankings with ties”, Proceedings of the 24th international conference on World Wide Web, pp. 1166-1176, ACM, 2015.

[2]

W.R. Knight, “A Computer Method for Calculating Kendall’s Tau with Ungrouped Data”, Journal of the American Statistical Association, Vol. 61, No. 314, Part 1, pp. 436-439, 1966.

[3]

Grace S. Shieh. “A weighted Kendall’s tau statistic”, Statistics & Probability Letters, Vol. 39, No. 1, pp. 17-24, 1998.

範例

>>> import numpy as np
>>> from scipy import stats
>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, 0]
>>> res = stats.weightedtau(x, y)
>>> res.statistic
-0.56694968153682723
>>> res.pvalue
nan
>>> res = stats.weightedtau(x, y, additive=False)
>>> res.statistic
-0.62205716951801038

NaN 被視為可能的最小分數

>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, np.nan]
>>> res = stats.weightedtau(x, y)
>>> res.statistic
-0.56694968153682723

這完全是 Kendall’s tau

>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, 0]
>>> res = stats.weightedtau(x, y, weigher=lambda x: 1)
>>> res.statistic
-0.47140452079103173

>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, 0]
>>> stats.weightedtau(x, y, rank=None)
SignificanceResult(statistic=-0.4157652301037516, pvalue=nan)
>>> stats.weightedtau(y, x, rank=None)
SignificanceResult(statistic=-0.7181341329699028, pvalue=nan)