scipy.stats.contingency.

chi2_contingency#

scipy.stats.contingency.chi2_contingency(observed, correction=True, lambda_=None, *, method=None)[原始碼]#

列聯表中變數獨立性的卡方檢定。

此函數計算卡方統計量和 p 值，用於檢定列聯表 [1] observed 中觀察頻率的獨立性假設。期望頻率是根據獨立性假設下的邊際總和計算得出；請參閱 scipy.stats.contingency.expected_freq。自由度數為（使用 numpy 函數和屬性表示）

dof = observed.size - sum(observed.shape) + observed.ndim - 1

參數:

observedarray_like: 列聯表。該表包含每個類別中的觀察頻率（即出現次數）。在二維情況下，該表通常被描述為「R x C 表」。
correctionbool，可選: 如果為 True，且自由度為 1，則應用葉氏連續性校正。校正的效果是將每個觀察值朝相應的期望值調整 0.5。
lambda_float 或 str，可選: 預設情況下，此檢定中計算的統計量是皮爾森卡方統計量 [2]。lambda_ 允許使用 Cressie-Read 冪散度族 [3] 的統計量來代替。有關詳細資訊，請參閱 scipy.stats.power_divergence。
methodResamplingMethod，可選: 定義用於計算 p 值的方法。僅與 correction=False、預設 lambda_ 和雙向表相容。如果 method 是 PermutationMethod/MonteCarloMethod 的實例，則 p 值是使用 scipy.stats.permutation_test/scipy.stats.monte_carlo_test 以及提供的組態選項和其他適當設定計算得出。否則，p 值將按照註解中的說明計算。請注意，如果 method 是 MonteCarloMethod 的實例，則 rvs 屬性必須保持未指定；Monte Carlo 樣本始終使用 scipy.stats.random_table 的 rvs 方法繪製。

在 1.15.0 版本中新增。

返回:

resChi2ContingencyResult

一個包含屬性的物件

statisticfloat: 檢定統計量。
pvaluefloat: 檢定的 p 值。
dofint: 自由度。如果 method 不是 None，則為 NaN。
expected_freqndarray，與 observed 形狀相同: 期望頻率，基於表的邊際總和。

另請參閱

scipy.stats.contingency.expected_freq
scipy.stats.fisher_exact
scipy.stats.chisquare
scipy.stats.power_divergence
scipy.stats.barnard_exact
scipy.stats.boschloo_exact
列聯表中變數獨立性的卡方檢定: 擴展範例

註解

關於此計算有效性的一個經常被引用的指南是，僅當每個單元格中的觀察頻率和期望頻率至少為 5 時，才應使用該檢定。

這是針對母體不同類別的獨立性檢定。僅當 observed 的維度為二維或更多時，此檢定才有意義。將檢定應用於一維表將始終導致 expected 等於 observed，且卡方統計量等於 0。

此函數不處理遮罩陣列，因為使用遺失值進行計算沒有意義。

與 scipy.stats.chisquare 類似，此函數計算卡方統計量；此函數提供的便利是從給定的列聯表計算出期望頻率和自由度。如果這些已經已知，並且如果不需要葉氏校正，則可以使用 scipy.stats.chisquare。也就是說，如果呼叫

res = chi2_contingency(obs, correction=False)

則以下為真

(res.statistic, res.pvalue) == stats.chisquare(obs.ravel(),
                                               f_exp=ex.ravel(),
                                               ddof=obs.size - 1 - dof)

lambda_ 引數已在 scipy 0.13.0 版本中新增。

參考文獻

[1]

「列聯表」，https://en.wikipedia.org/wiki/Contingency_table

[2]

「皮爾森卡方檢定」，https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

[3]

Cressie, N. and Read, T. R. C., 「Multinomial Goodness-of-Fit Tests」，J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464。

範例

二維範例 (2 x 3)

>>> import numpy as np
>>> from scipy.stats import chi2_contingency
>>> obs = np.array([[10, 10, 20], [20, 20, 20]])
>>> res = chi2_contingency(obs)
>>> res.statistic
2.7777777777777777
>>> res.pvalue
0.24935220877729619
>>> res.dof
2
>>> res.expected_freq
array([[ 12.,  12.,  16.],
       [ 18.,  18.,  24.]])

使用對數似然比（即「G 檢定」）而不是皮爾森卡方統計量執行檢定。

>>> res = chi2_contingency(obs, lambda_="log-likelihood")
>>> res.statistic
2.7688587616781319
>>> res.pvalue
0.25046668010954165

四維範例 (2 x 2 x 2 x 2)

>>> obs = np.array(
...     [[[[12, 17],
...        [11, 16]],
...       [[11, 12],
...        [15, 16]]],
...      [[[23, 15],
...        [30, 22]],
...       [[14, 17],
...        [15, 16]]]])
>>> res = chi2_contingency(obs)
>>> res.statistic
8.7584514426741897
>>> res.pvalue
0.64417725029295503

當雙向表中元素的總和很小時，預設漸近近似產生的 p 值可能不準確。考慮傳遞 PermutationMethod 或 MonteCarloMethod 作為 method 參數，並將 correction=False。

>>> from scipy.stats import PermutationMethod
>>> obs = np.asarray([[12, 3],
...                   [17, 16]])
>>> res = chi2_contingency(obs, correction=False)
>>> ref = chi2_contingency(obs, correction=False, method=PermutationMethod())
>>> res.pvalue, ref.pvalue
(0.0614122539870913, 0.1074)  # may vary

如需更詳細的範例，請參閱列聯表中變數獨立性的卡方檢定。