scipy.stats.

bootstrap#

scipy.stats.bootstrap(data, statistic, *, n_resamples=9999, batch=None, vectorized=None, paired=False, axis=0, confidence_level=0.95, alternative='two-sided', method='BCa', bootstrap_result=None, rng=None)[source]#

計算統計量的雙邊 bootstrap 信賴區間。

當 method 為 'percentile' 且 alternative 為 'two-sided' 時，bootstrap 信賴區間會依照以下程序計算。

重新取樣資料：對於 data 中的每個樣本以及每個 n_resamples，從原始樣本中隨機取樣（放回取樣），取樣大小與原始樣本相同。
計算統計量的 bootstrap 分佈：對於每組重新取樣，計算檢定統計量。
決定信賴區間：找出 bootstrap 分佈的區間，該區間
- 關於中位數對稱且
- 包含 confidence_level 的重新取樣統計量值。

雖然 'percentile' 方法最直觀，但在實務中很少使用。另有兩種更常用的方法，'basic'（「反向百分位數」）和 'BCa'（「偏差校正和加速」）；它們在步驟 3 的執行方式上有所不同。

如果 data 中的樣本是從各自的分佈中隨機抽取 \(n\) 次，則 bootstrap 返回的信賴區間將包含這些分佈的統計量的真實值，大約 confidence_level\(\, \times \, n\) 次。

參數:

dataarray-like 序列

data 的每個元素都是一個樣本，其中包含來自底層分佈的純量觀測值。data 的元素必須可廣播為相同的形狀（可能除了由 axis 指定的維度外）。

在版本 1.14.0 中變更：如果 data 的元素的形狀不相同（除了由 axis 指定的維度外），bootstrap 現在將發出 FutureWarning。從 SciPy 1.16.0 開始，bootstrap 將在執行計算之前，明確地將元素廣播為相同的形狀（除了沿 axis 的維度）。

statisticcallable

要計算信賴區間的統計量。statistic 必須是一個可呼叫的物件，它接受 len(data) 個樣本作為個別的引數，並返回結果統計量。如果將 vectorized 設定為 True，則 statistic 也必須接受關鍵字引數 axis，並且必須向量化以沿提供的 axis 計算統計量。

n_resamplesint，預設值：9999

執行重新取樣以形成統計量的 bootstrap 分佈的次數。

batchint，選用

在每次向量化呼叫 statistic 時要處理的重新取樣次數。記憶體使用量為 O( batch * n )，其中 n 是樣本大小。預設值為 None，在這種情況下，batch = n_resamples（或對於 method='BCa'，batch = max(n_resamples, n)）。

vectorizedbool，選用

如果將 vectorized 設定為 False，則不會將關鍵字引數 axis 傳遞給 statistic，並且預期僅針對 1D 樣本計算統計量。如果為 True，則會將關鍵字引數 axis 傳遞給 statistic，並且預期在傳遞 ND 樣本陣列時，沿 axis 計算統計量。如果為 None（預設值），則如果 axis 是 statistic 的參數，則 vectorized 將設定為 True。使用向量化統計量通常可以減少計算時間。

pairedbool，預設值：False

統計量是否將 data 中樣本的對應元素視為成對。如果為 True，bootstrap 會重新取樣一個索引陣列，並對 data 中的所有陣列使用相同的索引；否則，bootstrap 會獨立地重新取樣每個陣列的元素。

axisint，預設值：0

data 中樣本的軸，沿該軸計算 statistic。

confidence_levelfloat，預設值：0.95

信賴區間的信賴水準。

alternative{‘two-sided’, ‘less’, ‘greater’}，預設值：'two-sided'

選擇 'two-sided'（預設值）以取得雙邊信賴區間，選擇 'less' 以取得下限為 -np.inf 的單邊信賴區間，以及選擇 'greater' 以取得上限為 np.inf 的單邊信賴區間。單邊信賴區間的另一個界限與 confidence_level 距離 1.0 兩倍的雙邊信賴區間的界限相同；例如，95% 'less' 信賴區間的上限與 90% 'two-sided' 信賴區間的上限相同。

method{‘percentile’, ‘basic’, ‘bca’}，預設值：'BCa'

是否返回「百分位數」bootstrap 信賴區間 ('percentile')、「基本」（又稱「反向」）bootstrap 信賴區間 ('basic')，或偏差校正和加速 bootstrap 信賴區間 ('BCa')。

bootstrap_resultBootstrapResult，選用

提供先前呼叫 bootstrap 返回的結果物件，以將先前的 bootstrap 分佈包含在新的 bootstrap 分佈中。例如，這可用於變更 confidence_level、變更 method，或查看執行額外重新取樣而不重複計算的效果。

rng{None, int, numpy.random.Generator}，選用

如果透過關鍵字傳遞 rng，則會將 numpy.random.Generator 以外的類型傳遞給 numpy.random.default_rng 以實例化 Generator。如果 rng 已經是 Generator 實例，則使用提供的實例。指定 rng 以獲得可重複的函數行為。

如果此引數是透過位置傳遞，或 random_state 是透過關鍵字傳遞，則適用於引數 random_state 的傳統行為

如果 random_state 為 None（或 numpy.random），則使用 numpy.random.RandomState 單例。
如果 random_state 為 int，則會使用新的 RandomState 實例，並以 random_state 作為種子。
如果 random_state 已經是 Generator 或 RandomState 實例，則使用該實例。

在版本 1.15.0 中變更：作為從使用 numpy.random.RandomState 過渡到 numpy.random.Generator 的 SPEC-007 過渡的一部分，此關鍵字已從 random_state 變更為 rng。在過渡期間，這兩個關鍵字將繼續有效，儘管一次只能指定一個。在過渡期之後，使用 random_state 關鍵字的函數呼叫將發出警告。random_state 和 rng 的行為都如上所述，但在新程式碼中應僅使用 rng 關鍵字。

返回:

resBootstrapResult

具有屬性的物件

confidence_intervalConfidenceInterval: bootstrap 信賴區間，作為 collections.namedtuple 的實例，具有屬性 low 和 high。
bootstrap_distributionndarray: bootstrap 分佈，即每次重新取樣的 statistic 值。最後一個維度對應於重新取樣（例如，res.bootstrap_distribution.shape[-1] == n_resamples）。
standard_errorfloat 或 ndarray: bootstrap 標準誤，即 bootstrap 分佈的樣本標準差。

警告:

DegenerateDataWarning: 當 method='BCa' 且 bootstrap 分佈退化時產生（例如，所有元素都相同）。

附註

如果 bootstrap 分佈退化（例如，所有元素都相同），則對於 method='BCa'，信賴區間的元素可能為 NaN。在這種情況下，請考慮使用另一種 method 或檢查 data 以尋找其他分析可能更適合的跡象（例如，所有觀測值都相同）。

參考文獻

[1]

B. Efron 和 R. J. Tibshirani，《An Introduction to the Bootstrap》，Chapman & Hall/CRC，Boca Raton，FL，美國 (1993)

[2]

Nathaniel E. Helwig，「Bootstrap Confidence Intervals」，http://users.stat.umn.edu/~helwig/notes/bootci-Notes.pdf

[3]

Bootstrapping (statistics)，Wikipedia，https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29

範例

假設我們已從未知分佈中取樣資料。

>>> import numpy as np
>>> rng = np.random.default_rng()
>>> from scipy.stats import norm
>>> dist = norm(loc=2, scale=4)  # our "unknown" distribution
>>> data = dist.rvs(size=100, random_state=rng)

我們對分佈的標準差感興趣。

>>> std_true = dist.std()      # the true value of the statistic
>>> print(std_true)
4.0
>>> std_sample = np.std(data)  # the sample statistic
>>> print(std_sample)
3.9460644295563863

bootstrap 用於近似我們預期會發生的變異性，如果我們要從未知分佈中重複取樣，並每次計算樣本的統計量。它透過重複從原始樣本中放回取樣值，並計算每次重新取樣的統計量來做到這一點。這會產生統計量的「bootstrap 分佈」。

>>> import matplotlib.pyplot as plt
>>> from scipy.stats import bootstrap
>>> data = (data,)  # samples must be in a sequence
>>> res = bootstrap(data, np.std, confidence_level=0.9, rng=rng)
>>> fig, ax = plt.subplots()
>>> ax.hist(res.bootstrap_distribution, bins=25)
>>> ax.set_title('Bootstrap Distribution')
>>> ax.set_xlabel('statistic value')
>>> ax.set_ylabel('frequency')
>>> plt.show()

../../_images/scipy-stats-bootstrap-1_00_00.png

標準誤量化了這種變異性。它計算為 bootstrap 分佈的標準差。

>>> res.standard_error
0.24427002125829136
>>> res.standard_error == np.std(res.bootstrap_distribution, ddof=1)
True

統計量的 bootstrap 分佈通常近似於常態分佈，其尺度等於標準誤。

>>> x = np.linspace(3, 5)
>>> pdf = norm.pdf(x, loc=std_sample, scale=res.standard_error)
>>> fig, ax = plt.subplots()
>>> ax.hist(res.bootstrap_distribution, bins=25, density=True)
>>> ax.plot(x, pdf)
>>> ax.set_title('Normal Approximation of the Bootstrap Distribution')
>>> ax.set_xlabel('statistic value')
>>> ax.set_ylabel('pdf')
>>> plt.show()

../../_images/scipy-stats-bootstrap-1_01_00.png

這表示我們可以根據此常態分佈的分位數，建構統計量的 90% 信賴區間。

>>> norm.interval(0.9, loc=std_sample, scale=res.standard_error)
(3.5442759991341726, 4.3478528599786)

由於中央極限定理，此常態近似對於樣本底層的各種統計量和分佈都是準確的；但是，在所有情況下，近似都不可靠。由於 bootstrap 旨在與任意底層分佈和統計量一起使用，因此它使用更先進的技術來產生準確的信賴區間。

>>> print(res.confidence_interval)
ConfidenceInterval(low=3.57655333533867, high=4.382043696342881)

如果我們從原始分佈中取樣 100 次，並為每個樣本形成一個 bootstrap 信賴區間，則信賴區間包含統計量的真實值大約 90% 的時間。

>>> n_trials = 100
>>> ci_contains_true_std = 0
>>> for i in range(n_trials):
...    data = (dist.rvs(size=100, random_state=rng),)
...    res = bootstrap(data, np.std, confidence_level=0.9,
...                    n_resamples=999, rng=rng)
...    ci = res.confidence_interval
...    if ci[0] < std_true < ci[1]:
...        ci_contains_true_std += 1
>>> print(ci_contains_true_std)
88

除了編寫迴圈之外，我們也可以一次確定所有 100 個樣本的信賴區間。

>>> data = (dist.rvs(size=(n_trials, 100), random_state=rng),)
>>> res = bootstrap(data, np.std, axis=-1, confidence_level=0.9,
...                 n_resamples=999, rng=rng)
>>> ci_l, ci_u = res.confidence_interval

在這裡，ci_l 和 ci_u 包含每個 n_trials = 100 個樣本的信賴區間。

>>> print(ci_l[:5])
[3.86401283 3.33304394 3.52474647 3.54160981 3.80569252]
>>> print(ci_u[:5])
[4.80217409 4.18143252 4.39734707 4.37549713 4.72843584]

再次，大約 90% 包含真實值，std_true = 4。

>>> print(np.sum((ci_l < std_true) & (std_true < ci_u)))
93

bootstrap 也可用於估計多樣本統計量的信賴區間。例如，為了獲得平均值差異的信賴區間，我們編寫一個接受兩個樣本引數並僅返回統計量的函數。使用 axis 引數可確保所有平均值計算都在單次向量化呼叫中執行，這比在 Python 中迴圈處理成對的重新取樣更快。

>>> def my_statistic(sample1, sample2, axis=-1):
...     mean1 = np.mean(sample1, axis=axis)
...     mean2 = np.mean(sample2, axis=axis)
...     return mean1 - mean2

在這裡，我們使用預設 95% 信賴水準的「百分位數」方法。

>>> sample1 = norm.rvs(scale=1, size=100, random_state=rng)
>>> sample2 = norm.rvs(scale=2, size=100, random_state=rng)
>>> data = (sample1, sample2)
>>> res = bootstrap(data, my_statistic, method='basic', rng=rng)
>>> print(my_statistic(sample1, sample2))
0.16661030792089523
>>> print(res.confidence_interval)
ConfidenceInterval(low=-0.29087973240818693, high=0.6371338699912273)

bootstrap 標準誤的估計值也可用。

>>> print(res.standard_error)
0.238323948262459

成對樣本統計量也適用。例如，考慮 Pearson 相關係數。

>>> from scipy.stats import pearsonr
>>> n = 100
>>> x = np.linspace(0, 10, n)
>>> y = x + rng.uniform(size=n)
>>> print(pearsonr(x, y)[0])  # element 0 is the statistic
0.9954306665125647

我們包裝 pearsonr，使其僅返回統計量，確保我們使用 axis 引數，因為它可用。

>>> def my_statistic(x, y, axis=-1):
...     return pearsonr(x, y, axis=axis)[0]

我們使用 paired=True 呼叫 bootstrap。

>>> res = bootstrap((x, y), my_statistic, paired=True, rng=rng)
>>> print(res.confidence_interval)
ConfidenceInterval(low=0.9941504301315878, high=0.996377412215445)

結果物件可以傳回給 bootstrap 以執行額外的重新取樣

>>> len(res.bootstrap_distribution)
9999
>>> res = bootstrap((x, y), my_statistic, paired=True,
...                 n_resamples=1000, rng=rng,
...                 bootstrap_result=res)
>>> len(res.bootstrap_distribution)
10999

或變更信賴區間選項

>>> res2 = bootstrap((x, y), my_statistic, paired=True,
...                  n_resamples=0, rng=rng, bootstrap_result=res,
...                  method='percentile', confidence_level=0.9)
>>> np.testing.assert_equal(res2.bootstrap_distribution,
...                         res.bootstrap_distribution)
>>> res.confidence_interval
ConfidenceInterval(low=0.9941574828235082, high=0.9963781698210212)

而無需重複計算原始 bootstrap 分佈。