scipy.stats.

siegelslopes#

scipy.stats.siegelslopes(y, x=None, method='hierarchical')[原始碼]#

計算一組點 (x, y) 的 Siegel 估計量。

siegelslopes 實作了一種使用重複中位數的穩健線性迴歸方法（請參閱 [1]）以將線擬合到點 (x, y)。此方法對於離群值具有穩健性，其漸近崩潰點為 50%。

參數:

yarray_like: 應變數。
xarray_like 或 None，可選: 自變數。如果為 None，則改用 arange(len(y))。
method{‘hierarchical’, ‘separate’}: 如果為 ‘hierarchical’，則使用估計斜率 slope 估計截距（預設選項）。如果為 ‘separate’，則獨立於估計斜率估計截距。詳情請參閱「Notes」。

返回:

resultSiegelslopesResult 實例

傳回值是一個具有以下屬性的物件

slopefloat: 迴歸線斜率的估計值。
interceptfloat: 迴歸線截距的估計值。

另請參閱

theilslopes: 一種沒有重複中位數的類似技術

Notes

使用 n = len(y)，計算 m_j 作為從點 (x[j], y[j]) 到所有其他 n-1 個點的斜率中位數。slope 接著是所有斜率 m_j 的中位數。在 [1] 中給出了兩種估計截距的方法，可以通過參數 method 選擇。階層式方法使用估計斜率 slope 並計算 intercept 作為 y - slope*x 的中位數。另一種方法單獨估計截距，如下所示：對於每個點 (x[j], y[j])，計算通過其餘點的所有 n-1 條線的截距，並取中位數 i_j。intercept 是 i_j 的中位數。

此實作計算 n 次大小為 n 的向量的中位數，對於大型向量來說可能會很慢。還有更有效率的演算法（請參閱 [2]），但此處未實作。

為了與舊版本的 SciPy 相容，傳回值的作用類似於長度為 2 的 namedtuple，其中包含欄位 slope 和 intercept，因此可以繼續寫入

slope, intercept = siegelslopes(y, x)

參考文獻

[1] (1,2)

A. Siegel, “Robust Regression Using Repeated Medians”, Biometrika, Vol. 69, pp. 242-244, 1982.

[2]

A. Stein and M. Werman, “Finding the repeated median regression line”, Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 409-413, 1992.

範例

>>> import numpy as np
>>> from scipy import stats
>>> import matplotlib.pyplot as plt

>>> x = np.linspace(-5, 5, num=150)
>>> y = x + np.random.normal(size=x.size)
>>> y[11:15] += 10  # add outliers
>>> y[-5:] -= 7

計算斜率和截距。為了比較，也使用 linregress 計算最小平方擬合

>>> res = stats.siegelslopes(y, x)
>>> lsq_res = stats.linregress(x, y)

繪製結果。Siegel 迴歸線以紅色顯示。綠線顯示最小平方擬合以供比較。

>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(x, y, 'b.')
>>> ax.plot(x, res[1] + res[0] * x, 'r-')
>>> ax.plot(x, lsq_res[1] + lsq_res[0] * x, 'g-')
>>> plt.show()

../../_images/scipy-stats-siegelslopes-1.png