scipy.optimize.

isotonic_regression#

scipy.optimize.isotonic_regression(y, *, weights=None, increasing=True)[原始碼]#

非參數等張迴歸。

使用鄰近違反者集值演算法 (PAVA) 計算一個與 y 相同長度的（非嚴格）單調遞增陣列 x，參見 [1]。更多詳細資訊請參見「註解」章節。

參數:

y(N,) array_like: 反應變數。
weights(N,) array_like 或 None: 案例權重。
increasingbool: 若為 True，則擬合單調遞增（即等張）迴歸。若為 False，則擬合單調遞減（即反張）迴歸。預設值為 True。

回傳值:

resOptimizeResult

最佳化結果以 OptimizeResult 物件表示。重要的屬性有

x: 等張迴歸解，即一個與 y 相同長度的遞增（或遞減）陣列，元素值範圍從 min(y) 到 max(y)。
weights : 陣列，包含每個區塊（或群組）B 的案例權重總和。
blocks: 長度為 B+1 的陣列，包含每個區塊（或群組）B 的起始位置索引。第 j 個區塊由 x[blocks[j]:blocks[j+1]] 給出，其中所有值都相同。

註解

給定資料 \(y\) 和案例權重 \(w\)，等張迴歸解決以下最佳化問題

\[\operatorname{argmin}_{x_i} \sum_i w_i (y_i - x_i)^2 \quad \text{subject to } x_i \leq x_j \text{ whenever } i \leq j \,.\]

對於每個輸入值 \(y_i\)，它產生一個值 \(x_i\)，使得 \(x\) 是遞增的（但非嚴格遞增），即 \(x_i \leq x_{i+1}\)。這是通過 PAVA 完成的。解決方案包含群組或區塊，即 \(x\) 的相鄰元素，例如 \(x_i\) 和 \(x_{i+1}\)，它們都具有相同的值。

最有趣的是，如果將平方損失替換為廣泛的 Bregman 函數類別，解決方案保持不變，Bregman 函數類別是均值的唯一嚴格一致評分函數類別，參見 [2] 及其中的參考文獻。

根據 [1] 實作的 PAVA 版本，其計算複雜度為 O(N)，其中 N 為輸入大小。

參考文獻

[1] (1,2)

Busing, F. M. T. A. (2022). Monotone Regression: A Simple and Fast O(n) PAVA Implementation. Journal of Statistical Software, Code Snippets, 102(1), 1-25. DOI:10.18637/jss.v102.c01

[2]

Jordan, A.I., Mühlemann, A. & Ziegel, J.F. Characterizing the optimal solutions to the isotonic regression problem for identifiable functionals. Ann Inst Stat Math 74, 489-514 (2022). DOI:10.1007/s10463-021-00808-0

範例

此範例示範了 isotonic_regression 確實解決了一個受約束的最佳化問題。

>>> import numpy as np
>>> from scipy.optimize import isotonic_regression, minimize
>>> y = [1.5, 1.0, 4.0, 6.0, 5.7, 5.0, 7.8, 9.0, 7.5, 9.5, 9.0]
>>> def objective(yhat, y):
...     return np.sum((yhat - y)**2)
>>> def constraint(yhat, y):
...     # This is for a monotonically increasing regression.
...     return np.diff(yhat)
>>> result = minimize(objective, x0=y, args=(y,),
...                   constraints=[{'type': 'ineq',
...                                 'fun': lambda x: constraint(x, y)}])
>>> result.x
array([1.25      , 1.25      , 4.        , 5.56666667, 5.56666667,
       5.56666667, 7.8       , 8.25      , 8.25      , 9.25      ,
       9.25      ])
>>> result = isotonic_regression(y)
>>> result.x
array([1.25      , 1.25      , 4.        , 5.56666667, 5.56666667,
       5.56666667, 7.8       , 8.25      , 8.25      , 9.25      ,
       9.25      ])

isotonic_regression 相較於呼叫 minimize 的最大優勢在於，它更使用者友善，也就是說，使用者不需要定義目標函數和約束函數，而且速度快了好幾個數量級。在一般硬體上（2023 年），對於長度為 1000 的常態分佈輸入 y，最佳化器大約需要 4 秒，而 isotonic_regression 大約需要 200 微秒。