calibration_curve#

sklearn.calibration.calibration_curve(y_true, y_prob, *, pos_label=None, n_bins=5, strategy='uniform')[原始碼]#

計算校準曲線的真實機率和預測機率。

此方法假設輸入來自二元分類器，並將 [0, 1] 區間離散化為若干個 bin（區間）。

校準曲線也可稱為可靠性圖。

詳情請參閱使用者指南。

參數:

y_true形狀為 (n_samples,) 的類陣列

真實目標值。

y_prob形狀為 (n_samples,) 的類陣列

正類別的機率。

pos_labelint、float、bool 或 str，預設值為 None

正類別的標籤。

1.1 版新增。

n_binsint，預設值為 5

將 [0, 1] 區間離散化的 bin 的數量。較大的數字需要更多資料。沒有樣本的 bin（即在 y_prob 中沒有對應值的 bin）將不會回傳，因此回傳的陣列可能少於 n_bins 個值。

strategy{‘uniform’，‘quantile’}，預設值為 ‘uniform’

用於定義 bin 寬度的策略。

uniform: bin 具有相同的寬度。
quantile: bin 具有相同的樣本數，並且取決於 y_prob。

回傳值:

prob_true形狀為 (n_bins,) 或更小的 ndarray: 在每個 bin 中，類別為正類別的樣本比例（正例的比例）。
prob_pred形狀為 (n_bins,) 或更小的 ndarray: 每個 bin 中的平均預測機率。

參考文獻

Alexandru Niculescu-Mizil 和 Rich Caruana (2005) Predicting Good Probabilities With Supervised Learning，在第 22 屆國際機器學習會議 (ICML) 的論文集。請參閱第 4 節（預測的定性分析）。

範例

>>> import numpy as np
>>> from sklearn.calibration import calibration_curve
>>> y_true = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1])
>>> y_pred = np.array([0.1, 0.2, 0.3, 0.4, 0.65, 0.7, 0.8, 0.9,  1.])
>>> prob_true, prob_pred = calibration_curve(y_true, y_pred, n_bins=3)
>>> prob_true
array([0. , 0.5, 1. ])
>>> prob_pred
array([0.2  , 0.525, 0.85 ])