compute_sample_weight#

sklearn.utils.class_weight.compute_sample_weight(class_weight, y, *, indices=None)[原始碼]#

針對不平衡的資料集，根據類別估計樣本權重。

參數:

class_weightdict、dict 的列表、“balanced” 或 None

與類別相關聯的權重，格式為 {類別標籤: 權重}。如果沒有給定，則假設所有類別的權重皆為 1。對於多輸出問題，可以提供一個 dict 列表，其順序與 y 的欄位相同。

請注意，對於多輸出（包括多標籤）問題，應為每個欄位的每個類別在其自身的 dict 中定義權重。例如，對於四類多標籤分類，權重應為 [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}]，而不是 [{1:1}, {2:5}, {3:1}, {4:1}]。

"balanced" 模式會使用 y 的值來自動調整權重，使其與輸入資料中類別頻率成反比：n_samples / (n_classes * np.bincount(y))。

對於多輸出，y 的每個欄位的權重將會相乘。

y{array-like、稀疏矩陣}，形狀為 (n_samples,) 或 (n_samples, n_outputs)

每個樣本的原始類別標籤陣列。

indicesarray-like，形狀為 (n_subsample,)，預設值為 None

要在子樣本中使用的索引陣列。在子樣本的情況下，長度可以小於 n_samples；在重複索引的 bootstrap 子樣本的情況下，長度可以等於 n_samples。如果為 None，則將在完整樣本上計算樣本權重。如果提供了此參數，則僅支援 class_weight 的 "balanced" 模式。

返回:

sample_weight_vectndarray，形狀為 (n_samples,): 套用於原始 y 的樣本權重陣列。

範例

>>> from sklearn.utils.class_weight import compute_sample_weight
>>> y = [1, 1, 1, 1, 0, 0]
>>> compute_sample_weight(class_weight="balanced", y=y)
array([0.75, 0.75, 0.75, 0.75, 1.5 , 1.5 ])