KFold#

class sklearn.model_selection.KFold(n_splits=5, *, shuffle=False, random_state=None)[原始碼]#

K 折交叉驗證器。

提供訓練/測試索引以將資料分割為訓練/測試集。將資料集分割為 k 個連續的折疊（預設不洗牌）。

然後每個折疊使用一次作為驗證，而剩下的 k - 1 個折疊形成訓練集。

請在使用者指南中閱讀更多內容。

有關交叉驗證行為的可視化和常見 scikit-learn 分割方法之間的比較，請參閱在 scikit-learn 中可視化交叉驗證行為

參數:

n_splitsint, default=5: 折疊的數量。必須至少為 2。

在 0.22 版本中變更: n_splits 預設值從 3 變更為 5。
shufflebool, default=False: 是否在分割成批次之前打亂資料。請注意，每個分割中的樣本不會被打亂。
random_stateint, RandomState instance 或 None, default=None: 當 shuffle 為 True 時，random_state 會影響索引的順序，從而控制每個折疊的隨機性。否則，此參數無效。傳遞一個 int 以便在多個函式呼叫中產生可重複的輸出。請參閱詞彙表。

另請參閱

StratifiedKFold: StratifiedKFold
GroupKFold: 考慮類別資訊，以避免建立類別分佈不平衡的折疊（用於二元或多類別分類任務）。
RepeatedKFold: GroupKFold

具有非重疊群組的 K 折疊迭代器變體。

RepeatedKFold

重複 K 折疊 n 次。

範例

>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for i, (train_index, test_index) in enumerate(kf.split(X)):
...     print(f"Fold {i}:")
...     print(f"  Train: index={train_index}")
...     print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[2 3]
  Test:  index=[0 1]
Fold 1:
  Train: index=[0 1]
  Test:  index=[2 3]

備註

第一個 n_samples % n_splits 折疊的大小為 n_samples // n_splits + 1，其他折疊的大小為 n_samples // n_splits，其中 n_samples 是樣本的數量。

隨機化的 CV 分割器可能會在每次分割呼叫時傳回不同的結果。您可以將 random_state 設定為整數來使結果相同。

get_metadata_routing()[原始碼]#

取得此物件的中繼資料路由。: 請查看使用者指南，瞭解路由機制如何運作。

傳回:

routingMetadataRequest

參數:

一個封裝路由資訊的MetadataRequest。: get_n_splits(X=None, y=None, groups=None)[原始碼]#
傳回交叉驗證器中的分割迭代次數。: get_n_splits(X=None, y=None, groups=None)[原始碼]#
參數:: get_n_splits(X=None, y=None, groups=None)[原始碼]#

get_metadata_routing()[原始碼]#