sample_without_replacement#

sklearn.utils.random.sample_without_replacement(n_population, n_samples, method='auto', random_state=None)#

無放回取樣整數。

從集合 [0, n_population) 中無放回地選擇 n_samples 個整數。

參數:

n_populationint

要取樣的集合大小。

n_samplesint

要取樣的整數數量。

random_stateint、RandomState 實例或 None，預設值=None

如果為 int，random_state 是隨機數產生器使用的種子；如果為 RandomState 實例，random_state 是隨機數產生器；如果為 None，隨機數產生器是 np.random 使用的 RandomState 實例。

method{“auto”, “tracking_selection”, “reservoir_sampling”, “pool”}, 預設值=’auto’

如果 method == “auto”，則使用 n_samples / n_population 的比率來決定要使用哪個演算法：如果比率介於 0 和 0.01 之間，則使用追蹤選擇。如果比率介於 0.01 和 0.99 之間，則使用 numpy.random.permutation。如果比率大於 0.99，則使用蓄水池取樣。所選整數的順序未定義。如果需要隨機順序，則應對所選子集進行洗牌。

如果 method ==”tracking_selection”，則使用基於集合的實作，適用於 n_samples <<< n_population。

如果 method == “reservoir_sampling”，則使用蓄水池取樣演算法，適用於高記憶體約束或當 O(n_samples) ~ O(n_population) 時。所選整數的順序未定義。如果需要隨機順序，則應對所選子集進行洗牌。

如果 method == “pool”，則基於池的演算法特別快，甚至比追蹤選擇方法還快。但是，必須初始化包含整個母體的向量。如果 n_samples ~ n_population，則蓄水池取樣方法更快。

回傳值:

out形狀為 (n_samples,) 的 ndarray: 取樣的整數子集。所選整數的子集可能未隨機化，請參閱 method 參數。

範例

>>> from sklearn.utils.random import sample_without_replacement
>>> sample_without_replacement(10, 5, random_state=42)
array([8, 1, 5, 0, 7])