建立斑點 (make_blobs)#

sklearn.datasets.make_blobs(n_samples=100, n_features=2, *, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, return_centers=False)[原始碼]#

產生用於分群的等向性高斯斑點。

更多資訊請參閱使用者指南。

參數:

n_samplesint 或 array-like, 預設值=100: 若為 int，則為均勻分佈於各群集的總點數。若為 array-like，則序列中的每個元素表示每個群集的樣本數。

版本變更 v0.20: 現在可以將 array-like 傳遞給 n_samples 參數
n_featuresint, 預設值=2: 每個樣本的特徵數量。
centersint 或 array-like，形狀為 (n_centers, n_features), 預設值=None: 要產生的中心數量，或固定的中心位置。如果 n_samples 為 int 且 centers 為 None，則會產生 3 個中心。如果 n_samples 為 array-like，則 centers 必須為 None 或長度等於 n_samples 長度的陣列。
cluster_stdfloat 或 float 的 array-like, 預設值=1.0: 群集的標準差。
center_boxfloat 元組 (min, max), 預設值=(-10.0, 10.0): 當隨機產生中心時，每個群集中心的邊界框。
shufflebool, 預設值=True: 是否打亂樣本順序。
random_stateint, RandomState 實例或 None, 預設值=None: 決定資料集建立的隨機數生成。傳遞 int 以在多個函式呼叫中產生可重現的輸出。請參閱詞彙表。
return_centersbool, 預設值=False: 若為 True，則傳回每個群集的中心。

在版本 0.23 中新增。

回傳值:

X形狀為 (n_samples, n_features) 的 ndarray: 產生的樣本。
y形狀為 (n_samples,) 的 ndarray: 每個樣本的群集成員資格的整數標籤。
centers形狀為 (n_centers, n_features) 的 ndarray: 每個群集的中心。只有在 return_centers=True 時才會回傳。

另請參閱

make_classification: 一個更複雜的變體。

範例

>>> from sklearn.datasets import make_blobs
>>> X, y = make_blobs(n_samples=10, centers=3, n_features=2,
...                   random_state=0)
>>> print(X.shape)
(10, 2)
>>> y
array([0, 0, 1, 0, 2, 2, 2, 1, 1, 0])
>>> X, y = make_blobs(n_samples=[3, 3, 4], centers=None, n_features=2,
...                   random_state=0)
>>> print(X.shape)
(10, 2)
>>> y
array([0, 1, 2, 0, 2, 2, 2, 1, 1, 0])