ComplementNB#

class sklearn.naive_bayes.ComplementNB(*, alpha=1.0, force_alpha=True, fit_prior=True, class_prior=None, norm=False)[source]#

The Complement Naive Bayes classifier described in Rennie et al. (2003).

The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.

See also

BernoulliNB: Naive Bayes classifier for multivariate Bernoulli models.
CategoricalNB: Naive Bayes classifier for categorical features.
GaussianNB: Gaussian Naive Bayes.
MultinomialNB: Naive Bayes classifier for multinomial models.

References

Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

範例

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import ComplementNB
>>> clf = ComplementNB()
>>> clf.fit(X, y)
ComplementNB()
>>> print(clf.predict(X[2:3]))
[3]

fit(X, y, sample_weight=None)[source]#

Fit Naive Bayes classifier according to X, y.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): Training vectors, where n_samples is the number of samples and n_features is the number of features.
yarray-like of shape (n_samples,): Target values.
sample_weightarray-like of shape (n_samples,), default=None: Weights applied to individual samples (1. for unweighted).

Returns:

selfobject: Returns the instance itself.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

partial_fit(X, y, classes=None, sample_weight=None)[source]#

在批次樣本上進行增量擬合。

此方法預期會在資料集的不同區塊上連續呼叫多次，以實現核外學習或線上學習。

當整個資料集太大而無法一次放入記憶體時，這特別有用。

此方法有一些效能開銷，因此最好在盡可能大的資料區塊上呼叫 partial_fit（只要符合記憶體預算），以隱藏開銷。

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

yarray-like of shape (n_samples,)

Target values.

classes形狀為 (n_classes,) 的類陣列，預設值為 None

y 向量中可能出現的所有類別的列表。

必須在第一次呼叫 partial_fit 時提供，後續呼叫可以省略。

sample_weightarray-like of shape (n_samples,), default=None

Weights applied to individual samples (1. for unweighted).

Returns:

selfobject: Returns the instance itself.

predict(X)[原始碼]#

對測試向量陣列 X 執行分類。

Parameters:

X形狀為 (n_samples, n_features) 的類陣列: 輸入樣本。

Returns:

C形狀為 (n_samples,) 的 ndarray: X 的預測目標值。

predict_joint_log_proba(X)[原始碼]#

返回測試向量 X 的聯合對數機率估計值。

對於 X 的每一列 x 和類別 y，聯合對數機率由 log P(x, y) = log P(y) + log P(x|y) 給出，其中 log P(y) 是類別先驗機率，而 log P(x|y) 是類別條件機率。

Parameters:

X形狀為 (n_samples, n_features) 的類陣列: 輸入樣本。

Returns:

C形狀為 (n_samples, n_classes) 的 ndarray: 返回模型中每個類別的樣本聯合對數機率。列對應於屬性 classes_ 中依排序順序出現的類別。

predict_log_proba(X)[原始碼]#

返回測試向量 X 的對數機率估計值。

Parameters:

X形狀為 (n_samples, n_features) 的類陣列: 輸入樣本。

Returns:

C形狀為 (n_samples, n_classes) 的類陣列: 返回模型中每個類別的樣本對數機率。列對應於屬性 classes_ 中依排序順序出現的類別。

predict_proba(X)[原始碼]#

返回測試向量 X 的機率估計值。

Parameters:

X形狀為 (n_samples, n_features) 的類陣列: 輸入樣本。

Returns:

C形狀為 (n_samples, n_classes) 的類陣列: 返回模型中每個類別的樣本機率。列對應於屬性 classes_ 中依排序順序出現的類別。

score(X, y, sample_weight=None)[原始碼]#

返回給定測試資料和標籤的平均準確度。

在多標籤分類中，這是子集準確度，這是一個嚴苛的指標，因為您需要針對每個樣本正確預測每個標籤集。

Parameters:

X形狀為 (n_samples, n_features) 的類陣列: 測試樣本。
y形狀為 (n_samples,) 或 (n_samples, n_outputs) 的類陣列: X 的真實標籤。
sample_weightarray-like of shape (n_samples,), default=None: 樣本權重。

Returns:

score浮點數: self.predict(X) 相對於 y 的平均準確度。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ComplementNB[原始碼]#

請求傳遞至 fit 方法的中繼資料。

請注意，只有在 enable_metadata_routing=True 時，此方法才相關（請參閱 sklearn.set_config）。請參閱使用者指南，了解路由機制的運作方式。

每個參數的選項為

True：請求中繼資料，並在提供時傳遞給 fit。如果未提供中繼資料，則會忽略請求。
False：不請求中繼資料，並且 meta-estimator 不會將其傳遞給 fit。
None：不請求中繼資料，如果使用者提供中繼資料，meta-estimator 將會引發錯誤。
str：中繼資料應該使用此給定的別名而不是原始名稱傳遞給 meta-estimator。

預設值 (sklearn.utils.metadata_routing.UNCHANGED) 會保留現有的請求。這讓您可以變更某些參數的請求，而無需變更其他參數的請求。

在版本 1.3 中新增。

注意

只有在將此估算器用作 meta-estimator 的子估算器時，此方法才相關，例如在 Pipeline 內部使用。否則，它沒有任何作用。

Parameters:

sample_weightstr、True、False 或 None，預設值為 sklearn.utils.metadata_routing.UNCHANGED: fit 中 sample_weight 參數的中繼資料路由。

Returns:

selfobject: 更新後的物件。

set_params(**params)[原始碼]#

設定此估算器的參數。

此方法適用於簡單的估算器以及巢狀物件（例如 Pipeline）。後者的參數形式為 <component>__<parameter>，因此可以更新巢狀物件的每個元件。

Parameters:

**paramsdict: 估算器參數。

Returns:

self估算器實例: 估算器實例。

set_partial_fit_request(*, classes: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') → ComplementNB[原始碼]#

請求傳遞至 partial_fit 方法的中繼資料。

請注意，只有在 enable_metadata_routing=True 時，此方法才相關（請參閱 sklearn.set_config）。請參閱使用者指南，了解路由機制的運作方式。

每個參數的選項為

True：請求中繼資料，並在提供時傳遞至 partial_fit。如果未提供中繼資料，則會忽略請求。
False：不請求中繼資料，並且 meta-estimator 不會將其傳遞至 partial_fit。
None：不請求中繼資料，如果使用者提供中繼資料，meta-estimator 將會引發錯誤。
str：中繼資料應該使用此給定的別名而不是原始名稱傳遞給 meta-estimator。

預設值 (sklearn.utils.metadata_routing.UNCHANGED) 會保留現有的請求。這讓您可以變更某些參數的請求，而無需變更其他參數的請求。

在版本 1.3 中新增。

注意

只有在將此估算器用作 meta-estimator 的子估算器時，此方法才相關，例如在 Pipeline 內部使用。否則，它沒有任何作用。

Parameters:

classesstr、True、False 或 None，預設值為 sklearn.utils.metadata_routing.UNCHANGED: partial_fit 中 classes 參數的中繼資料路由。
sample_weightstr、True、False 或 None，預設值為 sklearn.utils.metadata_routing.UNCHANGED: partial_fit 中 sample_weight 參數的中繼資料路由。

Returns:

selfobject: 更新後的物件。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ComplementNB[原始碼]#

請求傳遞至 score 方法的中繼資料。

請注意，只有在 enable_metadata_routing=True 時，此方法才相關（請參閱 sklearn.set_config）。請參閱使用者指南，了解路由機制的運作方式。

每個參數的選項為

True：請求中繼資料，並在提供時傳遞至 score。如果未提供中繼資料，則會忽略請求。
False：不請求中繼資料，並且 meta-estimator 不會將其傳遞至 score。
None：不請求中繼資料，如果使用者提供中繼資料，meta-estimator 將會引發錯誤。
str：中繼資料應該使用此給定的別名而不是原始名稱傳遞給 meta-estimator。

預設值 (sklearn.utils.metadata_routing.UNCHANGED) 會保留現有的請求。這讓您可以變更某些參數的請求，而無需變更其他參數的請求。

在版本 1.3 中新增。

注意

只有在將此估算器用作 meta-estimator 的子估算器時，此方法才相關，例如在 Pipeline 內部使用。否則，它沒有任何作用。

Parameters:

sample_weightstr、True、False 或 None，預設值為 sklearn.utils.metadata_routing.UNCHANGED: score 中 sample_weight 參數的中繼資料路由。

Returns:

selfobject: 更新後的物件。

範例藝廊#

文字特徵提取和評估的範例管道

使用稀疏特徵分類文字文件