Skip to content

Commit f6e680c

Browse files
authored
Update README.rst
1 parent 344540c commit f6e680c

1 file changed

Lines changed: 28 additions & 0 deletions

File tree

README.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,34 @@ Logistic Regression
445445
Naive Bayes Classifier
446446
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
447447

448+
Naïve Bayes text classification has been used in industry
449+
and academia for a long time (introduced by Thomas Bayes
450+
between 1701-1761) ; however, this technique
451+
is studied since 1950s for text and document categorization. Naive Bayes Classifier (NBC) is generative
452+
model which is the most traditional method of text categorization
453+
which is widely used in Information Retrieval. Many researchers addressed and developed this technique
454+
for their applications. We start the most basic version
455+
of NBC which developed by using term-frequency (Bag of
456+
Word) fetaure extraction technique by counting number of
457+
words in documents
458+
459+
460+
.. code:: python
461+
462+
from sklearn.naive_bayes import MultinomialNB
463+
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)
464+
465+
466+
docs_new = ['God is love', 'OpenGL on the GPU is fast']
467+
X_new_counts = count_vect.transform(docs_new)
468+
X_new_tfidf = tfidf_transformer.transform(X_new_counts)
469+
470+
predicted = clf.predict(X_new_tfidf)
471+
472+
for doc, category in zip(docs_new, predicted):
473+
print('%r => %s' % (doc, twenty_train.target_names[category]))
474+
475+
448476
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
449477
K-nearest Neighbor
450478
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)