File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -445,6 +445,34 @@ Logistic Regression
445445Naive Bayes Classifier
446446~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
447447
448+ Naïve Bayes text classification has been used in industry
449+ and academia for a long time (introduced by Thomas Bayes
450+ between 1701-1761) ; however, this technique
451+ is studied since 1950s for text and document categorization. Naive Bayes Classifier (NBC) is generative
452+ model which is the most traditional method of text categorization
453+ which is widely used in Information Retrieval. Many researchers addressed and developed this technique
454+ for their applications. We start the most basic version
455+ of NBC which developed by using term-frequency (Bag of
456+ Word) fetaure extraction technique by counting number of
457+ words in documents
458+
459+
460+ .. code :: python
461+
462+ from sklearn.naive_bayes import MultinomialNB
463+ clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)
464+
465+
466+ docs_new = [' God is love' , ' OpenGL on the GPU is fast' ]
467+ X_new_counts = count_vect.transform(docs_new)
468+ X_new_tfidf = tfidf_transformer.transform(X_new_counts)
469+
470+ predicted = clf.predict(X_new_tfidf)
471+
472+ for doc, category in zip (docs_new, predicted):
473+ print (' %r => %s ' % (doc, twenty_train.target_names[category]))
474+
475+
448476~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
449477K-nearest Neighbor
450478~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can’t perform that action at this time.
0 commit comments