Skip to content

Commit b389b5b

Browse files
authored
Update README.rst
1 parent 81a58a1 commit b389b5b

1 file changed

Lines changed: 105 additions & 0 deletions

File tree

README.rst

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -726,10 +726,115 @@ Boosting and Bagging
726726
Boosting
727727
---------
728728

729+
.. image:: docs/pic/Boosting.PNG
730+
731+
732+
**Boosting** is a Ensemble learning meta-algorithm for primarily reducing Supervised learning, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by `Michael Kearns <https://en.wikipedia.org/wiki/Michael_Kearns_(computer_scientist)>`__ and Leslie Valiant (1988, 1989) Can a set of weak learners create a single strong learner. A weak learner is defined to be a Classification that is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
733+
734+
735+
736+
737+
.. code:: python
738+
739+
from sklearn.ensemble import GradientBoostingClassifier
740+
from sklearn.pipeline import Pipeline
741+
from sklearn import metrics
742+
from sklearn.feature_extraction.text import CountVectorizer
743+
from sklearn.feature_extraction.text import TfidfTransformer
744+
from sklearn.datasets import fetch_20newsgroups
745+
746+
newsgroups_train = fetch_20newsgroups(subset='train')
747+
newsgroups_test = fetch_20newsgroups(subset='test')
748+
X_train = newsgroups_train.data
749+
X_test = newsgroups_test.data
750+
y_train = newsgroups_train.target
751+
y_test = newsgroups_test.target
752+
753+
text_clf = Pipeline([('vect', CountVectorizer()),
754+
('tfidf', TfidfTransformer()),
755+
('clf', GradientBoostingClassifier(n_estimators=100)),
756+
])
757+
758+
text_clf.fit(X_train, y_train)
759+
760+
761+
predicted = text_clf.predict(X_test)
762+
763+
print(metrics.classification_report(y_test, predicted))
764+
765+
766+
Output:
767+
768+
.. code:: python
769+
770+
771+
729772
-------
730773
Bagging
731774
-------
732775

776+
.. image:: docs/pic/Bagging.PNG
777+
778+
779+
.. code:: python
780+
781+
from sklearn.ensemble import BaggingClassifier
782+
from sklearn.neighbors import KNeighborsClassifier
783+
from sklearn.pipeline import Pipeline
784+
from sklearn import metrics
785+
from sklearn.feature_extraction.text import CountVectorizer
786+
from sklearn.feature_extraction.text import TfidfTransformer
787+
from sklearn.datasets import fetch_20newsgroups
788+
789+
newsgroups_train = fetch_20newsgroups(subset='train')
790+
newsgroups_test = fetch_20newsgroups(subset='test')
791+
X_train = newsgroups_train.data
792+
X_test = newsgroups_test.data
793+
y_train = newsgroups_train.target
794+
y_test = newsgroups_test.target
795+
796+
text_clf = Pipeline([('vect', CountVectorizer()),
797+
('tfidf', TfidfTransformer()),
798+
('clf', BaggingClassifier(KNeighborsClassifier())),
799+
])
800+
801+
text_clf.fit(X_train, y_train)
802+
803+
804+
predicted = text_clf.predict(X_test)
805+
806+
print(metrics.classification_report(y_test, predicted))
807+
808+
809+
Output:
810+
811+
.. code:: python
812+
813+
precision recall f1-score support
814+
0 0.57 0.74 0.65 319
815+
1 0.60 0.56 0.58 389
816+
2 0.62 0.54 0.58 394
817+
3 0.54 0.57 0.55 392
818+
4 0.63 0.54 0.58 385
819+
5 0.68 0.62 0.65 395
820+
6 0.55 0.46 0.50 390
821+
7 0.77 0.67 0.72 396
822+
8 0.79 0.82 0.80 398
823+
9 0.74 0.77 0.76 397
824+
10 0.81 0.86 0.83 399
825+
11 0.74 0.85 0.79 396
826+
12 0.67 0.49 0.57 393
827+
13 0.78 0.51 0.62 396
828+
14 0.76 0.78 0.77 394
829+
15 0.71 0.81 0.76 398
830+
16 0.73 0.73 0.73 364
831+
17 0.64 0.79 0.71 376
832+
18 0.45 0.69 0.54 310
833+
19 0.61 0.54 0.57 251
834+
835+
avg / total 0.67 0.67 0.67 7532
836+
837+
733838
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
734839
Logistic Regression
735840
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)