Skip to content

Commit ef6e88a

Browse files
authored
Update README.rst
1 parent 8294314 commit ef6e88a

1 file changed

Lines changed: 19 additions & 0 deletions

File tree

README.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,6 +489,7 @@ Text Classification Techniques
489489
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
490490
Rocchio classification
491491
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
492+
492493
The first version of Rocchio algorithm is introduced by rocchio in 1971 to use relevance feedback in querying full-text databases. Since then many researchers addressed and developed this technique for text and document classification. This method uses TF-IDF weights for each informative word instead of a set of Boolean features. Using a training set of documents, Rocchio's algorithm builds a prototype vector for each class which is an average vector over all training document vectors that belongs to a certain class. Then, it will assign each test document to a class with maximum similarity that between test document and each of prototype vectors.
493494

494495

@@ -584,6 +585,24 @@ Support Vector Machine (SVM)
584585
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
585586

586587

588+
The original version of SVM was introduced by Vapnik and Chervonenkis in 1963. The early 1990s, nonlinear version was addressed by BE. Boser et al.. Original version of SVM was designed for binary classification problem, but Many researchers work on multi-class problem using this authoritative technique.
589+
590+
591+
The advantages of support vector machines are based on scikit-learn page:
592+
593+
* Effective in high dimensional spaces.
594+
* Still effective in cases where number of dimensions is greater than the number of samples.
595+
* Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
596+
* Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
597+
598+
599+
The disadvantages of support vector machines include:
600+
601+
* If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
602+
* SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).
603+
604+
605+
587606
.. image:: docs/pic/SVM.png
588607

589608

0 commit comments

Comments
 (0)