Skip to content

Commit 259d931

Browse files
authored
Update README.rst
1 parent ef6e88a commit 259d931

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

README.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,16 @@ Text and Document Feature Extraction
3535
----
3636

3737

38+
Text feature extraction and pre-processing for classification algorithm is very significant. In this section, we start to talk about text cleaning which most of documents have a lot of noise. In this part we discuss about two main methods of text feature extractions which are word embedding and weighted word.
39+
40+
3841
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3942
Text Cleaning and Pre-processing
4043
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44+
45+
In Natural Language Processing (NLP), most of the text and document datasets contains many unnecessary words such as Stopwords, miss-spelling, slang, and etc. In this section, we briefly explain some techniques and method for text cleaning and pre-processing text datasets. In many algorithm, especially statistical and probabilistic learning algorithm, noise and unnecessary features could have bad effect on performance of the system, so one of the solution could be illumination and remove these features as pre-processing step.
46+
47+
4148
-------------
4249
Tokenization
4350
-------------

0 commit comments

Comments
 (0)