This final assignment is the last part of the Python for Text Analysis course. Now that you have learned the basics of the Python programming language, it's time to put your skills into practice and work on your own code project.
This is a group assignment in which you will work together with one other student. You can form your own team.
For this assignment, you will choose a classification task and a corresponding dataset (see complete description) for which you are asked to:
- download/obtain the data;
- split the dataset into train/test sets;
- read and process the files in your dataset;
- extract relevant statistics from those files;
- store the computed statistics in a useful format (e.g. CSV/TSV);
- present the statistics to the user by means of visualization;
- use the computed statistics as features for the classification task;
- save the predictions of your model on the test data in a separate file;
- BONUS: evaluate your system's accuracy on the test set.
| When? | What? |
|---|---|
| Friday 22 December 2017 (13:00) | Decision about the task & dataset (please inform us by e-mail) |
| Monday 29 January 2018 (15:30-17:15) | 5-minute presentation |
| Sunday 4 February 2018 (23:59) | Deadline submission final assignment |
| Weight | |
|---|---|
| Code Accuracy | 20 |
| Code Structure | 20 |
| Content & Features | 35 |
| Visualizations | 10 |
| Documentation | 10 |
| Presentation | 5 |
| BONUS | 5 |