Text genre classification project
Text classification is the automatic categorisation of texts based on features gathered from the text. Automatic text categorisation can classify texts in different types (i.e. scientific, news, story, etc.), according to the subject (football, cars, computers, etc.) or texts can be classified by genre (objective or subjective, positive or negative). Genre means here if the text is positive or negative about a certain topic. In this project texts are classified as positive or negative opinions. The classification is applied on movie reviews, game reviews, restaurant reviews and book reviews. A number of different feature sets are implemented to try to catch the opinion of a text. Then machine learning techniques are applied to train a classifier such as Naïve Bayes or decision tree learning to categorise the texts where a number of different features are extracted. Then the classifier is tested in another domain (i.e. a movie classifier is tested on the restaurant reviews) to look if its performance can be generalised.