Using LightSIDE, sentiment_sentences.csv dataset is loaded
and extracted using Unigram feature space on the Feature Extraction panel, and
then using Logistic Regression and a 10
fold cross-validation to run experiment.
The experiment is aimed at counting positive and negative
words in a review taken from sentiment_sentences dataset to know if the overal review
is a good or bad one.
The Model evaluation metric are shown below:
Accuracy = 75.9%
Kappa = .52
Competency 8.2
To properly leverage on the positive and negative words
in the review we added more capabilities to the basic feature extractor
such as bigrams and trigrams in our model along with Unigrams. The Model
evaluation metric below is slightly better than the baseline model with just
unigrams as shown above.
Accuracy = 76.6%
Kappa = .53
Competency 8.3
When we set the number of features to 3500 we got an
accuracy of 76.9% and a kappa of .54%
Competency 8.5
Using another text category (Movie Reviews.csv) dataset
configuring basic features such as Unigrams, Bigrams, Trigrams and punctuation
we got an accuracy of 76.3% and a kappa of .45
No comments:
Post a Comment