Contact Us

Name

Email *

Message *

Thursday 18 December 2014

Data and Text Mining - sentimentsentences dataset

Competency 8.1

Using LightSIDE, sentiment_sentences.csv dataset is loaded and extracted using Unigram feature space on the Feature Extraction panel, and then using Logistic Regression and a 10 fold cross-validation to run experiment.

The experiment is aimed at counting positive and negative words in a review taken from sentiment_sentences dataset to know if the overal review is a good or bad one.


The Model evaluation metric are shown below:

Accuracy = 75.9%

Kappa =  .52


Competency 8.2

To properly leverage on the positive and negative words in the review we added more capabilities to the basic feature extractor such as bigrams and trigrams in our model along with Unigrams. The Model evaluation metric below is slightly better than the baseline model with just unigrams as shown above. 

Accuracy =  76.6%
Kappa = .53

Competency 8.3
When we set the number of features to 3500 we got an accuracy of 76.9% and a kappa of .54% 


Competency 8.5

Using another text category (Movie Reviews.csv) dataset configuring basic features such as Unigrams, Bigrams, Trigrams and punctuation we got an accuracy of  76.3% and a kappa of .45
  





  
  

No comments:

Post a Comment