A Sting Algorithm and Multi-dimensional Vectors Used for English Sentiment Classification in a Distributed System
Ngày
2017-12-20
Tác giả
Vo, Ngoc Phu
Vo, Thi Ngoc Tran
Tên Tạp chí
Tạp chí ISSN
Nhan đề tập
Nhà xuất bản
Trường Đại học Nguyễn Tất Thành (Tạp chí khoa học công nghệ NTT)
Giấy phép
Tóm tắt
In this research, we have proposed a new model for Big Data sentiment classification in the parallel network environment - a Cloudera system with Hadoop Map (M) and Hadoop Reduce (R). Our new model has used a Statistical Information Grid Algorithm (STING) with multi-dimensional vector and 2,000,000 English documents of our English training data set for English document-level sentiment classification. Our new model can classify sentiment of millions of English documents based on many English documents in the parallel network environment. However, we tested our new model on our testing data set (including 1,000,000 English reviews, 500,000 positive and 500,000 negative) and achieved 83.92% accuracy
Mô tả
19 tr.
Từ khóa
Sentiment Classification , English Sentiment Classification , Opinion Mining