Investigation of Opinion Mining on Derived Twitter Data using Big Data Tools
Keywords:
Opinion Mining, Twitter Data, Sentiment Analysis, Hadoop ComponentAbstract
With speedy innovations and growing web population, petabytes of data area unit being generated each second. Process this monumental knowledge and analyzing may be a tedious method now-a-days. The quantity of information in period of time is growing rapidly. Nearly 80% of the info is in unstructured format. Analysis of unstructured knowledge in period of time may be a terribly difficult task. Existing traditional business intelligence (BI) tools perform best only in a pre-defined schema. In this paper, a solution has been proposed that fetches real time twitter data and stored into hadoop components. After storing, sentiment analysis has been performed on these data using big-data analytical tools like: Apache Flume, Apache hive and Apache pig. Finally, their performance comparison has been presented. The results and analysis done on the twitter data, which is shown with the help of tables, diagrams and snapshots, later the comparison is done between the tools on which the sentiment analysis has been done. And after that, this idea and conclusion gotten that pig runs faster and works in fewer map-reduce works compare to hive.
References
Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More – Matthew A. Russell.
G. Szabo, and B.A. Huberman, “Predicting the Popularity of Online Content,” Communications of the ACM, 2010, 53(8), pp. 80–88.
R. Mehrotra, S. Sanner, W. Buntine, and L. Xie, “Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling,” in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’13. New York, NY, USA: ACM, 2013, pp. 889–892.
E. Cunha, G. Magno, G. Comarela, V. Almeida, M.A. Goncalves, and F. Benevenuto, “Analyzing the Dynamic Evolution of Hashtags on Twitter: A Language-Based Approach,” in Proceedings of the Workshop on Language in Social Media (LSM 2011). Portland, Oregon: Association for Computational Linguistics, 2011, pp. 58–65.
“The Streaming APIs.” Twitter Developers. N.p., n.d. Web. 23 Oct. 2014.
Y. Wang, J. Liu, J. Qu, Y. Huang, J. Chen, and X. Feng, “Hashtag Graph Based Topic Model for Tweet Mining,” in Data Mining (ICDM), 2014 IEEE International Conference on, Dec. 2014, pp. 1025–1030.
H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a Social Network or a News Media?”, in Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 591–600.
McKinsey, Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey & Company, 2011. Available: http://www.mckinsey.com/.
Sagiroglu, S., & Sinanc, D., “Big Data: A Review,” IEEE International Conference on Collaboration Technologies and Systems (CTS), 2013, pp. 42–47.
K.W. Lim and W. Buntine, “Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon,” in Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, ser. CIKM ’14. New York, NY, USA: ACM, 2014, pp. 1319–1328.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.