Performance Analysis of Big Data Parallel Processing using Message Passing Interface
Keywords:
Big Data, Data Mining, Transactional database, FP-Growth, MPI, ParallelizationAbstract
This paper presents an insight into the usage of Message Passing Interface for clustering of big data system. We processed parallel mining of big data frequent item set in transactional database of big data through MPI (Message Passing Interface) and MPI has been used here in distributed environment. Basically two algorithms used for mining in Distributed computing such as FP Growth and Apriori. FP Growth Algorithm has been parallelised through the use of MPI. FP Growth Algorithm is used for extracting significant data out of bulk of data. i.e. Data Mining. We have applied FP growth algorithm sequential as well as parallel and see the significant changes in time taken to mine the transaction. In parallel way, we used MPI to perform message communication. A significant feature is the time elapsed for processing. It draws out a comparison between time spent in parallel computing and single processor computing. On other hand, parallel algorithm has also been checked on performance basis.
References
F. Z. Benjelloun, A. A. Lahcen, and S. Belfkih, “An overview of big data opportunities, applications and tools,” in Intelligent Systems and Computer Vision (ISCV), 2015, March 2015, pp. 1–6.
H. Eridaputra, B. Hendradjaya, and W. D. Sunindyo, “Modeling the requirements for big data application using goal oriented approach,” in Data and Software Engineering (ICODSE), 2014 International Conference on, Nov 2014, pp. 1–6.
H. Amir and R. Asim, “The emerging era of big data analytics,” Big Data Analytics, vol. 1, no. 1, pp. 1–2, 2016.
J.-L. Monino, “Data value, big data analytics, and decision making,” Journal of the Knowledge Economy, pp. 1–12, 2016.
J. Ding, D. Zhang, and X. H. Hu, “A framework for ensuring the quality of a big data service,” in 2016 IEEE International Conference on Services Computing (SCC), June 2016, pp. 82–89.
L. Cao, “Data science and analytics: a new era,” International Journal of Data Science and Analytics, vol. 1, no. 1, pp. 1–2, 2016.
L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross, “Hardware partitioning for big data analytics,” IEEE Micro, vol. 34, no. 3, pp. 109–119, May 2014.
P. Koutsourelakis, N. Zabaras, and M. Girolami, “The features, hardware, and architectures of data center networks: A survey,” Journal of Parallel and Distributed Computing, vol. 96, pp. 45–74, October 2016.
P. Bogdan, “Workload modeling and its implications on data center-on-a-chip optimization: From mathematical models to control algorithms,” in 2015 20th International Conference on Control Systems and Computer Science, May 2015, pp. 1001–1001.
J. Tang and C. Liu, “An energy and memory trade-off study on resource constrained embedded jvm,” in 2014 43rd International Conference on Parallel Processing Workshops, Sept 2014, pp. 448–452.
S. P. Menon and N. P. Hegde, “A survey of tools and applications in big data,” in Intelligent Systems and Control (ISCO), 2015 IEEE 9th International Conference on, Jan 2015, pp. 1–7.
J. S. Saltz, “The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness,” in Big Data (Big Data), 2015 IEEE International Conference on, Oct 2015, pp. 2066–2071.
J. Rekha and R. Parvathi, “Survey on software project risks and big data analytics,” Procedia Computer Science, vol. 50, pp. 295–300, 2015.
G. Iuhasz and I. Dragan, “An overview of monitoring tools for big data and cloud applications,” in 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Sept 2015, pp. 363–366.
S. Gilbert and N. Lynch, “Perspectives on the cap theorem,” Computer, vol. 45, no. 2, pp. 30–36, Feb 2012.
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s highly available key-value store,” in Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, ser. SOSP ’07. New York, NY, USA: ACM, 2007, pp. 205–220. [Online]. Available: http://doi.acm.org/10.1145/1294261.1294281
A. Lakshman and P. Malik, “Cassandra: A structured storage system on a p2p network,” in Proceedings of the Twentyfirst Annual Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’09. New York, NY, USA: ACM, 2009, pp. 47–47. [Online]. Available: http://doi.acm.org/10.1145/1583991.1584009
C. K. Emani, N. Cullot, and C. Nicolle, “Understandable big data: A survey,” Computer Science Review, vol. 17, pp. 70–81, August 2015.
M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171–209, 2014.
J. Ullman, “Mapreduce algorithms,” in Proceedings of the 2Nd IKDD Conference on Data Sciences, ser. CODS-IKDD ’15. New York, NY, USA: ACM, 2015, pp. 1:1–1:1. [Online]. Available: http://doi.acm.org/10.1145/2778865.2778866
S. Bende and R. Shedge, “Dealing with small files problem in hadoop distributed file system,” Procedia Computer Science, vol. 79, pp. 1001–1012, 2016.
M. Nagao and H. Seki, “Towards parallel mining of closed patterns from multi-relational data,” in 2015 IEEE 8th International Workshop on Computational Intelligence and Applications (IWCIA), Nov 2015, pp. 103–108.
L. Xu and Z. Yun, “A novel parallel algorithm for frequent itemset mining of incremental dataset,” in Information Science and Control Engineering (ICISCE), 2015 2nd International Conference on, April 2015, pp. 41–44.
T. Wen, G. Wang, Q. Guo, and X. Ma, “An optimal association rule mining algorithm based on knowledge grid,” in Fuzzy Systems and Knowledge Discovery, 2008. FSKD ’08. Fifth International Conference on, vol. 2, Oct 2008, pp. 572–575.
T. Marschall and S. Rahmann, “An algorithm to compute the character access count distribution for pattern matching algorithms,” Algorithms, vol. 4, no. 4, p. 285, 2011.
J. Zhou, W. Xie, J. Noble, K. Echo, and Y. Chen, “Suora: A scalable and uniform data distribution algorithm for heterogeneous storage systems,” in 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), Aug 2016, pp. 1–10.
C. X. C. Xu, F. Y. F. Yu, Z. D. Z. Dai, G. Y. G. Yue, and R. L. R. Li, “Data distribution algorithm of high-speed intrusion detection system based on network processor,” in Semantics, Knowledge and Grid, 2006. SKG ’06. Second International Conference on, Nov 2006, pp. 27–27.
L. Troiano, G. Scibelli, and C. Birtolo, “A fast algorithm for mining rare itemsets,” in 2009 Ninth International Conference on Intelligent Systems Design and Applications, Nov 2009, pp. 1149–1155.
S. Shankar, N. Babu, T. Purusothaman, and S. Jayanthi, “A fast algorithm for mining high utility itemsets,” in Advance Computing Conference, 2009. IACC 2009. IEEE International, March 2009, pp. 1459–1464.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.