Data Mining to Databases: A Review Paper
Keywords:
Preprocessing, Clustering, Classification, Association Rule Mining, Anomaly DetectionAbstract
The modern era has witnessed a rapid surge in data generation, prompting the need for advanced analytical methods to efficiently process, analyze, and extract value from large volumes of data. In this context, this study aims to integrate data mining techniques into database systems to bolster their analytical capabilities and streamline decision-making processes. We focus on five key aspects: preprocessing, clustering, classification, association rule mining, and anomaly detection. Preprocessing ensures data quality and consistency, eliminating noise, and handling missing values. Clustering groups similar data points based on their attributes, facilitating pattern recognition and data segmentation. Classification categorizes data into predefined classes, enabling predictive modeling and improved understanding of relationships among data points. Association rule mining identifies frequent itemsets and generates rules to uncover relationships among variables, supporting business intelligence and decision-making.
References
Agrawal, R., and Psaila, G. (1995). Active Data Mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), 3–8. Menlo Park, Calif.: American Association for Artificial Intelligence.
Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.; and Verkamo, I. (1996). Fast Discovery of Association Rules. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 307–328. Menlo Park, Calif.: AAAI Press.
Apte, C., and Hong, S. J. (1996). Predicting Equity Returns from Securities Data with Minimal Rule Generation. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 514–560. Menlo Park, Calif.: AAAI Press.
Basseville, M., and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Englewood Cliffs, N.J.: Prentice Hall.
Berndt, D., and Clifford, J. (1996). Finding Patterns in Time Series: A Dynamic Programming Approach. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 229–248. Menlo Park, Calif.: AAAI Press.
Berry, J. (1994). Database Marketing. Business Week, September 5, 56–62.
Brachman, R., and Anand, T. (1996). The Process of Knowledge Discovery in Databases: A Human-Centered Approach. In Advances in Knowledge Discovery and Data Mining, 37–58, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Menlo Park, Calif.: AAAI Press.
Breiman, L.; Friedman, J. H.; Olshen, R. A.; and Stone, C. J. (1984). Classification and Regression Trees. Belmont, Calif.: Wadsworth.
Brodley, C. E., and Smyth, P. (1996). Applying Classification Algorithms in Practice. Statistics and Computing. Forthcoming.
Buntine, W. (1996). Graphical Models for Discovering Knowledge. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 59–82. Menlo Park, Calif.: AAAI Press.
Cheeseman, P. (1990). On Finding the Most Probable Model. In Computational Models of Scientific Discovery and Theory Formation, eds. J. Shrager and P. Langley, 73–95. San Francisco, Calif.: Morgan Kaufmann.
Cheeseman, P., and Stutz, J. (1996). Bayesian Classification (AUTOCLASS): Theory and Results. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 73–95. Menlo Park, Calif.: AAAI Press.
Cheng, B., and Titterington, D. M. (1994). Neural Networks—A Review from a Statistical Perspective. Statistical Science, 9(1), 2–30.
Codd, E. F. (1993). Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate. E. F. Codd and Associates.
Dasarathy, B. V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Washington, D.C.: IEEE Computer Society.
Djoko, S.; Cook, D.; and Holder, L. (1995). Analyzing the Benefits of Domain Knowledge in Substructure Discovery. In Proceedings of KDD-95: First International Conference on Knowledge Discovery and Data Mining, 75–80. Menlo Park, Calif.: American Association for Artificial Intelligence.
Dzeroski, S. (1996). Inductive Logic Programming for Knowledge Discovery in Databases. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 59–82. Menlo Park, Calif.: AAAI Press.
Elder, J., and Pregibon, D. (1996). A Statistical Perspective on KDD. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 83–116. Menlo Park, Calif.: AAAI Press.
Etzioni, O. (1996). The World Wide Web: Quagmire or Gold Mine? Communications of the ACM (Special Issue on Data Mining). November 1996. Forthcoming.
Fayyad, U. M.; Djorgovski, S. G.; and Weir, N. (1996). From Digitized Images to On-Line Catalogs: Data Mining a Sky Survey. AI Magazine, 17(2), 51–66.
Fayyad, U. M.; Haussler, D.; and Stolorz, Z. (1996). KDD for Science Data Analysis: Issues and Examples. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 50–56. Menlo Park, Calif.: American Association for Artificial Intelligence.
Fayyad, U. M.; Piatetsky-Shapiro, G.; and Smyth, P. (1996). From Data Mining to Knowledge Discovery: An Overview. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1–30. Menlo Park, Calif.: AAAI Press.
Fayyad, U. M.; Piatetsky-Shapiro, G.; Smyth, P.; and Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. Menlo Park, Calif.: AAAI Press.
Friedman, J. H. (1989). Multivariate Adaptive Regression Splines. Annals of Statistics, 19, 1–141.
Geman, S.; Bienenstock, E.; and Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemma. Neural Computation, 4, 1–58.
Glymour, C.; Madigan, D.; Pregibon, D.; and Smyth, P. (1996). Statistics and Data Mining. Communications of the ACM (Special Issue on Data Mining). November 1996. Forthcoming.
Glymour, C.; Scheines, R.; Spirtes, P.; and Kelly, K. (1987). Discovering Causal Structure. New York: Academic.
Guyon, O.; Matic, N.; and Vapnik, N. (1996). Discovering Informative Patterns and Data Cleaning. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 181–204. Menlo Park, Calif.: AAAI Press.
Hall, J.; Mani, G.; and Barr, D. (1996). Applying Computational Intelligence to the Investment Process. In Proceedings of CIFER-96: Computational Intelligence in Financial Engineering. Washington, D.C.: IEEE Computer Society.
Hand, D. J. (1994). Deconstructing Statistical Questions. Journal of the Royal Statistical Society A, 157(3), 317–356.
Hand, D. J. (1981). Discrimination and Classification. Chichester, U.K.: Wiley.
Heckerman, D. (1996). Bayesian Networks for Knowledge Discovery. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 273–306. Menlo Park, Calif.: AAAI Press.
Hernandez, M., and Stolfo, S. (1995). The MERGE/PURGE Problem for Large Databases. In Proceedings of the 1995 ACM SIGMOD Conference, 127–138. New York: Association for Computing Machinery.
Holsheimer, M.; Kersten, M. L.; Mannila, H.; and Toivonen, H. (1996). Data Surveyor: Searching the Nuggets in Parallel. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 447–471. Menlo Park, Calif.: AAAI Press.
Horvitz, E., and Jensen, F. (1996). Proceedings of the Twelfth Conference of Uncertainty in Artificial Intelligence. San Mateo, Calif.: Morgan Kaufmann.
Jain, A. K., and Dubes, R. C. (1988). Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall.
Kloesgen, W. (1996). A Multipattern and Multistrategy Discovery Assistant. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 249–271. Menlo Park, Calif.: AAAI Press.
Kloesgen, W., and Zytkow, J. (1996). Knowledge Discovery in Databases Terminology. In Advances in Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 569–588. Menlo Park, Calif.: AAAI Press.
Kolodner, J. (1993). Case-Based Reasoning. San Francisco, Calif.: Morgan Kaufmann.
Langley, P., and Simon, H. A. (1995). Applications of Machine Learning and Rule. Communications of the ACM, 38, 55–64.
Downloads
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.