TY - JOUR AU - Mounir, Amal Mohamed AU - Marie, Mohamed Ibrahim AU - Abd-Elhamid, Laila PY - 2024 TI - Big Data Framework for Predicting Infectious Diseases to Improve Healthcare by Discovering New Symptom Patterns JF - Journal of Computer Science VL - 20 IS - 10 DO - 10.3844/jcssp.2024.1251.1262 UR - https://thescipub.com/abstract/jcssp.2024.1251.1262 AB - The utilization of big data in infectious disease control represents a captivating opportunity, as these novel data streams offer the potential to enhance the timeliness of preventive measures. Various healthcare providers in both the public and private sectors generate, store, and analyse extensive datasets to enhance the quality of services they deliver. Recently, the outbreak of the new coronavirus, COVID-19, has posed significant threats to human health, life, production, social connections, and international relations, placing them in substantial peril. Consequently, the adoption of big data technologies has played a pivotal role in the response to the pandemic. Infectious diseases manifest when a person contracts a disease from a pathogen transmitted by another person, posing challenges that affect both individual and macroscales. Furthermore, the unknown patterns of infectious illnesses add complexity to the prediction process. This study aims to establish a big data framework for predicting infectious diseases by uncovering new patterns of symptoms, ultimately enhancing healthcare infection prevention and control. To achieve this objective, machine-learning algorithms such as K-Nearest Neighbors and Random Forest were employed for cleaning and maintaining extensive datasets collected from December 2019 to June 2020. Additionally, FP-growth and the Park, Chen, and Yu algorithms were applied to identify new patterns. The results demonstrated the superior performance of the Support Vector Machines (SVM) classifier, which achieved the highest accuracy of 98.2%. The Random Forest (RF) classifier had the highest precision (92.80%), and the SVM classifier had the highest F1 score (94.80%). Similarly, the Park, Chen, and Yu algorithm outperformed FP growth, achieving an accuracy rate of 98.5%. These findings underscore the potential of big data and machine learning in pattern recognition and predicting infectious diseases, ultimately contributing to improved public health outcomes.