در حال بارگذاری، لطفاً صبر کنید...

A Mutual-Information-Guided and ADASYN-Augmented Machine Learning Framework for Early Prediction of Parkinson’s Disease

Ghadeer Aqil Ali, Leila Sharifi, Parviz Rashidi-Khazaee, Hossein Nahid-Titkanlue
2025, Management Strategies and Engineering Sciences,

Background and Objective: Early detection of Parkinson&rsquos disease (PD) is essential for timely medical intervention and improving patient outcomes. Speech signal analysis offers a non-invasive, cost-effective, and easily deployable diagnostic pathway. However, achieving reliable early prediction remains challenging due to data imbalance, redundant features, and model instability. This study aims to develop an optimized and robust machine learning framework that enhances the predictive accuracy and stability of PD detection from speech data.

Methods: An optimized machine learning model based on eXtreme Gradient Boosting (XGBoost) was developed for early PD prediction. The model&rsquos hyperparameters were tuned using the Tree-structured Parzen Estimator (TPE), while Mutual Information (MI) was employed to select the most informative features from the speech dataset. To address class imbalance, the Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN) was applied to generate synthetic minority samples. Model performance and stability were evaluated using ten independent runs of Stratified 10-Fold Cross-Validation (SCV).

Results: The proposed framework achieved superior predictive performance with an average accuracy of 97.27%, precision of 98.79%, F1-score of 97.18%, recall of 95.77%, and ROC-AUC of 98.11% across multiple evaluations. Comparative analysis with similar studies demonstrated improved robustness, reliability, and balance between sensitivity and specificity.

Conclusion: The integration of MI-based feature selection and ADASYN-based data augmentation significantly enhanced the performance and stability of the XGBoost model for early PD prediction. The proposed model demonstrates strong potential for clinical use as a decision support system, providing a low-cost, non-invasive, and remotely deployable tool for early PD diagnosis using patient speech signals.

---