This documentation explains the machine learning architecture, preprocessing pipeline, optimization strategy, and the reasoning behind simplifying healthcare features for both predictive performance and end-user usability.
The prediction system uses Logistic Regression with One-vs-Rest (OvR) because the task involves multi-class classification across patient readmission risk categories.
Logistic Regression was selected because it provides strong interpretability, stable behavior on structured healthcare data, and reliable probabilistic outputs.
Linear Regression predicts continuous numerical values, while this project focuses on categorical risk classification. Logistic Regression is therefore significantly more suitable for healthcare prediction categories.
Binary Encoding was used to efficiently transform categorical variables into numerical form while minimizing unnecessary feature expansion.
RandomizedSearchCV was used for efficient hyperparameter tuning to identify stronger-performing parameter combinations without exhaustive computational search.
One important design decision in this project was simplifying raw clinical variables into cleaner, more meaningful feature groups.
Healthcare datasets often contain fragmented, inconsistent, or highly detailed medical values. Directly using raw features can increase complexity, dimensionality, and reduce interpretability.
The objective was not only improving predictive performance, but also creating a healthcare-oriented AI system that remains understandable, structured, and usable for real-world interaction.