در حال بارگذاری، لطفاً صبر کنید...

Sequential EXtreme Gradient Boosting-Based Descriptor Reduction for Size Prediction of Zwitterionic Polymer-Based Nanoparticles

Sima Rezvantalab, Sara Mihandoost, Roger M Pallares, Fabian Kiessling
2025/7/31, ACS omega, [Citation Link]

An approach based on machine learning (ML) is applied to investigate how structural descriptors affect the size of zwitterionic polymers (ZP) used in drug delivery systems (DDSs). This study examined the structural characteristics of components of ZP-based DDSs to determine how they contribute to the self-assembly phenomenon and, consequently, the final size of such DDSs. A new data set was curated through an extensive literature review, incorporating a wide range of descriptors related to molecular structures and pH. With each molecule initially characterized by 312 descriptors, reducing this extensive set to a manageable number emerged as a crucial step in the research process. To tackle this challenge, we introduced an innovative descriptor reduction strategy based on eXtreme Gradient Boosting (XGB), which we termed Sequential XGB (SXGB). This method was designed to systematically and efficiently identify the most significant descriptors while minimizing information loss. After applying the SXGB reduction process, the data set was streamlined to just 11 key descriptors, which were then used for further analysis. Among these, pH stood out as the most influential parameter, playing a pivotal role in the outcomes of the study. Furthermore, Local Interpretable Model-agnostic Explanations (LIME) were employed to interpret a set of six local samples, allowing for the analysis of the influence exerted by each chosen descriptor. To enhance the robustness and generalizability of the model, we implemented a data augmentation process that combined both synthetic and original data sets for training and testing. This approach ensured that the model was exposed to a diverse range of scenarios, improving its ability to handle variability and uncertainty. In comparison with other evaluated models (supporting vector regression, random forest), the SXGB model demonstrated excellent performance in predicting zwitterionic polymer-based nanoparticle (NPs) size, achieving an accuracy (R2) of 84.2% and 80.9% in the training and testing sections, respectively.

---