Microfluidics enables high-throughput and precise droplet generation in immiscible fluids. However, fully understanding droplet generation rules necessitates an in-depth analysis of the influential factors and their interactions. Previous semi-empirical correlations and machine learning (ML) models often neglect the importance of inputting all relevant features, including microchannel geometries, flow conditions, and fluid properties. To address this problem, we compiled an extensive dataset containing over 1800 experimental data points across 39 distinct microchannels and 31 biphasic fluid systems. We then developed robust tree-based machine learning (ML) models, notably XGBoost and stacking models, achieving superior prediction accuracy (R2 >0.98, RMSE<0.03). These models streamline the generation of user-specific droplets, reducing the reliance on extensive expertise and iterative experiments. Furthermore, Shapley Additive Explanations (SHAP) elucidates the underlying microfluidic mechanisms, especially revealing the 22.9% contribution of geometric features to the prediction, which was underestimated previously. This ML prediction + SHAP approach highlights the potential of data-mechanism dual-driven modeling in advancing fundamental microfluidic research.
This research is supported by National Natural Science Foundation of China (21991104), Tsinghua University Initiative Scientific Research Program (20233080063), and Tsinghua-Sinopec Green Chemical Joint Research Institute Grant (20212930034).