Page 28 - PowerPoint Presentation
P. 28
1 International Congress of Artificial Intelligence
st
in Medical Sciences Posters
Using Machine Learning Models to Evaluate the Need for COVID-19
Vaccination
A. Ziaee1, SM. Tabatabaei2, SH. Alavi1, A. Asghari3, M. Ziaee3, F Osmani3, AS. Pagheh
*
1 Mashhad University of Medical Sciences, Mashhad, Iran.
2 Department of Medical Informatics, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Background and aims: In the wake of the ongoing COVID-19 pandemic, Artificial Intelligence
(AI) is gaining much attention, and one of its practical fields is Machine Learning. While today
vaccine adherence is high, there was a time at the start of the COVID-19 pandemic when many
people did not trust vaccines and believed that once they were infected with COVID-19, there was
no need for vaccination. Still, there is evidence stating that COVID-19 antibodies will not stay
positive permanently, and there is a vital need for a booster vaccine. This study aims to develop
a pilot model using machine learning methods in order to predict if unvaccinated patients’ serum
IgG antibodies are sufficient or if there is a need for a vaccine without a laboratory test.
Method: This study used symptoms and demographic data of 206 confirmed COVID-19 patients
whose COVID-19-specific serum IgG was measured, and months passed since COVID-19 in-
fection was recorded and added as a variable. Data was gathered from January to October 2021,
before vaccination initiation in Iran. Data were preprocessed and cleaned, important features were
selected, and serum IgG amount was transformed into a binary variable based on the 1.20 cutoff.
This variable was later used to be predicated. Data were randomly split into train and test groups
with proportions of 20% and 80%; 5-fold cross-validation using models including Random For-
est, Support Vector Machine, Neural Network, Naïve Bayes, and XGBoost was conducted, and
they were evaluated and compared; the one with the best results was selected. Models were de-
ployed in R Studio software using packages including “randomForest”, “caret”, “e1071”, “neu-
ralnet”, “naivebayes”, and “xgboost”, and evaluation metrics were recorded. The model was later
exported and uploaded to the GitHub repository for analysis reuse.
Results: The train and test set included 162 and 44 samples, respectively. Features that had been
selected included Gender, Age, Hospitalization, time that had passed since infection, urban or
rural living area, education level, occupation, chronic disease, fever, headache, cough, malaise,
restlessness, sore throat, bone pain, conjunctivitis, anosmia, loss of taste sense, sweating, nausea,
vomiting, stomachache, diarrhea, chest pain, dyspnea, history of covid infection in family mem-
bers, and disease severity. The reported accuracy for the Random Forest, SVM, Decision Tree,
Neural Network, Naïve Bayes, and XGBoost were 0.8409, 0.7955, 0.6818, 0.7045, 0.6818 and
0.7727, respectively. The Random Forest model 5-fold with its default settings (number of trees
500, features per split 4) reported accuracy was 0.8409 (95% CI of 0.6993, 0.9336); model sen-
sitivity and specificity were 0.7692 and 0.8710, respectively. The recall was 0.7143, a negative
predictive value of 0.9000 was reported, and a ROC plot was drawn.
Conclusion: The Random Forest model showed satisfactory and exciting results, such as the
importance of occupation in the longevity of COVID-19 serum sufficient presence. A model was
provided for predicting the need for vaccination in unvaccinated individuals infected with COV-
ID-19; this study may serve as a stepping stone toward determining if vaccine booster doses need
to be administered based on the time since the last vaccination or infection.
Keywords: Machine Learning; COVID-19; Immunoglobulin G; Vaccine
6