Background On average, humans spend about 33% of each day sleeping. The recommended number of hours of sleep is 7 hours in a 24 hour period.^1^ Good sleep is not only vital to our day-to-day functioning but also allows us to lead healthy lives. In the US, 50-70 million people suffer from insomnia.^2^ This dashboard uses the 2018 NHANES data to explore the different variables that may have an effect on self-reported problems with sleeping.^3^
Choosing the best models Top variables that affect sleep were identified using variable importance plots generated by Random Forest models. R markdown file that goes from NHANES data to an analysis-ready data, and the final data can be found here.
Several classification models were trained and tested and models that showed the highest AUC (max = 0.61) were then chosen to be incorporated into the app. The link to the code can be found here.
Using the App In this app, users can explore the data and run a logistic regression model to learn about these variables in depth. Furthermore, The regularized logistic regression and support vector machines radial basis function kernel incorporated in the app utilize a tidy model workflow under the hood. These workflows train on the dataset in the app by applying data preprocessing steps, fit the models, and show predictions. Please note this app does not use test data.
In exploratory data analysis, users can explore the relationships between variables by changing the "X", "Y", and "Color by group" input. The logistic regression tab uses the user input data to predict the probability of reporting sleep problems given the information on age, BMI, mental and physical health, and a few laboratory values.
The logistic regression tab uses a general logistic regression model to predict of the probability of a patient reporting sleeping problems. The probability of reporting sleep problems will appear on the upper left panel. In the upper right, a figure shows how chosen variable affects the sleeping problem probability. The lower left figure presents the importance of the variables. The lower right figure presents variables coefficients of general logistic regression model.
Finally, in the machine learning models, regularized logistic regression used a lasso penalty for feature selection of the top three predictors and support vector machines used cost = 32 rbf_sigma = 1e-05 and showed the same variable importance as regularized regression. The top two panels react to user input and show local predictions in term of probability and a breakdown showing the contribution of each variable while the lower two panels show the variable importance plots and confusion matrix showing proportions of responses mis-classified by the model.
References
[1] Watson NF, Badr MS, Belenky G, et al.; Consensus Conference Panel. Joint consensus statement of the American Academy of Sleep Medicine and Sleep Research Society on the recommended amount of sleep for a healthy adult: methodology and discussion. Sleep. 2015;38:1161–1183.
[2] Ford ES, Wheaton AG, Cunningham TJ, Giles WH, Chapman DP, Croft JB. Trends in outpatient visits for insomnia, sleep apnea, and prescriptions for sleep medications among US adults: findings from the National Ambulatory Medical Care survey 1999–2010. Sleep. 2014;37(8): 1283–1293.
[3] Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2018.
[4] Packages: flexdashboard, shiny, tidyverse, ggiraphExtra, plotly, DALEX and DALEXtra, tidymodels, jtools, vip
"Sleep is the swiss-army knife of health. When sleep is deficient, there is sickness and disease. And when sleep is abundant, there is vitality and health."
- Matthew Walker, PhD
Disclaimer This app was created as a project for Advanced Data Science for Biomedical Engineering (EN580.464/EN580.664)/Advanced Data Science for Public Health (PH140.628/PH140.629) and is NOT validated. If you have any concerns relating to your sleep, please consult an actual healthcare provider. This app does it reflect the views of the Johns Hopkins University, the Johns Hopkins Hospital, or any of their affiliates.
Authors Briha Ansari, MD., data cleaning, machine learning Feng-Chiao Lee, logistic regression Tim Lee, MD., EDA, flexdashboard, final compilation