The Common Cold of Mental Disabilities
16% of Americans will experience major depression at some point in their lives.
In its most severe form, depression is the leading cause of disability in the world and is the leading risk factor for suicide.
In its most severe form, depression is the leading cause of disability in the world and is the leading risk factor for suicide.
Our Study
Recently, the Center for Behavioral Intervention Technologies at Northwestern University collected passive mobile phone data from 208 people over about six weeks. They wanted to use the data to detect the presence of depression in any given person, but they did not achieve good predictive power. We took analysis of the data in a different direction: we tried to predict whether a given day is a workday or not for a given person based on the data collected from their phone. This has potential applications for mHealth as a whole.
Using the phone sensor data, we created 129 features that may differ based on whether or not an individual was at work versus at home. For example, we found the percentage of time someone drives during the day because they may have to drive for longer periods of time to get to work. Daily self-reports indicated whether or not it was a workday and as a result provided labels.
We then compared the accuracies for different classifiers to find the one that best fit the dataset and the extracted features. Random forest with 150 trees consistently reported the highest accuracy. With this information, we tried multiple methods with the random forest classifier to continue to improve our results:
We then compared the accuracies for different classifiers to find the one that best fit the dataset and the extracted features. Random forest with 150 trees consistently reported the highest accuracy. With this information, we tried multiple methods with the random forest classifier to continue to improve our results:
- Removing partial workday
- Converting partial workday to workday
- Removing Features that ranked low in the random tree classifier
- Personalized model for each participant: We took all of the data for an individual and trained on 66\% of the data and tested on the rest
- Clustering based on:
- Days of participation: We divided the participants into quartiles based on the number of days they participated in the study so the same number of people would be distributed in each. We then ran each bin through the classifier.
- Missing data: Many individuals were missing data for WiFi, light, and applications. We then ran the files that had no missing data versus only missing data for each category through the classifier.
- Number of phone touches in a day: We divided the participants into quartiles based on how often they touched their phone so the same number of people would be distributed in each. We then ran each bin through the classifier.
- Age: We divided the participants into quartiles based on the year they were born so the same number of people would be distributed in each. We then ran each bin through the classifier.
- Gender: We ran the classifier on only females and then only males.
- Mental State: We create four files based on if the individual was a control, anxious, depressed, or both anxious and depressed. We than ran the classifier on all of these.
Results.
We achieved moderately good results for the detection of work day based on mobile phone sensor data. See our paper for specific results.
Limitations.
1. The CS120 dataset may not generalize to the general American population. The dataset may potentially be biased since 80% of participants are women. The study also took place at different times for different participants. For some participants, data collection took place during the holiday season, and as a result people’s schedules and habits are less representative of their lifestyles as a whole throughout a given year.
2. Many participants had missing sensor data. We found much greater accuracy when classifying data from participants that did not have missing data; however, very few participants had the complete set of data, and so this was probably overfitting.
3. More data is needed to better detect mental health conditions from phone sensor data.
2. Many participants had missing sensor data. We found much greater accuracy when classifying data from participants that did not have missing data; however, very few participants had the complete set of data, and so this was probably overfitting.
3. More data is needed to better detect mental health conditions from phone sensor data.
Special Thanks.
This project would not be possible without the help and dedication of the following people. Thank you so much!
- CBITS, Dr. Sohrab Saeb
- CBITS, Dr. Emily Lattie
- CBITS, Dr. David Mohr
- HABits Lab, Professor Nabil Alshurafa
- Northwestern TA: Shibo Zhang
- Northwestern EECS, McCormick Engineering