The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Site maintenance Monday, July 8th, 2024. Please note that access to some content and account information will be unavailable on this date.
Published Online:https://doi.org/10.1176/appi.prcp.20220015

Abstract

Objective

To evaluate if a machine learning approach can accurately predict antidepressant treatment outcome using electronic health records (EHRs) from patients with depression.

Method

This study examined 808 patients with depression at a New York City‐based outpatient mental health clinic between June 13, 2016 and June 22, 2020. Antidepressant treatment outcome was defined based on trend in depression symptom severity over time and was categorized as either “Recovering” or “Worsening” (i.e., non‐Recovering), measured by the slope of individual‐level Patient Health Questionnaire‐9 (PHQ‐9) score trajectory spanning 6 months following treatment initiation. A patient was designated as “Recovering” if the slope is less than 0 and as “Worsening” if the slope was no less than 0. Multiple machine learning (ML) models including L2 norm regularized Logistic Regression, Naive Bayes, Random Forest, and Gradient Boosting Decision Tree (GBDT) were used to predict treatment outcome based on additional data from EHRs, including demographics and diagnoses. Shapley Additive Explanations were applied to identify the most important predictors.

Results

The GBDT achieved the best results of predicting “Recovering” (AUC: 0.7654 ± 0.0227; precision: 0.6002 ± 0.0215; recall: 0.5131 ± 0.0336). When excluding patients with low PHQ‐9 scores (<10) at baseline, the results of predicting “Recovering” (AUC: 0.7254 ± 0.0218; precision: 0.5392 ± 0.0437; recall: 0.4431 ± 0.0513) were obtained. Prior diagnosis of anxiety, psychotherapy, recurrent depression, and baseline depression symptom severity were strong predictors.

Conclusions

The results demonstrate the potential utility of using ML in longitudinal EHRs to predict antidepressant treatment outcome. Our predictive tool holds the promise to accelerate personalized medical management in patients with psychiatric illnesses.

Highlights

  • Longitudinal questionnaire data were used to measure antidepressant treatment outcome.

  • Machine learning models were used to predict outcome from electronic health records.

  • The gradient boosting decision tree model achieved the best predictive results.

  • Diagnostic codes and baseline severity were strong predictors of treatment outcome.

Depression is one of the most prevalent psychiatric disorders, affecting approximately 14% of the global population (1). The economic costs resulting from depression are staggering and have become the second contributor to disease burden (2). While antidepressants are commonly prescribed to patients suffering from depression (3), due to the complex etiology and heterogeneous symptomatology of depression, prior studies suggest that antidepressant treatment efficacy is usually low, with as few as 11–30% of depressed patients obtaining remission after initial treatment (4). The use of prediction tools in areas of medicine such as oncology, cardiology, and radiology has played an important role in the clinical decision‐making (5, 6), suggesting the potential utility for such tools in predicting antidepressant treatment efficacy.

Recent studies have assessed the prediction of antidepressant treatment outcomes such as responder or the achievement of remission using brain images, social status, and electronic health records (EHRs). In particular, in studies using functional magnetic resonance imaging (fMRI) data (6, 7, 8, 9), the mean activation and differential response were analyzed for case and control groups, and specific observations were found to be predictive of the response to antidepressant treatment. However, the cost and time associated with collecting and processing fMRI data may hinder the use of the approach in a practical manner (4). Previous studies also used patient self‐report data, including socioeconomic status (4, 10, 11), to evaluate whether patients would achieve symptomatic remission. However, self‐reported social information may be less precise as an outcome measure and prone to nonresponse bias.

In addition, prior studies based on EHRs mainly extracted feature information from patients' medical records (10, 12, 13, 14), such as medication dose information, to predict treatment dropout or remission after receiving antidepressants. However, these studies did not consider patients' baseline depression severity and other clinical data such as diagnostic codes in prediction model development. Furthermore, most of the previous studies defined the outcome based on a behavioral assessment summarized in a numerical variable (e.g. Montgomery‐Asberg Depression Rating Scale (MADRS) score or Patient Health Questionnaire‐9 (PHQ‐9) score) (6), which only considered the baseline and final scores of a specified treatment period, without taking the variations in patients' self‐report scores and severity of depression over that time interval into account.

This study applies a data‐driven approach to address these limitations and seeks to evaluate trends in depressive symptom severity over time and evaluate the ability to use EHRs and machine learning methods to predict antidepressant treatment outcome. In particular,

(1)

We defined antidepressant treatment outcome by a slope fit of all PHQ‐9 scores across a treatment period using a linear regression model. Compared to the aforementioned studies which only considered the difference between the PHQ‐9 scores at baseline and the end of a treatment period, our method involving repeated measures takes into account the intermediate changes in the PHQ‐9 scores across multiple time points. In this way, we can capture fine‐grained PHQ‐9 scores changes over time and further measure treatment outcome accurately.

(2)

We incorporated baseline depression severity information and a range of EHR‐derived data types, including patient demographics, comorbidities, procedures, and prescription medications to train the predictive models.

(3)

We investigated the impact of longitudinal EHRs availability on predictive performance across different time periods. The most important features of predictive models were also identified and examined in this study.

MATERIALS AND METHODS

Dataset

Fully de‐identified study data were acquired from an outpatient behavioral health clinic at a major academic medical center in New York City whose mission is to facilitate access to outpatient mental health services for managed care patients. Patients who receive care at this clinic require a mental health referral from a primary care provider and, upon intake, undergo a strict screening assessment by a psychiatrist to confirm that this closely monitored setting is appropriate for their care needs. There were 3380 patients with PHQ‐9 scores in the dataset between June 13, 2016, and June 22, 2020. The PHQ‐9 is a multipurpose instrument for screening, monitoring, and measuring the severity of depression. While not a diagnostic instrument, the PHQ‐9 has been supported and used to define treatment outcome measures in prior studies (15, 16, 17). A total of 808 adult patients (≥18 years old) who received at least one antidepressant prescription (Supplementary eTable 1) were included in the final analysis (Cohort‐A). An advantage of this dataset is that close monitoring ensures continuity of care and antidepressants prescribed in this clinic are highly likely to be intended for their presenting mental health condition. In a sub‐analysis, we built a separate cohort (Cohort‐B) that consisted of 467 patients after removing patients with baseline PHQ‐9 scores less than 10 to account for the baseline severity of dependency (18). The exclusion cascade is shown in Supplementary eFigure 1. Additional details regarding the dataset are shown in the Supplemental files. IRB exemption was granted for use of this database for research.

Outcome

The primary outcome was established as the course of depression during the first 6 months following the prescription of an antidepressant, which for the benefit of clinical interpretability was categorized as either “Recovering” or “Worsening.” The outcome was measured using all PHQ‐9 scores recorded during a 6‐month period. The overall trend for each patient's PHQ‐9 score trajectory was represented by a slope, which was obtained using a linear regression model (19). This technique of using the slope on all PHQ‐9 scores to estimate the treatment outcome has the potential to capture the overall trend in a patient's symptom severity over time. A slope of a patient's PHQ‐9 score trajectory of less than 0 suggested a decline in the trajectory of depression over time. Thus, the treatment outcome was classified as “Recovering.” Otherwise, the outcome was classified as “Worsening.”

Study Design

Antidepressant treatment outcome was predicted based on patient demographics, baseline PHQ‐9 scores, comorbidities, procedures, and prescription medication data derived from the EHRs at the time of the index antidepressant prescription, censoring all subsequent data (Figure 1). In particular, the information during the observation window before time “t” (the time of the first antidepressant prescription) was used to predict the “Recovering” or “Worsening” outcome during the next 6 months following the antidepressant prescription. The labels “Recovering” or “Worsening” were determined based on the PHQ‐9 scores extracted between t and the prediction time t + T. Three experiments were conducted by setting different durations of the observation window, including 1 year, 2 years, and all years.

image

FIGURE 1. The framework for predicting antidepressant treatment outcome.

Feature Derivation

Features for building risk prediction models were extracted from patient demographics (age, sex, and race), baseline PHQ‐9 scores, medical and psychiatric comorbidities, procedures (i.e., interventions), and prescription medication orders. The comorbidity, procedure, and medication codes were extracted from the EHRs by counting the number of times that each of these codes appeared in a patient's past medical history. We considered only the features that could be found in the medical histories of at least 10 patients. For each diagnostic code (e.g., ICD‐10 codes A60.02, A60.04, A60.09), the first three characters (e.g., A60) were used to aggregate similar disease diagnostic codes together. Each patient was represented by a count vector based on their past diagnosis, procedures, and prescription medication history, which was combined with demographic and baseline PHQ‐9 scores for training predictive models. The counts of these codes are shown in Supplementary eTable 2.

Prediction Models and Metrics

We applied the popular machine learning (ML) models, including L2 norm regularized Logistic Regression (LR) (20), Naive Bayes (NB) (21), Random Forest (RF) (22), and Gradient Boosting Decision Tree (GBDT) were used to predict antidepressant treatment outcome. We used nested cross‐validation for each model (23), in which an outer 5‐fold‐cross‐validation was used to split the dataset into training data and testing data, and an inner 5‐fold‐cross‐validation was used on the training data for tuning the parameters. For LR, NB, and RF, we used the scikit‐learn software library (24). For GBDT, we utilized the XGBoost software library (25). The area under the receiver operating characteristic curve (AUC) was used to evaluate model performance. Precision and recall were also calculated for reference.

RESULTS

Study Cohort Characteristics

Data from 808 patients were analyzed, comprising 243 “Worsening” and 565 “Recovering” patients. The median age among these patients was 35 (IQR [29.0‐‐45.0]) years. Females composed 61.39% of the study cohort, and 53.47% were White. At baseline, the median PHQ‐9 score was 11 (IQR [7.0‐‐16.0]). On average, patients in the “Recovering” group scored three points higher at baseline compared to patients in the “Worsening” group. There were 341 (42.20%) patients exhibiting no or mild depression (PHQ‐9 ≤ 9), 209 (25.87%) with moderate depression (9 < PHQ‐9 ≤ 14), and 258 (31.93%) with moderately severe to severe depression (PHQ‐9 > 14) at baseline. In the “Worsening” group, there were more patients exhibiting no or mild depression (n = 139, 57.20%) at baseline. In the “Recovering” group, more patients had severe depression (n = 206, 36.46%) at baseline. Additional details about the study cohort are shown in Table 1.

TABLE 1. Characteristics of the study cohort.
CharacteristicAll patients“Recovering” group“Worsening” groupp value
Age, median (Q1‐Q3)35.0 (29.0–45.0)36.0 (30.0–45.0)34.0 (28.0–45.5)0.06
Gender, n (%)0.79
Female496 (61.39%)349 (61.77%)147 (60.49%)
Male312 (38.61%)216 (38.23%)96 (39.51%)
Race, n (%)0.98
White432 (53.47%)302 (53.45%)130 (53.50%)
Asian64 (7.92%)46 (8.14%)18 (7.41%)
Black51 (6.31%)36 (6.37%)15 (6.17%)
Other261 (32.30%)181 (32.04%)80 (32.92%)
Baseline PHQ‐9 score, median (Q1‐Q3)11.0 (7.0–16.0)12.0 (8.0–17.0)9.0 (5.0–13.0)<0.001
Baseline PHQ‐9 score category, n (%)<0.001
No or mild341 (42.20%)202 (35.75%)139 (57.20%)
Moderate209 (25.87%)157 (27.79%)52 (21.40%)
Severe258 (31.93%)206 (36.46%)52 (21.40%)
TABLE 1. Characteristics of the study cohort.
Enlarge table

Model Discrimination

Table 2 shows the AUC for the LR, RF, and GBDT models on different data collection window lengths. We observed that the incorporation of more longitudinal data contributes to improved prediction performance, as demonstrated by increases in the AUC. Gradient Boosting Decision Tree obtained better performance compared to LR, NB, and RF. Table 2 shows the best prediction performance for GBDT (0.7654 ± 0.0227) when the length of the data collection window was set to “all years,” which was based on Cohort‐A that contained 341 (42.20%) patients with mild depression (PHQ‐9 <10) at baseline.

TABLE 2. Prediction performance of machine learning (ML) models (AUC [Mean ± Standard deviation (SD)]).
ML modelLRNBRFGBDT
1 year's EHRs0.6881 ± 0.04080.6985 ± 0.05130.7082 ± 0.02210.7197 ± 0.0301
2 years' EHRs0.6971 ± 0.05120.7101 ± 0.04560.7197 ± 0.01310.7363 ± 0.0215
All years' EHRs0.7204 ± 0.04630.7392 ± 0.03010.7496 ± 0.03070.7654 ± 0.0227
TABLE 2. Prediction performance of machine learning (ML) models (AUC [Mean ± Standard deviation (SD)]).
Enlarge table

In clinical practice, patients with baseline PHQ‐9 scores less than 10 (i.e., with no or mild depression) may not receive antidepressant treatment or receive treatment of low‐intensity (18). Furthermore, a lower initial PHQ‐9 score could leave less room for a decreasing trend over time. Therefore, we excluded patients with baseline PHQ‐9 scores less than 10 and conducted a sub‐analysis on prediction performance on the Cohort‐B. The GBDT model was trained based on “all years” data, and an AUC of 0.7254 ± 0.0218 was obtained. More details about the number of patients in the “Recovering” and “Worsening” groups, as well as the prediction performance (AUC, precision, recall) are shown in Supplementary eTable 3. Because of its clinical significance, we conducted the following experiments on Cohort‐B.

Prediction Performance Using Different Types of Information with Gradient Boosting Decision Tree

This section studied the prediction performance of GBDT with different types of information from EHRs extracted from Cohort‐B. The AUC obtained on different types of information is shown in Figure 2. From these results, we observe that diagnostic code information obtained the best prediction performance relative to other types of information. The prediction performances of using the procedure and medication information were similar. Combining demographic and baseline PHQ‐9 score information obtained better prediction results, perhaps because the baseline PHQ‐9 score played an important role in predicting the outcomes.

image

FIGURE 2. Comparison of different types of information using the Gradient Boosting Decision Tree (GBDT).

Feature Importance

A large number of features were important in predicting outcome. Each feature's importance for prediction was investigated on Cohort‐B, and the top 10 features were selected using the feature importance scores obtained from the output of GBDT. Furthermore, the positive or negative impact of these features on prediction performance was derived using the SHapley Additive exPlanations tool (26), which explains the prediction of outcome by computing the individual contribution of each feature to the overall prediction. This enabled us to fairly distribute the prediction among the top ten pre‐treatment features that were important in the prediction, which included baseline PHQ‐9 scores, as well as anxiety, psychotherapy, fatigue, stress, lorazepam, and acetaminophen (see Figure 3). An interesting finding was that acetaminophen was among the top features that were predictive of PHQ‐9 slopes, which may be because acetaminophen alters emotions and reduces emotional distress when suffering from physical pain. A previous study has shown that nonsteroidal anti‐inflammatory drugs (NSAIDs) and paracetamol can yield effects similar to antidepressants, in particular selective serotonin reuptake inhibitors (SSRIs) (27).

image

FIGURE 3. The features' positive or negative impact on the prediction. Variables are ranked in descending order to demonstrate feature importance. Each point represents a single observation. The horizontal location shows whether the effect of that SHapley Additive exPlanations (SHAP) value is associated with a positive (value > 0) or negative (value < 0) impact on prediction. Color shows whether the original value of that variable is high (in red) or low (in blue) for that observation.

DISCUSSION

This study demonstrates the potential of using machine learning to identify clinically meaningful predictors of the outcome of antidepressant treatment, especially when a slope fit of all PHQ‐9 scores represents longitudinal treatment recovery. In the investigation of 808 individuals with EHRs, predictive models including LR, RF, and GBDT were built for discriminating outcome by integrating multiple types of clinical information such as demographics, diagnostic codes, and medications. Combining multiple types of features could build more complete representations of patients, which can improve the predictive performance of ML models.

In this study, the discrimination was modest, with an AUC of 0.7654 (SD: 0.0227) obtained by GBDT when considering EHRs from “all years.” Discrimination in these models differed when considering 1 year, 2 years, and all years worth of clinical data, and we observed that considering more longitudinal data tended to improve prediction performance. By removing individuals with low PHQ‐9 scores (<10) at baseline, a modified AUC of 0.7254 (SD: 0.0218) was obtained by GBDT. This resulted in a large drop in AUC, because the AUC measures how well the predictive model can distinguish patients who recovered from those who worsened. It is reasonable to expect the exclusion of patients with low baseline PHQ‐9 scores to limit the model's ability to distinguish worsening individuals because it is unusual for patients with severe baseline scores to worsen further after treatment initiation. In this sense, a broader range of PHQ‐9 scores may hold more potential for improved distinction. We also observed that different types of clinical information played different roles in prediction. Diagnostic information such as anxiety, psychotherapy, and recurrent depression contributed the most to prediction. Demographic and baseline PHQ‐9 score information combined played a more important role relative to procedure and medication codes separately, which may be because baseline PHQ‐9 score was a vital contributor on its own. These findings could corroborate well with a previous report (15). In addition, if a patient was more complex and had more severe comorbidities such as anxiety and stress prior to taking the antidepressant, the model tended to predict the outcome as “Worsening.”

Our study differs substantially compared to previous work that investigated antidepressant treatment outcome (6, 10, 12, 28, 29, 30). To the best of our knowledge, no prior work has attempted to model antidepressant treatment outcome based on the slope of multiple continuous self‐report PHQ‐9 measurements over time, and very limited studies have utilized complete, longitudinal EHRs for predicting antidepressant treatment outcome. Previous literature only considers the difference between the final PHQ‐9 score and the baseline PHQ‐9 score for a certain treatment period (6, 10, 30, 31). Relying on these two timepoints alone can be misleading because the single difference in scores might suggest a worsening or improving outcome, when in fact the course of the outcome was mostly in the other direction. Our work fills an important gap by using the slope fit using a linear regression model based on all PHQ‐9 scores throughout a certain treatment period. This method is advantageous because it captures the evolution and intermediate oscillations in PHQ‐9 scores over time towards modeling the overall treatment outcome.

Pradier et al. attempted to predict treatment dropout after antidepressant initiation using EHRs (12). In their study, the primary outcome was treatment discontinuation following index prescription, defined as less than 90 days of prescription availability and no evidence of non‐pharmacologic psychiatric treatment. However, treatment discontinuation may not necessarily reflect a “recovering” or “worsening” treatment outcome and moreover is unable to account for variation among patients in terms of depression severity at the time of treatment. Common measures such as PHQ‐9 and HDRS (Hamilton Depression Rating Scale) are more robust in investigating antidepressant treatment outcome (15, 30), because they are acquired based on standardized questionnaires and provide more complete and objective information for estimating antidepressant treatment outcome (32).

Our study has important clinical implications. First, the machine‐learned class models predicted antidepressant treatment outcome using patients' medical history, which may encourage clinicians and patients to conduct more follow‐up visits during the course of their treatment (12). Additionally, the predictive results obtained from the models can aid clinicians in developing treatment plans that combine multiple elements in sequence or in parallel. Furthermore, the predictive models using EHRs can make contributions to personalized treatment management strategies in psychiatry (33). Beyond informing targeted treatments, these predictive models may potentially contribute to the design of a new generation of EHR‐linked clinical trials (34). For example, clinicians can stratify the patients into “high‐risk” and “low‐risk” groups based on predictive results (“worsening” or “recovering”) and pay closer attention to the treatments and prognosis of the “high‐risk” group (35).

There are several potential limitations to our study. For example, this study only considered data from a single academic medical center, which did not allow us to generalize the model prediction across multiple different health systems. This may have also resulted in a sample that was selective in geography, payer, and patients, and may not fully represent the population at risk. As with most EHR‐based studies, we are limited only to visits captured within the EHR network, and as such, clinical care sought outside this network may be missing. Also, when defining the outcome using the slope, the slope of zero may not always be the clinically meaningful cut‐point for recovering or worsening depression, as reversions to mean would be expected. Further investigation is necessary as different thresholds may produce different “Recovering” and “Worsening” distributions, which would present different prediction performances. In addition, it was unknown which diagnosis a given patient was referred or receiving treatment for, and we did not build predictive models for different classes and doses of antidepressants in this study. Accounting for different antidepressant classes, extending the study period to explore longer term outcomes, as well as conducting cross‐site validation may further enhance the results and applicability of this study, which will be investigated in future studies. Subsequent studies may also apply the techniques used in this analysis to information used in previous studies that investigated antidepressant outcome prediction, for example, socioeconomic status for which objective measures and structured EHR data are expanding. Additionally, natural language processing techniques would be considered in the future to process clinical text. With these limitations in mind, these results provide insights in terms of personalizing antidepressant treatments and encouraging researchers to pursue modeling of this simple but highly valuable outcome.

CONCLUSION

Using routinely collected longitudinal EHRs and ML algorithms, we predicted overall changes in depression severity after starting antidepressant treatment. Antidepressant treatment outcome was defined based on multiple PHQ‐9 scores using a linear regression model. Multiple types of EHRs were integrated to train models, and GBDT showed the best prediction performance. These investigations have the potential to drive the development of a clinical decision‐making tool for personalized management of depression.

Weill Cornell Medicine, New York, New York, USA (Z. Xu, V. Vekaria, F. Wang, J. Cukor, P. Adekkanattu, Y. Xiao, G. Alexopoulos, J. Pathak); Temple University, Philadelphia, Pennsylvania, USA (C. Su); University of Washington, Seattle, Washington, USA (P. Brandt); Mayo Clinic, Rochester, Minnesota, USA (G. Jiang, R. C. Kiefer); Northwestern University, Chicago, Illinois, USA (Y. Luo, L. V. Rasmussen); University of Florida, Gainesville, Florida, USA (J. Xu)
Send correspondence to
Dr. Pathak
()

This research is funded in part by grants from the US National Institutes of Health (R01 GM105688, R01 MH119177, R01 MH121907, and R01 MH121922). The authors report no competing interests.

Zhenxing Xu and Veer Vekaria co‐first authors.

REFERENCES

1 Huang SH, LePendu P, Iyer SV, Tai‐Seale M, Carrell D, Shah NH. Toward personalizing treatment for depression: predicting diagnosis and severity. J Am Med Inf Assoc. 2014;21(6):1069–1075. https://doi.org/10.1136/amiajnl‐2014‐002733Google Scholar

2 Xu Z, Zhang Q, Li W, Li M, Yip PSF. Individualized prediction of depressive disorder in the elderly: a multitask deep learning approach. Int J Med Inf. 2019;132:103973. https://doi.org/10.1016/j.ijmedinf.2019.103973Google Scholar

3 Bozzatello P, Rocca P, De Rosa ML, Bellino S. Current and emerging medications for borderline personality disorder: is pharmacotherapy alone enough? Expet Opin Pharmacother. 2020;21(1):47–61. https://doi.org/10.1080/14656566.2019.1686482Google Scholar

4 Chekroud AM, Zotti RJ, Shehzad Z, Gueorguieva R, Johnson MK, Trivedi MH, et al. Cross‐trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatr. 2016;3(3):243–250. https://doi.org/10.1016/s2215‐0366(15)00471‐xGoogle Scholar

5 Kato, M, Hori H, Inoue T, Iga J, Iwata M, Inagaki T, et al. Discontinuation of antidepressants after remission with antidepressant medication in major depressive disorder: a systematic review and meta‐analysis. Mol Psychiatr. 2021;26(1):118–133. https://doi.org/10.1038/s41380‐020‐0843‐0Google Scholar

6 Zhdanov A, Atluri S, Wong W, Vaghei Y, Daskalakis ZJ, Blumberger DM, et al. Use of machine learning for predicting escitalopram treatment outcome from electroencephalography recordings in adult patients with depression. JAMA Netw Open. 2020;3(1):e1918377. https://doi.org/10.1001/jamanetworkopen.2019.18377Google Scholar

7 Williams LM, Korgaonkar MS, Song YC, Paton R, Eagles S, Goldstein‐Piekarski A, et al. Amygdala reactivity to emotional faces in the prediction of general and medication‐specific responses to antidepressant treatment in the randomized iSPOT‐D trial. Neuropsychopharmacology. 2015;40(10):2398–2408. https://doi.org/10.1038/npp.2015.89Google Scholar

8 Fu CH, Williams SC, Cleare AJ, Scott J, Mitterschiffthaler MT, Walsh ND, et al. Neural responses to sad facial expressions in major depression following cognitive behavioral therapy. Biol Psychiatr. 2008;64(6):505–512. https://doi.org/10.1016/j.biopsych.2008.04.033Google Scholar

9 Crane NA, Jenkins LM, Bhaumik R, Dion C, Gowins JR, Mickey BJ, et al. Multidimensional prediction of treatment response to antidepressants with cognitive control and functional MRI. Brain. 2017;140(2):472–486. https://doi.org/10.1093/brain/aww326Google Scholar

10 Viglione, A, Chiarotti F, Poggini S, Giuliani A, Branchi I. Predicting antidepressant treatment outcome based on socioeconomic status and citalopram dose. Pharmacogenomics J. 2019;19(6):538–546. https://doi.org/10.1038/s41397‐019‐0080‐6Google Scholar

11 Finegan M, Firth N, Delgadillo J. Adverse impact of neighbourhood socioeconomic deprivation on psychological treatment outcomes: the role of area‐level income and crime. Psychother Res. 2020;30(4):546–554. https://doi.org/10.1080/10503307.2019.1649500Google Scholar

12 Pradier MF, McCoy Jr TH, Hughes M, Perlis RH, Doshi‐Velez F. Predicting treatment dropout after antidepressant initiation. Transl Psychiatry. 2020;10(1):1–8. https://doi.org/10.1038/s41398‐020‐0716‐yGoogle Scholar

13 O’Driscoll C, Buckman JEJ, Fried EI, Saunders R, Cohen ZD, Ambler G, et al. The importance of transdiagnostic symptom level assessment to understanding prognosis for depressed adults: analysis of data from six randomised control trials. BMC Med. 2021;19(1):1–14. https://doi.org/10.1186/s12916‐021‐01971‐0Google Scholar

14 Saunders, R, Cohen ZD, Ambler G, DeRubeis RJ, Wiles N, Kessler D, et al., . A patient stratification approach to identifying the likelihood of continued chronic depression and relapse following treatment for depression. J Personalized Med. 2021;11(12):1295. https://doi.org/10.3390/jpm11121295Google Scholar

15 Rossom RC, Solberg LI, Vazquez‐Benitez G, Whitebird RR, Crain AL, Beck A, et al. Predictors of poor response to depression treatment in primary care. Psychiatr Serv. 2016;67(12):1362–1367. https://doi.org/10.1176/appi.ps.201400285Google Scholar

16 Löwe B, Kroenke K, Herzog W, Grafe K. Measuring depression outcome with a brief self‐report instrument: sensitivity to change of the Patient Health Questionnaire (PHQ‐9). J Affect Disord. 2004;81(1):61–66. https://doi.org/10.1016/s0165‐0327(03)00198‐8Google Scholar

17 McMillan D, Gilbody S, Richards D. Defining successful treatment outcome in depression using the PHQ‐9: a comparison of methods. J Affect Disord. 2010;127(1‐3):122–129. https://doi.org/10.1016/j.jad.2010.04.030Google Scholar

18 Button KS, Kounali D, Thomas L, Wiles NJ, Peters TJ, Welton NJ, et al. Minimal clinically important difference on the Beck Depression Inventory‐II according to the patient's perspective. Psychol Med. 2015;45(15):3269–3279. https://doi.org/10.1017/s0033291715001270Google Scholar

19 Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. John Wiley & Sons; 2021.Google Scholar

20 Lee S‐I, Lee H, Abbeel P, Ng AY. Efficient l∼ 1 regularized logistic regression. Aaai; 2006.Google Scholar

21 Murphy KP. Naive bayes classifiers. Univ British Columbia. 2006;18(60):1–8.Google Scholar

22 Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/a:1010933404324Google Scholar

23 Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005;21(5):631–643. https://doi.org/10.1093/bioinformatics/bti033Google Scholar

24 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit‐learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.Google Scholar

25 Chen T, Guestrin C Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016.Google Scholar

26 Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67. https://doi.org/10.1038/s42256‐019‐0138‐9Google Scholar

27 Köhler O, Petersen L, Mors O, Gasse C. Inflammation and depression: combined use of selective serotonin reuptake inhibitors and NSAIDs or paracetamol and psychiatric outcomesBrain Behav. 2015;5(8):e00338. https://doi.org/10.1002/brb3.338Google Scholar

28 Frodl T. Recent advances in predicting responses to antidepressant treatment. F1000Research. 2017;6:619. https://doi.org/10.12688/f1000research.10300.1Google Scholar

29 Löwe B, Schenkel I, Carney‐Doebbeling C, Gobel C. Responsiveness of the PHQ‐9 to psychopharmacological depression treatment. Psychosomatics. 2006;47(1):62–67. https://doi.org/10.1176/appi.psy.47.1.62Google Scholar

30 Sharma V, Khan M, Baczynski C, Boate I. Predictors of response to antidepressants in women with postpartum depression: a systematic review. Archives of Women's Mental Health; 2020. p. 1–11.Google Scholar

31 Rajpurkar P, Yang J, Dass N, Vale V, Keller AS, Irvin J, et al. Evaluation of a machine learning model based on pretreatment symptoms and electroencephalographic features to predict outcomes of antidepressant treatment in adults with depression: a prespecified secondary analysis of a randomized clinical trial. JAMA Netw Open. 2020;3(6):e206653. https://doi.org/10.1001/jamanetworkopen.2020.6653Google Scholar

32 Cameron IM, Crawford JR, Lawton K, Reid IC. Psychometric comparison of PHQ‐9 and HADS for measuring depression severity in primary care. Br J Gen Pract. 2008;58(546):32–36. https://doi.org/10.3399/bjgp08x263794Google Scholar

33 Cipriani A, Furukawa TA, Salanti G, Chaimani A, Atkinson LZ, Ogawa Y, et al. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta‐analysis. Focus. 2018;16(4):420–429. https://doi.org/10.1176/appi.focus.16407Google Scholar

34 Perlis RH, Fava M, McCoy TH, Jr. Can electronic health records revive central nervous system clinical trials. Mol Psychiatr. 2019;24(8):1096–1098. https://doi.org/10.1038/s41380‐018‐0278‐zGoogle Scholar

35 Perlis RH. Abandoning personalization to get to precision in the pharmacotherapy of depression. World Psychiatr. 2016;15(3):228–235. https://doi.org/10.1002/wps.20345Google Scholar