Researchers from the University of Southern California have developed a new machine learning tool capable of detecting certain speech-related diagnostic criteria in patients being evaluated for depression. Known as SimSensei, the tool listens to patient's voices during diagnostic interviews for reductions in vowel expression characteristic of psychological and neurological disorders that may not be sufficiently clear to human interviewers. The idea is (of course) not to replace those interviewers, but to add additional objective weight to the diagnostic process.
The group's work is described in the journal IEEE Transactions on Affective Computing.
Depression misdiagnosis is a huge problem in health care, particularly in cases in which a primary care doctor making (or not) the diagnosis. A 2009 meta-study covering some 50,000 patients found that docs were correctly identifying depression only about half the time, with the number of false positives outnumbering false negatives by a ratio of about three-to-one. That's totally unacceptable.
But it's also understandable. Doctors, especially general practitioners, will pretty much always overdiagnose an illness for two simple and related reasons: one, diagnosing an illness in error is almost always safer than not diagnosing an illness in error; two, eliminating with certainty the possibility of any single diagnosis requires more expertise/more confidence than otherwise. See also: overprescribing antibiotics.
A big part of the problem in diagnosing depression is that it's a very heterogenous disease. It has many different causes and is expressed in many different ways. Figure that a primary care doctor is seeing maybe hundreds of patients in a week, for all manner of illness, and the challenge involved in extracting a psychiatric diagnosis from the vagaries of self-reported symptoms and interview-based observations is pretty clear. There exists a huge hole then for something like SimSensei.
The depression-related variations in speech tracked by SimSensei are already well-documented. "Prior investigations revealed that depressed patients often display flattened or negative affect, reduced speech variability and monotonicity in loudness and pitch, reduced speech, reduced articulation rate, increased pause duration, and varied switching pause duration," the USC paper notes. "Further, depressed speech was found to show increased tension in the vocal tract and the vocal folds."
This is an obvious problem for machine learning, which is all about making predictions based on noisy data. Speech analysis, in general, is one of the field's primary concerns.
The analysis being made here is simple enough on its face. Reduce a patient's speech to only vowel sounds and then make a frequency analysis of the first and second formants (spectral peaks) of the vowels a, i, and u. The first two parts of this process involve the actual speech detector and an accompanying formant tracker. The third part is the algorithm, which is itself a fairly old (1967) machine learning device known as the k-means algorithm. Basically, it works by taking data sets and classifying them into different clusters centered around certain mean or average values.
The result of the clustering is a triangular space/graph with the spectral peaks of a, i, and u at each corner. The region inside of this triangle represents a vowel space and this is what the algorithm spits out. This resulting space is then compared against a reference "normal" vowel space and this ratio what's being tested here as the depression (and PTSD) indicator.
"We evaluate the automatically assessed vowel space in experiments with a sample of 253 individuals and show that the novel measure reveals a significantly reduced vowel space in subjects that reported symptoms of depression and PTSD," the USC team concludes. "We show that the measure is robust when analyzing only parts of the full interactions or limited amounts of speech data, which has implications on the approach's practicality. Lastly, we successfully show the measure's statistical robustness against varying demographics of individuals and articulation rate."
The resulting vowel space ratios found among depressed and nondepressed patients aren't hugely different, but they are enough to matter. The most significant problem with this study may be the initial classification of depressed/nondepressed, which is based on subjects' self-reported assessments. Moreover, the vowel space reduction seen isn't likely to be limited to depression and PTSD, and will need to investigated in conditions such as schizophrenia and Parkinson's disease as well.