Added: Alesia Volker - Date: 02.03.2022 08:15 - Views: 12577 - Clicks: 5784
Introduction: Research related to the automatic detection of Alzheimer's disease AD is important, given the high prevalence of AD and the high cost of traditional diagnostic methods.
Since AD ificantly affects the content and acoustics of spontaneous speech, natural language processing, and machine learning provide promising techniques for reliably detecting AD. There has been a recent proliferation of classification models for AD, but these vary in the datasets used, model types and training and testing paradigms. In this study, we compare and contrast the performance of two common approaches for automatic AD detection from speech on the same, well-matched dataset, to determine the advantages of using domain knowledge vs.
Two approaches were used for classification of these speech samples: 1 using domain knowledge: extracting an extensive set of clinically relevant linguistic and acoustic features derived from speech and transcripts based on prior literature, and 2 using transfer-learning and leveraging large pre-trained machine learning models: using transcript-representations that are automatically derived from state-of-the-art pre-trained language models, by fine-tuning Bidirectional Encoder Representations from Transformer BERT -based sequence classification models.
: We compared the utility of speech transcript representations obtained from recent natural language processing models i. Both the feature-based approaches and fine-tuned BERT models ificantly outperformed the baseline linguistic model using a small set of linguistic features, demonstrating the importance of extensive linguistic information for detecting cognitive impairments relating to AD. We observed that fine-tuned BERT models numerically outperformed feature-based approaches on the AD detection task, but the difference was Local women who want to fuck Sneisa statistically ificant.
Our main contribution is the observation that when tested on the same, demographically balanced dataset and tested on independent, unseen data, both domain knowledge and pretrained linguistic models have good predictive performance for detecting AD based on speech. It is notable that linguistic information alone is capable of achieving comparable, and even numerically better, performance than models including both acoustic and linguistic features here.
We also try to shed light on the inner workings of the more black-box natural language processing model by performing an interpretability analysis, and find that attention weights reveal interesting patterns such as higher attribution to more important information content units in the picture description task, as well as pauses and filler words.
Conclusion: This approach supports the value of well-performing machine learning and linguistically-focussed processing techniques to detect AD from speech and highlights the need to compare model performance on carefully balanced datasets, using consistent same training parameters and independent test datasets in order to determine the best performing predictive model.
Alzheimer's disease AD is a progressive neurodegenerative disease that causes problems with memory, thinking, and behavior. AD affects over 40 million people worldwide with high costs of acute and long-term care Prince et al. Current forms of diagnosis are both time consuming and expensive Prabhakaran et al. Studies have shown that valuable clinical information indicative of cognition can be obtained from spontaneous speech elicited using pictures Goodglass et al. Studies have capitalized on this clinical observation, using speech analysis, natural language processing NLPand machine learning ML to distinguish between speech from healthy and cognitively impaired participants in datasets including semi-structured speech tasks such as picture description.
These models serve as quick, objective, and non-invasive assessments of an individual's cognitive status which could be developed into more accessible tools to facilitate clinical screening and diagnosis. Since these initial reports, there has been a proliferation of studies reporting classification models for AD based on speech, as described by recent reviews and meta-analyses Slegers et al.
The existing studies that have addressed differences between AD and non-AD speech and worked on developing speech-based AD biomarkers, are often descriptive rather than predictive. Thus, they often overlook common biases in evaluations of AD detection methods, such as repeated occurrences of speech from the same participant, variations in audio quality of speech samples, and imbalances of gender and age distribution in the used datasets, as noted in the systematic reviews and meta-analyses published on this topic Slegers et al. As such, the existing ML models may be prone to the biases introduced in available data.
Due to these reasons, it's difficult to compare model performance across papers and datasets, since they are rarely matched in terms of data and model characteristics. To overcome the problem of bias and overfitting and introduce a common dataset to compare model performance, the ADReSS challenge Luz et al. The challenge consisted of two key tasks: 1 Speech classification task: classifying speech as AD or non-AD.
The organizers restricted access to the test dataset to make it completely unseen for participants to ensure the fair evaluation of models' performance. The work presented in this paper is focused entirely on this new balanced dataset and follows the ADReSS challenge's evaluation process.
As such, the models presented in this paper are more generalizable to unseen data than those developed in the ly discussed studies. Using domain knowledge : with this approach, we extract clinically relevant linguistic features from transcripts of speech, and acoustic features from corresponding audio files for binary AD vs.
The features extracted are informed by clinical and ML research in the space of cognitive impairment detection Fraser et al. Using transfer learning : with this approach, we fine-tune pre-trained BERT. The overwhelming majority of NLP and ML approaches on AD detection from speech are still based on hand-crafted engineering of clinically-relevant features de la Fuente Garcia et al.
work that focused on automatic AD detection from speech uses certain acoustic features such as zero-crossing rate, Mel-frequency cepstral coefficients etc. Fraser et al. Yancheva et al. Detecting Local women who want to fuck Sneisa or predicting MMSE scores with pre-engineered features of speech and thereby infusing domain knowledge into the task has several advantages, such as more interpretable model decisions, the possibility to represent speech in different modalities both acoustic and linguisticand potentially lower computational resource requirements when paired with conventional ML models.
However, there are also a few disadvantages, e. In the recent years, transfer learning, or in other words, utilizing language representations from huge pre-trained neural models that learn robust representations for text, has become ubiquitous in NLP Young et al. This model offers enhanced parallelization and better modeling of long-range dependencies in text and as such, has achieved state-of-the-art performance on a variety of tasks in NLP.
research Jawahar et al.
BERT uses powerful attention mechanisms to encode global dependencies between the input and output. This allows it to achieve state-of-the-art on a suite of benchmarks Devlin et al. Fine-tuning BERT for a few epochs can potentially attain good performance even on small datasets. The transfer learning technique in general and BERT model specifically are promising approaches to apply to the task of AD detection from speech because such a technique eliminates the need of expensive and time-consuming feature engineering, mitigates the need of big training datasets, and potentially in more generalizable models.
However, the common critique is that BERT is pre-trained on the corpus of healthy language and as such is not usable for detecting AD. In addition, BERT is not directly interpretable, unlike feature-based models. Finally, the original version of the BERT model is only able to use text as input, thus eliminating the possibility to employ the acoustic modality of Local women who want to fuck Sneisa, when detecting AD.
All these may be the reasons why BERT was not ly used for developing predictive models for AD detection, even though its performance on many other NLP tasks is exceptional. Our motivation in this work is to benchmark a BERT training procedure on transcripts from a pathological speech dataset, and evaluate the effectiveness of high-level language representations from BERT in detecting AD.
We are specifically interested in understanding whether BERT has a potential to outperform traditional widely used domain-knowledge based approaches given that it does not include acoustic features, and at the same time increase the generalizability of the predictive models. To eliminate the biases of unbalanced data, we perform all our experiments on the carefully demographically-matched ADReSS dataset. To understand how well the presented models generalize to unseen data, we evaluate performance of the models using both cross-validation and testing on unseen held out dataset.
When evaluation is performed on the unseen held out test data, the fine-tuned BERT text sequence classification models achieve the highest AD detection accuracy of These show that: 1 Extensive feature-based—i. Speech is elicited from participants through the Cookie Theft picture from the Boston Diagnostic Aphasia exam Goodglass et al.
Recordings were acoustically enhanced by the challenge organizers with stationary noise removal and audio volume normalization was applied across all speech segments to control for variation caused by recording conditions such as microphone placement Luz et al. The speech dataset is divided into the train set and the unseen held out test set.
MMSE Cockrell and Folstein, scores are available for all but one of the participants in the train set. Table 1. Table 2. Table 3. The speech transcripts in the dataset are manually transcribed as per the CHAT protocol MacWhinney,and include speech segments from both the participant and an investigator.
We only use the portion of the transcripts corresponding to the participant. Additionally, we combine all participant speech segments corresponding to a single picture description for extracting acoustic features. We extract manually-engineered features from transcripts and associated audio files see Tables 4 — 6. These features are identified as indicators of cognitive impairment in literature, and hence encode domain knowledge.
All the features are divided into three higher-level :. Lexico-syntactic features : Frequencies of various production rules from the constituency parsing tree of the transcripts Chae and Nenkova,speech-graph based features Mota et al. We also extract syntactic features Ai and Lu, such as the proportion of various POS-tags, and similarity between consecutive utterances. Acoustic and temporal features : Mel-frequency cepstral coefficients MFCCsfundamental frequency, statistics related to zero-crossing rate, as well as proportion of various pauses for example, filled and unfilled pauses, ratio of a of pauses to a of words etc.
Semantic features based on picture description content 25 : Proportions of various information content units used in the picture, identified as being relevant to memory impairment in prior literature Croisile et al. We benchmark the following training regimes for classification: classifying features extracted at transcript-level and a BERT model fine-tuned on transcripts.
The random forest classifier fits decision trees and considers f e a t u r e s when looking for the best split. The minimum of samples required to split an internal node is 2, and the minimum of samples required to be at a leaf node is 2. Bootstrap samples are used when building trees. All other parameters are set to the default value.
The NN used consists of two layers of 10 units each note we varied both the of units and of layers while tuning for the optimal hyperparameter setting.
The ReLU activation function is used at each hidden layer. The model is trained using Adam Kingma and Ba, for epochs and with a batch size of of samples in train set in each fold. All other parameters are default. The of features is tly optimized with the classification model parameters. All our experiments are based on the bert-base-uncased variant Devlin et al.
Maximum input length is tokens. Cross-entropy loss is used while fine-tuning for AD detection. While the base BERT model is pre-trained with sentence pairs, our input to the model consists of speech transcripts with several transcribed utterances with start and separator special tokens from the BERT vocabulary at the beginning and end of each utterance respectively, following Liu and Lapata This is performed to ensure that utterance boundaries are easily encoded, since cross-utterance information such as coherence and utterance transitions is important for reliable AD detection Fraser et al.
An embedding, following Devlin et al. This is then passed to the classification layer, and the combined model is fine-tuned on the AD detection task—all using an open-source PyTorch Paszke et al. As noted by Devlin et al. The transcript input to the classification model consists of several transcribed utterances with corresponding start and end tokens for each utterance, following Liu and Lapata, The final hidden state corresponding to the first start [CLS] token in the transcript which summarizes the information across all tokens in the transcript using the self-attention mechanism in BERT is used as the aggregate representation, and passed to the classification layer Devlin et al.
This model is then fine-tuned on training data. Hyperparameter tuning: We optimize the of epochs to 10 by varying it from 1 to 12 during CV. Adam optimizer Kingma and Ba, and linear scheduling for the learning rate Paszke et al.
Values of performance metrics for each model are averaged across three runs with different random seeds in all cases. Predictions on ADReSS test set: We generate three predictions with different seeds from each hyperparameter-optimized classifier trained on the complete train set, and then produce a majority prediction to avoid overfitting.
We report performance on the challenge test set, as obtained from the challenge organizers. We also report precision, recall, specificity, and F1 with respect to the positive class AD. Domain knowledge-based approach: For this task, we benchmark two kinds of regression models, linear, and ridge, using pre-engineered features as input.
MMSE scores are always within the range of 0—30, and so predictions are clipped to a range between 0 and We perform feature selection by choosing top-k of features, based on an F-Score computed from the correlation of each feature with MMSE score.Local women who want to fuck Sneisa
email: [email protected] - phone:(846) 289-5504 x 6776
Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech