The Clinical Applications of NLP: Workshop at EMNLP 2020 | by Jerry Wei | Nov, 2020


1. Dr. Summarize: Global Summarization of Medical Dialogue by Exploiting Local Structures [Paper]

This paper focuses on text summarization in the context of medical dialogue. The idea is that when a patient has a conversation with a doctor, you want to be able to automatically summarize what transpired in the conversation. So for example, if the conversation went something like this:

Doctor: “What have you been experiencing today?”

Patient: “I’m having pain in my stomach.”

Doctor: “How severe is the pain on a scale from 1 to 10”

Patient: “8”

Then you want to be able to produce a transcript-like record of what happened during the visit.

In this paper, the authors propose a model that can summarize these dialogues by filling in words using local structures. For example, the model would be able to fill in the patient’s response to “how severe is the pain on a scale from 1 to 10” (which was originally “8”) into something like “The severity of pain is 8 on a scale from 1 to 10”. Doing this multiple times, the model could arrive at something like

“The patient has pain in their stomach that is at a severity of 8 on a scale from 1 to 10”.

The model is 80% accurate at summarization when compared with a baseline model, so this is a promising direction for text summarization in medical dialogues.

2. Generating Accurate Electronic Health Assessment from Medical Graph [Paper]

In this paper, the authors try to generate a diagnosis for a patient based on their medical records. Essentially, their model is inputted some information about a patient along with a central complaint, and it outputs an assessment of the patient.

The main contribution of the paper is that the authors use a transformer-based model which outperforms a baseline LSTM model. This is quite interesting because it helps demonstrate that transformer-based models (e.g., BERT) can achieve high performance on domain-specific datasets and not just the general datasets that they are usually used for.

Another interesting part of the paper is that the model is able to incorporate background medical contexts to increase performance. For example, it can actually help visualize why certain drugs have been recommended for a certain patient. Here’s an example:

In an example that the authors presented during the talk, the graph that the model builds is able to make it clear why the drug “Saxenda” was recommended instead of “lantus insulin.” The graph shows that the model has found patient specific information (that the patient is looking for a cure for type 2 diabetes and is allergic to lantus insulin) and combined it with general medical knowledge (that Saxenda and lantus insulin are both cures for type 2 diabetes) in order to show that the doctor has suggested Saxenda instead of lanxus insulin because the patient was allergic to lantus insulin.

3. On the diminishing return of labeling clinical reports [Paper]

This paper, which was nominated for the best paper award at the workshop, attempted to address the question: is there a certain amount of data needed for clinical reports for the model to converge?

To answer this question, they took a dataset and obtained subsets of the dataset by taking a random percentage of the training data (the testing set is fixed). They then trained multiple models using the subsets in order to analyze the models’ performances at varying percentages of data that was used. The paper had 2 main findings:

  1. Deep learning models outperform rule-based models starting at only 5–10% of the training data. The authors used 4 deep learning models and 2 rule-based models for this experiment. The deep learning models were superior to the rule based model, and this is pretty much expected because deep learning models allow for much more complexity in analysis than rule-based models.
  2. The models converged at a sample size of ~6000 reports, while other text classification tasks usually need hundreds of thousands of data points. This is the part that answers the main question. It turns out that you only need a few thousand reports for the models to converge, and so there isn’t actually much to be gained by obtaining more data. This could be incredibly important because it provides evidence that could allow researchers to apply NLP models to clinical settings that they previously thought they didn’t have enough data for. If a researcher had a few thousand data points but is stuck trying to get more data, this paper could provide evidence that they don’t need that extra data to obtain high performance.


Source link

Write a comment