Stefan Petrik, Christina Drexel, Leo Fessler, Jeremy Jancsary, Alexandra Klein, Gernot Kubin, Johannes Matiasek, Franz Pernkopf, and Harald Trost
Computer Speech and Language, vol. 25(2), pp. 363-385, 2011
Automatic speech recognition (ASR) has become a valuable tool in large document production environments like medical dictation. While manual post-processing is still needed for correcting speech recognition errors and for creating documents which adhere to various stylistic and formatting conventions, a large part of the document production process is carried out by the ASR system. For improving the quality of the system output, knowledge about the multi-layered relationship between the dictated texts and the final documents is required. Thus, typical speech-recognition errors can be avoided, and proper style and formatting can be anticipated in the ASR part of the document production process. Yet – while vast amounts of recognition results and manually edited final reports are constantly being produced – the error-free literal transcripts of the actually dictated texts are a scarce and costly resource because they have to be created by manually transcribing the audio files.
To obtain large corpora of literal transcripts for medical dictation, we propose a method for automatically reconstructing them from draft speech-recognition transcripts plus the corresponding final medical reports. The main innovative aspect of our method is the combination of two independent knowledge sources: phonetic information for the identification of speech-recognition errors and semantic information for detecting post-editing concerning format and style. Speech recognition results and final reports are first aligned, then properly matched based on semantic and phonetic similarity, and finally categorised and selectively combined into a reconstruction hypothesis. This method can be used for various applications in language technology, e.g., adaptation for ASR, document production, or generally for the development of parallel text corpora of non-literal text resources. In an experimental evaluation, which also includes an assessment of the quality of the reconstructed transcripts compared to manual transcriptions, the described method results in a relative word error rate reduction of 7.74% after retraining the standard language model with reconstructed transcripts.
Please cite as:
title = “Semantic and phonetic automatic reconstruction of medical dictations”,
journal = “Computer Speech & Language”,
volume = “25″,
number = “2″,
pages = “363 – 385″,
year = “2011″,
author = “Stefan Petrik and Christina Drexel and Leo Fessler and Jeremy Jancsary and Alexandra Klein and Gernot Kubin and Johannes Matiasek and Franz Pernkopf and Harald Trost”