Abstract
We present a new method for preparing a lexical-phonetic database as a resource for acoustic model training. The research is an offshoot of the ongoing Project Ravnur (Speech Recognition for Faroese), but the method is language-independent. At NODALIDA 2019 we demonstrate the method (called SHARP) online, showing how a traditional lexical-phonetic dictionary (with a very rich phone inventory) is transformed into an ASR-friendly database (with reduced phonetics, preventing data sparseness). The mapping procedure is informed by a corpus of speech transcripts. We conclude with a discussion on the benefits of a well-thought-out BLARK design (Basic Language Resource Kit), making tools like SHARP possible.
Original language | English |
---|---|
Title of host publication | Proceedings of the 22nd Nordic Conference on Computational Linguistics |
Place of Publication | Turku |
Publisher | Linköping University Electronic Press |
Pages | 395-399 |
Number of pages | 5 |
Volume | 2019 |
Edition | September–October |
Publication status | Published - 2019 |
Keywords
- Phonetics
- Databases