Korean speech recognition using deep learning (in Korean)

Apr 30, 2019·

Suji Lee

Sukjin Han

Sewon Park*

Kyeongwon Lee

Jaeyong Lee

· 0 min read

PDF Cite

Abstract

In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in “End-to-end” model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the “WorimalSam” online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.

Type

Journal article

Publication

The Korean Journal of Applied Statistics, 32 (2)

Last updated on Apr 30, 2019

Deep Learning

Authors

Sewon Park*

Assistant Professor

← SOS: Score-based oversampling for tabular data Aug 14, 2022