Resolution limits on visual speech recognition
Bear, Y., Harvey, Richard, Theobald, Barry-John and Lan, Yuxuan 2014. Resolution limits on visual speech recognition. in: IEEE International Conference on Image Processing (ICIP) IEEE.
|Authors||Bear, Y., Harvey, Richard, Theobald, Barry-John and Lan, Yuxuan|
Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.
|Keywords||Shape; Accuracy; Hidden Markov models; Visualization; Lips; Active appearance model; Face|
|Book title||IEEE International Conference on Image Processing (ICIP)|
|Publication process dates|
|Deposited||10 Mar 2017|
|Event||IEEE International Conference on Image Processing (ICIP) 2014|
|Web address (URL)||http://ieeexplore.ieee.org/document/7025274/|
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
|Accepted author manuscript|
2views this month
3downloads this month