Finding phonemes: improving machine lip-reading

Conference paper

Bear, Y., Harvey, Richard W. and Lan, Yuxuan 2015. Finding phonemes: improving machine lip-reading. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 115-120

Publication dates
Authors	Bear, Y., Harvey, Richard W. and Lan, Yuxuan
Type	Conference paper
Abstract	In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and phonemes which are better still.
Keywords	visual-only speech recognition; computer lipreading; visemes; classification; pattern recognition
Year	2015
Conference	FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing
Publisher	International Speech Communication Association
Accepted author manuscript	Finding phonemes - Helen Bear.pdf License CC BY
Print	15 Sep 2015
Publication process dates
Deposited	09 Feb 2017
Accepted	Sep 2015
Web address (URL)	http://www.isca-speech.org/archive/avsp15/papers/av15_115.pdf
Page range	115-120

Permalink -

https://repository.uel.ac.uk/item/854v4

Download files

Accepted author manuscript

	Finding phonemes - Helen Bear.pdf
License: CC BY

240
total views
99
total downloads
3
views this month
1
downloads this month

Export as

Related outputs

Resolution limits on visual speech recognition

Bear, Y., Harvey, Richard, Theobald, Barry-John and Lan, Yuxuan 2014. Resolution limits on visual speech recognition. in: IEEE International Conference on Image Processing (ICIP) IEEE.

Some observations on computer lip-reading: moving from the dream to the reality

Bear, Y., Owen, Gari, Harvey, Richard and Theobald, Barry-John 2014. Some observations on computer lip-reading: moving from the dream to the reality. Proceedings of SPIE. 9253. https://doi.org/10.1117/12.2067464

Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Bear, Y., Harvey, Richard W., Theobald, Barry-John and Lan, Yuxuan 2014. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in: Bebis, George, Boyle, Richard, Parvin, Bahram, Koracin, Darko, McMahan, Ryan, Jerald, Jason, Zhang, Hui, Drucker, Steven M., Kambhamettu, Chandra, Choubassi, Maha El, Deng, Zhigang and Carlson, Mark (ed.) Advances in Visual Computing: 10th International Symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II Springer International Publishing.

Decoding visemes: Improving machine lip-reading

Bear, Y. and Harvey, Richard 2016. Decoding visemes: Improving machine lip-reading. in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE.

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

Bear, Y., Cox, Stephen J. and Harvey, Richard W. 2015. Speaker-independent machine lip-reading with speaker-dependent viseme classifiers. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 190-195