Finding phonemes: improving machine lip-reading

Conference paper


Bear, Y., Harvey, Richard W. and Lan, Yuxuan 2015. Finding phonemes: improving machine lip-reading. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 115-120
AuthorsBear, Y., Harvey, Richard W. and Lan, Yuxuan
TypeConference paper
Abstract

In machine lip-reading there is continued debate and research
around the correct classes to be used for recognition.
In this paper we use a structured approach for devising
speaker-dependent viseme classes, which enables the creation
of a set of phoneme-to-viseme maps where each has a different
quantity of visemes ranging from two to 45. Viseme classes
are based upon the mapping of articulated phonemes, which
have been confused during phoneme recognition, into viseme
groups.
Using these maps, with the LiLIR dataset, we show the
effect of changing the viseme map size in speaker-dependent
machine lip-reading, measured by word recognition correctness
and so demonstrate that word recognition with phoneme classifiers
is not just possible, but often better than word recognition
with viseme classifiers. Furthermore, there are intermediate
units between visemes and phonemes which are better still.

Keywordsvisual-only speech recognition; computer lipreading; visemes; classification; pattern recognition
Year2015
ConferenceFAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing
PublisherInternational Speech Communication Association
Accepted author manuscript
License
CC BY
Publication dates
Print15 Sep 2015
Publication process dates
Deposited09 Feb 2017
AcceptedSep 2015
Web address (URL)http://www.isca-speech.org/archive/avsp15/papers/av15_115.pdf
Page range115-120
Permalink -

https://repository.uel.ac.uk/item/854v4

Download files


Accepted author manuscript
  • 117
    total views
  • 66
    total downloads
  • 0
    views this month
  • 1
    downloads this month

Export as

Related outputs

Resolution limits on visual speech recognition
Bear, Y., Harvey, Richard, Theobald, Barry-John and Lan, Yuxuan 2014. Resolution limits on visual speech recognition. in: IEEE International Conference on Image Processing (ICIP) IEEE.
Some observations on computer lip-reading: moving from the dream to the reality
Bear, Y., Owen, Gari, Harvey, Richard and Theobald, Barry-John 2014. Some observations on computer lip-reading: moving from the dream to the reality. Proceedings of SPIE. 9253. https://doi.org/10.1117/12.2067464
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
Bear, Y., Harvey, Richard W., Theobald, Barry-John and Lan, Yuxuan 2014. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in: Bebis, George, Boyle, Richard, Parvin, Bahram, Koracin, Darko, McMahan, Ryan, Jerald, Jason, Zhang, Hui, Drucker, Steven M., Kambhamettu, Chandra, Choubassi, Maha El, Deng, Zhigang and Carlson, Mark (ed.) Advances in Visual Computing: 10th International Symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II Springer International Publishing.
Decoding visemes: Improving machine lip-reading
Bear, Y. and Harvey, Richard 2016. Decoding visemes: Improving machine lip-reading. in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE.
Speaker-independent machine lip-reading with speaker-dependent viseme classifiers
Bear, Y., Cox, Stephen J. and Harvey, Richard W. 2015. Speaker-independent machine lip-reading with speaker-dependent viseme classifiers. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 190-195
Phoneme-to-viseme mappings: the good, the bad, and the ugly
Bear, Y. and Harvey, Richard 2017. Phoneme-to-viseme mappings: the good, the bad, and the ugly. Speech Communication. 95, pp. 40-67. https://doi.org/10.1016/j.specom.2017.07.001
Comparing phonemes and visemes with DNN-based lipreading
Thangthai, Kwanchiva, Bear, Y. and Harvey, Richard 2017. Comparing phonemes and visemes with DNN-based lipreading. 28th British Machine Vision Conference. London, UK 04 - 07 Sep 2017 BMVA Press.
Visual speech recognition: aligning terminologies for better understanding
Bear, Y. and Taylor, Sarah L. 2017. Visual speech recognition: aligning terminologies for better understanding. 28th British Machine Vision Conference. London, UK 04 - 07 Sep 2017 BMVA Press.
Visual gesture variability between talkers in continuous speech
Bear, Y. 2017. Visual gesture variability between talkers in continuous speech. 28th British Machine Vision Conference. London, UK 04 - 07 Sep 2017 BMVA Press.