Decoding visemes: Improving machine lip-reading

Book chapter

Bear, Y. and Harvey, Richard 2016. Decoding visemes: Improving machine lip-reading. in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE.

Publication dates
Authors	Bear, Y. and Harvey, Richard
Abstract	To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.
Keywords	visemes; weak learning; visual speech; lip-reading; recognition; classification
Book title	2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Year	2016
Publisher	IEEE
Print	19 May 2016
Publication process dates
Deposited	24 Feb 2017
Accepted	Mar 2016
Event	The 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ISBN	978-1-4799-9988-0
	978-1-4799-9987-3
Digital Object Identifier (DOI)	https://doi.org/10.1109/ICASSP.2016.7472029
Web address (URL)	http://ieeexplore.ieee.org/document/7472029/
Additional information	© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript	Decoding Visemes.pdf

Permalink -

https://repository.uel.ac.uk/item/850yz

Download files

275
total views
432
total downloads
1
views this month
0
downloads this month

Export as

Related outputs

Resolution limits on visual speech recognition

Bear, Y., Harvey, Richard, Theobald, Barry-John and Lan, Yuxuan 2014. Resolution limits on visual speech recognition. in: IEEE International Conference on Image Processing (ICIP) IEEE.

Some observations on computer lip-reading: moving from the dream to the reality

Bear, Y., Owen, Gari, Harvey, Richard and Theobald, Barry-John 2014. Some observations on computer lip-reading: moving from the dream to the reality. Proceedings of SPIE. 9253. https://doi.org/10.1117/12.2067464

Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Bear, Y., Harvey, Richard W., Theobald, Barry-John and Lan, Yuxuan 2014. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in: Bebis, George, Boyle, Richard, Parvin, Bahram, Koracin, Darko, McMahan, Ryan, Jerald, Jason, Zhang, Hui, Drucker, Steven M., Kambhamettu, Chandra, Choubassi, Maha El, Deng, Zhigang and Carlson, Mark (ed.) Advances in Visual Computing: 10th International Symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II Springer International Publishing.

Finding phonemes: improving machine lip-reading

Bear, Y., Harvey, Richard W. and Lan, Yuxuan 2015. Finding phonemes: improving machine lip-reading. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 115-120

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

Bear, Y., Cox, Stephen J. and Harvey, Richard W. 2015. Speaker-independent machine lip-reading with speaker-dependent viseme classifiers. FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing. Education Centre of the Jesuits, Vienna, Austria 11 - 13 Sep 2015 International Speech Communication Association. pp. 190-195

Decoding visemes: Improving machine lip-reading

Download files

275

432

1

0

Export as

Related outputs

Resolution limits on visual speech recognition

Some observations on computer lip-reading: moving from the dream to the reality

Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Finding phonemes: improving machine lip-reading

Speaker-independent machine lip-reading with speaker-dependent viseme classifiers

Phoneme-to-viseme mappings: the good, the bad, and the ugly

Comparing phonemes and visemes with DNN-based lipreading

Visual speech recognition: aligning terminologies for better understanding

Visual gesture variability between talkers in continuous speech