Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity

Yanushevskaya, Irena; Gobl, Christer; Ni Chasaide, Ailbhe; Murphy, Andrew

dc.contributor.author	Yanushevskaya, Irena	en
dc.contributor.author	Gobl, Christer	en
dc.contributor.author	Ni Chasaide, Ailbhe	en
dc.contributor.author	Murphy, Andrew	en
dc.date.accessioned	2022-03-03T14:31:40Z
dc.date.available	2022-03-03T14:31:40Z
dc.date.created	10-11 June 2021	en
dc.date.issued	2021	en
dc.date.submitted	2021	en
dc.identifier.citation	Murphy, A., Yanushevskaya, I., N? Chasaide, A., Gobl, C., Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity, 2021 32nd Irish Signals and Systems Conference (ISSC), Athlone, Ireland, 10-11 June 2021, 2021, 1 - 6	en
dc.identifier.other	Y	en
dc.description	PUBLISHED	en
dc.description	Athlone, Ireland	en
dc.description.abstract	This paper reports an experiment exploring how a voice analysis-synthesis system, GlórCáil, can be used to add expressiveness to the synthetic voice in text-to-speech (TTS) systems. This implementation focuses on the Irish ABAIR TTS voices, where such voice control would facilitate many current/envisaged applications. GlórCáil allows voice control of synthesized speech, and for this experiment was integrated into a DNN-based TTS framework. Utterances were generated with f0, voice quality and vocal tract parameter manipulations targeting shifts in speaker identity and in the affective coloring of utterances. Scaling factors used for the manipulations were suggested in an earlier study. They involved global changes without sentence-internal dynamic variation, with a view to ascertain whether such global shifts might alter listeners' perception of speaker identity and affect. Results demonstrate affect shifts compatible with expectations. However, there were confounding factors. The female/child voices were poorly differentiated, which was expected given the similarity in the scaling factors used. The affect transformations suggest the baseline voice used had an intrinsically sad quality so that there is weak differentiation between the sad and no emotion stimuli. Male angry voice was the least successful, suggesting that dynamic, within-utterance variation is essential for the signaling of certain affects.	en
dc.format.extent	1	en
dc.format.extent	6	en
dc.language.iso	en	en
dc.rights	Y	en
dc.subject	Speech synthesis	en
dc.subject	Voice quality	en
dc.subject	Voice transformation	en
dc.subject	Affect	en
dc.subject	Speaker characteristics	en
dc.title	Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity	en
dc.title.alternative	2021 32nd Irish Signals and Systems Conference (ISSC)	en
dc.type	Conference Paper	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/yanushi	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/cegobl	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/anichsid	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/amurph48	en
dc.identifier.rssinternalid	238245	en
dc.identifier.doi	http://dx.doi.org/10.1109/ISSC52156.2021.9467853	en
dc.rights.ecaccessrights	openAccess
dc.identifier.orcid_id	0000-0003-1161-4625	en
dc.status.accessible	N	en
dc.identifier.uri	http://hdl.handle.net/2262/98196

Files in this item

Name:: Murphy et al. ISSC 2021 camera ...
Size:: 770.3Kb
Format:: PDF
Description:: Published (author's copy) - Peer ...

View/Open

Name:: license.txt
Size:: 3.424Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Centre for Language and Communication Studies (Scholarly Publications)
CLCS (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity

Files in this item

This item appears in the following Collection(s)