Show simple item record

dc.contributor.authorYanushevskaya, Irenaen
dc.contributor.authorGobl, Christeren
dc.contributor.authorNi Chasaide, Ailbheen
dc.contributor.authorMurphy, Andrewen
dc.date.accessioned2022-03-03T14:31:40Z
dc.date.available2022-03-03T14:31:40Z
dc.date.created10-11 June 2021en
dc.date.issued2021en
dc.date.submitted2021en
dc.identifier.citationMurphy, A., Yanushevskaya, I., N? Chasaide, A., Gobl, C., Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity, 2021 32nd Irish Signals and Systems Conference (ISSC), Athlone, Ireland, 10-11 June 2021, 2021, 1 - 6en
dc.identifier.otherYen
dc.descriptionPUBLISHEDen
dc.descriptionAthlone, Irelanden
dc.description.abstractThis paper reports an experiment exploring how a voice analysis-synthesis system, GlórCáil, can be used to add expressiveness to the synthetic voice in text-to-speech (TTS) systems. This implementation focuses on the Irish ABAIR TTS voices, where such voice control would facilitate many current/envisaged applications. GlórCáil allows voice control of synthesized speech, and for this experiment was integrated into a DNN-based TTS framework. Utterances were generated with f0, voice quality and vocal tract parameter manipulations targeting shifts in speaker identity and in the affective coloring of utterances. Scaling factors used for the manipulations were suggested in an earlier study. They involved global changes without sentence-internal dynamic variation, with a view to ascertain whether such global shifts might alter listeners' perception of speaker identity and affect. Results demonstrate affect shifts compatible with expectations. However, there were confounding factors. The female/child voices were poorly differentiated, which was expected given the similarity in the scaling factors used. The affect transformations suggest the baseline voice used had an intrinsically sad quality so that there is weak differentiation between the sad and no emotion stimuli. Male angry voice was the least successful, suggesting that dynamic, within-utterance variation is essential for the signaling of certain affects.en
dc.format.extent1en
dc.format.extent6en
dc.language.isoenen
dc.rightsYen
dc.subjectSpeech synthesisen
dc.subjectVoice qualityen
dc.subjectVoice transformationen
dc.subjectAffecten
dc.subjectSpeaker characteristicsen
dc.titleIntegrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identityen
dc.title.alternative2021 32nd Irish Signals and Systems Conference (ISSC)en
dc.typeConference Paperen
dc.type.supercollectionscholarly_publicationsen
dc.type.supercollectionrefereed_publicationsen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/yanushien
dc.identifier.peoplefinderurlhttp://people.tcd.ie/cegoblen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/anichsiden
dc.identifier.peoplefinderurlhttp://people.tcd.ie/amurph48en
dc.identifier.rssinternalid238245en
dc.identifier.doihttp://dx.doi.org/10.1109/ISSC52156.2021.9467853en
dc.rights.ecaccessrightsopenAccess
dc.identifier.orcid_id0000-0003-1161-4625en
dc.status.accessibleNen
dc.identifier.urihttp://hdl.handle.net/2262/98196


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record