Show simple item record

dc.contributor.advisorHarte, Naomien
dc.contributor.authorRoddy, Matthewen
dc.date.accessioned2021-05-11T09:05:23Z
dc.date.available2021-05-11T09:05:23Z
dc.date.issued2021en
dc.date.submitted2021en
dc.identifier.citationRoddy, Matthew, Neural Turn-Taking Models for Spoken Dialogue Systems, Trinity College Dublin.School of Engineering, 2021en
dc.identifier.otherYen
dc.descriptionAPPROVEDen
dc.description.abstractIn order to simulate naturalistic turn-taking behaviours, such as fast-turn switches, intentional overlap, backchanneling, and barge-in, spoken dialogue systems (SDSs) will need to have computational models of turn-taking that are both predictive and incremental. They will need to be predictive in the sense that they predict future user turn-taking behaviours rather than respond to behaviours that have already occurred, as is typically done in traditional endpointing-based systems. In the projection theory of Sacks et al. (1974) they proposed that humans are capable of anticipating turn endings before they occur. We argue that SDSs which aim to converse in a human-like manner should be capable of anticipating user behaviours as well. To make decisions based on these predictions, the system must process information incrementally, while the user is still speaking. In this thesis we develop recurrent neural network (RNN) based models of turn-taking that are both predictive and incremental. Continuous turn-taking (CTT) models as proposed by Skantze (2017) were taken as a starting point. We investigated these models and proposed a number of improvements and extensions. First, we performed an analysis of input features for CTT models, gained insights into the utility of different varieties of features, and proposed optimal sets. We then proposed architectural improvements to the original CTT model in the form of a multiscale RNN architecture that allows features to be processed at an independent rate. We then designed a control process based on partially observable Markov decision processes (POMDPs) that is able to employ the predictive nature of our RNN models to make responsive turn-taking decisions. Our investigations led to the development of a different variety of model that can be used for generating naturalistic response timings using features from both the user's turn and the system turn. Our response timing networks (RTNets) are motivated by the observation that response timings carry communicative importance, and that listeners associate different timings with different types of responses. RTNets are still both predictive and incremental, but they differ from CTT models in many other aspects, such as their objective functions and architectures. We propose that these models address an overlooked aspect of SDS response generation that can increase the realism of SDS interactions.en
dc.publisherTrinity College Dublin. School of Engineering. Discipline of Electronic & Elect. Engineeringen
dc.rightsYen
dc.subjectMachine Learningen
dc.subjectDialogue Systemsen
dc.titleNeural Turn-Taking Models for Spoken Dialogue Systemsen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:RODDYMen
dc.identifier.rssinternalid229160en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorAdapt Centreen
dc.identifier.urihttp://hdl.handle.net/2262/96251


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record