Neural Turn-Taking Models for Spoken Dialogue Systems

Roddy, Matthew

dc.contributor.advisor	Harte, Naomi	en
dc.contributor.author	Roddy, Matthew	en
dc.date.accessioned	2021-05-11T09:05:23Z
dc.date.available	2021-05-11T09:05:23Z
dc.date.issued	2021	en
dc.date.submitted	2021	en
dc.identifier.citation	Roddy, Matthew, Neural Turn-Taking Models for Spoken Dialogue Systems, Trinity College Dublin.School of Engineering, 2021	en
dc.identifier.other	Y	en
dc.description	APPROVED	en
dc.description.abstract	In order to simulate naturalistic turn-taking behaviours, such as fast-turn switches, intentional overlap, backchanneling, and barge-in, spoken dialogue systems (SDSs) will need to have computational models of turn-taking that are both predictive and incremental. They will need to be predictive in the sense that they predict future user turn-taking behaviours rather than respond to behaviours that have already occurred, as is typically done in traditional endpointing-based systems. In the projection theory of Sacks et al. (1974) they proposed that humans are capable of anticipating turn endings before they occur. We argue that SDSs which aim to converse in a human-like manner should be capable of anticipating user behaviours as well. To make decisions based on these predictions, the system must process information incrementally, while the user is still speaking. In this thesis we develop recurrent neural network (RNN) based models of turn-taking that are both predictive and incremental. Continuous turn-taking (CTT) models as proposed by Skantze (2017) were taken as a starting point. We investigated these models and proposed a number of improvements and extensions. First, we performed an analysis of input features for CTT models, gained insights into the utility of different varieties of features, and proposed optimal sets. We then proposed architectural improvements to the original CTT model in the form of a multiscale RNN architecture that allows features to be processed at an independent rate. We then designed a control process based on partially observable Markov decision processes (POMDPs) that is able to employ the predictive nature of our RNN models to make responsive turn-taking decisions. Our investigations led to the development of a different variety of model that can be used for generating naturalistic response timings using features from both the user's turn and the system turn. Our response timing networks (RTNets) are motivated by the observation that response timings carry communicative importance, and that listeners associate different timings with different types of responses. RTNets are still both predictive and incremental, but they differ from CTT models in many other aspects, such as their objective functions and architectures. We propose that these models address an overlooked aspect of SDS response generation that can increase the realism of SDS interactions.	en
dc.publisher	Trinity College Dublin. School of Engineering. Discipline of Electronic & Elect. Engineering	en
dc.rights	Y	en
dc.subject	Machine Learning	en
dc.subject	Dialogue Systems	en
dc.title	Neural Turn-Taking Models for Spoken Dialogue Systems	en
dc.type	Thesis	en
dc.type.supercollection	thesis_dissertations	en
dc.type.supercollection	refereed_publications	en
dc.type.qualificationlevel	Doctoral	en
dc.identifier.peoplefinderurl	https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:RODDYM	en
dc.identifier.rssinternalid	229160	en
dc.rights.ecaccessrights	openAccess
dc.contributor.sponsor	Adapt Centre	en
dc.identifier.uri	http://hdl.handle.net/2262/96251

Files in this item

Name:: mroddy_thesis_rev_6.pdf
Size:: 4.317Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.530Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Electronic & Electrical Eng (Theses and Dissertations)
Electronic & Electrical Eng (Theses and Dissertations)
Trinity College Dublin Theses & Dissertations

Show simple item record

Browse

My Account

Neural Turn-Taking Models for Spoken Dialogue Systems

Files in this item

This item appears in the following Collection(s)