Towards predicting dialog acts from previous speakers' non-verbal cues

Harte, Naomi

dc.contributor.author	Harte, Naomi
dc.date.accessioned	2019-08-19T08:17:28Z
dc.date.available	2019-08-19T08:17:28Z
dc.date.issued	2017
dc.date.submitted	2017	en
dc.identifier.citation	Roddy, M, Harte, N. Towards predicting dialog acts from previous speakers' non-verbal cues, BIBTEX 2017, 2017, 1--	en
dc.identifier.other	Y
dc.description.abstract	In studies of response times during conversational turn-taking, a modal time of 200 ms has been observed to be a universal value that exists across languages and cross-culturally. This 200 ms value is also seen as the limit of human response times to any stimulus (e.g the response time to a starting-gun in a race). It has also been shown that human language production is slow and can take up to 1500 ms to generate even a short clause. Due to these two observations, it is necessary for a person to start formulating their turns long before the end of their interlocutor’s turn. To do this we must predict elements of what a person will say in order to formulate our responses and sustain the flow of conversation. In this sense, the end of a person’s turn can be viewed as a trigger for a prepared response. This model of human language production informs incremental approaches to the design of dialog systems, where dialog options are evaluated incrementally, while the system processes user utterances. One way we can form our predictions is by reading the non-linguistic signals that are produced by our interlocutor. For example, prosodic information such as pitch inflection can be used to infer whether a question is being asked or a statement is being made. Pitch and intensity information can also be used to infer whether a backchannel is an appropriate response. These backchannel prediction models based on non-linguistic cues can be used by conversational agents to carry out more fluid interactions with users. The development of better prediction models that exploit the social signals that humans use will lead to agents that can reproduce the interaction behaviors of humans more effectively. In this analysis we look at non-verbal speaker signals that can be used to predict the appropriate dialogue act that will follow the speaker’s utterance. We define three categories of dialogue acts: (1) response (as in a response to a question),(2) statement (a general turn switch which does not include other dialog act types), and (3) backchannel (vocalizations encouraging the speaker to continues speaking). In addition we define a fourth category, no-response, which is not strictly a dialogue act but is a relevant category for agent interactions. We identify four types of non-verbal signals that can be used to predict the appropriate type of response dialogue act: inner eyebrow movement, outer eyebrow movement, blinks, and gaze. We analyze the behavior of these four signals in the vicinity of the dialogue acts.	en
dc.format.extent	1-	en
dc.language.iso	en	en
dc.rights	Y	en
dc.subject	Predicting dialog acts	en
dc.subject.lcsh	predicting dialog acts	en
dc.title	Towards predicting dialog acts from previous speakers' non-verbal cues	en
dc.title.alternative	BIBTEX 2017	en
dc.type	Conference Paper	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/nharte
dc.identifier.rssinternalid	205254
dc.rights.ecaccessrights	openAccess
dc.identifier.uri	http://hdl.handle.net/2262/89214

Files in this item

Name:: MMSYM2017_paper5_RoddyHarte.pdf
Size:: 64.95Kb
Format:: PDF
Description:: Pre-print (author's copy) - ...

View/Open

Name:: license.txt
Size:: 3.499Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Electronic & Electrical Eng (Scholarly Publications)
Electronic & Electrical Eng (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Towards predicting dialog acts from previous speakers' non-verbal cues

Files in this item

This item appears in the following Collection(s)