Show simple item record

dc.contributor.authorHarte, Naomi
dc.date.accessioned2019-08-19T08:17:28Z
dc.date.available2019-08-19T08:17:28Z
dc.date.issued2017
dc.date.submitted2017en
dc.identifier.citationRoddy, M, Harte, N. Towards predicting dialog acts from previous speakers' non-verbal cues, BIBTEX 2017, 2017, 1--en
dc.identifier.otherY
dc.description.abstractIn studies of response times during conversational turn-taking, a modal time of 200 ms has been observed to be a universal value that exists across languages and cross-culturally. This 200 ms value is also seen as the limit of human response times to any stimulus (e.g the response time to a starting-gun in a race). It has also been shown that human language production is slow and can take up to 1500 ms to generate even a short clause. Due to these two observations, it is necessary for a person to start formulating their turns long before the end of their interlocutor’s turn. To do this we must predict elements of what a person will say in order to formulate our responses and sustain the flow of conversation. In this sense, the end of a person’s turn can be viewed as a trigger for a prepared response. This model of human language production informs incremental approaches to the design of dialog systems, where dialog options are evaluated incrementally, while the system processes user utterances. One way we can form our predictions is by reading the non-linguistic signals that are produced by our interlocutor. For example, prosodic information such as pitch inflection can be used to infer whether a question is being asked or a statement is being made. Pitch and intensity information can also be used to infer whether a backchannel is an appropriate response. These backchannel prediction models based on non-linguistic cues can be used by conversational agents to carry out more fluid interactions with users. The development of better prediction models that exploit the social signals that humans use will lead to agents that can reproduce the interaction behaviors of humans more effectively. In this analysis we look at non-verbal speaker signals that can be used to predict the appropriate dialogue act that will follow the speaker’s utterance. We define three categories of dialogue acts: (1) response (as in a response to a question),(2) statement (a general turn switch which does not include other dialog act types), and (3) backchannel (vocalizations encouraging the speaker to continues speaking). In addition we define a fourth category, no-response, which is not strictly a dialogue act but is a relevant category for agent interactions. We identify four types of non-verbal signals that can be used to predict the appropriate type of response dialogue act: inner eyebrow movement, outer eyebrow movement, blinks, and gaze. We analyze the behavior of these four signals in the vicinity of the dialogue acts.en
dc.format.extent1-en
dc.language.isoenen
dc.rightsYen
dc.subjectPredicting dialog actsen
dc.subject.lcshpredicting dialog actsen
dc.titleTowards predicting dialog acts from previous speakers' non-verbal cuesen
dc.title.alternativeBIBTEX 2017en
dc.typeConference Paperen
dc.type.supercollectionscholarly_publicationsen
dc.type.supercollectionrefereed_publicationsen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/nharte
dc.identifier.rssinternalid205254
dc.rights.ecaccessrightsopenAccess
dc.identifier.urihttp://hdl.handle.net/2262/89214


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record