Show simple item record

dc.contributor.advisorHarte, Naomi
dc.contributor.authorKotey, Samantha
dc.date.accessioned2025-05-22T10:58:25Z
dc.date.available2025-05-22T10:58:25Z
dc.date.issued2025en
dc.date.submitted2025
dc.identifier.citationSamantha Kotey, 'Multimodal Summarization in Natural Language Conversations', [Thesis], Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science, 2025en
dc.identifier.otherYen
dc.descriptionAPPROVEDen
dc.description.abstractSummarization is the task of creating a shorter version of a piece of content, that is representative of the original format. This process not only applies to written text, but is also essential for verbal exchanges in the form of dialogue. Naturally, humans recount snippets of relevant information from stories and events, when engaging in conversations. Subconsciously, these stories are composed and summarized from memory, incorporating linguistic, auditory, and visual elements. In order for machines to replicate this behaviour, multiple modalities should be integrated into the summarization process. Although substantial progress has been achieved in text summarization, research that leverages additional modalities, particularly in conversation analysis, remains under explored. To this end, we address the challenges associated with developing multimodal systems that can automatically summarize human conversations. We explore emerging natural language domains such as podcasting and online video conferencing meetings, where lengthy conversations frequently occur, and where applications are in high demand. Automatic summarization would significantly reduce the time required to create trailers and teasers for podcasting episodes, or minutes of meetings in a professional context. The objective of this thesis, is to contribute knowledge that advances the field of multimodal dialogue summarization. To achieve this, we investigate the strengths and limitations of using individual modalities, such as text and audio, for generating summaries independently. Specifically, we propose a method to generate long fine grained summaries of podcast conversations, demonstrating the effectiveness of text as a single modality. Similarly, we explore the limitations of using audio independently by introducing a method to generate audio clip summaries, directly using raw audio embeddings. Following this, we examine the challenges of multimodal integration, particularly focusing on incorporating video into summarization systems. In order to study the complex interactions between modalities, we create dense annotations at the utterance level, for an existing multimodal dataset. Additionally, we propose a multimodal transformer architecture, that incorporates cost sensitive learning and gated fusion techniques. This thesis presents a comprehensive overview of the proposed methods and thoroughly evaluates their performance. The implications of the findings are also discussed, along with potential applications within the field.en
dc.language.isoenen
dc.publisherTrinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Scienceen
dc.rightsYen
dc.subjectMultimodal Summarizationen
dc.subjectSpoken Document Summarizationen
dc.subjectAudio Summarizationen
dc.subjectSpeech Summarizationen
dc.subjectMeeting Summarizationen
dc.subjectLarge Language Modelsen
dc.subjectMachine Learningen
dc.titleMultimodal Summarization in Natural Language Conversationsen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:KOTEYSen
dc.identifier.rssinternalid278155en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorIrish Research Council (IRC)en
dc.contributor.sponsorGrantNumberGOIPG/2019/2353en
dc.identifier.urihttps://hdl.handle.net/2262/111819


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record