Automatic Metadata Mining from Multilingual Enterprise Content

WADE, VINCENT; SAH, MELIKE

dc.contributor.author	WADE, VINCENT	en
dc.contributor.author	SAH, MELIKE	en
dc.date.accessioned	2012-02-29T15:01:18Z
dc.date.available	2012-02-29T15:01:18Z
dc.date.issued	2012	en
dc.date.submitted	2012	en
dc.identifier.citation	Melike Sah, Vincent Wade, Automatic Metadata Mining from Multilingual Enterprise Content, Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 11, 2012, 41-62	en
dc.identifier.other	Y	en
dc.description	PUBLISHED	en
dc.description.abstract	Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (~11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval.	en
dc.description.sponsorship	This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at University of Dublin, Trinity College.	en
dc.format.extent	41-62	en
dc.language.iso	en	en
dc.relation.ispartofseries	Journal of Web Semantics: Science, Services and Agents on the World Wide Web	en
dc.relation.ispartofseries	11	en
dc.rights	Y	en
dc.subject	Computer science	en
dc.subject	Automatic metadata generation	en
dc.subject	ontologies	en
dc.subject	personalization	en
dc.title	Automatic Metadata Mining from Multilingual Enterprise Content	en
dc.type	Journal Article	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/vwade	en
dc.identifier.rssinternalid	75815	en
dc.subject.TCDTheme	Intelligent Content & Communications	en
dc.identifier.rssuri	http://dx.doi.org/10.1016/j.websem.2011.11.001	en
dc.contributor.sponsor	Science Foundation Ireland (SFI)	en
dc.contributor.sponsorGrantNumber	07/CE/I1142	en
dc.identifier.uri	http://hdl.handle.net/2262/62420

Files in this item

Name:: Automatic Metadata Mining from ...
Size:: 1.742Mb
Format:: PDF
Description:: Published (author's copy) - Peer ...

View/Open

Name:: license.txt
Size:: 3.243Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Automatic Metadata Mining from Multilingual Enterprise Content

Files in this item

This item appears in the following Collection(s)