Printed Text Recognition for Lexical Lists in Chinese-International Phonetic Alphabet (IPA) Glossing

Hill, Nathan

This item is covered by a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationa. Click to find out more

File Type:

PDF

Item Type:

Journal Article

Date:

2023

Author:

Hill, Nathan

Access:

openAccess

Citation:

Li, Shihua; Hill, Nathan W., Printed Text Recognition for Lexical Lists in Chinese-International Phonetic Alphabet (IPA) Glossing, Journal of Open Humanities Data, 9, 15, 2023, 1-8

Download Item:

(Li and Hill 2023 Transkribus.pdf) 1.340Mb

Abstract:

This study presents a dataset serving as a benchmark for the recognition of printed text in lexical lists using Chinese-IPA glossing. The paper provides an overview of the baseline model, transcription model, and PyLaia engines employed in the research. Furthermore, it elucidates the specific need for digitizing the aforementioned lexical lists, outlines the methodology employed for training the baseline model for layout analysis, and describes the training process of the transcription model using the ground truth data generated on Transkribus. This comprehensive approach encompasses both the images of the lexical list content and their corresponding transcriptions as input. Additionally, the study highlights the limitations of the model and identifies avenues for future development. By making this dataset openly accessible, it can be utilized by researchers seeking to digitize lexical lists using Chinese-IPA glossing. Moreover, since the model can recognize both Chinese characters and IPA symbols, it has the potential to contribute to linguistic analysis of languages documented in Chinese-IPA glossing.

URI:

http://hdl.handle.net/2262/104040

Author's Homepage:

http://people.tcd.ie/hillna

Description:

PUBLISHED

Author: Hill, Nathan

Type of material:

Journal Article

URI:

http://hdl.handle.net/2262/104040

Collections

Series/Report no:

Journal of Open Humanities Data
9
15

Availability:

Full text available

DOI:

http://dx.doi.org/10.5334/johd.119

ISSN:

2059-481X

Metadata

Show full item record

The following license files are associated with this item:

Original License

Browse

My Account