Computerising a library catalogue using optical character recognition

Anderson, Glynn

dc.contributor.advisor	Byrne, John G.
dc.contributor.author	Anderson, Glynn
dc.date.accessioned	2021-07-22T13:39:23Z
dc.date.available	2021-07-22T13:39:23Z
dc.date.issued	1993
dc.identifier.citation	Glynn Anderson, 'Computerising a library catalogue using optical character recognition', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 1993, pp 164
dc.identifier.other	THESIS 2842
dc.description.abstract	Trinity College Library contains several million books. Catalogues for the more modern books have been computerised to allow readers a fast and efficient means of locating a book. The 1872 Printed Catalogue which lists books owned by the library before 1872 has not yet been computerised. The catalogue lists 165,000 books, some of which are the most valuable in the library. The purpose of this project is to write a computer program that will automatically computerise the catalogue using optical character recognition (OCR). OCR is the process by which a digital picture of a portion of text is converted into computer readable text. Each character on the page is represented by a group or ’blob’ of dots or pixels. The role of the computer is twofold; first to decide which pixels should be grouped together (ie which belong to the same character) and second to decide what character each of the blobs of pixels represents. The output of the OCR program is sent to a database and will eventually be incorporated into the existing DYNIX© database, currently in use in the library. The thesis contains a review of several different approaches to OCR, including feature vector analysis, discrimination trees, stroke analysis and neural networks. The implementation and results of a selection of these methods are described. The recognition or classification method used in this project, template matching, has not been implemented before as a primary classification method. The results of this thesis show that template matching compares very favourably with other classification methods. The thesis describes the considerable work undertaken in deriving a good matching algorithm which is the key to success of template matching. The segmentation of lines and characters is described in full including the development of a very efficient perimeter tracing algorithm. Before the final chapters on results, conclusion and future work, there is a chapter explaining how a state machine is used, while classifying, to delimit the fields within each entry on a catalogue page.
dc.format	1 volume
dc.language.iso	en
dc.publisher	Trinity College (Dublin, Ireland). School of Computer Science & Statistics
dc.relation.isversionof	http://stella.catalogue.tcd.ie/iii/encore/record/C__Rb12469081
dc.subject	Computer Science, M.Sc.
dc.subject	M.Sc. Trinity College Dublin 1993
dc.subject	Printed Catalogue	en
dc.subject	1872 Catalogue	en
dc.subject	Trinity College Library	en
dc.title	Computerising a library catalogue using optical character recognition
dc.type	thesis
dc.type.supercollection	thesis_dissertations
dc.type.supercollection	refereed_publications
dc.type.qualificationlevel	Master thesis (research)
dc.type.qualificationname	Master in Science (M.Sc.)
dc.rights.ecaccessrights	openAccess
dc.format.extentpagination	pp 164
dc.description.note	TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie
dc.identifier.uri	http://hdl.handle.net/2262/96773

Files in this item

Name:: Anderson TCD THESIS 2842 Compu ...
Size:: 67.76Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.530Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Theses and Dissertations)
Computer Science (Theses and Dissertations)
Trinity College Dublin Theses & Dissertations

Show simple item record

Browse

My Account

Computerising a library catalogue using optical character recognition

Files in this item

This item appears in the following Collection(s)