Show simple item record

dc.contributor.advisorByrne, John G.
dc.contributor.authorAnderson, Glynn
dc.date.accessioned2021-07-22T13:39:23Z
dc.date.available2021-07-22T13:39:23Z
dc.date.issued1993
dc.identifier.citationGlynn Anderson, 'Computerising a library catalogue using optical character recognition', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 1993, pp 164
dc.identifier.otherTHESIS 2842
dc.description.abstractTrinity College Library contains several million books. Catalogues for the more modern books have been computerised to allow readers a fast and efficient means of locating a book. The 1872 Printed Catalogue which lists books owned by the library before 1872 has not yet been computerised. The catalogue lists 165,000 books, some of which are the most valuable in the library. The purpose of this project is to write a computer program that will automatically computerise the catalogue using optical character recognition (OCR). OCR is the process by which a digital picture of a portion of text is converted into computer readable text. Each character on the page is represented by a group or ’blob’ of dots or pixels. The role of the computer is twofold; first to decide which pixels should be grouped together (ie which belong to the same character) and second to decide what character each of the blobs of pixels represents. The output of the OCR program is sent to a database and will eventually be incorporated into the existing DYNIX© database, currently in use in the library. The thesis contains a review of several different approaches to OCR, including feature vector analysis, discrimination trees, stroke analysis and neural networks. The implementation and results of a selection of these methods are described. The recognition or classification method used in this project, template matching, has not been implemented before as a primary classification method. The results of this thesis show that template matching compares very favourably with other classification methods. The thesis describes the considerable work undertaken in deriving a good matching algorithm which is the key to success of template matching. The segmentation of lines and characters is described in full including the development of a very efficient perimeter tracing algorithm. Before the final chapters on results, conclusion and future work, there is a chapter explaining how a state machine is used, while classifying, to delimit the fields within each entry on a catalogue page.
dc.format1 volume
dc.language.isoen
dc.publisherTrinity College (Dublin, Ireland). School of Computer Science & Statistics
dc.relation.isversionofhttp://stella.catalogue.tcd.ie/iii/encore/record/C__Rb12469081
dc.subjectComputer Science, M.Sc.
dc.subjectM.Sc. Trinity College Dublin 1993
dc.subjectPrinted Catalogueen
dc.subject1872 Catalogueen
dc.subjectTrinity College Libraryen
dc.titleComputerising a library catalogue using optical character recognition
dc.typethesis
dc.type.supercollectionthesis_dissertations
dc.type.supercollectionrefereed_publications
dc.type.qualificationlevelMaster thesis (research)
dc.type.qualificationnameMaster in Science (M.Sc.)
dc.rights.ecaccessrightsopenAccess
dc.format.extentpaginationpp 164
dc.description.noteTARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie
dc.identifier.urihttp://hdl.handle.net/2262/96773


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record