Show simple item record

dc.contributor.advisorDahyot, Rozennen
dc.contributor.authorUlicny, Matejen
dc.date.accessioned2021-05-04T07:20:59Z
dc.date.available2021-05-04T07:20:59Z
dc.date.issued2021en
dc.date.submitted2021en
dc.identifier.citationUlicny, Matej, Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision, Trinity College Dublin.School of Computer Science & Statistics, 2021en
dc.identifier.otherYen
dc.descriptionAPPROVEDen
dc.description.abstractConvolutional neural networks (CNNs) have become a paradigm for designing vision based intelligent systems. These models are controlled by a vast amount of parameters, which are learned thanks to the availability of annotated datasets. Image data is available in multiple formats including JPEG that uses Discrete Cosine Transform (DCT) coefficients to efficiently encode and compress visual information. We first propose to use directly these DCT coefficients of the JPEG images as input of CNN models, removing the need to completely decode JPEG format before applying CNNs. Furthermore, we propose to use DCT basis functions to express convolutional filters in any layer of a CNN and we show that this provides an advantageous regularization during the training process. We show that expressing weights within DCT bases can increase performance and speed up the training. We improve several popular models on standard benchmarks such as ImageNet classification accuracy by 1%, MS COCO object detection average precision by 1% and Pascal VOC semantic segmentation IoU score by 1.1%. We propose to exploit properties of natural images by restricting the set of basis functions used during the training. Suppressing the low-frequency component on the first layer can make models insensitive to illumination effects. High-frequency truncation on multiple layers can in turn add stability and efficiently compress a model without any significant loss in accuracy. Using the DCT bases provides a prior that reduces overfitting, specially when compression is applied, and helps with generalization when fewer samples are available. Lastly, the standard DCT-based compression is modified and extended to be applicable to any weight tensor used in neural networks. We propose to reshape a tensor into a 2-dimensional matrix and reorder its rows based on pairwise distances between the columns in order to make the matrix more coherent. The reordered matrix is transformed via 1-dimensional DCT and high frequencies are truncated. We further correct the scale and bias parameters of batch normalization layers to take into account compression of the preceding layers. Promising results are achieved even without a need for model fine-tuning. The use of a short fine-tuning of one epoch can lead to models with 3-times fewer parameters without a loss in accuracy.en
dc.publisherTrinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Scienceen
dc.rightsYen
dc.subjectBoundary Detectionen
dc.subjectConvolutional Neural Networken
dc.subjectDiscrete Cosine Transformen
dc.subjectImage Classificationen
dc.subjectModel Compressionen
dc.subjectObject Detectionen
dc.subjectSemantic Segmentationen
dc.titleConvolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Visionen
dc.typeThesisen
dc.type.supercollectionthesis_dissertationsen
dc.type.supercollectionrefereed_publicationsen
dc.type.qualificationlevelDoctoralen
dc.identifier.peoplefinderurlhttps://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:ULINMen
dc.identifier.rssinternalid228403en
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorScience Foundation Ireland (SFI)en
dc.identifier.urihttp://hdl.handle.net/2262/96207


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record