Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision

Ulicny, Matej

dc.contributor.advisor	Dahyot, Rozenn	en
dc.contributor.author	Ulicny, Matej	en
dc.date.accessioned	2021-05-04T07:20:59Z
dc.date.available	2021-05-04T07:20:59Z
dc.date.issued	2021	en
dc.date.submitted	2021	en
dc.identifier.citation	Ulicny, Matej, Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision, Trinity College Dublin.School of Computer Science & Statistics, 2021	en
dc.identifier.other	Y	en
dc.description	APPROVED	en
dc.description.abstract	Convolutional neural networks (CNNs) have become a paradigm for designing vision based intelligent systems. These models are controlled by a vast amount of parameters, which are learned thanks to the availability of annotated datasets. Image data is available in multiple formats including JPEG that uses Discrete Cosine Transform (DCT) coefficients to efficiently encode and compress visual information. We first propose to use directly these DCT coefficients of the JPEG images as input of CNN models, removing the need to completely decode JPEG format before applying CNNs. Furthermore, we propose to use DCT basis functions to express convolutional filters in any layer of a CNN and we show that this provides an advantageous regularization during the training process. We show that expressing weights within DCT bases can increase performance and speed up the training. We improve several popular models on standard benchmarks such as ImageNet classification accuracy by 1%, MS COCO object detection average precision by 1% and Pascal VOC semantic segmentation IoU score by 1.1%. We propose to exploit properties of natural images by restricting the set of basis functions used during the training. Suppressing the low-frequency component on the first layer can make models insensitive to illumination effects. High-frequency truncation on multiple layers can in turn add stability and efficiently compress a model without any significant loss in accuracy. Using the DCT bases provides a prior that reduces overfitting, specially when compression is applied, and helps with generalization when fewer samples are available. Lastly, the standard DCT-based compression is modified and extended to be applicable to any weight tensor used in neural networks. We propose to reshape a tensor into a 2-dimensional matrix and reorder its rows based on pairwise distances between the columns in order to make the matrix more coherent. The reordered matrix is transformed via 1-dimensional DCT and high frequencies are truncated. We further correct the scale and bias parameters of batch normalization layers to take into account compression of the preceding layers. Promising results are achieved even without a need for model fine-tuning. The use of a short fine-tuning of one epoch can lead to models with 3-times fewer parameters without a loss in accuracy.	en
dc.publisher	Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science	en
dc.rights	Y	en
dc.subject	Boundary Detection	en
dc.subject	Convolutional Neural Network	en
dc.subject	Discrete Cosine Transform	en
dc.subject	Image Classification	en
dc.subject	Model Compression	en
dc.subject	Object Detection	en
dc.subject	Semantic Segmentation	en
dc.title	Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision	en
dc.type	Thesis	en
dc.type.supercollection	thesis_dissertations	en
dc.type.supercollection	refereed_publications	en
dc.type.qualificationlevel	Doctoral	en
dc.identifier.peoplefinderurl	https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:ULINM	en
dc.identifier.rssinternalid	228403	en
dc.rights.ecaccessrights	openAccess
dc.contributor.sponsor	Science Foundation Ireland (SFI)	en
dc.identifier.uri	http://hdl.handle.net/2262/96207

Files in this item

Name:: matej_ulicny_thesis.pdf
Size:: 16.59Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.530Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Theses and Dissertations)
Computer Science (Theses and Dissertations)
Trinity College Dublin Theses & Dissertations

Show simple item record

Browse

My Account

Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision

Files in this item

This item appears in the following Collection(s)