Automatic vectorization through superword level parellelism with associative chain re-ordering and loop shifting

ROGERS, STEPHEN

This item is covered by a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationa. Click to find out more

File Type:

PDF

Item Type:

Thesis

Date:

2018

Author:

ROGERS, STEPHEN

Access:

openAccess

Citation:

ROGERS, STEPHEN, Automatic vectorization through superword level parellelism with associative chain re-ordering and loop shifting, Trinity College Dublin.School of Computer Science & Statistics, 2018

Download Item:

(Thesis) 2.319Mb

Abstract:

Single instruction, multiple data (SIMD) is a class of parallel computing that involves executing a single operation across multiple pieces of data. A common type of SIMD is vector processing which involves executing a single instruction across 1-dimensional arrays of data called vectors. A category of compiler optimization called automatic vectorization has been developed since the introduction of vector processing to allow 'vectorizing compilers' to target such processor capabilities without direct intervention from application programmers. Convolution is a fundamental concept in image processing. It involves the application of a matrix called a kernel to weight the sum of a pixel and its adjacent pixels, for all pixels in an image. This process is used to perform tasks like image blurring, edge detection and noise reduction. In this thesis, we explore the challenges of automatic vectorization of image convolutions implemented in C and C++. We describe the fundamentals of vectorization and image convolutions and propose an approach for the effective vectorization of these convolutions. Our approach combines vectorization through Superword Level Parallelism with tentative loop unrolling, loop shifting, and the reordering of associative and commutative chains of instructions. Most modern optimizing compilers are capable of vectorizing 3x3 image convolutions, but tend to fail at vectorizing larger sized convolutions, like 5x5. The vectorizer we describe in this thesis, with the aid of its combined optimizations, is designed to vectorize such larger convolutions. Through this combination of optimizations, we have measured performance improvements for 5x5, 7x7, and 9x9 image convolutions. For convolutions operating on integer data types we measured performance improvements between 2.01x and 6.97x, and for floating-point types, between 2.19x and 5.34x.

URI:

http://hdl.handle.net/2262/85533

Author's Homepage:

http://people.tcd.ie/rogersst

Description:

APPROVED

Author: ROGERS, STEPHEN

Advisor:

Gregg, David

Publisher:

Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science

Type of material:

Thesis

URI:

http://hdl.handle.net/2262/85533

Collections

Availability:

Full text available

Subject:

automatic vectorization, SLP, software pipelining, compiler optimisation

Metadata

Show full item record

The following license files are associated with this item:

Original License

Browse

My Account