Data layout oriented compilation techniques in vectorization for multi-/many-cores

Xu, Shixiong

dc.contributor.advisor	Gregg, David
dc.contributor.author	Xu, Shixiong
dc.date.accessioned	2017-08-29T09:43:27Z
dc.date.available	2017-08-29T09:43:27Z
dc.date.issued	2017	en
dc.date.submitted	2017
dc.identifier.citation	XU, SHIXIONG, Data layout oriented compilation techniques in vectorization for multi-/many-cores, Trinity College Dublin.School of Computer Science & Statistics.COMPUTER SYSTEMS, 2017	en
dc.identifier.other	Y	en
dc.description	APPROVED	en
dc.description.abstract	Single instruction, multiple data (SIMD) architectures are widely adopted in both general-purpose processors and graphic processing units for exploiting data-level parallelism. It is tedious and error-prone for programmers to write high performance code to utilize SIMD execution units on both platforms. Therefore, users often rely on automatic code generation techniques in compilers. However, it is not trivial for compilers to generate high performance code without considering the data layout of the data used in the computation. Data layout determines data access patterns, and in turn have a great impact on the memory performance of the automatically generated code for both CPUs and GPUs. In this thesis, we demonstrate several data layout oriented compilation techniques for efficient vectorization. We put forward semi-automatic data layout transformation to help users to easily change their program, and exploit the best possible data layout in terms of vectorization. Our proposed vectorization based on hyper loop parallelism provides a way to take advantage the relationship between data layout and computation structure. The experimental results demonstrated that this vectorization technique can yield significant performance gain. In addition, we show that this technique is of great use to boost the memory performance on CUDA GPUs. We also present pioneering work that uses loop vectorization techniques to handle nested thread-level parallelism (TLP) on CUDA GPUs. As loop vectorization prioritizes vectorizing loops with contiguous data accesses, it is of great help to achieve an efficient mapping strategy for nested TLP on CUDA GPUs. Our new bitslice vector computing for customizable arithmetic precision on general-purpose processors with SIMD extensions not only breaks the limit of hardware arithmetic precision but also achieves great performance. It also shows the great power of logic optimization widely used in hardware synthesis in optimizing C/C++ code with a large amount of logic operations.	en
dc.language.iso	en	en
dc.publisher	Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science	en
dc.rights	Y	en
dc.subject	compiler	en
dc.subject	data layout	en
dc.subject	vectorization	en
dc.subject	bitslice	en
dc.subject	source-to-source transformation	en
dc.subject	single instruction multiple data (SIMD)	en
dc.title	Data layout oriented compilation techniques in vectorization for multi-/many-cores	en
dc.type	Thesis	en
dc.type.supercollection	thesis_dissertations	en
dc.type.supercollection	refereed_publications	en
dc.type.qualificationlevel	Postgraduate Doctor	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/xush	en
dc.identifier.rssinternalid	176597	en
dc.rights.ecaccessrights	openAccess
dc.contributor.sponsor	Science Foundation Ireland grant 10/CE/I1855 to Lero - the Irish Software Engineering Research Centre (www.lero.ie).	en
dc.identifier.uri	http://hdl.handle.net/2262/81727

Files in this item

Name:: thesis-final-two-sided.pdf
Size:: 3.550Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.419Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Theses and Dissertations)
Computer Science (Theses and Dissertations)
Trinity College Dublin Theses & Dissertations

Show simple item record

Browse

My Account

Data layout oriented compilation techniques in vectorization for multi-/many-cores

Files in this item

This item appears in the following Collection(s)