Show simple item record

dc.contributor.authorGarland, James Philip
dc.contributor.authorGregg, David
dc.date.accessioned2021-05-13T15:56:17Z
dc.date.available2021-05-13T15:56:17Z
dc.date.issued2018en
dc.date.submitted2018en
dc.identifier.citationJames Philip Garland, David Gregg, 'Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing', 2018, ACM Transactions on Architecture and Code Optimization;, 15;, 3;en
dc.identifier.issn1544-3566
dc.identifier.otherY
dc.descriptionPUBLISHEDen
dc.description.abstractConvolutional neural networks (CNNs) are one of the most successful machine-learning techniques for image, voice, and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs that typically contain large numbers of multiply-accumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate count and power consumption. “Weight-sharing” accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins, and the bin index is used to access the weight-shared value. We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight-shared CNN. PASM re-architects the MAC to instead count the frequency of each weight and place it in a bin. The accumulated value is computed in a subsequent multiply phase, significantly reducing gate count and power consumption of the CNN. In this article, we implement PASM in a weight-shared CNN convolution hardware accelerator and analyze its effectiveness. Experiments show that for a clock speed 1GHz implemented on a 45nm ASIC process our approach results in fewer gates, smaller logic, and reduced power with only a slight increase in latency. We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate the MAC operations.en
dc.format.extent31:1en
dc.format.extent31:24en
dc.language.isoenen
dc.relation.ispartofseriesACM Transactions on Architecture and Code Optimization;
dc.relation.ispartofseries15;
dc.relation.ispartofseries3;
dc.rightsYen
dc.subjectsame weight-shared-with-PASMen
dc.subjectConvolutional neural networks (CNNs)en
dc.subjectimage, voice, and video processingen
dc.subject.lcshsame weight-shared-with-PASMen
dc.subject.lcshConvolutional neural networks (CNNs)en
dc.subject.lcshimage, voice, and video processingen
dc.titleLow Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharingen
dc.typeJournal Articleen
dc.type.supercollectionscholarly_publicationsen
dc.type.supercollectionrefereed_publicationsen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/jgarland
dc.identifier.peoplefinderurlhttp://people.tcd.ie/dgregg
dc.identifier.rssinternalid215132
dc.rights.ecaccessrightsopenAccess
dc.relation.sourceACM Transactions on Architecture and Code Optimizationen
dc.subject.TCDTagComputer Hardwareen
dc.subject.TCDTagContext-aware Computingen
dc.identifier.rssurihttps://dl.acm.org/doi/10.1145/3233300
dc.relation.sourceurihttps://dl.acm.org/doi/10.1145/3233300en
dc.identifier.orcid_id0000-0002-8688-9407
dc.status.accessibleNen
dc.contributor.sponsorSFI stipenden
dc.contributor.sponsorGrantNumber12/IA/1381en
dc.contributor.sponsorScience Foundation Irelanden
dc.identifier.urihttp://hdl.handle.net/2262/96284


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record