GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.
This is a re-submission.
Paths
| Differential D44179
[AMDGPU] Widened vector length for global/constant address space. ClosedPublic Authored by FarhanaAleen on Mar 6 2018, 4:11 PM.
Details Summary GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. This is a re-submission.
Diff Detail Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptMar 6 2018, 4:11 PM This revision is now accepted and ready to land.Mar 6 2018, 4:15 PM Closed by commit rL326910: [AMDGPU] Increased vector length for global/constant loads. (authored by faaleen). · Explain WhyMar 7 2018, 9:11 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 137289 lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
lib/Target/AMDGPU/SIISelLowering.cpp
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
test/CodeGen/AMDGPU/load-constant-f32.ll
test/CodeGen/AMDGPU/load-constant-f64.ll
test/CodeGen/AMDGPU/waitcnt-looptest.ll
|