This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Allow vectorization of packed types
ClosedPublic

Authored by arsenm on May 1 2017, 4:11 PM.

Details

Reviewers
kzhuravl
cfang

Diff Detail

Event Timeline

arsenm created this revision.May 1 2017, 4:11 PM
arsenm updated this revision to Diff 97478.May 2 2017, 11:05 AM

Add a few more tests

arsenm updated this revision to Diff 97490.May 2 2017, 12:54 PM

Add missing test file

arsenm updated this revision to Diff 100010.May 23 2017, 3:10 PM

Use new TTI hook

This revision is now accepted and ready to land.Jun 20 2017, 12:26 PM
cfang accepted this revision.Jun 20 2017, 1:14 PM
cfang added inline comments.
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
261

The interleaving was disabled based on SHOC DeviceMemory readLocalMemory test. We request CQE to do a complete performance
measurement around this, and the results were very positive. The major reason to disable it is based on register usage concern.

I remember that I re-measure DeviceMemory performance later when new waitcnt insertion was introduced, and it turned out that it does not matter for DeviceMemory readLocalMemory if we enable it!

Note sure the other tests that CQE found beneficial when it is disabled.

arsenm closed this revision.Jun 20 2017, 1:39 PM

r305844

test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll