This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Widen extending scalar loads to 32-bits
ClosedPublic

Authored by wdng on Jul 7 2017, 2:40 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng created this revision.Jul 7 2017, 2:40 PM
arsenm edited edge metadata.Jul 10 2017, 9:08 AM

Needs tests

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
448 ↗(On Diff #105700)

There's no point to having this since you never set it to anything else

453 ↗(On Diff #105700)

Ideally this would handle vectors like <2 x i8>

454 ↗(On Diff #105700)

The address space check should be first

458 ↗(On Diff #105700)

The builder already has a getInt32Ty

460 ↗(On Diff #105700)

I.getPointerOperand

468 ↗(On Diff #105700)

This should also skip volatile loads

wdng updated this revision to Diff 106894.Jul 17 2017, 9:55 AM
wdng marked 5 inline comments as done.
  1. Address code reviews. Looks like adding "isDereferenceableAndAlignedPointer" is too strong to prevent expected code transformations.
  2. Modify related LIT tests.
wdng marked an inline comment as done.Jul 17 2017, 10:09 AM

This needs a dedicated test

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
452 ↗(On Diff #106894)

This should be able to handle vectors. This should also use the DataLayout so it works for pointers

456 ↗(On Diff #106894)

getI32Ty is the wrong thing to use here

wdng added inline comments.Jul 17 2017, 2:51 PM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
452 ↗(On Diff #106894)

This one (VT && VT->getBitWidth() < 32) is able to handle vectors with bitwidth < 32 or scalar (!VT). Are you saying to use DataLayout to handle the pointer dereferenceable issue?

wdng updated this revision to Diff 107684.Jul 21 2017, 9:20 AM
wdng marked 2 inline comments as done.

Address code reviews.

wdng marked an inline comment as done.Jul 21 2017, 9:21 AM
arsenm added inline comments.Jul 21 2017, 9:55 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
449 ↗(On Diff #107684)

No space after *

451 ↗(On Diff #107684)

M->getDataLayout()

462–475 ↗(On Diff #107684)

You don't need an entire block of code that is mostly the same for the vector case. The non-vector case requires a bitcast as well.

test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
3 ↗(On Diff #107684)

These should be running just this pass with opt

12 ↗(On Diff #107684)

This should not be converted because you don't know if it's 4 byte aligned

48 ↗(On Diff #107684)

Needs tests with half and <2 x half>, i1, and maybe another exotic size. I'm pretty sure this will assert for half as is now. Also need tests with various alignments and volatile

arsenm added inline comments.Jul 21 2017, 9:56 AM
test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
48 ↗(On Diff #107684)

Also would be good to have a test specifically loading from the dispatch packet like happens in the workitem ID calculation

kzhuravl added inline comments.Jul 21 2017, 10:10 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
42 ↗(On Diff #107684)

Included twice.

wdng updated this revision to Diff 107736.Jul 21 2017, 3:42 PM
wdng marked 6 inline comments as done.

Address code reviews.

wdng marked 2 inline comments as done.Jul 21 2017, 3:42 PM
arsenm added inline comments.Jul 24 2017, 10:14 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
455 ↗(On Diff #107736)

Move align check before DA. It would also be better to move the datalayout alignment check into a helper function checked here

test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
5 ↗(On Diff #107736)

Don't use FUNC, you don't have this as a check prefix

24–27 ↗(On Diff #107736)

This isn't checking the relevant parts

133 ↗(On Diff #107736)

_f16 is the naming convention

145 ↗(On Diff #107736)

_v2f16

161 ↗(On Diff #107736)

Needs to check the type

wdng updated this revision to Diff 107960.Jul 24 2017, 1:52 PM
wdng marked 4 inline comments as done.

Address code reviews.

wdng marked an inline comment as done.Jul 24 2017, 1:53 PM
wdng updated this revision to Diff 107967.Jul 24 2017, 2:38 PM

Fixed a mistake for align2_i8 lit test.

wdng updated this revision to Diff 107982.Jul 24 2017, 3:47 PM

Upload correct diff file.

arsenm added inline comments.Jul 25 2017, 9:42 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
130–135 ↗(On Diff #107982)

The comment explains too specifically what it is doing, it should be describing intent and why.

test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
1 ↗(On Diff #107982)

These are IR check lines, so shouldn't use GCN. You also don't need or use the HSA check prefix

88–90 ↗(On Diff #107982)

This should check the integer type/operands

172 ↗(On Diff #107982)

Spelling _addrespace1

wdng updated this revision to Diff 108113.Jul 25 2017, 10:16 AM
wdng marked 4 inline comments as done.

Address code reviews.

arsenm added inline comments.Jul 26 2017, 1:29 PM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
238 ↗(On Diff #108113)

This doesn't actually do the widening, so the name should be something like canWidenScalarExtLoad

wdng updated this revision to Diff 108351.Jul 26 2017, 1:43 PM
wdng marked an inline comment as done.

Modified function name.

arsenm accepted this revision.Jul 26 2017, 1:52 PM

LGTM

This revision is now accepted and ready to land.Jul 26 2017, 1:52 PM
arsenm added inline comments.Jul 26 2017, 1:53 PM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
245 ↗(On Diff #108351)

This should really be I.isSimple(), in case it's atomic.

wdng marked an inline comment as done.Jul 26 2017, 2:08 PM
This revision was automatically updated to reflect the committed changes.