This is an archive of the discontinued LLVM Phabricator instance.

Fix invalid addrspacecast due to combining alloca with global var
ClosedPublic

Authored by yaxunl on Nov 30 2016, 2:33 PM.

Details

Summary

For function-scope variables with large initialisation list, FE usually generates a global variable to hold the initializer, then generates memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst identifies such allocas which are accessed only by reading and replaces them with the global variable. This is done by casting the global variable to the type of the alloca and replacing all references.

However, when the global variable is in a different address space which is disjoint with addr space 0 (e.g. for IR generated from OpenCL, global variable cannot be in private addr space i.e. addr space 0), casting the global variable to addr space 0 results in invalid IR for certain targets (e.g. amdgpu).

To fix this issue, when the global variable is not in addr space 0, instead of casting it to addr space 0, this patch chases down the uses of alloca until reaching the load instructions, then replaces load from alloca with load from the global variable. If during the chasing bitcast and GEP are encountered, new bitcast and GEP based on the global variable are generated and used in the load instructions.

A debug output is also added to amdgpu backend to facilitate debugging such issues.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl updated this revision to Diff 79811.Nov 30 2016, 2:33 PM
yaxunl retitled this revision from to Fix invalid addrspacecast due to combining alloca with global var.
yaxunl updated this object.
yaxunl added a subscriber: llvm-commits.
yaxunl updated this object.Nov 30 2016, 2:38 PM
yaxunl edited edge metadata.
majnemer edited edge metadata.Nov 30 2016, 4:15 PM

Please use auto * instead of auto when dealing with pointer types.

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
240–241 ↗(On Diff #79811)

Formatting.

243 ↗(On Diff #79811)

Please format this appropriately.

244 ↗(On Diff #79811)

Don't use llvm:: here.

268–269 ↗(On Diff #79811)

Please brace this.

test/Transforms/InstCombine/memcpy-addrspace.ll
66 ↗(On Diff #79811)

Please fix.

yaxunl updated this revision to Diff 79913.Dec 1 2016, 8:14 AM
yaxunl edited edge metadata.
yaxunl marked 5 inline comments as done.

Revised by David's comments.

yaxunl updated this object.Dec 1 2016, 8:17 AM
yaxunl edited edge metadata.
majnemer added inline comments.Dec 7 2016, 12:05 PM
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
238–240 ↗(On Diff #79913)

I'd make these cast<> instead of dyn_cast<> and remove PT and NT from the assert.

253–256 ↗(On Diff #79913)

Is it possible for you to create instructions which you never RAUW?

345–351 ↗(On Diff #79913)

Formatting looks funny.

arsenm added inline comments.Dec 8 2016, 11:49 AM
lib/Target/AMDGPU/SIISelLowering.cpp
37–39 ↗(On Diff #79913)

I think these will break -debug-only=isel so this should also be removed

2141–2142 ↗(On Diff #79913)

This should be removed

yaxunl added inline comments.Dec 8 2016, 12:28 PM
lib/Target/AMDGPU/SIISelLowering.cpp
2141–2142 ↗(On Diff #79913)

I can keep it to myself but without this debug output it is much difficult to debug the "invalid addrspacecast" issue.

arsenm added inline comments.Dec 8 2016, 12:31 PM
lib/Target/AMDGPU/SIISelLowering.cpp
2141–2142 ↗(On Diff #79913)

It already produces an error here which is easy to track down

yaxunl added inline comments.Dec 8 2016, 12:35 PM
lib/Target/AMDGPU/SIISelLowering.cpp
2141–2142 ↗(On Diff #79913)

but it does not tell you what SDNode causes it. Even if you break into it you still need to dump it to know what happens.

yaxunl marked 3 inline comments as done.Dec 8 2016, 2:00 PM
yaxunl added inline comments.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
253–256 ↗(On Diff #79913)

Yes, but then these instructions will be removed. The lit test checks that.

yaxunl updated this revision to Diff 80822.Dec 8 2016, 2:04 PM
yaxunl marked an inline comment as done.

Revised by David's and Matt's comments.

majnemer added inline comments.Dec 8 2016, 2:07 PM
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
253–256 ↗(On Diff #79913)

It is bad form to create new instructions in instcombine if you aren't sure you will want to keep them. Please keep in mind that IC is heavily relied upon and shows up several times in the pass pipeline.

There are two different things going on in this patch: a correctness fix and an optimization. How about we split out the correctness fix out to another patch?

yaxunl marked 2 inline comments as done.Dec 9 2016, 7:42 AM

There are two different things going on in this patch: a correctness fix and an optimization. How about we split out the correctness fix out to another patch?

The correctness-only fix will generate less optimal code which cause a regression in another lit test.

I have a fix for the issue you mentioned above.

yaxunl updated this revision to Diff 81114.Dec 12 2016, 11:28 AM

Revised so that no useless instructions generated during instruction combination.

Hi David,

Do I still need to split this patch into two: one for disabling the optimisation to fix the functional issue, and the second to add the new optimization?

Thanks.

rivanvx added a subscriber: rivanvx.Feb 9 2017, 7:32 AM

Whoa, this seems to fix the remaining part of https://bugs.freedesktop.org/show_bug.cgi?id=99552
Reviewing.

rivanvx accepted this revision.Feb 9 2017, 11:22 AM

To the extent of my knowledge and testing, this looks good.

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
256 ↗(On Diff #81114)

If the first user isn't an instruction, what guarantees that others aren't? How come there isn't a continue here instead?

287–288 ↗(On Diff #81114)

Does the order matter here? If not, can you make it consistent with other two cases?

This revision is now accepted and ready to land.Feb 9 2017, 11:22 AM
yaxunl added inline comments.Feb 9 2017, 12:54 PM
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
256 ↗(On Diff #81114)

The function is recursive. It tries to iterate through the users and then the users of users, so long so forth, to find a chain of GEP/casts that ends with a load, then try to replace the uses of the load with the load of another pointer. The replacing only happens when we reach the load instruction.

For example, we want to replace a with aa, which are pointers to the same pointee type in different addr space:

b = bitcast a
c = GEP b, 1
d = load c
// uses of d

so we create new casts/GEPs/loads

bb = bitcast aa
cc = GEP bb, 1
dd = load cc
// replace uses of d with dd

If the first user is not an instruction, that means this will not end up as a chain of instructions, so we will just return so that the recursive function will try the next path.

287–288 ↗(On Diff #81114)

The handling of load instructions is different from the handling of GEP/casts. For example, we want to replace a with aa

b = cast a
// uses of b
c = load b
// uses of c

We will create new casts following the uses of a,

bb = cast aa
cc = load bb
// replaces uses of c with uses of cc

However, we should not replace uses of b with bb since they have different address space therefore different type. Also we may not be able to remove b since it may have other users. However, since one inst combine pass contains multiple iterations, it will be removed in the next iteration if it does not have any users.

This revision was automatically updated to reflect the committed changes.