Page MenuHomePhabricator

Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address

Authored by wengxt on Jun 16 2015, 12:26 PM.



This is done by first adding two additional instructions to convert the
alloca returned address to local and convert it back to generic. Then
replace all uses of alloca instruction with the converted generic
address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine
the generic addresscast and the corresponding Load, Store, Bitcast, GEP
Instruction together.

Patched by Xuetian Weng (

Diff Detail

Event Timeline

wengxt updated this revision to Diff 27779.Jun 16 2015, 12:26 PM
wengxt retitled this revision from to Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address.
wengxt updated this object.
wengxt edited the test plan for this revision. (Show Details)
wengxt added reviewers: jingyue, jholewinski.
wengxt added a subscriber: Unknown Object (MLST).

This will depends on D10482 to eliminate alloca across the addrspacecast boundary, otherwise there maybe performance regression because SROA will not work properly.

Though moving SROA before NVPTXLowerAlloca could also resolve the issue, but NVPTXLowerKernelArgs, NVPTXLowerAlloca, NVPTXFavorNonGenericAddrSpaces will merge someday, so it's better to fix this issue in SROA.

jingyue added inline comments.Jun 16 2015, 12:48 PM

The comment here is imprecise. You're not adding "a cast to local"; you are adding a *pair* of addrspacecasts.


I'd mention the end result -- what the PTX looks like after this change, because that's what you are optimizing for.


Can this pass be a BasicBlockPass? Looks like all the changes are intra-BB.



for (auto &B : F) {
  for (auto &I : B) {

unless you invalidate the iterators.


Similarly, I think there's a uses() that returns a range?


and BitCast. Can you comment in code why you only check these types? What would happen if you replace all uses?


Is this %SP + 0 removed due to your change? If so, add a CHECK-NOT in the test.

jholewinski edited edge metadata.Jun 16 2015, 1:01 PM

Other than the issues raised by Jingyue, this looks good to me! Thanks!

jingyue edited edge metadata.Jun 16 2015, 1:04 PM
jingyue added subscribers: eliben, broune, meheff.

feel free to throw in comments

wengxt updated this revision to Diff 27808.Jun 16 2015, 6:48 PM
wengxt edited edge metadata.

Try to address jingyue's comments.

For line 77 comment, the use iterator may be invalidated by changing
operands below.

for call-with-alloca-buffer.ll comment The %SP + 0 is not a critical
check, and it can appear in other position depending on some other
pass and instruction, so this check is removed and replaced by check
the dest register of instruction.

The updated patch also move NVPTXLowerAlloca and NVPTXFavorNonGenericAddrSpaces
after SROA pass. So we don't need to depends on D10482 for now. This
can be changed back depending on how D10482 goes.

jingyue accepted this revision.Jun 17 2015, 12:16 PM
jingyue edited edge metadata.

LGTM with some minors


; emits st.u32

just to show the difference from the optimized version


"Check Load, Store, GEP and BitCast Uses"


"For other types"


The first part of this comment still makes sense, right?

This revision is now accepted and ready to land.Jun 17 2015, 12:16 PM
wengxt updated this revision to Diff 27875.Jun 17 2015, 2:48 PM
wengxt edited edge metadata.

Fix issues according to jingyue's comments

jingyue added inline comments.Jun 17 2015, 3:19 PM

What are these "//continue"?

wengxt updated this revision to Diff 27880.Jun 17 2015, 3:22 PM

sorry for mistake..

jingyue updated this object.
jingyue closed this revision.Jun 17 2015, 3:35 PM