This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix address space handling when removing allocas
AbandonedPublic

Authored by jprice on Jan 4 2017, 10:03 AM.

Details

Reviewers
majnemer
Summary

This fixes the bug(s) originally described on llvm-dev in November (with some more discussion in the last couple of days):
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106955.html

As described by @mehdi_amini in his reply, the approach here is to recursively propagate the address space of the global through the allocas uses, instead of introducing a potentially illegal addrspacecast.

Apologies for any obvious issues with either the approach or the style, I'm a first time contributor to LLVM. One thing in particular I'm not sure about is where I had to modify test2 to check for the nonnull attribute as well - I couldn't work out how to correctly retain the original function attributes.

Event Timeline

jprice updated this revision to Diff 83067.Jan 4 2017, 10:03 AM
jprice retitled this revision from to [InstCombine] Fix address space handling when removing allocas.
jprice updated this object.
jprice added a reviewer: majnemer.
jprice added a subscriber: mehdi_amini.
jprice added a subscriber: llvm-commits.
jprice updated this revision to Diff 83108.Jan 4 2017, 12:41 PM

Fix memtransfer function type deduction

arsenm added a subscriber: arsenm.Jan 5 2017, 10:00 AM
arsenm added inline comments.
lib/Transforms/InstCombine/InstCombineInternal.h
447

SmallVector

471

SmallVector

472

nullptr

477–480

You should be able to do all of this in the initial constructor

487–488

This can be a C array

504–506

I think you'll also need to handle AtomicRMW and cmpxchg

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
62

35 is a strange choice for vector size

arsenm added inline comments.Jan 5 2017, 10:02 AM
lib/Transforms/InstCombine/InstCombineInternal.h
444

This could also be a utility somewhere more general

jprice updated this revision to Diff 83374.Jan 6 2017, 9:15 AM
jprice marked 6 inline comments as done.
  • Address review comments from arsenm
lib/Transforms/InstCombine/InstCombineInternal.h
444

I did wonder this, but currently this function makes several assumptions that are specific to the particular optimisation that uses it (e.g. lack of handling for certain instructions as per comment below). I think it may need a fair bit more work to make it more general purpose.

504–506

At the moment, the isOnlyCopiedFromConstantGlobal method doesn't allow this optimisation to occur if these instructions are present.

Ping.

I don't know if it's too late, but I'd love for this to make it into the 4.0 release. Although this bug has been present for a while so perhaps isn't strictly a regression, there is a regression in downstream projects that I'm working on due to (valid) changes in the IR generated by Clang since 3.9 which expose this bug.

Is this optimization safe in general if the address spaces don't match? Could, for example, some of the users being affected be some target-specific intrinsic that is only valid for certain address spaces?

lib/Transforms/InstCombine/InstCombineInternal.h
502

llvm_unreachable

jprice updated this revision to Diff 85837.Jan 25 2017, 4:23 PM
jprice marked an inline comment as done.
  • Handle different global and function arg address spaces properly
  • Use llvm_unreachable instead of assert(false)
  • Rebase to current trunk

Is this optimization safe in general if the address spaces don't match? Could, for example, some of the users being affected be some target-specific intrinsic that is only valid for certain address spaces?

This optimisation will always preserve the original address space of parameters passed to any call instructions, so I don't think this is an issue. I did just notice that this wasn't handled correctly by this patch in certain cases, so I've made sure that this is now the case (and added an appropriate test).

Ping.

I don't know if it's too late, but I'd love for this to make it into the 4.0 release.

I don't believe it is too late at this point.

@arsenm, since you started reviewing, do you mind getting another look at it? (I also know that AMDGPU is a heavy user of address space)

A few inline comments

lib/Transforms/InstCombine/InstCombineInternal.h
442

Nit: Don't repeat the name of the function in the comment, we use to do this but we changed the doxygen settings some time ago and this is deprecated.

444

Can you turn this into a work-queue iterative loop? I'm worried about unbounded recursion.

445

Why not shortcut here:

If (OldPtr->getType()->getPointerAddressSpace() == NewAddrSpace) {
  replaceInstUsesWith(OldPtr, NewPtr);
  return;
}
448

Add a comment why you need to copy?

449

I haven't figured why it isn't a for-range loop?

457

Shouldn't this be if (NewPtr->getType() == ...?
(If so, you're missing a test to cover this)

479

Is it needed for loads? (Maybe because of typeless pointers?)

499

I'm worried about the validity of this. I would expect passing a pointer to a function to block this transformation when the address space mismatch.

505

Nit: Do not use braces for trivial blocks.

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
62

Document it.

103

It seems that you will return false if there is any mismatch (line 168).

Do we need a vector? Can't we just keep one AS and return false as soon as we find a mismatch here?

165

Nit: period to end sentence.

314

Spurious change

jprice updated this revision to Diff 85916.Jan 26 2017, 8:24 AM
jprice marked 11 inline comments as done.
  • Address review comments from mehdi_amini

Thanks for the feedback, responses inline.

lib/Transforms/InstCombine/InstCombineInternal.h
445

This breaks certain cases where OldPtr is the source to an addrspacecast which may result in casting NewPtr to an address space that it shouldn't.

We can however shortcut inside the loop over users, as long as we handle addrspacecast instructions first. I've done this in the latest patch.

457

You're right.

There is a test that covers this case (test11), but since this check just results in a shortcut, it was passing anyway (by introducing an unnecessary NOP cast which was nuked anyway).

499

It does indeed block the transformation, unless we know we can use an addrspacecast because there is already one being used on the global when it is passed as a source to the memcpy. I've added a comment here to make this assumption clear.

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
103

I'm not sure I see how this would work. There are either one or two valid address spaces that can be used for function arguments (the source and raw source of the memory transfer that we are trying to remove). We don't know whether we will see the memory transfer first, or the function calls that use the alloca.

jprice updated this revision to Diff 86416.Jan 31 2017, 6:11 AM
  • Remove accidental debug statement
jprice abandoned this revision.Feb 24 2017, 12:49 PM

This patch is now obsolete - the core issue has now been fixed by D27283. While that fix is a little more conservative (i.e. it doesn't replace the alloca when it sometimes could), it's good enough to fix all of the issues that I was experiencing in my downstream projects, so I'm happy to leave it at that.

Closing.