This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Enhance cast-of-load->load-of-bitcast fold to handle multiple identical casts
AbandonedPublic

Authored by lebedev.ri on Mar 3 2020, 12:33 AM.

Details

Reviewers
spatel
nikic
jfb
Summary

Seen in real code, although not sure it's really impactful.
Currently the fold is limited to the case where there is a single use of load,
which is a cast, which generally makes sense. But if originally have
several identical casts in different BB's, and don't decide to hoist
them into some one BB, we won't CSE them, and the fold won't happen..

Instead, let's be smart, and let's simply natively handle the case
where all uses of load are identical casts.

Diff Detail

Event Timeline

lebedev.ri created this revision.Mar 3 2020, 12:33 AM

This feels like we're pushing the limits of instcombine. What if we view this as a SimplifyCFG problem instead: for any no-op cast or other zero-cost instruction with operand defined in a different BB, hoist the cast up to its operand's BB because that helps CSE and instcombine?

Hm, why SimplifyCFG?
I'm aware that we (still) don't have specialized hoisting/sinking pass..

nikic added inline comments.Mar 3 2020, 12:23 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
589

This should use replaceInstUsesWith().

jfb added a comment.Mar 3 2020, 12:37 PM

Please add tests for volatile loads.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
561

This comment is pretty obvious, I'd rather not have it.

563

Same here.

571

This seems like a pre-existing bug: you should bail on volatile as well. For example, if user code does a volatile load of int32 and then casts to float, we should probably try to emit an integer load (even if that's not guaranteed by the language).

576

Does this bail on address space casts to different address spaces (when the value is a pointer)?

lebedev.ri marked 2 inline comments as done.Mar 3 2020, 12:43 PM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
571

I can do that, but could you point me to the appropriate langref wording?

576
/// This function determines if the CastInst does not require any bits to be
/// changed in order to effect the cast. Essentially, it identifies cases where
/// no code gen is necessary for the cast, hence the name no-op cast.  For
/// example, the following are all no-op casts:
/// # bitcast i32* %x to i8*
/// # bitcast <2 x i32> %x to <4 x i16>
/// # ptrtoint i32* %x to i32     ; on 32-bit plaforms only
/// Determine if the described cast is a no-op.
bool CastInst::isNoopCast(Instruction::CastOps Opcode,
                          Type *SrcTy,
                          Type *DestTy,
                          const DataLayout &DL) {
  switch (Opcode) {
    default: llvm_unreachable("Invalid CastOp");
    case Instruction::Trunc:
    case Instruction::ZExt:
    case Instruction::SExt:
    case Instruction::FPTrunc:
    case Instruction::FPExt:
    case Instruction::UIToFP:
    case Instruction::SIToFP:
    case Instruction::FPToUI:
    case Instruction::FPToSI:
    case Instruction::AddrSpaceCast:
      // TODO: Target informations may give a more accurate answer here.
      return false;
    case Instruction::BitCast:
      return true;  // BitCast never modifies bits.
    case Instruction::PtrToInt:
      return DL.getIntPtrType(SrcTy)->getScalarSizeInBits() ==
             DestTy->getScalarSizeInBits();
    case Instruction::IntToPtr:
      return DL.getIntPtrType(DestTy)->getScalarSizeInBits() ==
             SrcTy->getScalarSizeInBits();
  }
}
589

I suppose it would be good to eradicate usages of "wrong" spellings so we don't accidentally use the "deprecated" one.

Hm, why SimplifyCFG?
I'm aware that we (still) don't have specialized hoisting/sinking pass..

Time to change that? :)
I was just grepping around in SimplifyCFG and saw that it has hoisting transforms gated by instruction costs, so I thought maybe there's a way to squeeze this into the existing code, but I know that's a stretch too.

lebedev.ri marked an inline comment as done.Mar 3 2020, 1:25 PM

Hm, why SimplifyCFG?
I'm aware that we (still) don't have specialized hoisting/sinking pass..

Time to change that? :)

In principle - hell yeah :)

But i'm not sure i have clear understanding how that would look,
so for now i'd settle for either just not touching the pattern,
or doing this conservative instcombine-level change.

I was just grepping around in SimplifyCFG and saw that it has hoisting transforms gated by instruction costs, so I thought maybe there's a way to squeeze this into the existing code, but I know that's a stretch too.

jfb added inline comments.Mar 3 2020, 1:35 PM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
571

I don't think we're super clear on what we guarantee, but knowing people who use volatile: when they say "load an integer and cast it to a float" they expect to do an integer, followed by an integer->float register transfer. That's a reasonable expectation IMO.

lebedev.ri marked 4 inline comments as done.Mar 4 2020, 2:09 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
571

People also expect UB to be ignored by compiler.
Unless it's spelled out in langref i don't believe i know why this should be changed.

jfb added inline comments.Mar 4 2020, 10:58 AM
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
571

That's a user-hostile approach to compiler optimizations. It's not a useful thing to optimize, and developers don't have a way to express this, so they'd have to turn to assembly. We could simply be nice and do what's trivial for us to do.

lebedev.ri marked an inline comment as done.Mar 4 2020, 11:24 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
571

That is not what i'm saying :)

You have either pointed out a miscompile, or not. If it's a miscompile,
then someone knowledgeable (you) should document it [in langref]
for future reference, to avoid having this disscussion in the future
and simplify the life of whoever would want to be aware of it later.

I don't care either way, i just don't want to have such undocumented
limitation/guaranteed spelled out in the code and not in documentation.

lebedev.ri marked an inline comment as done.Mar 5 2020, 12:05 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
570–573

Ah, i see why volatile is currently not being transformed.

lebedev.ri updated this revision to Diff 248767.Mar 6 2020, 9:45 AM
lebedev.ri marked 7 inline comments as done.

Review notes addressed, PTAL.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
571

Tests are being added by D75644, this fold is apparently not happening
for volatile loads and i'm not planning on changing that.
Nothing further to do.

jfb added a comment.Mar 6 2020, 10:09 AM

Can you add a few tests for this specifically, both with identical subsequent bitcasts as well as with mixed ones?

In D75505#1910065, @jfb wrote:

Can you add a few tests for this specifically, both with identical subsequent bitcasts as well as with mixed ones?

I'm not sure i'm following.
There are negative tests for this in llvm/test/Transforms/InstCombine/multi-use-load-casts.ll.
On top of those, what other tests would you like to see?

In D75505#1910065, @jfb wrote:

Can you add a few tests for this specifically, both with identical subsequent bitcasts as well as with mixed ones?

I'm not sure i'm following.
There are negative tests for this in llvm/test/Transforms/InstCombine/multi-use-load-casts.ll.
On top of those, what other tests would you like to see?

up

@spatel could you please specify if you don't want this in instcombine

@spatel could you please specify if you don't want this in instcombine

If the consensus is that this is good, I'm not going to block it, but...

  1. Loads are usually between 10-25% of all instructions, so walking the users of every load has potential to substantially increase compile-time.
  2. According to the summary text, we don't have practical evidence of the benefit.
  3. We agree that we could do hoisting/sinking more generally and more efficiently in a stand-alone pass.

So I'd prefer an alternative route or at least some preliminary perf data if that's feasible.

I was just rummaging around in CGP and noticed that it has CodeGenPrepare::tryToSinkFreeOperands(). Maybe we could move that up as a utility function and extend it to try hoisting too?

lebedev.ri abandoned this revision.Aug 12 2020, 2:26 PM