This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable reassociation for ADD instructions
ClosedPublic

Authored by Carrot on Oct 20 2022, 5:07 PM.

Details

Summary

ADD is an associative and commutative operation, so we can do reassociation for it.

Diff Detail

Event Timeline

Carrot created this revision.Oct 20 2022, 5:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 5:07 PM
Carrot requested review of this revision.Oct 20 2022, 5:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 5:07 PM
pengfei added inline comments.Oct 20 2022, 11:56 PM
llvm/test/CodeGen/X86/reassociate-add.ll
4

No idea if we intended to not do reassociation in ADD instructions.
Is there problem when unexpected overflow/underflow may be generated during reassociation?

RKSimon added inline comments.Oct 21 2022, 6:49 AM
llvm/test/CodeGen/X86/reassociate-add.ll
2

Drop -mcpu=x86-64

Pre-commit these test with current codegen to trunk and rebase to show the patch diffs.

4

Not that I can think of - EFLAGS will be the same and we already handles the $dst=$src0 constraint

llvm/test/CodeGen/X86/xmulo.ll
1871

Annoying - normally I'd expect reassociation to help us reduce stack use

spatel added inline comments.Oct 21 2022, 7:01 AM
llvm/test/CodeGen/X86/reassociate-add.ll
4

I was working on this a long time ago (~2015), so it's hard to remember exactly, but I don't think there was a fundamental reason to exclude integer ADD.

It just seemed like it did not have much potential gain with the limited register set and could interfere with other transforms like LEA formation.

If there's evidence that this improves something (and doesn't cause regressions), then it should be ok.

xbolva00 added inline comments.
llvm/test/CodeGen/X86/lea-opt-cse4.ll
105–111

New leas, +1

Carrot added inline comments.Oct 21 2022, 11:14 AM
llvm/test/CodeGen/X86/reassociate-add.ll
4

EFLAGS may be different. Suppose we are adding three bytes, 250 + 10 + 10.

  • If we add them in the order (250 + 10) + 10, the first ADD generates carry/overflow flags, the second ADD doesn't generate carry/overflow.
  • If we add them in the order (10 + 10) + 250, the first ADD doesn't generates carry/overflow flags, the second ADD generates these flags.

So we need to check if the definition of EFLAGS is dead.

Maybe this is the reason @spatel didn't add ADD instructions in 2015.

Thank @pengfei for the reminds.

craig.topper added inline comments.Oct 21 2022, 11:15 AM
llvm/test/CodeGen/X86/reassociate-add.ll
4

Scalar MUL would have the same EFLAGs issue. So hopefully we already solved it?

Carrot added inline comments.Oct 21 2022, 11:51 AM
llvm/test/CodeGen/X86/reassociate-add.ll
4

Indeed. So IMULrr should be handled similar to ADD. But I can't find any code for it :(

craig.topper added inline comments.Oct 21 2022, 11:55 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
8700

Here is the EFLAGS code

Carrot added inline comments.Oct 21 2022, 1:12 PM
llvm/lib/Target/X86/X86InstrInfo.cpp
8700

Thanks for the pointer. So now EFLAGS is not a problem for ADD.

Carrot added inline comments.Oct 21 2022, 3:01 PM
llvm/test/CodeGen/X86/reassociate-add.ll
2

Test case has been sent out as https://reviews.llvm.org/D136501.

-mcpu=x86-64 is required because in MachineCombiner the reassociation is performed either the new sequence reduce latency, or function doSubstitute returns true. In its implementation, if -mcpu is not specified, the target will not have a valid schedule model, and then doSubstitute returns true, and reassociation is performed.

bool MachineCombiner::doSubstitute(unsigned NewSize, unsigned OldSize,
                                   bool OptForSize) {
  if (OptForSize && (NewSize < OldSize))
    return true;
  if (!TSchedModel.hasInstrSchedModelOrItineraries())
    return true;
  return false;
}
llvm/test/CodeGen/X86/xmulo.ll
1871

Agree it is annoying.
But I think the impact of reassociation to register allocation is quite randomly, it depends on if the operands can be killed by the first instruction, and later be scheduled earlier to shorten the operands' live range.

In this specific case, although it increases stack use, it reduces register spill/reload instructions.

$ git diff origin/main xmulo.ll | grep "+;" | wc
    104     518    3715
$ git diff origin/main xmulo.ll | grep "\-;" | wc
    108     542    3846
RKSimon added inline comments.Oct 22 2022, 1:06 AM
llvm/test/CodeGen/X86/reassociate-add.ll
2

OK - that makes sense!

4

Sorry, bad description - EFLAGS will be handled (we check they are ignored...) the same as the existing ops

Carrot updated this revision to Diff 470288.Oct 24 2022, 2:10 PM

Rebase test cases.

pengfei accepted this revision.Oct 24 2022, 6:45 PM

LGTM.

This revision is now accepted and ready to land.Oct 24 2022, 6:45 PM
This revision was automatically updated to reflect the committed changes.

Hey, this broke a Dexter test: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/47878/

@jmorse and/or @Orlando can you help make sure this isn't a Dexter issue?

Hey, this broke a Dexter test: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/47878/

@jmorse and/or @Orlando can you help make sure this isn't a Dexter issue?

Here is the patch https://reviews.llvm.org/D136778 waiting for review.

Enjoyably not a Dexter bug -- the output with -v indicates that there's a sequence in the "main" function of that test where the main_result variable becomes unavailable with this patch. That would appear to be because TargetInstrInfo::reassociateOps isn't instrumented for instruction referencing, which I'll file as a fault. This patch just exposes that omission.

Sadly though, this Dexter test isn't supposed to be checking for this sort of error, it's supposed to be making sure the other functions in the file have reasonable debug-info. The only reason we see this is because we didn't mark main with __attribute__((optnone)). I'll add that attribute and un-xfail the test as I'm out of cycles; thanks for pinging!

It seems this patch cause spec2017/548 performance regression, as it increased register pressure after reassociating ADD operation.

llvm/test/CodeGen/X86/mul-constant-i64.ll