Page MenuHomePhabricator

X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
ClosedPublic

Authored by MatzeB on Sep 30 2021, 11:19 AM.

Details

Summary

This extends optimizeCompareInstr to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example:

CMP x, 13
...
CMP x, 12   ; can be removed if we change the SETg
SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for x > C the fallthrough optimization switches true/false blocks with !(x > C) --> x <= C and canonicalization turns this into x < C + 1.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MatzeB requested review of this revision.Sep 30 2021, 11:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2021, 11:19 AM
MatzeB edited the summary of this revision. (Show Details)Sep 30 2021, 11:33 AM
foad added a subscriber: foad.Oct 1 2021, 6:20 AM
foad added inline comments.
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

Is it OK to use INT64_MIN >> Shift here? I think the standard says right shift of a negative value is implementation-defined.

MatzeB marked an inline comment as done.Oct 1 2021, 10:00 AM
MatzeB added inline comments.
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

Good question!

I think we can rely on two-complement arithmetic shift behavior here. I'm not aware of any compiler behaving differently, certainly not the one ones specified to be used for LLVM in the documentation: https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library

Other code like APInt seems to rely on this as well: https://github.com/llvm/llvm-project/blob/4f0225f6d21b601d62b73dce913bf59d8fb93d87/llvm/include/llvm/ADT/APInt.h#L796

That said if you have a simple alternative to this, I'd be happy to change the code...

MatzeB marked an inline comment as done.Oct 1 2021, 10:04 AM
MatzeB added inline comments.Oct 1 2021, 10:11 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

How about I add this:

static_assert(INT64_MIN >> 16 == INT32_MIN && INT64_MIN >> 24 == INT8_MIN,
              "expects compiler with twos complement right shift");

and then whoever has a whacky compiler will see the problem and can go fix it.

foad added inline comments.Oct 1 2021, 10:20 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

I was mostly worried a buildbot with the "undefined behaviour" sanitiser complaining about it -- though I'm not actually sure if ubsan detects this by default.

I don't really mind what fix you use, if any. The static_assert sounds fine to me. Alternatively you could rewrite the expression, e.g. as ~(INT64_MAX >> Shift), but it's much less obvious what it means.

MatzeB updated this revision to Diff 376576.Oct 1 2021, 11:02 AM
xbolva00 added inline comments.
llvm/test/CodeGen/X86/jump_sign.ll
133 ↗(On Diff #376576)

cmp now removed

RKSimon added inline comments.Oct 14 2021, 2:51 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

What about using APInt?

MatzeB added inline comments.Oct 14 2021, 10:42 AM
llvm/test/CodeGen/X86/jump_sign.ll
133 ↗(On Diff #376576)

Good catch. So:

  • I believe the comment to be outdated:
  • The new version of the code is correct; you see that previously it used a cmpl ecx, edx, but edx and eax contain the same value and the sub in bb.0 the only predecessor of bb.1 is just the reverse operation and will produce the reverse flags. So we can indeed remove the cmp when we reverse the user flags cmovle -> cmovl.
  • I accidentally put this test change onto the wrong commit in the stack. The test actually changes in D110862.
MatzeB added inline comments.Oct 14 2021, 11:01 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

What about using APInt?

Seemed a bit overengineered given the value fits neatly in an uint64_t, but sure I will change it to use an APInt then.

MatzeB added inline comments.Oct 14 2021, 11:02 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4493

Or let me factor out a common helper to shift an uint64_t that we can use in APInt and here.

MatzeB updated this revision to Diff 379823.Oct 14 2021, 1:35 PM

rebase; use APInt to compute maximum/minimum constants.

foad added inline comments.Oct 15 2021, 12:24 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4492

Surely you need APInt::getSignedMinValue(BitWidth)?

MatzeB added inline comments.Oct 15 2021, 11:02 AM
llvm/lib/Target/X86/X86InstrInfo.cpp
4492

Good point. Changed them now.

I just realized that technically this check isn't necessary as the tests I added still pass. This is because the ImmDelta calculations are actually done on int64_t values so they don't wrap around for 8/16/32 bit immediates and instead just overflow to an unrepresentable immediate. We would have overflow problems for 64bit immediates, but x86 doesn't support those.

Anyway I think I'll keep the tests here anyway as I like the expressed intent, and if someone ever introduces a 64bit immediates we are prepared :)

MatzeB updated this revision to Diff 380064.Oct 15 2021, 11:03 AM
MatzeB updated this revision to Diff 380065.
RKSimon accepted this revision.Oct 16 2021, 2:35 AM

LGTM - cheers

This revision is now accepted and ready to land.Oct 16 2021, 2:35 AM
hans added a subscriber: hans.Nov 3 2021, 8:56 AM

This seems to have caused a miscompile of Chromium, see https://bugs.chromium.org/p/chromium/issues/detail?id=1265339 (there's even a screenshot).

It appears that in a switch statement, one of the cases gets lost.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1265339#c38 for the attached IR, and how the codegen changes with this patch. Sadly it's not reduced, but at least it's stand-alone.

What we've got is something like:

cmp $21, reg
jle foo

foo:
cmp $20, reg
jle bar

bar:
je baz

I suspect what's happening is that this transformation figures the second cmp is redundant if it changes "jle bar" to "jl bar", but it doesn't take into account that the "je baz" was also depending on that second cmp.

MatzeB added a comment.Nov 3 2021, 9:01 AM

Thanks for reporting!

I suspect what's happening is that this transformation figures the second cmp is redundant if it changes "jle bar" to "jl bar", but it doesn't take into account that the "je baz" was also depending on that second cmp.

The optimization should have detected EFLAGS being live out of the bar block in your example and abort because there could potentially be more users (see line 4550).

Is there a way to reproduce this easily?

MatzeB added a comment.Nov 3 2021, 9:02 AM

Actually I think I see what is going wrong, let me attempt a fix in the next hours.

FWIW, we also saw a non-determinism issue as a result of this patch in a stage2 PGO'd build of clang.

MatzeB reopened this revision.Nov 3 2021, 10:22 AM
This revision is now accepted and ready to land.Nov 3 2021, 10:22 AM
MatzeB updated this revision to Diff 384505.Nov 3 2021, 10:25 AM
  • Fix EFLAGS live-out not triggering, even though we need to update CC flags for ImmDelta != 0.
  • While on it rename IsSafe to the more descriptive FlagsMayLiveOut.
  • Added new tests opt_adjusted_imm_multiple_blocks, opt_adjusted_imm_multiple_blocks_noopt.

I've reverted back to green in https://github.com/llvm/llvm-project/commit/a2a58d91e82db38fbdf88cc317dcb3753d79d492 until this can be fixed.

FWIW, we also saw a non-determinism issue as a result of this patch in a stage2 PGO'd build of clang.

I believe the new version of this diff fixes the reported issue. Do you have a way to confirm?

I've reverted back to green in https://github.com/llvm/llvm-project/commit/a2a58d91e82db38fbdf88cc317dcb3753d79d492 until this can be fixed.

FWIW, we also saw a non-determinism issue as a result of this patch in a stage2 PGO'd build of clang.

I believe the new version of this diff fixes the reported issue. Do you have a way to confirm?

In theory you should be able to reproduce the non-determinism issue, as it's just clang (nothing internal), but it may depend on some particular workflow we do internally. I'll patch it and report back.

I don't have a way to verify the miscompile Hans is talking about.

nathanchance added a subscriber: nathanchance.EditedNov 3 2021, 11:18 AM

I too saw issues with this patch with PGO (I was in the middle of writing up a reproducer when I saw Hans comment). My symptom was that stage 3 check-llvm would not pass, whereas stage 2 would. This version of the patch on top of 7277d2e1c86bf4d75321efc1195d88ade4bedfa1 does not have that same issue.

This version of the patch LGTM with regards to non-determinism

hans added a comment.Nov 3 2021, 12:29 PM

The reproducer I was using looks good with the latest version of the patch. Thanks!

MatzeB added a comment.Nov 3 2021, 2:07 PM

Thanks for confirming! Will re-commit the new version then.

This revision was landed with ongoing or failed builds.Nov 3 2021, 2:13 PM
This revision was automatically updated to reflect the committed changes.
bgraur added a subscriber: bgraur.EditedDec 9 2021, 7:58 AM

We have root caused to this revision a mis-compile affecting AMD Rome machines.
I have attached a reproducer program that shows the problem when compiled with -O2 or -O3.

Please revert this change until you have a chance to look at the problem.

NOTE: the repro uses the abseil library that can be found at https://github.com/abseil/abseil-cpp. The reproducer might probably be reduced more (if needed) but it would be great if you could revert early.

Reproducer:

We have root caused to this revision a mis-compile affecting AMD Rome machines.
I have attached a reproducer program that shows the problem when compiled with -O2 or -O3.

Please revert this change until you have a chance to look at the problem.

NOTE: the repro uses the abseil library that can be found at https://github.com/abseil/abseil-cpp. The reproducer might probably be reduced more (if needed) but it would be great if you could revert early.

Reproducer:

Please post standalone reproducer, one that does not require guessing versions of external dependecies,
and the actual reproduction steps, as in the actual complete compilation and run command.

We have root caused to this revision a mis-compile affecting AMD Rome machines.
I have attached a reproducer program that shows the problem when compiled with -O2 or -O3.

Please revert this change until you have a chance to look at the problem.

NOTE: the repro uses the abseil library that can be found at https://github.com/abseil/abseil-cpp. The reproducer might probably be reduced more (if needed) but it would be great if you could revert early.

Reproducer:

Please post standalone reproducer, one that does not require guessing versions of external dependecies,
and the actual reproduction steps, as in the actual complete compilation and run command.

Reduced reproducer:

clang -O3 -std=gnu++17 -fsized-deallocation -x c++ reduced.cc -lc++abi -pthread -static -o repro

./repro is expected to always return 0.
It will return 0 on intel machines and 186 on amd when built with clang at this revision.
It will return 0 on all machines when built with clang a previous version.

Please revert and let's work together after that if you need more info.

Thanks for investigating and root-causing! I’m currently on vacation, please revert in the meantime.

Thanks for investigating and root-causing! I’m currently on vacation, please revert in the meantime.

There appears to be one change:
https://gcc.godbolt.org/z/TfKrs8M18

llvm13:

shlq $32, %rax
testq %rax, %rax
je .LBB5_11
jle .LBB5_11

(current) trunk:

shlq $32, %rax
je .LBB5_11
jle .LBB5_11

Thanks for investigating and root-causing! I’m currently on vacation, please revert in the meantime.

There appears to be one change:
https://gcc.godbolt.org/z/TfKrs8M18

llvm13:

shlq $32, %rax
testq %rax, %rax
je .LBB5_11
jle .LBB5_11

(current) trunk:

shlq $32, %rax
je .LBB5_11
jle .LBB5_11

The jle would use the OF flag which isn't defined for shifts of more than 1. And even then doesn't have the same meaning.

An LGTM on the revert would be greatly appreciated.

RKSimon reopened this revision.Dec 10 2021, 10:35 AM
This revision is now accepted and ready to land.Dec 10 2021, 10:35 AM
RKSimon requested changes to this revision.Dec 10 2021, 10:35 AM
This revision now requires changes to proceed.Dec 10 2021, 10:35 AM
MatzeB updated this revision to Diff 397268.Jan 4 2022, 4:49 AM

Fix incomplete condition (again) on when to abort on eflags being live-out of the block. This fixes the problem reported by @bgraur

RKSimon accepted this revision.Jan 10 2022, 9:59 AM

LGTM - cheers

This revision is now accepted and ready to land.Jan 10 2022, 9:59 AM
RKSimon closed this revision.Jan 26 2022, 5:51 AM

Closing - this was pushed again at rGad25f8a556d239d8b7d17383cf1a0771359521fd