This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] convert sub from (Pow2-1) constant into xor
AbandonedPublic

Authored by spatel on Jun 17 2022, 11:38 AM.

Details

Summary

An xor with constant should have equivalent perf as sub-from-constant on any target (or better perf in the case that the sub constant is harder to materialize and use).

This is translated from an existing fold in InstCombine, but the pattern can show up in codegen as shown in a variety of test diffs.

Alive2 proof:
https://alive2.llvm.org/ce/z/8CgggY

This might help remove some diffs from D127115. We don't see test diffs on basic x86 examples because this fold already exists somewhere later in that target's codegen, but that may be too late to combine with other transforms.

Diff Detail

Event Timeline

spatel created this revision.Jun 17 2022, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2022, 11:38 AM
spatel requested review of this revision.Jun 17 2022, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2022, 11:38 AM
RKSimon accepted this revision.Jun 18 2022, 3:52 AM

LGTM - I'd be interested to know if this will allow us to remove the x86 specific fold?

This revision is now accepted and ready to land.Jun 18 2022, 3:52 AM

It doesn't looks like it is helping D127115, but it looks useful on its own.

It doesn't looks like it is helping D127115, but it looks useful on its own.

Yes, I stepped through one of those tests, and we'd have to do something else...I'm not sure what that is yet.

I also tracked down where this happens in x86 codegen:
https://github.com/llvm/llvm-project/blob/cd64a427efa0baaf1bb7ae624d4301908afc07f7/llvm/lib/Target/X86/X86InstrCompiler.td#L1535

So this fold and the InstCombine that I copied from are actually over-constrained. We don't need a low-bit-mask constant:
https://alive2.llvm.org/ce/z/OUm6N_

I'll add a TODO comment for now. I can add more tests and generalize the transforms as a follow-up.

spatel added subscribers: craig.topper, arsenm.

I also tracked down where this happens in x86 codegen:
https://github.com/llvm/llvm-project/blob/cd64a427efa0baaf1bb7ae624d4301908afc07f7/llvm/lib/Target/X86/X86InstrCompiler.td#L1535

So this fold and the InstCombine that I copied from are actually over-constrained. We don't need a low-bit-mask constant:
https://alive2.llvm.org/ce/z/OUm6N_

And digging further, that change was added with D48557, but that was a target-limited form of the original patch D48529...and that was the better version of this patch! :)

I applied that patch today, and the PowerPC regression is no longer present, but there's a different test with an extra load immediate. That seems like a minor regression that could be fixed easily.

The AMDGPU diff seen here was considered a regression then, so I'm assuming that's still true now.

There are now RISCV tests with diffs that look like real improvements in test/CodeGen/RISCV/atomic-signext.ll and /test/CodeGen/RISCV/atomic-rmw.ll:

-; RV32IA-NEXT:    li a5, 24
-; RV32IA-NEXT:    sub a3, a5, a3
+; RV32IA-NEXT:    xori a3, a3, 24

@arsenm @craig.topper - given that we have almost this transform in IR as a canonicalization, do we still want to make this target-specific for DAGCombiner?

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
112

This diff was called a code-size regression in D48529.

RKSimon added a subscriber: foad.Jun 18 2022, 7:14 AM
RKSimon added inline comments.
llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
112

@foad Is this still an issue? Not sure if its relevant, but does the loss of the offset:65535 compensate?

I put up D128123 as the potential better/final patch, so we can see the current test diffs.

foad added inline comments.Jun 20 2022, 1:22 AM
llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
112

It's still a code size regression and the loss of the offset:65535 does not compensate. Also AMDGPU has a decent set of subrev instructions so I don't think this transform is of any use.

spatel abandoned this revision.Jun 20 2022, 3:27 AM

Abandoning - looks like this will need a target hook of some kind along with the more general transform in D128123.