This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SelectionDAG] Add target-specific implementation of srem
ClosedPublic

Authored by bcl5980 on Apr 2 2022, 12:06 AM.

Details

Summary
  1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first.
  2. Add AArch64 faster path for SREM only pow2 case

Fix https://github.com/llvm/llvm-project/issues/54649

Diff Detail

Event Timeline

bcl5980 created this revision.Apr 2 2022, 12:06 AM
bcl5980 requested review of this revision.Apr 2 2022, 12:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2022, 12:06 AM
bcl5980 updated this revision to Diff 420710.Apr 5 2022, 10:19 PM

add test case srem 2

efriedma added inline comments.Apr 12 2022, 11:41 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13591

Probably worth a comment explaining this. Lg2 == 0 folds to a constant. Not obvious why we're special-casing Lg2 == (VT.getScalarSizeInBits() - 1); does this sequence not work somehow?

llvm/test/CodeGen/AArch64/srem-lkk.ll
171 ↗(On Diff #420710)

Maybe we also want some tests for i16?

Please post a separate patch to add the new tests, so it's easier to compare the impact. And actually, while you're at it, maybe stick all the power-of-two tests in a separate file with a better name.

llvm/test/CodeGen/AArch64/srem-vector-lkk.ll
188

I think this somehow ended up shorter with D122829? That might be fine, though; it looks like this version avoids the expensive add-with-shift instructions.

efriedma added inline comments.Apr 12 2022, 11:44 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4325

I'm not sure I like the modified API for visitSDIVLike? It's sort of weird that the return value is either the quotient or the remainder, depending on the value of BuildRem. And that BuildRem is both an input and an output.

bcl5980 added inline comments.
llvm/test/CodeGen/AArch64/srem-lkk.ll
171 ↗(On Diff #420710)

separate patch: D123671
@efriedma Can you help to review?

bcl5980 added inline comments.Apr 13 2022, 4:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4325

Yeah, this part is ugly. I will refactor this part today

bcl5980 updated this revision to Diff 422491.Apr 13 2022, 5:52 AM
  1. refactor visitSDIVLike
  2. remove Lg2 == (VT.getScalarSizeInBits() - 1)
  3. add srem-pow2.ll, will rebase after D123671 land

Please ignore the check in rGe2d77a160c5b. I made a mistake when check in another baseline.

bcl5980 updated this revision to Diff 422549.Apr 13 2022, 10:02 AM

rebase main

efriedma added inline comments.Apr 13 2022, 11:02 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582

Do we need to do something special for something like the following, to match the existing handling for div/rem pairs? I guess it's only a couple instructions different either way, but maybe worth considering.

define void @sdivrem(i32 %x, i32* %ap, i32* %bp) {
  %a = sdiv i32 %x, 4
  %b = srem i32 %x, 4
  store i32 %a, i32* %ap
  store i32 %b, i32* %bp
  ret void
}
23902

Fix this comment? Not sure how you compute srem "by right shifting"

bcl5980 added inline comments.Apr 13 2022, 9:29 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582

For now I have no AArch64 device to test, do we have some tools to verify instructions similar to alive2.llvm.org/ce/ on AArch64?
If no maybe I should use qemu to build a test environment.

add a IR level proof for this change
https://alive2.llvm.org/ce/z/djRQRY

bcl5980 added inline comments.Apr 14 2022, 1:44 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582

I use IR to write the logic. Is this the behavior you want?
https://alive2.llvm.org/ce/z/hGZBBk

bcl5980 added inline comments.Apr 14 2022, 2:40 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582

Based on the IR to asm by llc, it looks sdiv/srem pair can't get benifit from this pattern.
https://godbolt.org/z/Pns5YvYfY

And it looks we lack dag combine from neg + cmp to negs based on the test.

efriedma added inline comments.Apr 14 2022, 11:38 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582

I just meant that I wasn't sure if we would end up with the same code sequence as we would without this patch; I wasn't suggesting any specific transform. (I guess what happens might be sensitive to whether we visit the SDIV or the SREM first...)

bcl5980 added inline comments.Apr 14 2022, 9:12 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582

I check the !DAG.doesNodeExist(ISD::SDIV, N->getVTList(), {N0, N1}) at line 4412, SDIV / SREM paire won't go into the srem only path

efriedma accepted this revision.Apr 15 2022, 12:53 PM

LGTM with one minor comment

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
422

Please rename this so it's clear it's not the primary visitor function for SREM. Maybe call it "buildOptimizedSREM" or something like that.

4582

Oh, I see; that's fine, then.

This revision is now accepted and ready to land.Apr 15 2022, 12:53 PM
bcl5980 updated this revision to Diff 423219.Apr 15 2022, 9:15 PM
bcl5980 marked 6 inline comments as done.

rebase and update comments

This revision was landed with ongoing or failed builds.Apr 15 2022, 9:30 PM
This revision was automatically updated to reflect the committed changes.

FYI, I’ve bisected a misoptimization down to this commit. I’ll follow up later with a proper repro…

FYI, I’ve bisected a misoptimization down to this commit. I’ll follow up later with a proper repro…

This commit causes misoptimizations of the following function:

int func(int val) {
    val += val % 2;
    return val;
}

Tested with a caller like this:

#include <stdio.h>
int func(int val);
int main(int argc, char* argv[]) {
    for (int i = -3; i <= 3; i++)
        printf("%d -> %d\n", i, (int)func(i));
    return 0;
}

Originally this function returns the following values:

-3 -> -4
-2 -> -2
-1 -> -2 
0 -> 0
1 -> 2
2 -> 2
3 -> 4

After this commit, it prints the following:

-3 -> -2
-2 -> -2
-1 -> 0 
0 -> 0
1 -> 0
2 -> 2
3 -> 2

Originally, the function was compiled into the following (with -O2):

func:
        cmp     w0, #0
        cinc    w8, w0, lt
        and     w8, w8, #0xfffffffe
        sub     w8, w0, w8
        add     w0, w8, w0
        ret

After this change, it gets incorrectly compiled into this:

func:
        and     w8, w0, #0x1
        cmp     w0, #0
        cneg    w8, w8, ge
        add     w0, w8, w0
        ret

After this change, it gets incorrectly compiled into this:

func:
        and     w8, w0, #0x1
        cmp     w0, #0
        cneg    w8, w8, ge

Should be 'csneg w8, w8, w8, ge' here

add     w0, w8, w0
ret

Thanks for the finding. I have no AArch64 device for now , so can you help to modify line 13600 to

SDValue Cmp = getAArch64Cmp(N0, Zero, ISD::SETGE, CCVal, DAG, DL);
bcl5980 updated this revision to Diff 423329.Apr 17 2022, 8:20 PM

Fix logical error

efriedma reopened this revision.Apr 18 2022, 8:44 AM

If you want to try out some stuff yourself, on Linux, it isn't too hard to run stuff using qemu-user. Get a cross-compile toolchain (e.g. https://developer.arm.com/-/media/Files/downloads/gnu/11.2-2022.02/binrel/gcc-arm-11.2-2022.02-x86_64-aarch64-none-linux-gnu.tar.xz ). The do something like:

clang --gcc-toolchain=/path/to/toolchain --target=aarch64-none-linux-gnu test.c -o test -static
qemu-aarch64 test


I'll run this through some tests today.

This revision is now accepted and ready to land.Apr 18 2022, 8:44 AM

If you want to try out some stuff yourself, on Linux, it isn't too hard to run stuff using qemu-user. Get a cross-compile toolchain (e.g. https://developer.arm.com/-/media/Files/downloads/gnu/11.2-2022.02/binrel/gcc-arm-11.2-2022.02-x86_64-aarch64-none-linux-gnu.tar.xz ). The do something like:

clang --gcc-toolchain=/path/to/toolchain --target=aarch64-none-linux-gnu test.c -o test -static
qemu-aarch64 test


I'll run this through some tests today.

Thanks for the way running on qemu user mode.
I will build the environment on WSL and test on my local machine.

This revision was landed with ongoing or failed builds.Apr 18 2022, 11:56 AM
This revision was automatically updated to reflect the committed changes.
dewen added a subscriber: dewen.Jan 18 2023, 7:05 PM