This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
ClosedPublic

Authored by david-arm on Nov 22 2021, 3:57 AM.

Details

Summary

When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Diff Detail

Event Timeline

david-arm created this revision.Nov 22 2021, 3:57 AM
david-arm requested review of this revision.Nov 22 2021, 3:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 3:57 AM

Couple of discussion points. I can see the rationale, but I wonder about some of the test changes and whether this could be revealing a latent bug.

llvm/test/CodeGen/X86/vector-fshl-512.ll
1107 ↗(On Diff #388865)

Can anyone comment if the sign extensions in these constants are NFC?

llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗(On Diff #388865)

Can anyone comment if these sll -> mul changes are expected & harmless?

lebedev.ri added inline comments.
llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗(On Diff #388865)

These look like regressions to me.

david-arm added inline comments.Nov 24 2021, 2:22 AM
llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗(On Diff #388865)

Hi @lebedev.ri, thanks for providing input here. Do you know if this means that there is a latent bug in the code where we should be explicitly using zero-extend here? I'm worried about cases where code is relying upon getAnyExtOrTrunc zero-extending.

RKSimon added inline comments.Nov 24 2021, 5:35 AM
llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗(On Diff #388865)

I'm not certain, but vXi8 shl by constant will fold to vXi8 multiplies by (pow2) constant, and will then be extended to vXi16 to make use of PMULLW - the upper bits aren't demanded so were any-extended, which was treated as a zero-extension during constant folding which preserved the pow2 nature which allowed it to be lowered back to a vXi16 shl. My guess is that now that they are sign-extended, the pow2 isn't seen any more.

craig.topper added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1319

There's no guarantee that the caller would use getAnyExtOrTrunc. I think this should be in the constant folding for getNode(ISD::ANY_EXTEND)

craig.topper added inline comments.Nov 24 2021, 8:51 AM
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
119

Pre-commit the new tests so we can see the change?

david-arm planned changes to this revision.Dec 16 2021, 7:02 AM
david-arm updated this revision to Diff 398142.Jan 7 2022, 7:30 AM
david-arm retitled this revision from [CodeGen] Change getAnyExtOrTrunc to use SIGN_EXTEND for some constants to [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants.
david-arm edited the summary of this revision. (Show Details)
  • Rewrote the patch to make use of isSExtCheaperThanZExt instead so that this becomes a AArch64-specific change.
david-arm edited the summary of this revision. (Show Details)Jan 7 2022, 7:30 AM
craig.topper added inline comments.Jan 7 2022, 4:16 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
1142

I think you can write if (!V)

david-arm updated this revision to Diff 398561.Jan 10 2022, 3:34 AM
  • Use if (!V) when checking for null SDValue objects.
david-arm marked an inline comment as done.Jan 10 2022, 3:34 AM
RKSimon added inline comments.Jan 10 2022, 3:54 AM
llvm/include/llvm/CodeGen/TargetLowering.h
2647–2648

Please can you update the documentation to explain the V arg (is it the src/dst?) and that it can be SDValue() if unknown.

david-arm updated this revision to Diff 398920.Jan 11 2022, 5:37 AM
  • Updated comments above isSExtCheaperThanZExt.
david-arm marked an inline comment as done.Jan 11 2022, 5:37 AM
RKSimon accepted this revision.Jan 12 2022, 3:44 AM

LGTM with one minor

llvm/test/CodeGen/AArch64/fast-isel-cmp-vec.ll
2

Regenerate + pre-commit before committing the patch so the patch just shows the codegen diff.

This revision is now accepted and ready to land.Jan 12 2022, 3:44 AM

I've bisected a miscompilation to this file.

To reproduce:

$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
$ ./configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync
$ make -j$(nproc)
$ make fate-dpcm-interplay

The breakage happens in the libavformat/ipmovie.c file. (I also saw a couple other broken tests, so I think there are other source files affected too, but I didn't bisect and pinpoint those failures.)

The issue can be observed with https://martin.st/temp/ipmovie-preproc.c, with clang -target aarch64-linux-gnu -O3 -o - ipmovie-preproc.c. The generated code contains differences like this:

--- old.s       2022-01-18 10:30:24.726016244 +0200
+++ new.s       2022-01-18 10:30:01.650536299 +0200
@@ -506,7 +506,7 @@
        mov     w1, #56
        bl      av_log
        add     x9, x19, #1104
-       mov     w21, #65535
+       mov     w21, #-1
 .LBB3_9:                                // %while.end
        ldr     x0, [x19]
        ldr     w8, [x0, #44]

I've bisected a miscompilation to this file.

To reproduce:

$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
$ ./configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync
$ make -j$(nproc)
$ make fate-dpcm-interplay

The breakage happens in the libavformat/ipmovie.c file. (I also saw a couple other broken tests, so I think there are other source files affected too, but I didn't bisect and pinpoint those failures.)

The issue can be observed with https://martin.st/temp/ipmovie-preproc.c, with clang -target aarch64-linux-gnu -O3 -o - ipmovie-preproc.c. The generated code contains differences like this:

--- old.s       2022-01-18 10:30:24.726016244 +0200
+++ new.s       2022-01-18 10:30:01.650536299 +0200
@@ -506,7 +506,7 @@
        mov     w1, #56
        bl      av_log
        add     x9, x19, #1104
-       mov     w21, #65535
+       mov     w21, #-1
 .LBB3_9:                                // %while.end
        ldr     x0, [x19]
        ldr     w8, [x0, #44]

Hi @mstorsjo, thanks for the info. I'll revert the patch again for now. It's strange because ANY_EXTEND should really mean "any", i.e. you don't care if it's sign-extend or zero-extend! I suspect there is a bug in codegen somewhere that either relies upon ANY_EXTEND actually being ZERO_EXTEND, or relies upon ANY_EXTEND being consistently the same. I'm worried because the isSExtCheaperThanZExt interface definitely allows for the possibility of sometimes choosing one over the other depending upon the types.

Hi @mstorsjo, thanks for the info. I'll revert the patch again for now. It's strange because ANY_EXTEND should really mean "any", i.e. you don't care if it's sign-extend or zero-extend! I suspect there is a bug in codegen somewhere that either relies upon ANY_EXTEND actually being ZERO_EXTEND, or relies upon ANY_EXTEND being consistently the same. I'm worried because the isSExtCheaperThanZExt interface definitely allows for the possibility of sometimes choosing one over the other depending upon the types.

Thanks, and that does indeed seem worrying.

For the sake of finding other possible similar related cases, with the same instructions, but with make -j$(nproc) fate it runs all tests - which currently triggered 24 failed tests (of which I believe there's maybe a 3-6 actual individual breakages), if you want to rerun more tests when you think you've pinpointed the root cause.

I've bisected a miscompilation to this file.

To reproduce:

$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
$ ./configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync
$ make -j$(nproc)
$ make fate-dpcm-interplay

The breakage happens in the libavformat/ipmovie.c file. (I also saw a couple other broken tests, so I think there are other source files affected too, but I didn't bisect and pinpoint those failures.)

The issue can be observed with https://martin.st/temp/ipmovie-preproc.c, with clang -target aarch64-linux-gnu -O3 -o - ipmovie-preproc.c. The generated code contains differences like this:

--- old.s       2022-01-18 10:30:24.726016244 +0200
+++ new.s       2022-01-18 10:30:01.650536299 +0200
@@ -506,7 +506,7 @@
        mov     w1, #56
        bl      av_log
        add     x9, x19, #1104
-       mov     w21, #65535
+       mov     w21, #-1
 .LBB3_9:                                // %while.end
        ldr     x0, [x19]
        ldr     w8, [x0, #44]

Hi @mstorsjo, thanks for the info. I'll revert the patch again for now. It's strange because ANY_EXTEND should really mean "any", i.e. you don't care if it's sign-extend or zero-extend! I suspect there is a bug in codegen somewhere that either relies upon ANY_EXTEND actually being ZERO_EXTEND, or relies upon ANY_EXTEND being consistently the same. I'm worried because the isSExtCheaperThanZExt interface definitely allows for the possibility of sometimes choosing one over the other depending upon the types.

I believe the culprit is FunctionLoweringInfo::ComputePHILiveOutRegInfo which assumes that ConstantInt inputs to phis will be zero extended.