This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] G_FMINNUM and G_FMAXNUM vector lowering
ClosedPublic

Authored by dmgreen on Jul 24 2023, 3:06 PM.

Details

Summary

This attempts to expand the handling for G_FMAXNUM/G_FMINNUM for vector types, which is hopefully fairly straightforward now that fptrunc and fpext are working.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 24 2023, 3:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 3:06 PM
dmgreen requested review of this revision.Jul 24 2023, 3:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 3:06 PM
Herald added a subscriber: wdng. · View Herald Transcript
tschuett added inline comments.Jul 24 2023, 9:50 PM
llvm/test/CodeGen/AArch64/fminmax.ll
271

The 3 x f32 seems to fail again, but it should be in reach.

402

The 7 is ugly for both.

1409

For 16 Gisel seems to be better?!?

dmgreen added inline comments.Jul 25 2023, 3:45 AM
llvm/test/CodeGen/AArch64/fminmax.ll
271

It's just uglyness from buildvectors with undef elements. You can see it moving out-of and into the same register:

mov s2, v0.s[1]
..
mov v0.s[1], v2.s[0]

And it shouldn't be setting v0.s[3], as those value are just undef. All that can be cleaned up with other combines, they just haven't been implemented yet (and they resolve around what happens to propagated undef elements, so have some dragons).

402

The calling convention here is very odd. They seem to match between the two, at least.

1409

The SDAG for fp16 without fullfp16 has always scalarized many instructions, and I don't believe anyone has looked into fixing it, as it's fairly low priority. It's worth getting it right with GISel if we can though.

May I ask you to add :

...
---
name:            3xs32_legal_with_full_fp16
alignment:       4
body:             |
  bb.0:
    liveins: $h0, $h1
    %a:_(<3 x s32>) = IMPLICIT_DEF
    %b:_(<3 x s32>) = IMPLICIT_DEF
    %minnum:_(<3 x s32>) = G_FMINNUM %a, %b
    %c:_(<3 x s32>) = COPY %minnum(<3 x s32>)
    RET_ReallyLR implicit %c

to llvm/test/CodeGen/AArch64/GlobalISel/legalize-fminnum.mir?

May I ask you to add :

...
---
name:            3xs32_legal_with_full_fp16
alignment:       4
body:             |
  bb.0:
    liveins: $h0, $h1
    %a:_(<3 x s32>) = IMPLICIT_DEF
    %b:_(<3 x s32>) = IMPLICIT_DEF
    %minnum:_(<3 x s32>) = G_FMINNUM %a, %b
    %c:_(<3 x s32>) = COPY %minnum(<3 x s32>)
    RET_ReallyLR implicit %c

to llvm/test/CodeGen/AArch64/GlobalISel/legalize-fminnum.mir?

This needs to use G_IMPLICIT_DEF, not IMPLICIT_DEF

aemerson accepted this revision.EditedJul 25 2023, 2:56 PM

LGTM (with other comments addressed), thanks.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
929–934

General musing out loud, not for this patch: we should find a way to factor out these common vector rules cleanly.

This revision is now accepted and ready to land.Jul 25 2023, 2:56 PM
dmgreen updated this revision to Diff 545441.Jul 30 2023, 7:26 AM

Update and rebase.

May I ask you to add :

...
---
name:            3xs32_legal_with_full_fp16
alignment:       4
body:             |
  bb.0:
    liveins: $h0, $h1
    %a:_(<3 x s32>) = IMPLICIT_DEF
    %b:_(<3 x s32>) = IMPLICIT_DEF
    %minnum:_(<3 x s32>) = G_FMINNUM %a, %b
    %c:_(<3 x s32>) = COPY %minnum(<3 x s32>)
    RET_ReallyLR implicit %c

to llvm/test/CodeGen/AArch64/GlobalISel/legalize-fminnum.mir?

This needs to use G_IMPLICIT_DEF, not IMPLICIT_DEF

I've added a test, but it's a little different. I am not a huge fan of tests with IMPLICIT_DEF/undef as they could just be simplified away in the future, and the <3 x s32> needs to go somewhere legal in order to be valid (I think). It adds a test for the legalization that is currently seen when lowering.

This revision was landed with ongoing or failed builds.Jul 30 2023, 11:35 PM
This revision was automatically updated to reflect the committed changes.