This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Select SMULL for zero extended vectors when top bit is zero
ClosedPublic

Authored by zjaffal on Sep 27 2022, 12:55 AM.

Details

Summary

we can safely replace a zext instruction with sext if the top bit is zero. This is useful because we can select smull when both operands are sign extended.

Diff Detail

Event Timeline

zjaffal created this revision.Sep 27 2022, 12:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2022, 12:55 AM
zjaffal requested review of this revision.Sep 27 2022, 12:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2022, 12:55 AM
HsiangKai added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4617

NewOpc has no value if LastBitValue is not zero in the previous if block. It will lost some opportunities to use UMULL to do unsigned multiplication. It is why some test cases become worse.

Do you think it is a good way to check NewOpc before checking isN0ZExt && isN1ZExt? That is

if (!NewOpc) {
  if (isN0ZExt && isN1ZExt)
    NewOpc = AArch64ISD::UMULL;
  else if (isN1SExt || isN1ZExt) {
    // ...
  }
}
fhahn requested changes to this revision.Sep 28 2022, 1:29 AM

Thanks for the patch @zjaffal! Please update the code as suggested by @HsiangKai to fix the regressions in the test case.

This revision now requires changes to proceed.Sep 28 2022, 1:29 AM
zjaffal updated this revision to Diff 463505.Sep 28 2022, 5:01 AM

Fix regression caused by not selecting UMULL when we can

fhahn added inline comments.Sep 28 2022, 5:30 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4600

The logic with the nested ifs is a bit hard to follow now.

Would it be possible to simplify the code by moving all those checks to a static helper function which returns the new opcode (and updates the other variables as required)? This would allow to use early exits to reduce the need for nested of conditions.

spatel added inline comments.Sep 28 2022, 5:49 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4606

Just a couple of drive-by comments:

  1. Why is this passing depth 4?
  2. Use DAG.SignBitIsZero() to make this easier to read? ("last bit" usually means the LSB, not the MSB/signbit).
zjaffal added inline comments.Sep 29 2022, 3:00 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4606

The depth is 4 following a similar code block in the same file

// Check if the value is zero-extended from i1 to i8
static bool checkZExtBool(SDValue Arg, const SelectionDAG &DAG) {
  unsigned SizeInBits = Arg.getValueType().getSizeInBits();
  if (SizeInBits < 8)
    return false;

  APInt RequredZero(SizeInBits, 0xFE);
  KnownBits Bits = DAG.computeKnownBits(Arg, 4);
  bool ZExtBool = (Bits.Zero & RequredZero) == RequredZero;
  return ZExtBool;
}

I tested the implementation using DAG.SignBitIsZero() and it works with depth=0

zjaffal updated this revision to Diff 463823.Sep 29 2022, 3:20 AM

Refactor common code for selecting multiplication instruction out of the main code for LowerMul, rewrite checking for the top bit to use DAG.SignBitIsZero and refactor on top of the added new tests

fhahn added a comment.Sep 29 2022, 4:02 AM

Thanks for the update! Could you split out the code that adds selectUmullSmull into a separate NFC patch and then update this patch to just add the new handling for converting zext to sext?

zjaffal updated this revision to Diff 464178.Sep 30 2022, 1:23 AM

rebase after lint of parent patch

zjaffal updated this revision to Diff 464301.Sep 30 2022, 9:12 AM

Rebase on top of main

dmgreen added inline comments.Nov 9 2022, 2:06 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4531

Is there some other changes needed here, to pass a DL through to the function from the callsites of selectUmullSmull?

zjaffal updated this revision to Diff 474252.Nov 9 2022, 6:17 AM

Rebase on top of main and edit the function call to include DL

dmgreen accepted this revision.Nov 9 2022, 2:06 PM

Thanks. A small nit, but otherwise this looks like a good patch. The newly created node will never be used, but I don't think that would be a problem. LGTM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4540

Small nit but can you move this above the new code? This condition and the new one are mutually exclusive, and it would seem a little better to have the check for two zext next to the bool IsN0ZExt = ..., the same as the base smull case.

zjaffal updated this revision to Diff 475369.Nov 15 2022, 1:41 AM

Move code blocko

fhahn accepted this revision.Dec 7 2022, 7:39 AM

LGTM, thanks!

This revision is now accepted and ready to land.Dec 7 2022, 7:39 AM
This revision was landed with ongoing or failed builds.Dec 7 2022, 11:06 PM
This revision was automatically updated to reflect the committed changes.