This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Emit zext move when the source of the zext is AssertZext or AssertSext
ClosedPublic

Authored by wwei on Sep 16 2020, 8:51 AM.

Details

Summary

When the source of the zext is AssertZext or AssertSext, it is hard to know any information about the upper 32 bits,
so we should insert a zext move before emitting SUBREG_TO_REG to define the lower 32 bits.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=47543

Diff Detail

Event Timeline

wwei created this revision.Sep 16 2020, 8:51 AM
wwei requested review of this revision.Sep 16 2020, 8:51 AM
efriedma added a subscriber: spop.Sep 16 2020, 12:31 PM
efriedma added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.h
425

AssertSext/AssertZext don't really indicate anything either way; it would make sense to check the operand, maybe? That said, I guess the operand will usually be a CopyFromReg, so maybe it doesn't matter much.

More generally, I don't like the general approach of guessing what isel will do, as opposed to examining what isel actually did. Due to the way the dot product patterns were written, it's actually possible for an i32 EXTRACT_VECTOR_ELT to produce a value that isn't zero-extended (testcase follows). And I'm not confident there aren't other weird edge cases. I'd be happier doing this after isel, when we can tell what instruction actually produced the value in question.

define i64 @test_udot_v8i8(i8* nocapture readonly %a, i8* nocapture readonly %b) {
entry:
; CHECK-LABEL: test_udot_v8i8:
; CHECK:  udot {{v[0-9]+}}.2s, {{v[0-9]+}}.8b, {{v[0-9]+}}.8b
  %0 = bitcast i8* %a to <8 x i8>*
  %1 = load <8 x i8>, <8 x i8>* %0
  %2 = zext <8 x i8> %1 to <8 x i32>
  %3 = bitcast i8* %b to <8 x i8>*
  %4 = load <8 x i8>, <8 x i8>* %3
  %5 = zext <8 x i8> %4 to <8 x i32>
  %6 = mul nuw nsw <8 x i32> %5, %2
  %7 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> %6)
  %8 = zext i32 %7 to i64
  ret i64 %8
}
declare i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32>)
llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll
1

This testcase is way too complicated; can you extract out the bit that actually triggers this issue?

llvm/test/CodeGen/AArch64/shift_minsize.ll
62

This mov shouldn't be necessary, but the reason isn't really related to this patch, I guess.

wwei updated this revision to Diff 292542.Sep 17 2020, 9:27 AM
wwei added inline comments.Sep 17 2020, 9:30 AM
llvm/test/CodeGen/AArch64/arm64-assert-zext-sext.ll
1

Done, test cases have been simplified,

wwei added inline comments.Sep 17 2020, 10:10 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.h
425

I basically agree with you, it seems more reasonable doing it after isel.
Especially when DAG isel is based on basic block scope, it is difficult to consider global data/control flow. For example,in arm64-assert-zext-sext.ll case, both AssertZext and AssertZext will have operand CopyFromReg, but what really matters is where the operand (maybe EXTRACT_SUBREG or Truncate) of CopyFromReg comes from. Unfortunately, they're from other blocks. Therefore, in the ISEL process, we can only perform conservative instruction selection. This is the right way to fix bugs in pr47543.
I have also considered adding check for the operand of AssertSext/AssertZext, like below:

static inline bool isDef32(const SDNode &N) {
  unsigned Opc = N.getOpcode();
  if(Opc == AssertSext || Opc == AssertZext)
    Opc = N.getOperand(0).getOpcode();
  return Opc != ISD::TRUNCATE && Opc != TargetOpcode::EXTRACT_SUBREG &&
         Opc != ISD::CopyFromReg;
}

But I'm not sure whether the check is required. Finally, I refer to the implementation in x86.

def def32 : PatLeaf<(i32 GR32:$src), [{
  return N->getOpcode() != ISD::TRUNCATE &&
         N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&
         N->getOpcode() != ISD::CopyFromReg &&
         N->getOpcode() != ISD::AssertSext &&
         N->getOpcode() != ISD::AssertZext;
}]>;

Doing it after isel will be better. However, many scenarios may need to be considered. It is not clear what edge or strange cases may occur. The current patch is used as a trade-off solution.

efriedma accepted this revision.Sep 17 2020, 11:29 AM

LGTM

I'm okay with taking this as an incremental improvement, and then looking into a different approach later.

llvm/lib/Target/AArch64/AArch64ISelLowering.h
425

I don't anticipate any complex issues performing the transform after isel, but I could be missing something.

This revision is now accepted and ready to land.Sep 17 2020, 11:29 AM
yutsumi added a subscriber: yutsumi.Nov 4 2021, 5:58 PM