This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Do not emit an extra zero-extend for i1 argument
ClosedPublic

Authored by asavonic on Jul 30 2021, 7:11 AM.

Details

Summary

AAPCS requires i1 argument to be zero-extended to 8-bits by the caller.
Emit a new AArch64ISD::ASSERT_ZEXT_BOOL hint to enable some
optimization opportunities. In particular, when the argument is forwarded
to the callee, we can avoid zero-extension and use it as-is.

Diff Detail

Event Timeline

asavonic created this revision.Jul 30 2021, 7:11 AM
asavonic requested review of this revision.Jul 30 2021, 7:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2021, 7:11 AM

On the SelectionDAG ISel side, I'm not really happy we're working around the usual framework for dealing with known bits. Can we use some sort of target-specific equivalent to AssertZExt to encode the known bits, then use those known bits when we generate the call?

If it's possible to handle this using AssertZExt on the SDAG side, I think it would be nice to handle it similarly on the GISel side using G_ASSERT_ZEXT.

llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
1079

I think that you can handle most of the logic in here using MIPatternMatch:

Register OrigReg;
if (!mi_match(AI.Regs[0], MRI, m_GTrunc(m_Reg(OrigReg))))
   return false;
1088

Maybe this loop can be replaced with llvm::any_of?

asavonic updated this revision to Diff 368054.Aug 23 2021, 3:20 AM
asavonic edited the summary of this revision. (Show Details)

Used an AssertZExt to keep information about zero-extended low bits.
There were a number of places where the compiler failed to optimize this correctly (found by LIT tests), so I fixed them as well.

Matt added a subscriber: Matt.Aug 23 2021, 10:48 AM

This turned out much messier than I was hoping for...

I guess we have the following possibilities:

  1. Go back to the original patch.
  2. Go with something like this approach, maybe with a few small cleanups.
  3. Try to use target-specific nodes to dodge the legalization issues. Instead of using the target-independent nodes, introduce AArch64ISD::AssertZextBool/AArch64::ZERO_EXTEND_BOOL which have an i32 argument/result, and then DAGCombine them away after type legalization cleans up all the extend/truncate ops which make this harder to analyze. Maybe more code, but easier to understand the implications, I think.

You might want to add a testcase where the call isn't in the entry block; I think maybe the original patch handles this?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3889 ↗(On Diff #368054)

You can't unconditionally truncate to i1 here, I think; this might be after type legalization.

asavonic updated this revision to Diff 369430.Aug 30 2021, 6:11 AM
asavonic edited the summary of this revision. (Show Details)

Added AArch64ISD::ASSERT_ZEXT_BOOL opcode to avoid legalization issues with AssertZExt.
Added a test case with a call not in the entry block.

This turned out much messier than I was hoping for...

I guess we have the following possibilities:

  1. Go back to the original patch.
  2. Go with something like this approach, maybe with a few small cleanups.
  3. Try to use target-specific nodes to dodge the legalization issues. Instead of using the target-independent nodes, introduce AArch64ISD::AssertZextBool/AArch64::ZERO_EXTEND_BOOL which have an i32 argument/result, and then DAGCombine them away after type legalization cleans up all the extend/truncate ops which make this harder to analyze. Maybe more code, but easier to understand the implications, I think.

Thanks a lot! I decided to try the option 3, and it seems to work fine.

You might want to add a testcase where the call isn't in the entry block; I think maybe the original patch handles this?

This case does not work well with SelectionDAG - the hint and the call are in different blocks, and thus not visible.
I noticed that zeroext flag is propagated correctly between blocks, so we might be able to do the same for the assertion.
I guess that ideally we need to propagate known bits instead, but I'm not sure if it is possible to do this.

For cross-block information, we do some limited tracking in target-independent code. See FunctionLoweringInfo::ComputePHILiveOutRegInfo. We could maybe mess with that code to track more information, or add a target hook to SelectionDAG around GetLiveOutRegInfo(). Actually, we could maybe even just feed the info from ComputePHILiveOutRegInfo directly into ComputeKnownBits; we actually keep all the information around already? (Not sure what the history is here...)

But we can leave that for a followup. The current SelectionDAG changes seem fine. I'm not familiar enough with the globalisel bits to approve that.

If it's possible to handle this using AssertZExt on the SDAG side, I think it would be nice to handle it similarly on the GISel side using G_ASSERT_ZEXT.

Can you please check if the GISel changes are fine?

aemerson added inline comments.Sep 28 2021, 5:40 PM
llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
553

This can be LLT::scalar(8)

587

If these are bool args then we can assume OrigTy.getScalarSizeInBits() == 1 right?

588–589

MIRBuilder's builder methods return a MachineInstrBuilder that can be directly passed into most source operands of other builder methods. You can simplify these lines into something like:

MIRBuilder.buildTrunc(OrigReg, MIRBuilder.buildAssertZExt(WideTy, WideReg, 1));
asavonic updated this revision to Diff 378203.Oct 8 2021, 7:49 AM
asavonic added inline comments.Oct 8 2021, 7:51 AM
llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
553

Done.

587

Right. Added an assert and replaced the call with 1.

588–589

Thank you. Done.

aemerson accepted this revision.Oct 11 2021, 12:03 AM
This revision is now accepted and ready to land.Oct 11 2021, 12:03 AM
This revision was automatically updated to reflect the committed changes.