This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR.
ClosedPublic

Authored by rogfer01 on Jan 15 2018, 12:25 AM.

Details

Summary

In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do copies CPSR ↔ GPR but not all Thumb1 targets implement them.

The schedule can attempt, before attempting a copy, to clone the instructions but it does not currently do that for nodes with input glue. In this patch we introduce a target-hook to let the hook decide if a glued machinenode is still eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .

As a follow-up of this change we should actually implement the copies for the Thumb1 targets that do implement them and restrict the hook to the targets that can't really do such copy as these clones are not-ideal.

Diff Detail

Repository
rL LLVM

Event Timeline

rogfer01 created this revision.Jan 15 2018, 12:25 AM

If ADDC/ADDE produced glue, the check would be mostly straightforward: the transform is safe iff no value produced by the ADDE or a node glued to the ADDE is a predecessor of the ADDC or a node glued to the ADDC.

It's possible we can clone our way out of this problem: ScheduleDAGRRList::CopyAndMoveSuccessors has code to clone nodes. This actually already works in some cases... but not in all cases: it will clone an tADDSrr, but not a tADCS, because tADCS has a glue operand. Maybe we can modify this code to clone tADCS nodes. Or maybe we can add a hook to ScheduleDAGRRList::PickNodeToScheduleBottomUp to let the ARM target "copy" CPSR the way it wants to (using a tADCS/tADDSrr pair, instead of a COPY machineinstr we can't lower). Related to this, we should probably change the Thumb1 implementation of getCrossCopyRegClass; not strictly necessary, I think, but it would be easier to debug in the future if we crash earlier in the pipeline.

I don't think the approach in this patch works. Fundamentally, for nodes without glue, the DAG's CSE will merge them in ways which require either cross-class CPSR copies or cloning. Even patterns which look okay locally can collapse into patterns which generate an illegal COPY given DAGCombine optimizations and CSE.

Testcase follows which still fails with your patch:

target datalayout = "e-m:e-p:64:64-i128:64-v128:64:128-a:0:64-n64-S64"
target triple = "thumbv6---gnueabi"

; Function Attrs: norecurse nounwind readonly
define i128 @a(i64* nocapture readonly %z) local_unnamed_addr #0 {
entry:
  %0 = load i64, i64* %z, align 4
  %conv.i = zext i64 %0 to i128
  %arrayidx1 = getelementptr inbounds i64, i64* %z, i64 2
  %1 = load i64, i64* %arrayidx1, align 4
  %conv.i38 = zext i64 %1 to i128
  %shl.i39 = shl nuw i128 %conv.i38, 64
  %or = or i128 %shl.i39, %conv.i
  %arrayidx3 = getelementptr inbounds i64, i64* %z, i64 1
  %2 = load i64, i64* %arrayidx3, align 4
  %conv.i37 = zext i64 %2 to i128
  %arrayidx5 = getelementptr inbounds i64, i64* %z, i64 3
  %3 = load i64, i64* %arrayidx5, align 4
  %conv.i35 = zext i64 %3 to i128
  %shl.i36 = shl nuw i128 %conv.i35, 64
  %or7 = or i128 %shl.i36, %conv.i37
  %arrayidx10 = getelementptr inbounds i64, i64* %z, i64 4
  %4 = load i64, i64* %arrayidx10, align 4
  %conv.i64 = zext i64 %4 to i128
  %shl.i33 = shl nuw i128 %conv.i64, 64
  %or12 = or i128 %shl.i33, %conv.i
  %arrayidx15 = getelementptr inbounds i64, i64* %z, i64 5
  %5 = load i64, i64* %arrayidx15, align 4
  %conv.i30 = zext i64 %5 to i128
  %shl.i = shl nuw i128 %conv.i30, 64
  %or17 = or i128 %shl.i, %conv.i37
  %add = add i128 %or7, %or
  %add18 = add i128 %or17, %or12
  %mul = mul i128 %add18, %add
  ret i128 %mul
}

Hi @efriedma thanks a lot for the suggestions and the testcase. I already considered gluing though it impacts scheduling and the change will be noisy for tests but I'll look into your other suggestions too.

rogfer01 updated this revision to Diff 130418.Jan 18 2018, 8:40 AM
rogfer01 retitled this revision from [ARM] Avoid having to schedule a copy between CPSR and GPR in Thumb1 mode to [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR..
rogfer01 edited the summary of this revision. (Show Details)

Hi @efriedma gluing the nodes is too disruptive so I am exploring one of the alternatives you suggested. What do you think, does this look workable?

This approach seems okay. I mean, worst-case, you're causing an O(N^2) explosion in codesize, but it triggers very rarely in practice.

(Adding some x86 reviewers; you might want to use this hook to avoid generating pushf/popf.)

lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1136 ↗(On Diff #130418)

errs()?

1144 ↗(On Diff #130418)

Maybe clang-format this?

rogfer01 updated this revision to Diff 130884.Jan 22 2018, 6:55 AM

Update patch based on the review comments.

rogfer01 marked 2 inline comments as done.Jan 22 2018, 6:56 AM
This revision is now accepted and ready to land.Jan 29 2018, 3:21 PM

Thank you very much @efriedma !

This revision was automatically updated to reflect the committed changes.