This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][ARM] Enable Swing Module Scheduling for ARM
ClosedPublic

Authored by dpenry on Mar 29 2022, 11:06 AM.

Details

Summary

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7. The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions. These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Diff Detail

Event Timeline

dpenry created this revision.Mar 29 2022, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 11:06 AM
dpenry requested review of this revision.Mar 29 2022, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 11:06 AM

Sounds like a nice addition.

llvm/lib/CodeGen/MachinePipeliner.cpp
1671

Can you take the NFC cleanup and spelling fixes and commit those separately. They look like good fixes on their own.

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
505

Are these changes for removeBranch/insertBranch/reverseBranchCondition etc needed here, or should they be part of the patch that adds analyzeBranch for t2LoopEnd?

6760

I think it's usually preferred to use llvm_unreachable to assert(false)

llvm/test/CodeGen/ARM/O3-pipeline.ll
8

This looks like it should be removed.

dpenry updated this revision to Diff 419563.Mar 31 2022, 1:42 PM

Responding to review comments:

  • Split out NFC cleanup
  • Removed t2LoopEnd related changes
  • llvm_unreachable added
  • Removed generated header line from test
dpenry marked 4 inline comments as done.Mar 31 2022, 1:44 PM

mark comments done

It's great to see extensions to the pipeliner so that it works on more targets and loops that don't assume a hardware loop. A little while back a couple of functions, see shoulgnoreForPipelining, were added to support regular loops. However, I don't see that function used. Perhaps that effort was never completely finished? Do we need to get rid of that function, or try to combine the work you've done with that work?

lkail added a subscriber: lkail.Apr 6 2022, 7:17 PM
dpenry added a comment.EditedApr 7 2022, 8:05 AM

It's great to see extensions to the pipeliner so that it works on more targets and loops that don't assume a hardware loop. A little while back a couple of functions, see shoulgnoreForPipelining, were added to support regular loops. However, I don't see that function used. Perhaps that effort was never completely finished? Do we need to get rid of that function, or try to combine the work you've done with that work?

shouldIgnoreForPipelining is being called in SMSchedule::computeUnpipelinableNodes and is implemented in ARMPipelinerLoopInfo. I have made a slight change to its semantics. Instead of returning true for all instructions which should stay in stage 0, it need only return a "seed" instruction or instructions -- typically the one which computes the loop exit condition. Then computeUnpipelineableNodes finds all the transitive fan-in to that instruction (or group of instructions) in the dependence graph and marks those instructions as staying in stage 0 -- which is required for correctness. I moved the discovery of the transitive fan-in into SMSchedule because a) all regular loops on any architecture are going to need to do this, so common code seems a good place for it; and b) SMSchedule has been given the dependence graph, so it already has the information needed to compute the transitive fan-in, while shouldIgnoreForPipelining would have needed an interface change to get the best/most information (either the graph or things like AA needed to compute an equivalent graph). Note also that this will work not just for loops with a nice canonical induction variable, but also is correct for while loops and loops in which the exit condition is memory-dependent.

dmgreen added inline comments.Apr 12 2022, 12:59 AM
llvm/lib/CodeGen/MachinePipeliner.cpp
2715

I don't think you need to clear this if it has just been created

llvm/lib/CodeGen/ModuloSchedule.cpp
318

Do you have test cases for these different fixes? I see one, but it would be good to make sure we are testing various situations if they can come up.

llvm/test/CodeGen/Thumb2/swp-fixedii.mir
6–7

You can probably remove this, maybe hidden, local_unnamed_addr and sometimes the attributes too.

55

I usually delete some of this that isn't needed. Like the FrameInfo and empty stacks.

dpenry marked 4 inline comments as done.Apr 12 2022, 11:18 AM

Made modifications per dmgreen. Uploading new patch.

dpenry updated this revision to Diff 422301.Apr 12 2022, 11:18 AM

Changes as per dmgreen

dmgreen accepted this revision.Apr 14 2022, 6:26 AM

Thanks for the changes. This LGTM, but please wait a couple of days in case there are other comments.

This revision is now accepted and ready to land.Apr 14 2022, 6:26 AM
dpenry updated this revision to Diff 424548.Apr 22 2022, 11:33 AM

Rebase in hopes the pre-merge buildbot will be kind

JanekvO added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
2793

Is it possible to get an ordering where a use is scheduled prior to the CycleDef?

dpenry added inline comments.Apr 26 2022, 8:12 AM
llvm/lib/CodeGen/MachinePipeliner.cpp
2793

It shouldn't be possible.

When dependence latencies are greater than zero, schedulePipeline only assigns uses after deps in the full, uncollapsed schedule. When they are zero, it might assign them to the same cycle and get them in the wrong order -- if it assigns them to a different cycle they won't be wrong. The collapsing of the schedule can introduce misorderings when the use and def are not in the same stage. orderDependence fixes both of these cases for virtual registers, but not physical registers. The first test here reports any cross-stage physical register use/def as invalid, thus conservatively taking care of problems caused when collapsing. The second test reports same-cycle/same-stage use/def pairs as invalid, thus conservatively taking care of problems in the original ordering.

There wouldn't be anything wrong with making this a <= check just to be sure -- and it would be more resilient to changes in other parts of the code -- so I'll go ahead and do that.

dpenry updated this revision to Diff 425239.Apr 26 2022, 8:56 AM

Edit in response to JanekvO's comment

thopre added a subscriber: thopre.Apr 27 2022, 3:36 AM
thopre added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
1796–1797

Should we use ignoreDependence here?

1802

Sorry for the naive question but why are anti-dependences ok here?

2272

Maybe fix the typo while you're at it

2277

Can we drop Dep.isArtificial()?

dpenry marked an inline comment as done.Apr 27 2022, 8:32 AM
dpenry added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
1796–1797

Even if we were to change ignoreDependence to be just a test for boundary node and a guarded test for anti-dependence, I think we shouldn't call it here. We're not trying to ignore a dependence here, but rather, we're trying to avoid putting the ExitSU node into the node set, and directly testing for that is more clear. If I'm reading the history of the code right, the artificial test was originally an attempt to do that and then the addition of more artificial edges due to the CopyToPHI DAG mutation obscured what is going on.

I'd prefer to sort out "where is only a boundary node check" and "where is ignoreDependence" and "can we drop artificial edge checks" in another patch, in part because I was hoping that this particular patch wouldn't affect Hexagon or PowerPC.

1802

No problem. The more people that understand the code, the better off we'll be!

At any rate, the original SMS paper wants to add all connected nodes here, including those connected by anti-dependences.

"E is the set of edges, where each edge (u,v) Î E
represents a dependence from operation u to operation
v. Only data dependences (flow, anti and output-dependences)
are included ..."

"Finally, the remaining nodes are
grouped into sets of the same priority but this priority is
lower than that of the sets containing recurrences. Each one
of these sets consists of the nodes of a connected
component of the graph that do not belong to any previous
set."

2272

I had some typo fixes, but was requested to move them to a separate patch.

2277

I expect we could, but as I mentioned above, I'd prefer to clean up the extra artificial checks in a separate patch.

thopre added inline comments.Apr 27 2022, 12:52 PM
llvm/lib/CodeGen/MachinePipeliner.cpp
1796–1797

Ah ok, thanks for the explanation.

2272

Ok

2277

Ok.

This revision was landed with ongoing or failed builds.Apr 28 2022, 1:01 PM
This revision was automatically updated to reflect the committed changes.
dpenry marked an inline comment as done.

The reversion is because buildbot clang builds failed; there's an unused private field in a structure that isn't liked.