This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Add support for sinking operands to their users, if they are free.
ClosedPublic

Authored by fhahn on Jan 29 2019, 3:27 AM.

Details

Summary

This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

I would appreciate any feedback, especially related to where to best put
the target hook.

Event Timeline

fhahn created this revision.Jan 29 2019, 3:27 AM

Adding Simon and Sanjay in case this might be useful for X86 as well.

Hi Florian,

I've had a case where I have wanted to do this, so it looks useful to me! CGP seems like a reasonable place to perform this too and I think TargetLowering is an aptly named place to include the hook.

cheers,

llvm/lib/CodeGen/CodeGenPrepare.cpp
5984

Is this order of ops enforced?

5991

This doesn't need to larger than OpsToSink, same goes for MaybeDead.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8292

I'm assuming cast is all you need, otherwise you're missing a nullptr check.

We already have various sinking transforms in CGP, so this is the right approach IMO. I do wonder if we could adapt the existing (cmp/and-mask/etc) sinking that we already have in CGP to use this hook as a refinement, but that can be a follow-up of this patch.

I'm not sure yet if/how we'd use this with x86 shuffles, but there's always hope. :)
You might be interested in looking at some recent DAGCombiner and x86 patches that are motivated by narrowing the width of vector ops and shuffles:
D55126
D55866
D56604
D56875
D57156
D57336

fhahn updated this revision to Diff 184348.Jan 30 2019, 11:59 AM
fhahn marked an inline comment as done.

Address Sam's comments. Thanks!

fhahn marked 2 inline comments as done.Jan 30 2019, 12:06 PM
fhahn added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
5984

Do you mean enforced as in by an assertion? Not at the moment. I think we would could check it with OrderedInstructions, if you think it would be beneficial.

samparker accepted this revision.Jan 31 2019, 6:25 AM

LGTM with one comment.

llvm/lib/CodeGen/CodeGenPrepare.cpp
5984

It's okay, I hadn't noticed that you've added this requirement in the documentation.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8292

the same for below.

This revision is now accepted and ready to land.Jan 31 2019, 6:25 AM
fhahn added a comment.Feb 1 2019, 6:14 AM

Thanks Sam! I plan to commit this early next week, unless there are additional comments.

fhahn added a comment.Feb 1 2019, 6:16 AM

We already have various sinking transforms in CGP, so this is the right approach IMO. I do wonder if we could adapt the existing (cmp/and-mask/etc) sinking that we already have in CGP to use this hook as a refinement, but that can be a follow-up of this patch.

Thanks, I'll take a look at that as a follow-up.

I'm not sure yet if/how we'd use this with x86 shuffles, but there's always hope. :)

It is not really related to shuffles, but rather to any operations that could be done for free with a target instruction.

You might be interested in looking at some recent DAGCombiner and x86 patches that are motivated by narrowing the width of vector ops and shuffles:
D55126
D55866
D56604
D56875
D57156
D57336

Great, those look quite useful!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 2:27 AM