This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Reverse LICM pass for shift and rotate patterns.
Needs ReviewPublic

Authored by tkrupa on Jun 27 2018, 11:40 PM.

Details

Summary

This patch fetches back parts of shift and rotate patterns which were hoisted out of the loop in LICM pass (which prevented the patterns from combining). This resolves https://bugs.llvm.org/show_bug.cgi?id=37417 and enables D46946 and D47019 to proceed.

Diff Detail

Event Timeline

tkrupa created this revision.Jun 27 2018, 11:40 PM

I'm not sure D46946 and D47019 are good ideas in the first place, particularly D47019. Expanding an intrinsic to an 8-instruction sequence is getting past the point where we're actually getting any benefit from transforming intrinsic to native IR. Emitting a complicated lowering like that, and trying to recover it in isel seems very tricky to get right, and as far as I can tell we don't get much benefit.

If we are going to expand out x86 shift and rotate intrinsics, we should probably consider pattern-matching on IR, rather than waiting for SelectionDAG. Trying to work around isel limitations in this fashion is fragile, and will probably have a wider effect than you want.

For rotates, there was a proposal to add a generic IR intrinsic for variable rotates on llvmdev, due to the complications involved in late pattern-matching.

I'm not sure D46946 and D47019 are good ideas in the first place, particularly D47019. Expanding an intrinsic to an 8-instruction sequence is getting past the point where we're actually getting any benefit from transforming intrinsic to native IR. Emitting a complicated lowering like that, and trying to recover it in isel seems very tricky to get right, and as far as I can tell we don't get much benefit.

If we are going to expand out x86 shift and rotate intrinsics, we should probably consider pattern-matching on IR, rather than waiting for SelectionDAG. Trying to work around isel limitations in this fashion is fragile, and will probably have a wider effect than you want.

For rotates, there was a proposal to add a generic IR intrinsic for variable rotates on llvmdev, due to the complications involved in late pattern-matching.

For reference, that was here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html

And the discussion stopped with no objections that I see, but there are some open questions about the semantics and form of any intrinsic. The proposal was made by Fabian Giessen. I don't find a Phab ID for Fabian, so maybe we need to ping the dev thread?

I'm not sure D46946 and D47019 are good ideas in the first place, particularly D47019. Expanding an intrinsic to an 8-instruction sequence is getting past the point where we're actually getting any benefit from transforming intrinsic to native IR. Emitting a complicated lowering like that, and trying to recover it in isel seems very tricky to get right, and as far as I can tell we don't get much benefit.

If we are going to expand out x86 shift and rotate intrinsics, we should probably consider pattern-matching on IR, rather than waiting for SelectionDAG. Trying to work around isel limitations in this fashion is fragile, and will probably have a wider effect than you want.

For rotates, there was a proposal to add a generic IR intrinsic for variable rotates on llvmdev, due to the complications involved in late pattern-matching.

FWIW we don't do the whole pattern matching in lowering stage - most of the pattern disappears during the combining right after creating the DAG because the extra instructions are only needed in IR to avoid creating poison values. Because of this, we can't do pattern matching in IR. If it was possible I would just emit shortened IR - unless I'm missing something?

But all in all yeah, the whole thing is rather tricky.

spatel added a comment.Jul 2 2018, 9:35 AM

I'm not sure D46946 and D47019 are good ideas in the first place, particularly D47019. Expanding an intrinsic to an 8-instruction sequence is getting past the point where we're actually getting any benefit from transforming intrinsic to native IR. Emitting a complicated lowering like that, and trying to recover it in isel seems very tricky to get right, and as far as I can tell we don't get much benefit.

If we are going to expand out x86 shift and rotate intrinsics, we should probably consider pattern-matching on IR, rather than waiting for SelectionDAG. Trying to work around isel limitations in this fashion is fragile, and will probably have a wider effect than you want.

For rotates, there was a proposal to add a generic IR intrinsic for variable rotates on llvmdev, due to the complications involved in late pattern-matching.

FWIW we don't do the whole pattern matching in lowering stage - most of the pattern disappears during the combining right after creating the DAG because the extra instructions are only needed in IR to avoid creating poison values. Because of this, we can't do pattern matching in IR. If it was possible I would just emit shortened IR - unless I'm missing something?

But all in all yeah, the whole thing is rather tricky.

I've replied to the existing llvm-dev thread and referenced this patch. Let's see if we can find a good definition for target-independent intrinsic(s) there.