This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
ClosedPublic

Authored by bjope on Nov 24 2017, 9:21 AM.

Details

Summary

Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.

A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.

Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.

Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get

t76: i64,ch = load<LD8[FixedStack-9]

instead of

t37: i32,ch = load<LD4[FixedStack-10]>
t35: i32,ch = load<LD4[FixedStack-9]>
t41: i64 = build_pair t37, t35

before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.

Diff Detail

Repository
rL LLVM

Event Timeline

bjope created this revision.Nov 24 2017, 9:21 AM
niravd edited edge metadata.Nov 26 2017, 10:56 AM

This endianness problem is probably also latent where do load combination, but we should sink this check into areNonVolatileConsecutiveLoads and MatchLoadCombine.

bjope added a comment.Nov 27 2017, 7:16 AM

This endianness problem is probably also latent where do load combination, but we should sink this check into areNonVolatileConsecutiveLoads and MatchLoadCombine.

I've looked at MatchLoadCombine and it looks like it is trying to support both little and big endian. As well as combining the loads and doing a bswap if the ordering in memory does not match how the bytes are ordered after OR:ing the pieces together.

And areNonVolatileConsecutiveLoads is working just as it is described. It checks if one load is "Bytes * Dist" bytes after another load. I think it depends on the use case how to use the result of areNonVolatileConsecutiveLoads, and making it aware of endianess only mess up such a low level function.

I've done a double check myself for all in-tree uses and all the uses of areNonVolatileConsecutiveLoads should be fine as well.

So modulo a test case this LGTM.

This endianness problem is probably also latent where do load combination, but we should sink this check into areNonVolatileConsecutiveLoads and MatchLoadCombine.

I've looked at MatchLoadCombine and it looks like it is trying to support both little and big endian. As well as combining the loads and doing a bswap if the ordering in memory does not match how the bytes are ordered after OR:ing the pieces together.

And areNonVolatileConsecutiveLoads is working just as it is described. It checks if one load is "Bytes * Dist" bytes after another load. I think it depends on the use case how to use the result of areNonVolatileConsecutiveLoads, and making it aware of endianess only mess up such a low level function.

bjope added a comment.Nov 27 2017, 8:41 AM

I've done a double check myself for all in-tree uses and all the uses of areNonVolatileConsecutiveLoads should be fine as well.

So modulo a test case this LGTM.

So the existing test/CodeGen/PowerPC/anon_aggr.ll test was not enough?

What kind of test are you requesting? Something that dumps output after ISel to see that we got some load combines?
I had a hard time trying to do something like that. I basically need some target that will produce a BUILD_PAIR during ISel, often that happens when having some argument passed byval(?).
I found out that I could get the optimization to trigger for PowerPC, but PowerPC is during legalization splitting the combined load into two smaller loads again. That is what happens in test/CodeGen/PowerPC/anon_aggr.ll for one of the RUN-lines. So there is already an in-tree test for which the optimization now triggers, and where the end result ends up the same as before.

I actually wanted to create a test case that showed that we could get miscompiles earlier, but I haven't been able to do that for any in-tree-target. For that to happen I need SelectionDAG to create a BUILD_PAIR where the two loads are in non-consecutive order. I don't know how to trigger that.

Passing a struct of i16 by load to generate a BUILD_PAIR and

I had a hard time trying to do something like that. I basically need some target that will produce a BUILD_PAIR during ISel, often that happens when having some argument passed byval(?).
I found out that I could get the optimization to trigger for PowerPC, but PowerPC is during legalization splitting the combined load into two smaller loads again. That is what happens in test/CodeGen/PowerPC/anon_aggr.ll for one of the RUN-lines. So there is already an in-tree test for which the optimization now triggers, and where the end result ends up the same as before.

I actually wanted to create a test case that showed that we could get miscompiles earlier, but I haven't been able to do that for any in-tree-target. For that to happen I need SelectionDAG to create a BUILD_PAIR where the two loads are in non-consecutive order. I don't know how to trigger that.

Do they have to be non-consecutive? Could you get something by doing a direct operation on the BUILD_PAIR. Something like load a struct of {i16, i16} , citcast the struct to an i32, apply a (+ 1), and return it. Ideally we could have that case, but I suspect it's not worth the time to spend too much time working something out and a case dumping debug information is reasonable.

I've done a double check myself for all in-tree uses and all the uses of areNonVolatileConsecutiveLoads should be fine as well.

So modulo a test case this LGTM.

So the existing test/CodeGen/PowerPC/anon_aggr.ll test was not enough?

What kind of test are you requesting? Something that dumps output after ISel to see that we got some load combines?
I had a hard time trying to do something like that. I basically need some target that will produce a BUILD_PAIR during ISel, often that happens when having some argument passed byval(?).
I found out that I could get the optimization to trigger for PowerPC, but PowerPC is during legalization splitting the combined load into two smaller loads again. That is what happens in test/CodeGen/PowerPC/anon_aggr.ll for one of the RUN-lines. So there is already an in-tree test for which the optimization now triggers, and where the end result ends up the same as before.

I actually wanted to create a test case that showed that we could get miscompiles earlier, but I haven't been able to do that for any in-tree-target. For that to happen I need SelectionDAG to create a BUILD_PAIR where the two loads are in non-consecutive order. I don't know how to trigger that.

bjope updated this revision to Diff 125371.Dec 4 2017, 10:41 AM

Added a test case that checks that the build_pair -> combine load transform is done during ISel.

niravd accepted this revision.Dec 4 2017, 10:48 AM

LGTM. Thanks

This revision is now accepted and ready to land.Dec 4 2017, 10:48 AM
This revision was automatically updated to reflect the committed changes.