This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold endian-independent load sequence into a single load.
Changes PlannedPublic

Authored by ab on Oct 21 2014, 5:21 PM.

Details

Reviewers
majnemer
hfinkel
Summary

This patch combines endian-independent load sequences, that look like this:

(x[3]<<24 | x[2]<<16 | x[1]<<8 | x[0])

into a single load (bswapped if the data endianness is different from the target endianness):

*((int*)x)

One notable issue is alignment: this patch produces 1-aligned loads, no matter the size. In practice, this means that on some targets (ARM comes to mind), the load will codegen into the same original shift/or sequence. I'm not sure if there's a way to discover alignment for this sort of situation. But no matter what, this should always be profitable.

It doesn't happen very often in practice (LNT is still running), so I tried to avoid being too expensive (this is pretty different from the other OR combines).

Also, the combining is aborted if there are *any* stores between the first and last load in the sequence. This needs a loop scanning up from the last instruction. As a safeguard, I put a max number of instructions on that loop, but by then, we already know the combine is valid, so I'm not sure if it's a good idea to abort that late.

Thanks!

  • Ahmed

Diff Detail

Event Timeline

ab updated this revision to Diff 15220.Oct 21 2014, 5:21 PM
ab retitled this revision from to [InstCombine] Fold endian-independent load sequence into a single load..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added a reviewer: majnemer.
ab added a subscriber: Unknown Object (MLST).
hfinkel added inline comments.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2069

As you require these loads to all be in the same block and the initial instruction when you scan later, why don't you get the load's parent block here and bail out earlier?

2211

Okay, but why not be smart? Making an AA query is easy.

ab updated this revision to Diff 15257.Oct 22 2014, 10:47 AM

Address Hal's comments:

  • bail out early if the byte load instruction isn't in the same block as the shift
  • query AA instead of bailing out for all stores

Also, test that the unaligned folded load is made to be aligned (this is done by other InstCombines, but it's pretty important for this; I can remove the test if desired)

ab updated this revision to Diff 15266.Oct 22 2014, 11:58 AM
ab added a reviewer: hfinkel.
ab removed a subscriber: hfinkel.

Keep the alignment from the first byte load instruction.
Also, refactor the ByteLoads handling to make it endian-independent (from a memory standpoint).

The previous patch relied on the later InstCombines to set the alignment, but we can also do it here. I didn't do that from the beginning because I couldn't think of a case where that's possible but the later InstCombine isn't. I still can't, but that's no reason not to do it here!

ab planned changes to this revision.Nov 13 2014, 9:09 AM