This is an archive of the discontinued LLVM Phabricator instance.

move DAGCombiner's allowableAlignment() helper function into the TLI
ClosedPublic

Authored by spatel on Jul 2 2015, 1:21 PM.

Details

Summary

Making allowableAlignment() more accessible was suggested as a predecessor patch
for D10662, so I've pulled it into TargetLowering. This let's us remove 4 instances
of duplicate logic in LegalizeDAG.

There's a subtle functional change in the implementation: the existing
allowableAlignment() code was using getPrefTypeAlignment() when checking
alignment with the DataLayout and assumed that was fast. In this implementation,
we use getABITypeAlignment() and assume that is fast. See the TODO comment
for future improvements in this implementation (don't use the data layout at all).

There are no regression test changes from this difference, and I'm not sure how to
expose it via a test. I think we actually do want to provide the 'Fast' param when
checking this from DAGCombiner::MergeConsecutiveStores(). Ie, we shouldn't merge
stores if the new stores are not going to be fast. But that change will require
fixing allowsMisalignedMemoryAccess() overrides as noted in D10662.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 28968.Jul 2 2015, 1:21 PM
spatel retitled this revision from to move DAGCombiner's allowableAlignment() helper function into the TLI.
spatel updated this object.
spatel added reviewers: jyknight, qcolombet.
spatel added a subscriber: llvm-commits.
jyknight added inline comments.Jul 6 2015, 1:51 PM
lib/CodeGen/TargetLoweringBase.cpp
1547 ↗(On Diff #28968)

This won't return the right result in Fast.

E.g. allowsMisalignedMemoryAccesses might return with { *Fast = false; return true; }, which would then cause this function to claim that a properly aligned memory access is not fast.

That's obviously wrong, but I'm not really sure what Right is.

What's the actual contract for the ABI alignment and Preferred alignment information in DataLayout?

I think there's three things that LLVM might want to know about alignment and memory accesses:

  1. If I can choose any alignment for this data, what should it be? (getPrefTypeAlignment should return that, I think)
  1. Is this memory access guaranteed to work *AT ALL*, no matter the speed. I think that's a combination of getABITypeAlignment, and then allowsMisalignedMemoryAccesses without checking "Fast".

    Although: is it always guaranteed that the ABI alignment for a type is always okay to load/store from? It'd be sane for it to be, but it's not really totally crazy to have a target where that's not true: e.g. ABI alignment of 32bits for an i64, but requires 64bit alignment for an i64 load/store?) I guess that's not the case of any existing target, since LLVM doesn't support it.
  1. Is it worthwhile to combine some memory accesses into a larger memory access, given alignment of X? I'm not sure this is actually adequately expressed in LLVM right now. Could be it's intended to be getPrefTypeAlignment and allowsMisalignedMemoryAccesses (with testing Fast==true) is supposed to be this.

    But, is "preferred alignment" and "fast to access" supposed to always be the same thing? Because, e.g. MIPS says i8:8:32-i16:16:32-i64:64 -- so both i8 and i16 have abi alignment of their size, but preferred alignment of 32 bits. I don't know a lot about MIPS, so I don't know why it claims a preferred alignment of 32bits, but I think it has a perfectly good 1/2-byte load instructions that work at their natural alignment.

    So on MIPS (even pre-MIPS32r6/MIPS64r6 which apparently made unaligned access allowed), it seems like it should always be an improvement to merge 8-bit loads which are 16-bit aligned into a 16-bit load. Yet, going by getPrefTypeAlignment would say that's not ok.
spatel added inline comments.
lib/CodeGen/TargetLoweringBase.cpp
1547 ↗(On Diff #28968)

Hi James -

Thanks for the close review - much appreciated!

I can certainly fix the Fast bug that you noted, but I also don't know what the right answer is.

My guess is that the MIPS DataLayout is wrong. Let's see if we can find out...

The 32-bit preferred alignment for MIPS i8/i16 was introduced way back here:
http://reviews.llvm.org/rL53585

Adding Bruno (author) and Daniel (currently listed as owner of the MIPS backend).

dsanders added inline comments.Jul 7 2015, 1:24 PM
lib/CodeGen/TargetLoweringBase.cpp
1547 ↗(On Diff #28968)

I'm afraid I'm not sure why we prefer 32-bit alignment for i8 and i16. My best guess is that it pays off in library calls by allowing better-optimised memcpy/strcpy/strcmp/etc.

I don't know a lot about MIPS, so I don't know why it claims a preferred alignment of 32bits,
but I think it has a perfectly good 1/2-byte load instructions that work at their natural alignment.

That's right. As far as I know they're usually (possibly always) the same latency too.

... (even pre-MIPS32r6/MIPS64r6 which apparently made unaligned access
allowed) ...

That's right. 32r6/64r6 dropped the unaligned load/store left/right instruction pairs in favour a promise that something in the system will make unaligned load/store work without special instructions. Depending on the MIPS implementation, the 'something' could be full hardware support, or a software exception handler, or a hybrid such as hardware-support for unaligned accesses inside a cache-line and software otherwise.

So on MIPS (even pre-MIPS32r6/MIPS64r6 which apparently made unaligned access
allowed), it seems like it should always be an improvement to merge 8-bit loads which are
16-bit aligned into a 16-bit load. Yet, going by getPrefTypeAlignment would say that's not ok.

As far as the load is concerned, I agree it's always an improvement to merge like this. The risk is that manipulating the data might cost more than the saving.

It's worth mentioning that some MIPS implementations will do this optimization in hardware without the risk of needing additional data manipulation.

jyknight edited edge metadata.Jul 13 2015, 9:58 AM
jyknight added a subscriber: nlewycky.

nlewycky: chandler says you might be able to help answer the questions in the comment thread.

mehdi_amini added inline comments.
lib/CodeGen/TargetLoweringBase.cpp
1553 ↗(On Diff #28968)

I'm removing the getDataLayout() from TargetLowering soon (D11079), you'll have to find another way of getting it (argument to the function?).

jyknight added inline comments.Jul 14 2015, 8:02 AM
lib/CodeGen/TargetLoweringBase.cpp
1547 ↗(On Diff #28968)

Even if there were no reason at all for MIPS to have preferred alignment set as it is, I think I've convinced myself that using "preferred" alignment in this function isn't correct. The "Fast" output should be indicating whether it's faster to do a potentially-misaligned load/store, or to use multiple load/store instructions. But "preferred" alignment is indicating the overall BEST alignment choice, not just what is going to be faster-or-equal than separate loads.

E.g. consider an i64:32:64 alignment spec, with a target that doesn't support under-aligned memory access (it supports the abi alignment of 32bits, but not below). It still is quite possible that the target behavior is such that merging two 32-bit loads to a 64bit load is a good idea -- e.g. that it's at worst the same speed as two 32-bit loads, but is still faster when there's 64bit alignment.

So perhaps the DataLayout should not actually be used to determine what memory alignments are valid for load/store on a target at all? That is, I'm thinking that targets maybe ought to implement "allowsMemoryAccess" directly, describing the entirety of that target's rules for what memory alignments work, and are "fast", rather than having "allowsMisalignedMemoryAccess" which describes only half of them, the other half intuited from the DataLayout. That seems a potentially saner model -- leaving the "Data Layout" spec to just describe the layout, and the target to describe what the hardware/runtime-environment can actually do, independent of that.

Alternatively it might be okay for the ABI alignment to just be assumed to be fast. If that's not the case, that's certainly a suboptimal ABI, but it wouldn't surprise me wouldn't surprise me if some ABIs were suboptimal in that way.

spatel added inline comments.Jul 21 2015, 9:35 AM
lib/CodeGen/TargetLoweringBase.cpp
1547 ↗(On Diff #28968)

I see your point about making a new target hook that implements all of 'allowsMemoryAccess' directly, but a quick look at the implementations of DataLayout::getAlignment() and DataLayout::getAlignmentInfo() makes me scared...

Let me post an updated version of the patch that assumes the ABI alignment is fast and see what people think.

spatel updated this revision to Diff 30265.Jul 21 2015, 9:51 AM
spatel edited edge metadata.

Patch updated:

  1. Changed implementation of allowsMemoryAccess() to check ABI alignment first and assume that ABI alignment is fast.
  2. Added DataLayout as parameter because it's no longer available from TargetLowering.
qcolombet added inline comments.Jul 28 2015, 2:29 PM
lib/CodeGen/TargetLoweringBase.cpp
1539 ↗(On Diff #30265)

I’d say that’s fine for the default behavior, but we could provide a way to change that at the target discretion by making the hook virtual.

qcolombet added inline comments.Jul 28 2015, 2:31 PM
lib/CodeGen/TargetLoweringBase.cpp
1539 ↗(On Diff #30265)

Note: We could also do the virtual thing when we get an actual use case. I do not feel strongly either way.

jyknight accepted this revision.Jul 28 2015, 10:17 PM
jyknight edited edge metadata.

I do still think it might be ultimately better and cleaner to not involve DataLayout at all, and to have the target hook describe the complete rules on what alignments can be loaded/stored, since DataLayout is really about the system's defined ABI, which can legitimately differ in different environments, even though the hardware's behavior does not. And I expect it'd actually be easier to understand that way as well.

But this is definitely a solid improvement, and is a good change both if considered as a step towards the above and as the final state, so LGTM. Please update the description to match the new revision before committing.

[I'll also note that there's still a few other places left that call allowsMisalignedMemoryAccesses after this; I bet most of those could also be updated to call allowsMemoryAccess instead.]

This revision is now accepted and ready to land.Jul 28 2015, 10:17 PM

I do still think it might be ultimately better and cleaner to not involve DataLayout at all, and to have the target hook describe the complete rules on what alignments can be loaded/stored, since DataLayout is really about the system's defined ABI, which can legitimately differ in different environments, even though the hardware's behavior does not. And I expect it'd actually be easier to understand that way as well.

Thanks, James and Quentin. I had considered Quentin's suggestion of making the hook virtual, but I'd prefer to do that when we have evidence that it's needed. That may come when I mess around with MergeConsecutiveStores() again. :)

I'll add a 'TODO' comment, so we have a reminder of this discussion in the code.

This revision was automatically updated to reflect the committed changes.
spatel updated this object.Jul 29 2015, 11:32 AM
spatel edited edge metadata.