Page MenuHomePhabricator

Add 32-bit alignment minimum for globals in Thumb1 code
Needs ReviewPublic

Authored by mroth on Apr 17 2014, 8:08 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

Thumb1 already requests small types to be word-aligned. This patch extends that requirement to globals by setting the minimum alignment for them to 32 bits.

Diff Detail

Event Timeline

Hi Moritz,

Thumb1 already requests small types to be word-aligned.

How does Thumb1 make this request? It's got LDRB and friends for accessing non-word aligned variables.

I can imagine Cortex-M0 embedded developers getting a bit put out when suddenly all of their globals take 4 bytes of memory.

Cheers.

Tim.

Hi Tim,

sorry for the delay and thanks for your thoughts.

How does Thumb1 make this request?

I probably should've worded this differently - as well as their natural alignment requirements, the Thumb1 target datalayout string specifies a preferred alignment for i1/8/16 to 32 bits. If I'm not mistaken that means that certain parts of the back-end will try to improve alignment of these types beyond what's specified by the ABI.

It's got LDRB and friends for accessing non-word aligned variables.

The underlying problem is that the front-end always outputs these variables as byte-aligned, which prevents the back-end from doing any kinds of alignment-based optimisations. For example, a strcpy of a (small) global string constant could be fully inlined (using LDM and similar instructions) if the access was aligned. Of course this is mainly a performance concern since a call to memcpy/strcpy can deal with non-word aligned variables just fine.

I can imagine Cortex-M0 embedded developers getting a bit put out when suddenly all of their globals take 4 bytes of memory.

I agree with your concerns. Perhaps there should be a compiler option for this? The optimisations enabled by the greater alignment might improve code size and performance in certain cases, but since it's the front-end that makes the final alignment decision, I'm not sure if it's even possible to do some kind of analysis there. Also consider that this patch shouldn't affect any alignments explicitly specified by the programmer.

I guess more investigation is needed here in terms of trade-off, but at the moment the optimisations that would actually make use of this (ARMLoadStoreOptimizer, inline memcpy expansion) is disabled for Thumb1...

Any other thoughts would be appreciated!

Cheers
Moritz.