This is an archive of the discontinued LLVM Phabricator instance.

Output alignment in zerofill and comm only if needed
AbandonedPublic

Authored by gchatelet on Nov 28 2022, 1:09 AM.

Details

Summary

13 years ago, the following commit changed the way alignment was represented moving from Log2 of alignment to plain number of bytes.
https://github.com/llvm/llvm-project/commit/6a715dccdf596d39d2b480beac48385f1fa73219#diff-d6520fbf2b954c8679d0835ca53c6ae423b4354e5e517dcfb043446042249f98

This patch introduced a regression, relevant lines are the following:
Before : if (Pow2Alignment != 0) OS << ',' << Pow2Alignment;
After : if (ByteAlignment != 0) OS << ',' << Log2_32(ByteAlignment);

This impacted the emitCommonSymbol and emitZeroFill functions. Functions added later on like emitLocalCommonSymbol indeed output the alignment only when it is greater than one. e.g.,
https://github.com/llvm/llvm-project/blob/10d183b889daab4512d476c1645d24d4e8946e8c/llvm/lib/MC/MCAsmStreamer.cpp#L997
https://github.com/llvm/llvm-project/blob/10d183b889daab4512d476c1645d24d4e8946e8c/llvm/lib/MC/MCAsmStreamer.cpp#L1059

This patch makes the behavior uniform for all sections and outputs alignment only when greater than one.

Diff Detail

Event Timeline

gchatelet created this revision.Nov 28 2022, 1:09 AM
gchatelet requested review of this revision.Nov 28 2022, 1:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 1:09 AM
gchatelet retitled this revision from Output alignment in zerofill and comm only is needed to Output alignment in zerofill and comm only if needed.Nov 28 2022, 1:15 AM

I'm not familiar with .comm directives. Do we document them somewhere in LLVM ?

All I could find is this and the doc for gnu as. The latter states that:

When using ELF, the .comm directive takes an optional third argument. This
is the desired alignment of the symbol, specified as a byte boundary (for
example, an alignment of 16 means that the least significant 4 bits of the
address should be zero). The alignment must be an absolute expression, and
it must be a power of two. If ld allocates uninitialized memory for the
common symbol, it will use the alignment when placing the symbol. If no
alignment is specified, as will set the alignment to the largest power of
two less than or equal to the size of the symbol, up to a maximum of 16.

This seems to imply that the directives generated by LLVM are incorrect, as we're emitting log2 values rather than alignment values. What did I miss ?

This seems to imply that the directives generated by LLVM are incorrect, as we're emitting log2 values rather than alignment values. What did I miss ?

AFAIU (I'm no expert here), we write log2 values only if getCOMMDirectiveAlignmentIsInBytes() returns false.
I found the following instances :

@MaskRay WDYT?

MaskRay requested changes to this revision.EditedDec 4 2022, 5:06 PM

Sorry for the belated response. This change looks wrong. Without the third argument, the alignment appears min(st_size, 16) on ELF in GNU as.

alignment=1 (st_value=1) COMMON symbols will get wrong alignments.

This revision now requires changes to proceed.Dec 4 2022, 5:06 PM
gchatelet added a comment.EditedDec 6 2022, 6:02 AM

Sorry for the belated response. This change looks wrong. Without the third argument, the alignment appears min(st_size, 16) on ELF in GNU as.

alignment=1 (st_value=1) COMMON symbols will get wrong alignments.

Thx ! Is there a reason to differentiate between 0 and 1 here ? Would that be OK to output alignment unconditionally ?

edit: AFAICT the value 0 is never used. All tests pass with assert(ByteAlignment != 0); in both common and zerofill functions.

For the producer, it should be fine if the third argument is always produced. I think its existence may sometimes imply that the producer doesn't care about the alignment.
The patch as is doesn't seem improve things.

gchatelet abandoned this revision.Dec 19 2022, 5:41 AM