This is an archive of the discontinued LLVM Phabricator instance.

[AIX] asm output: use character literals in byte lists for strings
ClosedPublic

Authored by hubert.reinterpretcast on Jun 1 2020, 2:17 PM.

Details

Summary

This patch improves the assembly output produced for string literals by using character literals in byte lists. This provides the benefits of having printable characters appear as such in the assembly output and of having strings kept as logical units on the same line.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 2:17 PM

I'm sure this may have been considered, but given that .byte also supports a limited string syntax (given with the limitation that it can support ), is there a reason we can't just glue consecutive runs of printable characters together?

e.g. .byte 'h,'e,'l,'l,'o,' ,'w,'o,'r,'l,'d,'!,0012,0000 becomes:

.byte   'hello world!'
.byte 0012,0000

(Essentially mode shift between writing printable as a string and the comma seperated octal form, and just emitting a new .byte directive when we encounter the other type)

I'm sure this may have been considered, but

Yes, the patch description says consideration was given towards "having strings kept as logical units on the same line". This is beneficial for the purposes of being able to grep even when sequences mix "printable" and "non-printable" characters. For example, the UTF-8 code unit sequence for e with acute would likely end up on a line of its own if the alternative you mention is used. For a program with strings in French, I imagine that could be rather annoying.

daltenty added inline comments.Jun 3 2020, 8:50 PM
llvm/lib/MC/MCAsmStreamer.cpp
1052–1053

Why not ward off trouble below and just make the check reflect whether AsciizDirective is actually usable in this case upfront?

Data.size() == 1 || !(MAI->getAsciiDirective() || MAI->getByteListDirective() || MAI->getAsciizDirective() && Data.back() == 0)
1080

See above comment.

hubert.reinterpretcast marked 3 inline comments as done.
  • Address comment: Avoid situation where assert arises; reduce duplication
llvm/lib/MC/MCAsmStreamer.cpp
1052–1053

I've refactored to remove the duplication of the conditions.

daltenty accepted this revision.Jun 5 2020, 8:04 AM

LGTM

This revision is now accepted and ready to land.Jun 5 2020, 8:04 AM
llvm/lib/MC/MCAsmStreamer.cpp
986

May need to hedge against -Werror,-Wformat-security warning as seen recently on another commit.

daltenty added inline comments.Jun 30 2020, 2:49 PM
llvm/lib/MC/MCAsmStreamer.cpp
986

I'm assuming specifically we're worried about -Wformat-nonliteral.

I think the MAI property supporting other arbitrary character formatting strings is a nice to have, but not essential to this patch, since AIX is currently the only user. We could simplify to the just supporting a character prefix (single quote in our case) or something like that so we can use a fixed format for the character itself.

hubert.reinterpretcast marked an inline comment as done.
  • Update to avoid stored format string by using switch on a enum value
llvm/lib/MC/MCAsmStreamer.cpp
986

@daltenty, the patch has been updated to encode the known (and used) formats into an enumeration type. The implementation of the formatting itself is selected via a switch, which passes different lambdas depending on the case chosen.

daltenty accepted this revision.Sep 29 2020, 8:07 AM

LGTM, thanks!

llvm/lib/MC/MCAsmStreamer.cpp
983

nit: the name of this lambda isn't very descriptive, but I don't really have a better suggestion

This revision was landed with ongoing or failed builds.Sep 29 2020, 6:14 PM
This revision was automatically updated to reflect the committed changes.