This is an archive of the discontinued LLVM Phabricator instance.

Allow GNU inline asm using target-specific dialect variant
Needs ReviewPublic

Authored by uweigand on Jan 26 2021, 7:09 AM.

Details

Summary

LLVM has two notions of "assembler dialect" that are conceptually distinct, but are currently somewhat intermixed.

The first notion corresponds to platform-specific syntactical differences in assembler text, e.g. the distinction between AT&T and Intel-style assembler for x86 (and similar differences on other architectures). The second notion corresponds to C/C++ source level differences in handling the inputs/outputs of *inline* assembler statements, specifically the difference between GNU inline asm and MS inline asm.

These are conceptually different, but the current inline assembler handling always enforces AT&T style assembler with GNU inline asm and Intel-style assembler with MS inline asm. This is done even on non-Intel platforms, which may not even have AT&T vs. Intel assembler -- that's not an issue for MS inline asm, as this itself only exists on Intel, but it is a problem if a platform wants to use different assembler styles for *GNU* inline asm. (Specifically, we want to do this on SystemZ to distinguish between GAS and HLASM style assembler text.)

This patch allows for more flexibility here by moving the "connection" between assembler styles and inline asm types into platform-specific code. Specifically, in addition to the existing "MAI->getAssemblerDialect" which returns the default style to be used for assembler reading and writing (which is currently uses everywhere *except* for inline asm), this patch introduces a "MAI->getInlineAsmDialect" which returns the assembler style to be used for parsing inline asm. The routine gets the inline asm type as type and therefore can return different styles for GNU and MS inline asm, which is then done in the X86 back-end.

Diff Detail

Event Timeline

uweigand created this revision.Jan 26 2021, 7:09 AM
uweigand requested review of this revision.Jan 26 2021, 7:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2021, 7:09 AM
hubert.reinterpretcast added inline comments.
llvm/include/llvm/MC/MCAsmInfo.h
572

The parameter here needs to be explained more clearly. This function seems to be compensating for problems in the applicability/correctness of what the input argument represents. I think a function that just actually gives a default dialect for the inline assembly would be easier to understand. I guess that in cases where the input to this function matters, we should consider changing the code where the input argument was originally set.

llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
21–22

Why is this comment removed?

rnk added a comment.Feb 4 2021, 10:24 AM

Sorry, I got a bit lost initially thinking about MS/Intel/GNU inline asm. The main motivation for this patch is SystemZ HLASM vs GNU asm. The idea behind this patch is that:

  • LLVM will continue to emit GNU-style assembly on SystemZ (non-inline writing case)
  • LLVM will continue to parse .s files in the GNU dialect by default (non-inline reading case)
  • Inline asm will now be parsed into MCInstructions using the HLASM dialect, and then the MCInstructions will either be directly emitted or textually emitted in the GNU dialect

That seems reasonable, but I wonder if the change should be in the frontend. Clang actually supports mixing GCC and MSVC inline asm in the same translation unit, and it uses the dialect flag in the IR to control this (I think). This is important if you want to support LTO between two TUs that use GCC and HLASM. Maybe there is exactly zero inline asm that uses GCC-style asm for SystemZ, so maybe this isn't a concern, but I thought I'd ask.

llvm/include/llvm/MC/MCAsmInfo.h
572

Can this be a field instead of a virtual method? Most MCAsmInfo customization points seem to be data instead of code, and I think it's usually easier to reason about data than code. I see that this method has a parameter and that the x86 implementations use it, and I'm not immediately sure how to replace that, but it seems like there might be a way to do it.

In D95444#2542739, @rnk wrote:
  • Inline asm will now be parsed into MCInstructions using the HLASM dialect, and then the MCInstructions will either be directly emitted or textually emitted in the GNU dialect

Yes this is the idea behind this patch and a series of subsequent patches (main RFC about introducing HLASM syntax):

That seems reasonable, but I wonder if the change should be in the frontend. Clang actually supports mixing GCC and MSVC inline asm in the same translation unit, and it uses the dialect flag in the IR to control this (I think). This is important if you want to support LTO between two TUs that use GCC and HLASM. Maybe there is exactly zero inline asm that uses GCC-style asm for SystemZ, so maybe this isn't a concern, but I thought I'd ask.

(I'm also commenting on this since I will be working on putting up some of the patches pertaining to the HLASM syntax soon.).

Would you please be able to expand on this a bit more? I don't believe we will allow intermixing of gnu asm and hlasm in an inline asm context anyway. Since the standard format of an HLASM statement (atleast the support that we're adding for it) has the same format of a gnu asm statement (with respect to its inputs, outputs, clobbers etc.). The dialect's "rules" comes into play while parsing the inline asm instructions into corresponding MCInst (like you mentioned) and have them emitted (either to object file or assembly file). The parsing of the input operands, output operands, clobbers remain unchanged. My apologies if I haven't seen your point you're trying to make right away.

In D95444#2542739, @rnk wrote:

Clang actually supports mixing GCC and MSVC inline asm in the same translation unit, and it uses the dialect flag in the IR to control this (I think). This is important if you want to support LTO between two TUs that use GCC and HLASM. Maybe there is exactly zero inline asm that uses GCC-style asm for SystemZ, so maybe this isn't a concern, but I thought I'd ask.

I'm concerned that "GCC" is overloaded here. The "GCC" that contrasts with MSVC is a C/C++ level parsing difference. The "GCC" that contrasts with HLASM has to do with the string content in a GNU-style inline assembly construct.

rnk added a comment.Feb 5 2021, 2:42 PM
In D95444#2542739, @rnk wrote:

Clang actually supports mixing GCC and MSVC inline asm in the same translation unit, and it uses the dialect flag in the IR to control this (I think). This is important if you want to support LTO between two TUs that use GCC and HLASM. Maybe there is exactly zero inline asm that uses GCC-style asm for SystemZ, so maybe this isn't a concern, but I thought I'd ask.

I'm concerned that "GCC" is overloaded here. The "GCC" that contrasts with MSVC is a C/C++ level parsing difference. The "GCC" that contrasts with HLASM has to do with the string content in a GNU-style inline assembly construct.

Yes, sorry, I meant the assembly dialect that the GNU binutils assembler parses. What I'm trying to get at is that you might want to support separate blobs that use different dialects:

void foo() { asm volatile ("gnu style asm"); }
void bar() { asm volatile ("HLASM style asm"); }

If there is one global setting for the inline asm dialect, this won't work. You could create a flag to control the setting, but then if you want to use LTO on two objects that use different assembly dialects, the command line flag isn't sufficient. To fully address this use case, you would want to put the dialect on each inline assembly blob, similar to the way we handle the existing intel/gnu dialect flag.

In D95444#2546235, @rnk wrote:

Yes, sorry, I meant the assembly dialect that the GNU binutils assembler parses. What I'm trying to get at is that you might want to support separate blobs that use different dialects:

void foo() { asm volatile ("gnu style asm"); }
void bar() { asm volatile ("HLASM style asm"); }

If there is one global setting for the inline asm dialect, this won't work. You could create a flag to control the setting, but then if you want to use LTO on two objects that use different assembly dialects, the command line flag isn't sufficient. To fully address this use case, you would want to put the dialect on each inline assembly blob, similar to the way we handle the existing intel/gnu dialect flag.

At least for our use case, this is not relevant. We want to use the "GCC style asm" whenever compiling for the z/OS target, and the "HLASM style asm" whenever compiling for the Linux target. These are incompatible anyway (different ABI, different target OS), so they cannot be combined using LTO. We simply need to use asm dialects since both targets are implemented in a single SystemZ back-end. (While the OS and ABI are different, the ISA is of course the same, so it does not make sense to use a different back-end.)

For this use case to work correctly, the back-end must be able to set the asm variant to be used for everything: writing asm source output, assembling a full asm source file, function-level inline asm, and module-level inline asm. There's really not much point in the front-end getting involved here, since the asm variant is already fully determined by the target triple.

Originally, I thought this would be a simple matter of the back-end setting the AssemblerDialect variable in its MCAsmInfo. However, it turned out this variable was actually only used for writing asm source output and assembling a full asm source file. For function-level inline asm, the dialect is taked from the IR, and for module-level inline asm, the dialect is hard-coded to 0 no matter what. This is what this patch was intended to fix.

Now, I fully agree that there are open questions on what the best implementation would be. Specifically, as Hubert points out above, the IR-level "dialect" field is currently overloaded with two semantics: it indicates both that the inline asm should be parsed using the MS-asm input/output syntax and that the Intel-style mnemonic/operand syntax should be used.

The idea of this patch is to explicitly decouple those two semantics by having the IR-level "dialect" only refer to the GCC vs. MS input/output syntax, and allowing the mnemonic/operand syntax to be always set by the back-end with no influence from the front-end / IR. In order to allow the X86 back-end to preserve existing behaviour, this meant that the back-end must be allowed to chose different mnemonic/operand dialects depending for GCC vs. MS input/output syntax inline asm statements, and therefore this is not a simple variable but a function.

If you have any better suggestions on how to implement this, I'd be happy to try those out as well ...

llvm/include/llvm/MC/MCAsmInfo.h
572

The intent is not to "compensate for problems", really, but to clearly separate the two notions of "GNU vs. MS input/output syntax" from "AT&T vs. Intel mnemonic/operand syntax". This function maps a value from the first domain into one of the second domain, allowing the back-end to provide the appropriate default mnemonic/operand syntax depending on the GNU vs. MS inline asm statement.

llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
21–22

Well, the whole reason the numbering had to match was that the value is currently overloaded between the "GNU vs. MS" values and "AT&T vs Intel" values. With the patch, the overload is no longer present (the correspondence is rather explicitly provided via a function), and therefore the numbers no longer have to match.

At least for our use case, this is not relevant. We want to use the "GCC style asm" whenever compiling for the z/OS target, and the "HLASM style asm" whenever compiling for the Linux target. These are incompatible anyway (different ABI, different target OS), so they cannot be combined using LTO. We simply need to use asm dialects since both targets are implemented in a single SystemZ back-end. (While the OS and ABI are different, the ISA is of course the same, so it does not make sense to use a different back-end.)

@uweigand, its the other way right? We want to use the "GNU style asm" when compiling for the Linux target (on the Z backend) and use the "HLASM style asm" when compiling for the z/OS target (on the Z backend).

At least for our use case, this is not relevant. We want to use the "GCC style asm" whenever compiling for the z/OS target, and the "HLASM style asm" whenever compiling for the Linux target. These are incompatible anyway (different ABI, different target OS), so they cannot be combined using LTO. We simply need to use asm dialects since both targets are implemented in a single SystemZ back-end. (While the OS and ABI are different, the ISA is of course the same, so it does not make sense to use a different back-end.)

@uweigand, its the other way right? We want to use the "GNU style asm" when compiling for the Linux target (on the Z backend) and use the "HLASM style asm" when compiling for the z/OS target (on the Z backend).

Ah, of course, thanks for catching this. The rest of the argument remains unchanged, though :-)

llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
21–22

This patch provides a correspondence from some input to these values. I am not seeing where the consumers (i.e., via AsmWriterFlavor and AssemblerDialect, the callers of getAssemblerDialect) are being updated by this patch to not be sensitive to the specific numbering here.

uweigand added inline comments.Feb 9 2021, 1:46 AM
llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
21–22

It used to be the case that those numbers had to match the values of the AsmDialect enum (i.e. ATT == AsmDialect::AD_ATT and Intel == AsmDialect::AD_Intel). This is what I interpreted the comment to refer to, and that no longer holds true.

The only consumers of that value should be now the interpretation of {..|..} variants in assembler strings. Of course it is still the case that the variant numbers cannot be arbitrarily changed if inline asm statements in existing source code already use the {..|..} syntax. If that's what the comment is supposed to refer to, then it would remain true.

llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
21–22

Yes, that syntax is what I interpret the comment to refer to. The EmitGCCInlineAsmStr function operates using these values as arguments to its AsmPrinterVariant parameter.

llvm/include/llvm/MC/MCAsmInfo.h
572

The intent is [ ... ] to clearly separate the two notions of "GNU vs. MS input/output syntax" from "AT&T vs. Intel mnemonic/operand syntax".

Some renaming of the enumerators in the first domain would aid comprehension and make this patch easier to move forward.

Matt added a subscriber: Matt.Oct 2 2021, 6:37 AM