This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
thinlto-inline-asm.c
-
llvm/lib/Object/
-
lib/
-
Object/
-
ModuleSymbolTable.cpp

Differential D82862

[ThinLTO] Always parse module level inline asm with At&t dialect
ClosedPublic

Authored by hans on Jun 30 2020, 4:08 AM.

Download Raw Diff

Details

Reviewers

pcc
thakis

Commits

rGa8e582c8307b: [ThinLTO] Always parse module level inline asm with At&t dialect (PR46503)

Summary

clang-cl passes -x86-asm-syntax=intel to the cc1 invocation so that assembly listings produced by the /FA flag are printed in intel dialect. That flag however should not affect the *parsing* of inline assembly in the program. (See llvm.org/r322652)

When compiling normally, AsmPrinter::emitInlineAsm is used for assembling and defaults to At&t dialect. However, when compiling for ThinLTO, the code which parses module level inline asm to find symbols for the symbol table was failing to set the dialect. This patch fixes that. (See PR46503)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hans created this revision.Jun 30 2020, 4:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2020, 4:08 AM

Herald added subscribers: llvm-commits, dexonsmith, steven_wu and 2 others. · View Herald Transcript

thakis accepted this revision.Jun 30 2020, 7:09 AM

This revision is now accepted and ready to land.Jun 30 2020, 7:09 AM

Closed by commit rGa8e582c8307b: [ThinLTO] Always parse module level inline asm with At&t dialect (PR46503) (authored by hans). · Explain WhyJul 1 2020, 1:02 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 1:03 AM

Hi @hans , we're having some issues with using the AssemblerDialect mechanism to support both the GNU assembler and the IBM HLASM assembler in the SystemZ back-end. See also the discussion started here: https://lists.llvm.org/pipermail/llvm-dev/2020-November/146875.html

One of the issues that showed up is what you're refering to above:

That flag however should not affect the *parsing* of inline assembly in the program.

I'm wondering why this is true. I mean, it is true that the flag currently does not affect parsing of inline asm, but I'm wondering whether it *should* be that way.

Note that the -x86-asm-syntax=intel LLVM flag is used to implement the -masm=intel clang command line option, which exists also in GCC and where hopefully the two compilers should be compatible. But in GCC, that flag is documented to affect parsing of inline assembly just like it affects output. See https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/x86-Options.html#x86-Options

-masm=dialect

Output assembly instructions using selected dialect. Also affects which dialect is used for basic asm (see Basic Asm) and extended asm (see Extended Asm). Supported choices (in dialect order) are ‘att’ or ‘intel’. The default is ‘att’. Darwin does not support ‘intel’.

What is the reason for treating this differently in LLVM?

Herald added a subscriber: pengfei. · View Herald TranscriptJan 14 2021, 7:15 AM

uweigand mentioned this in D94250: [SystemZ] Introducing assembler dialects for the Z backend.Jan 14 2021, 7:17 AM

In D82862#2498242, @uweigand wrote:
Hi @hans , we're having some issues with using the AssemblerDialect mechanism to support both the GNU assembler and the IBM HLASM assembler in the SystemZ back-end. See also the discussion started here: https://lists.llvm.org/pipermail/llvm-dev/2020-November/146875.html

One of the issues that showed up is what you're refering to above:

That flag however should not affect the *parsing* of inline assembly in the program.

I'm wondering why this is true. I mean, it is true that the flag currently does not affect parsing of inline asm, but I'm wondering whether it *should* be that way.

Note that the -x86-asm-syntax=intel LLVM flag is used to implement the -masm=intel clang command line option, which exists also in GCC and where hopefully the two compilers should be compatible. But in GCC, that flag is documented to affect parsing of inline assembly just like it affects output. See https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/x86-Options.html#x86-Options
-masm=dialect
Output assembly instructions using selected dialect. Also affects which dialect is used for basic asm (see Basic Asm) and extended asm (see Extended Asm). Supported choices (in dialect order) are ‘att’ or ‘intel’. The default is ‘att’. Darwin does not support ‘intel’.
What is the reason for treating this differently in LLVM?

The motivation for my change was really just to make ThinLTO compiles work the same as non-ThinLTO ones.

Maybe the fact that -x86-asm-syntax=intel doesn't affect inline asm is a bug. I wasn't aware that Clang and GCC's -masm= flags behaved differently in that way, but that certainly suggests there's a problem here.

(From clang-cl's point of view we just want to set the *output* dialect to Intel, but we could invent a different flag for that if necessary.)

What is the reason for treating this differently in LLVM?

I'm not sure if it is related to this. I think one difference is that LLVM is supporting MS inline assembly. Although it uses Intel dialect, it has different representation in input, output, clobber etc. with GCC'.

In D82862#2500038, @pengfei wrote:

What is the reason for treating this differently in LLVM?

I'm not sure if it is related to this. I think one difference is that LLVM is supporting MS inline assembly. Although it uses Intel dialect, it has different representation in input, output, clobber etc. with GCC'.

That is indeed another complication. On the one hand, we have two different inline asm formats, GNU assembly and MS assembly. These differ completely in how asm arguments (passed in from surrounding C/C++ code) are handled and therefore need to be parsed quite differently. On the other hand, we have the different assembler dialects (AT&T vs. Intel on x86), which affect how the single asm instructions are to be parsed.

Unfortunately, the distinction between the two seems to be somewhat muddled in LLVM at this point. In particular, MS assembly statements are represented at the LLVM IR level via the "inteldialect" token. This is a bit confusing because while MS asm indeed always uses the Intel dialect, we can also have GNU asm with the Intel dialect -- at least this is the case in GCC when using -masm=intel.

What I think we should have is:

the LLVM IR level distinction between GNU and MS asm statements; and in addition
for GNU asm statements, a target-specific selection of assembler variants, which may depend on the target triple and/or command line options like -masm=

If you look at AsmPrinter::emitInlineAsm, this is actually partially implemented already:

// The variant of the current asmprinter.
int AsmPrinterVariant = MAI->getAssemblerDialect();
AsmPrinter *AP = const_cast<AsmPrinter*>(this);
if (MI->getInlineAsmDialect() == InlineAsm::AD_ATT)
  EmitGCCInlineAsmStr(AsmStr, MI, MMI, AsmPrinterVariant, AP, LocCookie, OS);
else
  EmitMSInlineAsmStr(AsmStr, MI, MMI, AP, LocCookie, OS);

So here the LLVM IR marker is used to select between GCC and MS inline asm instructions, and for the GCC inline asm statements, the target is queried as to the proper asm dialect variant to use.

However, later on we have a

Parser->setAssemblerDialect(Dialect);

call, where the Dialect is again taken from the LLVM IR setting -- so for GNU asm, this gets now always set back to AT&T. It seems to me that this is simply wrong.

Then, we finally have module level inline asm statements, which are always treated as GNU style (although this may not really make any difference since module level statements do not have arguments anyway). Again because of the above setAssemblerDialect call they would therefore be also treated as AT&T dialect, which I guess is what @hans originally noticed.

In D82862#2498785, @hans wrote:

The motivation for my change was really just to make ThinLTO compiles work the same as non-ThinLTO ones.

Maybe the fact that -x86-asm-syntax=intel doesn't affect inline asm is a bug. I wasn't aware that Clang and GCC's -masm= flags behaved differently in that way, but that certainly suggests there's a problem here.

So I'm wondering, if I remove the above setAssemblerDialect line and revert this commit, we should have inline asm (both module-level and GNU function-leve) respect the target-selected asm dialect variant both for ThinLTO and non-ThinLTO, so they should match again. Would that also solve the problem you were originally tracking?

In D82862#2498785, @hans wrote:

The motivation for my change was really just to make ThinLTO compiles work the same as non-ThinLTO ones.

Maybe the fact that -x86-asm-syntax=intel doesn't affect inline asm is a bug. I wasn't aware that Clang and GCC's -masm= flags behaved differently in that way, but that certainly suggests there's a problem here.

So I'm wondering, if I remove the above setAssemblerDialect line and revert this commit, we should have inline asm (both module-level and GNU function-leve) respect the target-selected asm dialect variant both for ThinLTO and non-ThinLTO, so they should match again. Would that also solve the problem you were originally tracking?

Not completely, because clang-cl sets -x86-asm-syntax=intel to enable intel-style asm in assembly listing output. We'd have to find another way of doing that without affecting the inline asm dialect.

pengfei added subscribers: LiuChen3, FreddyYe.Jan 15 2021, 8:35 PM

In D82862#2500717, @hans wrote:

In D82862#2498785, @hans wrote:

The motivation for my change was really just to make ThinLTO compiles work the same as non-ThinLTO ones.

Maybe the fact that -x86-asm-syntax=intel doesn't affect inline asm is a bug. I wasn't aware that Clang and GCC's -masm= flags behaved differently in that way, but that certainly suggests there's a problem here.

So I'm wondering, if I remove the above setAssemblerDialect line and revert this commit, we should have inline asm (both module-level and GNU function-leve) respect the target-selected asm dialect variant both for ThinLTO and non-ThinLTO, so they should match again. Would that also solve the problem you were originally tracking?

Not completely, because clang-cl sets -x86-asm-syntax=intel to enable intel-style asm in assembly listing output. We'd have to find another way of doing that without affecting the inline asm dialect.

So why do you want GNU inline asm for clang-cl anyway? I thought the whole point of clang-cl was to be compatible with the Microsoft Visual Studio compiler, which I understand only supports the MS asm syntax?

In D82862#2512908, @uweigand wrote:

So why do you want GNU inline asm for clang-cl anyway? I thought the whole point of clang-cl was to be compatible with the Microsoft Visual Studio compiler, which I understand only supports the MS asm syntax?

We have users, in this case, I think it's V8, who would prefer to use gcc-style module level assembly if it is available. Their motivation is somewhat arbitrary, but generally, clang-cl supports a variety of extensions, some inherited from GCC, in all modes. Part of the point of switching compilers from MSVC to clang is to get access to those extensions.

In D82862#2513044, @rnk wrote:

In D82862#2512908, @uweigand wrote:

So why do you want GNU inline asm for clang-cl anyway? I thought the whole point of clang-cl was to be compatible with the Microsoft Visual Studio compiler, which I understand only supports the MS asm syntax?

We have users, in this case, I think it's V8, who would prefer to use gcc-style module level assembly if it is available. Their motivation is somewhat arbitrary, but generally, clang-cl supports a variety of extensions, some inherited from GCC, in all modes. Part of the point of switching compilers from MSVC to clang is to get access to those extensions.

I might be mistaken here but couldn't this be done by introducing an option and tying it to setting the AssemblerDialect field in the target's respective MCAsmInfo instance and then querying that? Wouldn't this remove the dependency on setting the dialect on the parser. Module level assembly are only going to be treated as GNU statements (from Uli's comment above), and if its just the case of using a different variant (within the gnu dialect), then this can be set in the respective (Target)MCAsmInfo instances appropriately?

ormris added a subscriber: ormris.Jan 25 2021, 9:59 AM

In D82862#2513044, @rnk wrote:

In D82862#2512908, @uweigand wrote:

So why do you want GNU inline asm for clang-cl anyway? I thought the whole point of clang-cl was to be compatible with the Microsoft Visual Studio compiler, which I understand only supports the MS asm syntax?

We have users, in this case, I think it's V8, who would prefer to use gcc-style module level assembly if it is available. Their motivation is somewhat arbitrary, but generally, clang-cl supports a variety of extensions, some inherited from GCC, in all modes. Part of the point of switching compilers from MSVC to clang is to get access to those extensions.

I see, thanks. I think what would make sense to preserve existing behavior while still allowing other platforms to use different dialects for GNU inline asm would be to move the decision which dialect to use for inline asm to the back-end. I've posted a patch along those lines as https://reviews.llvm.org/D95444 - I hope we can continue the discussion there.

Revision Contents

Path

Size

clang/

test/

CodeGen/

thinlto-inline-asm.c

21 lines

llvm/

lib/

Object/

ModuleSymbolTable.cpp

5 lines

Diff 274709

clang/test/CodeGen/thinlto-inline-asm.c

This file was added.

				// REQUIRES: x86-registered-target

				// RUN: %clang_cc1 -triple x86_64-pc-windows-msvc19.11.0 -emit-llvm-bc \
				// RUN: -flto=thin -mllvm -x86-asm-syntax=intel -v \
				// RUN: -o %t.obj %s 2>&1 \| FileCheck --check-prefix=CLANG %s
				//
				// RUN: llvm-lto2 dump-symtab %t.obj \| FileCheck --check-prefix=SYMTAB %s

				// Module-level inline asm is parsed with At&t syntax. Test that the
				// -x86-asm-syntax flag does not affect this.

				// CLANG-NOT: unknown token in expression
				// SYMTAB: D------X foo
				// SYMTAB: D------X bar

				void foo() {}

				asm(".globl bar \n"
				"bar: \n"
				" xor %eax, %eax\n"
				" ret \n");

llvm/lib/Object/ModuleSymbolTable.cpp

Show All 17 Lines
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDirectives.h"		#include "llvm/MC/MCDirectives.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCObjectFileInfo.h"		#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCParser/MCAsmParser.h"		#include "llvm/MC/MCParser/MCAsmParser.h"
#include "llvm/MC/MCParser/MCTargetAsmParser.h"		#include "llvm/MC/MCParser/MCTargetAsmParser.h"
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	initializeRecordStreamer(const Module &M,
std::unique_ptr<MCAsmParser> Parser(		std::unique_ptr<MCAsmParser> Parser(
createMCAsmParser(SrcMgr, MCCtx, Streamer, *MAI));		createMCAsmParser(SrcMgr, MCCtx, Streamer, *MAI));

std::unique_ptr<MCTargetAsmParser> TAP(		std::unique_ptr<MCTargetAsmParser> TAP(
T->createMCAsmParser(STI, Parser, *MCII, MCOptions));		T->createMCAsmParser(STI, Parser, *MCII, MCOptions));
if (!TAP)		if (!TAP)
return;		return;

		// Module-level inline asm is assumed to use At&t syntax (see
		// AsmPrinter::doInitialization()).
		Parser->setAssemblerDialect(InlineAsm::AD_ATT);

Parser->setTargetParser(*TAP);		Parser->setTargetParser(*TAP);
if (Parser->Run(false))		if (Parser->Run(false))
return;		return;

Init(Streamer);		Init(Streamer);
}		}

void ModuleSymbolTable::CollectAsmSymbols(		void ModuleSymbolTable::CollectAsmSymbols(
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines