This is an archive of the discontinued LLVM Phabricator instance.

[MC][X86] Allow assembler variable assignment to register name.
ClosedPublic

Authored by niravd on May 30 2018, 11:08 AM.

Details

Summary

Allow extended parsing of variable assembler assignment syntax and modify X86 to permit
VAR = register assignment. As we emit these as .set directives when possible, we inline
such expressions in output assembly.

Fixes PR37425.

Diff Detail

Repository
rL LLVM

Event Timeline

niravd created this revision.May 30 2018, 11:08 AM
rnk added a comment.May 30 2018, 11:20 AM

This breaks the invariant that an MCSymbol is always a relocatable symbol. I wonder if the better way to do this would be to pretend what we actually saw was a macro definition when the RHS of an assignment doesn't parse as an MCExpr, which is basically a language for relocatable expressions.

I'm not familiar enough with gas syntax to know what the equivalent directives would be. I see the #define suggestion in the bug, but is there a gas-only way to express that? We seem to support macros with arguments, but is there a way to express argument-less macros?

In D47545#1116590, @rnk wrote:

This breaks the invariant that an MCSymbol is always a relocatable symbol. I wonder if the better way to do this would be to pretend what we actually saw was a macro definition when the RHS of an assignment doesn't parse as an MCExpr, which is basically a language for relocatable expressions.

That's part of why these symbols are inlined and assignments are elided in the assembly output. I agree it'd better to other variable layers, but neither the C++ processor or assembler macro work. For the former case because of order of evaluation compared to assembler functions; for the latter because assembler macro handling assumes instantiated macros are complete assembly statements.

nickdesaulniers added inline comments.
llvm/lib/Target/X86/MCTargetDesc/X86MCExpr.h
15 ↗(On Diff #149166)

wrong guard?

niravd updated this revision to Diff 149196.May 30 2018, 2:03 PM
niravd edited the summary of this revision. (Show Details)

Address Nick's comments and restrict this to change to V = E statements

rnk added inline comments.May 30 2018, 2:21 PM
llvm/lib/MC/MCParser/AsmParser.cpp
1114–1121 ↗(On Diff #149196)

Given that we already have logic like this, I'm OK with this. I guess gnu as treats = assignments much more textually than we do.

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

Does this generalize to more than just registers? Could it include things like foo = $42, or foo = 0x40(%rcx) as some kind of alias for an accessor? We might want to allow these things so that we don't have to do this fire drill again the next time a Linux developer writes some creative GNU as.

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

or tries to do this with another ISA, like arm64. I guess I'm curious if this is needs to be implemented in the other parsers as well? We only see the issue on x86 in the kernel currently, but I'd think this kind of functionality would be ISA independent?

void added inline comments.May 30 2018, 3:19 PM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

Something like foo = 0x40(%rcx) should be an illegal expression according to their own documentation....but then again so should using a register as a primary symbol. :-/

niravd added inline comments.May 31 2018, 7:55 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

Extending to generic instruction operands seems relatively straightforward, but given that it's illegal and there's really only a few cases we need to support, I'd prefer to keep this as simple as possible and deal with it as it comes along.

Hopefully, once we have clang consistently building the linux kernel with the integrated assembler (which I believe is now just this) additional things we need to workaround in this we should get quick push back and that'll be enough social pressure to prevent new violations needing compiler support vs. a code rewrite.

niravd added a comment.Jun 4 2018, 8:31 AM

Since we've only found 2 cases needing this extension and none requiring us to accept a larger set of expressions, I think we should defer any more work until we find such a case in the wild.

Reid, can I get an LGTM?

rnk added inline comments.Jun 4 2018, 10:46 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

I'm still curious to know what gas actually supports in practice before we commit to limiting ourselves to things that look like registers. If gas really treats this as a textual macro, that's probably the way we should go.

They may document that foo = 0x40(%rcx) is illegal, but documentation has been known to be wrong.

void added inline comments.Jun 4 2018, 11:03 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

In an ideal world, we could do that because it would be documented. But as you mentioned their documentation seems faulty. In order to know exactly what gas supports, we would need to unravel its code. That's a fairly complex task, which would take a lot of time and may not be useful.

In this case, it used to be an acceptable practice to allow registers in assembly macros, but it was uncommon. It's possible that its support in gas was unknown to the documentation writers.

IMO, we should go with their documentation and change only when we detect a deviation from it.

niravd marked an inline comment as done.Jun 4 2018, 11:29 AM
niravd added inline comments.
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

From testing, it's not just textual replacement. It looks like we always interpret assigned variables either as registers or memory references.

foo = <Register> -> valid.

foo = ($<immediate expression>)
foo = $<Immediate expression>
foo = <immediate expression>

Both valid. Always interpreted as memory access i.e. xorl $eax, foo translates to xorl $eax, (<immediate>)

foo = 40(%rcx)
foo = 40 + (%rcx)
foo = 40 + %rcx

All invalid.

That means, the only things we're missing from gas is the parsing of memory references off of immediate (4 + $100), but this isn't parsed currently in straight line code, e.g., "xor %eax, ($40)".

rnk accepted this revision.Jun 4 2018, 1:01 PM

lgtm

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

@void I think my perspective is informed by doing lots of MS compatibility work and triaging user complaints about incompatibility. I lean more towards doing all the experiments we can to try to understand the principle of what gas (or GCC, or MSVC) is really trying to do. It usually saves time in the long run if we can discover the underlying principle and implement it if it is not too far from our existing model. Knowing what's a bridge too far is tough, though.

@niravd Thanks for doing the experiments! The examples convince me that gas users aren't likely to rely on this extension for much more than register names. ($imm) is just not that useful, and is easy to rewrite as an absolute symbol definition. Actually, isn't that just an absolute symbol? Do we not already handle that? Whatever, it's not relevant to this change.

This revision is now accepted and ready to land.Jun 4 2018, 1:01 PM
void added inline comments.Jun 4 2018, 1:43 PM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2197 ↗(On Diff #149196)

@rnk For the record, I don't disagree with you. And I certainly would like to have full compatibility. One thing we're doing is trying to compile the Linux kernel with the integrated assembler. They have a lot of assembly that does wonky thing. I think it will be a good test for Clang. However, we do need to progress past this bug to find more incompatibilities...It's a bit of a chicken-'n-egg thing, unfortunately. :-(

This revision was automatically updated to reflect the committed changes.