This is an archive of the discontinued LLVM Phabricator instance.

[ms] [llvm-ml] Use default RIP-relative addressing for x64 MASM.
ClosedPublic

Authored by epastor on Jan 22 2020, 2:06 PM.

Details

Summary

When parsing 64-bit MASM, treat memory operands with unspecified base register as RIP-based.

Documented in several places, including https://software.intel.com/en-us/articles/introduction-to-x64-assembly: "Unfortunately, MASM does not allow this form of opcode, but other assemblers like FASM and YASM do. Instead, MASM embeds RIP-relative addressing implicitly."

Diff Detail

Event Timeline

epastor created this revision.Jan 22 2020, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2020, 2:06 PM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

epastor updated this revision to Diff 244898.Feb 16 2020, 1:48 PM

Rebase on top of corrected predecessors

epastor updated this revision to Diff 245989.Feb 21 2020, 2:03 PM

Add basic regression test

Yuck.

What's the plan for 64-bit inline asm?

epastor added a comment.EditedFeb 25 2020, 11:20 AM

Yuck.

There's a reasonable debate that this reflects the x64 instruction set better, but yeah.

What's the plan for 64-bit inline asm?

It should probably be the same, admittedly. To clarify: isParsingInlineAsm will only be true when parsing inline MS-style assembly? If so, I'll rename it to isParsingInlineMSAsm in this patch, and let the handling trigger off of both.

What's the plan for 64-bit inline asm?

It should probably be the same, admittedly. To clarify: isParsingInlineAsm will only be true when parsing inline MS-style assembly? If so, I'll rename it to isParsingInlineMSAsm in this patch, and let the handling trigger off of both.

Are there any cases where MS style assembly can be injected inline on x86_64? As far as I know, MSVC didn't support any inline assembly at all on platforms other than i386.

If that's the case, I think the sane thing would be to keep handling inline asm like it always has been, as gcc style inline assembly. There could, concievably, be projects that contains gcc style inline assembly within #ifdef __clang__ (as this probably worked so far).

What's the plan for 64-bit inline asm?

It should probably be the same, admittedly. To clarify: isParsingInlineAsm will only be true when parsing inline MS-style assembly? If so, I'll rename it to isParsingInlineMSAsm in this patch, and let the handling trigger off of both.

Are there any cases where MS style assembly can be injected inline on x86_64? As far as I know, MSVC didn't support any inline assembly at all on platforms other than i386.

If that's the case, I think the sane thing would be to keep handling inline asm like it always has been, as gcc style inline assembly. There could, concievably, be projects that contains gcc style inline assembly within #ifdef __clang__ (as this probably worked so far).

I'd argue that the sane thing would be if x86_64 inline asm when targeting the msvc abi worked like llvm-ml (and ml64) works. Anything else seems extremely surprising to me.

Anyhoo, if that's the long-term plan I'd try starting with the inline asm part of that change, and then make llvm-ml match that.

epastor added a comment.EditedFeb 25 2020, 11:41 AM

What's the plan for 64-bit inline asm?

It should probably be the same, admittedly. To clarify: isParsingInlineAsm will only be true when parsing inline MS-style assembly? If so, I'll rename it to isParsingInlineMSAsm in this patch, and let the handling trigger off of both.

Are there any cases where MS style assembly can be injected inline on x86_64? As far as I know, MSVC didn't support any inline assembly at all on platforms other than i386.

If that's the case, I think the sane thing would be to keep handling inline asm like it always has been, as gcc style inline assembly. There could, concievably, be projects that contains gcc style inline assembly within #ifdef __clang__ (as this probably worked so far).

It's true, MSVC doesn't support inline assembly at all on non-i386 platforms.

However, LLVM currently does support MS-style inline assembly on x86-64, if you use the MS-style asm blocks... it just doesn't quite work as expected. (We've seen a few bug reports to this effect, including http://llvm.org/PR44272 - several of which have said the option of MS-style inline assembly on x86-64 is a big bonus.) Handling that here would move it closer to fully working.

epastor updated this revision to Diff 246523.Feb 25 2020, 11:46 AM

Support default RIP-relative addressing on x64 MS-style assembly (including inline).

It's true, MSVC doesn't support inline assembly at all on non-i386 platforms.

However, LLVM currently does support MS-style inline assembly on x86-64, if you use the MS-style asm blocks... it just doesn't quite work as expected. (We've seen a few bug reports to this effect, including http://llvm.org/PR44272 - several of which have said the option of MS-style inline assembly on x86-64 is a big bonus.) Handling that here would move it closer to fully working.

Oh, ok - I see. In that case, keeping the gcc style inline assembly behaviour for asm() blocks and ms style for __asm {} probably is best - if it isn't a huge extra effort.

MS-style inline assembly on x64 should also benefit now.

rnk added inline comments.Feb 25 2020, 2:22 PM
llvm/include/llvm/MC/MCParser/MCAsmParser.h
168–169

I guess I'd prefer isParsingMSInlineAsm. Do you mind uploading that and committing it separately?

171

We already have ASMLexer::LexMasmIntegers. Can this be collapsed with that so that we don't have two indicators of "masm-ness"?

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2072–2073

Does this not make the check for isParsingInlindMSAsm() below redundant? If it's true, we always return here.

2078–2080

What ends up going wrong if we set the basereg to RIP in these conditions when it is not already set? I'm wondering if it would be cleaner to do the defaulting here, and then adjust the downstream code to treat RIP-based memory operands the same as no-base memory operands.

llvm/lib/Target/X86/AsmParser/X86Operand.h
61

I'd really like to get by without adding more operand members that don't correspond to parts of the x86 addressing mode.

epastor updated this revision to Diff 246790.Feb 26 2020, 10:49 AM
epastor marked 9 inline comments as done.

Responses to rnk's comments

epastor marked an inline comment as done.
epastor added inline comments.
llvm/include/llvm/MC/MCParser/MCAsmParser.h
168–169

Rename split out as a prerequisite differential: https://reviews.llvm.org/D75198

171

Since that's a Lexer control property, we might need to stick to setting it based on isParsingMasm() instead...

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2072–2073

Whoops. Yep, revised to set the default base register to RIP in CreateMemForInlineAsm instead.

2078–2080

I started by doing that, and we ended up with RIP as the BaseReg in many inappropriate situations. The issue is that this only applies when the Operand is matched as a non-absolute reference... which we can't tell at parse time.

It would have been much cleaner, but since the interpretation is context-dependent, I couldn't find a better option. I'm open to suggestions, though!

llvm/lib/Target/X86/AsmParser/X86Operand.h
61

Agreed. Unfortunately, the only obvious way to do that would be to match the x64 addressing mode more closely, and assume that all non-absolute memory references lacking a BaseReg are RIP-relative... which would break GNU-style assembly, since they made the decision to enforce explicit RIP-relative referencing.

epastor updated this revision to Diff 247499.Mar 1 2020, 5:51 AM
epastor marked an inline comment as done.

Rebasing to HEAD

thakis added inline comments.Apr 1 2020, 12:40 PM
llvm/lib/Target/X86/AsmParser/X86Operand.h
61

It sounds like ms-style 64-bit assembly with a msvc triple will already be incompatible with gnu-style assembly, due to the rip-relative differences. Is that incorrect?

If not, I think we should have the GNU mode and the msvc mode here, and not a third kinda-hybrid mode. It sounds like this is currently some hybrid mode.

If needed, we could have the full commandline flag / pragma dance for picking a mode for inline asm and only setting the default mode off the triple if use cases for that emerge, but that can happen later.

(Maybe I misunderstood what you're saying, though.)

epastor marked 2 inline comments as done.Apr 1 2020, 2:36 PM
epastor added inline comments.
llvm/lib/Target/X86/AsmParser/X86Operand.h
61

What I'm saying is that we can't know whether or not to assume RIP-relative (in MSVC mode) until AFTER the Operand has been matched to an instruction, as only MASM instructions that allow relative addressing will default to RIP-relative if no other base register is specified. However, once the Operand is matched, we lose information on whether it's come from MASM or GNU-style assembly. (We do allow GNU-style assembly for MSVC platforms already.) This means that when we're finally about to emit our instruction and need the base register, we don't know whether to translate a NoReg base as actually implying RIP.

To handle that, I opted to extend the Operand here with a DefaultBaseReg, which we use as our Base if no base register has been specified. When we parse GNU-style assembly, this defaults to NoReg... when we parse MASM-style assembly, it gets set to RIP.

epastor marked an inline comment as done.Apr 22 2020, 7:46 AM

Any further thoughts here?

epastor updated this revision to Diff 259287.Apr 22 2020, 7:49 AM

Rebase to HEAD

epastor marked an inline comment as done.Apr 22 2020, 8:01 AM
epastor added inline comments.
llvm/lib/Target/X86/AsmParser/X86Operand.h
61

One more try at clarifying this: MS-style assembly with an MSVC triple and GNU-style assembly with an MSVC triple both work now, and will both work in the future. The only difference is that MS-style assembly instructions that allow relative addressing will use a default RIP-relative address, while GNU-style assembly forces the user to always be explicit about RIP-relative addressing.

Since only relative-addressing contexts in MS-style assembly use the implicit RIP-relative convention, we can't just pick a mode and stick with it. Instead, we label our Operands with how their source assembly wants them treated if used in relative-addressing context without a base register. After we've matched in relative-addressing context, if the base register is not present, we implement the approach given by the source assembly.

epastor updated this revision to Diff 262117.May 5 2020, 8:08 AM

Rebase against HEAD

epastor updated this revision to Diff 264007.May 14 2020, 8:50 AM

Rebasing to HEAD

thakis accepted this revision.Jul 1 2020, 7:33 AM

I think being compatible with ml64 by default is good, and making progress here seems better than holding this up for even longer.

Going forward, we likely will want to have the full suite of toggles for this (pragma, compiler flag, accepting an alternative form with an explicit RIP, etc), but let's start with this.

This revision is now accepted and ready to land.Jul 1 2020, 7:33 AM
epastor updated this revision to Diff 274813.Jul 1 2020, 7:51 AM

Rebase to HEAD

This revision was automatically updated to reflect the committed changes.