This is an archive of the discontinued LLVM Phabricator instance.

[X86][AsmParser] re-introduce 'offset' operator
Needs RevisionPublic

Authored by coby on Sep 5 2017, 3:27 AM.

Details

Summary

Amend MS offset operator implementation, to more closely fit with its MS counterpart:

  1. InlineAsm: evaluate non-local source entities to their (address) location
  2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous
  3. address PR32530

Presented diffs are relative to D37412

InlineAsm tests: D37466

Diff Detail

Repository
rL LLVM

Event Timeline

coby created this revision.Sep 5 2017, 3:27 AM

Adding @mcrosier , as he originally introduced the feature

coby edited the summary of this revision. (Show Details)Sep 6 2017, 2:47 AM
rnk added inline comments.Sep 18 2017, 1:58 PM
test/MC/X86/pr32530.s
5

Can you add some more tests? I think this combines with everything: scale, index, base, segment, and integer displacement.

coby added a comment.Sep 19 2017, 12:50 AM

sure, will update shortly

coby updated this revision to Diff 117285.EditedOct 1 2017, 7:45 AM

Minimal source changes - introduced 'new' enum values definitions as adapted on dependent change (D37412)
Added few more assembly level testing, as requested by @rnk
Minimized 'offset_operator.ll' test-case, and auto-updated it checks

RKSimon edited edge metadata.Nov 2 2017, 3:13 AM

@coby What is happening with this patch?

coby added a comment.Nov 2 2017, 3:43 AM

@coby What is happening with this patch?

currently suspended due to a more dire matters.
nonetheless - as far as functionality is to be considered, and unless any of the listed reviewers will object, this one require no further development.

avt77 edited edge metadata.Nov 28 2017, 11:31 PM

@coby, is it possible to commit this patch? I'd like to fix https://bugs.llvm.org/show_bug.cgi?id=35115 based on this source.

mcrosier resigned from this revision.Apr 11 2018, 11:29 AM
thakis added a subscriber: thakis.Jul 2 2018, 8:05 AM

Thanks for working on this! I'd be happy to see this fixed. It also likely helps with PR36676.

I patched it in to play with it (it still applies with just two hunks not applying via patch, but manually applying them seems easy.) I got an assert on this code (which is from Windows.h):

$ cat repro.ii
void Int64ShrlMod32(unsigned __int64 Value) {
  __asm {
    mov edx, dword ptr [Value+4]
  }
}
$ bin/clang-7.0 -cc1 -triple i386-pc-windows-msvc19.11.0 -S -disable-free -main-file-name moved.i -mrelocation-model static -mthread-model posix -relaxed-aliasing -fmath-errno -masm-verbose -mconstructor-aliases -target-cpu pentium4 -mllvm -x86-asm-syntax=intel -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames -fms-volatile -fdiagnostics-format msvc -dwarf-column-info -debugger-tuning=gdb -target-linker-version 305 -momit-leaf-frame-pointer -ffunction-sections -coverage-notes-file /Users/thakis/src/llvm-build-goma/moved.gcno -resource-dir /Users/thakis/src/llvm-build-goma/lib/clang/7.0.0 -Os -Wno-msvc-not-found -WCL4 -fdebug-compilation-dir /Users/thakis/src/llvm-build-goma -ferror-limit 19 -fmessage-length 254 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.11 -fdelayed-template-parsing -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o moved.asm -x cpp-output repro.ii
Assertion failed: (AR.IntelExp.isValid() && "cannot write invalid intel expression"), function parseMSInlineAsm, file /Users/thakis/src/llvm-rw/lib/MC/MCParser/AsmParser.cpp, line 5709.
Stack dump:
0.	Program arguments: /Users/thakis/src/llvm-build-goma/bin/clang-7.0 -cc1 -triple i386-pc-windows-msvc19.11.0 -S -disable-free -main-file-name moved.i -mrelocation-model static -mthread-model posix -relaxed-aliasing -fmath-errno -masm-verbose -mconstructor-aliases -target-cpu pentium4 -mllvm -x86-asm-syntax=intel -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames -fms-volatile -fdiagnostics-format msvc -dwarf-column-info -debugger-tuning=gdb -target-linker-version 305 -momit-leaf-frame-pointer -ffunction-sections -coverage-notes-file /Users/thakis/src/llvm-build-goma/moved.gcno -resource-dir /Users/thakis/src/llvm-build-goma/lib/clang/7.0.0 -Os -Wno-msvc-not-found -WCL4 -fdebug-compilation-dir /Users/thakis/src/llvm-build-goma -ferror-limit 19 -fmessage-length 254 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.11 -fdelayed-template-parsing -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o moved.asm -x cpp-output repro.ii 
1.	repro.ii:5:1: current parser token '}'
2.	repro.ii:1:45: parsing function body 'Int64ShrlMod32'
3.	repro.ii:1:45: in compound statement ('{}')
0  clang-7.0                0x000000010d82ebc8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  clang-7.0                0x000000010d82ddc5 llvm::sys::RunSignalHandlers() + 85
2  clang-7.0                0x000000010d82f1d2 SignalHandler(int) + 258
3  libsystem_platform.dylib 0x00007fffc1fd2b3a _sigtramp + 26
4  libsystem_platform.dylib 0x00007fff532a277c _sigtramp + 2435644508
5  libsystem_c.dylib        0x00007fffc1e57420 abort + 129
6  libsystem_c.dylib        0x00007fffc1e1e893 basename_r + 0
7  clang-7.0                0x000000010d5320cd (anonymous namespace)::AsmParser::parseMSInlineAsm(void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, unsigned int&, unsigned int&, llvm::SmallVectorImpl<std::__1::pair<void*, bool> >&, llvm::SmallVectorImpl<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >&, llvm::SmallVectorImpl<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >&, llvm::MCInstrInfo const*, llvm::MCInstPrinter const*, llvm::MCAsmParserSemaCallback&) + 7981
8  clang-7.0                0x000000010e936e52 clang::Parser::ParseMicrosoftAsmStatement(clang::SourceLocation) + 5746
9  clang-7.0                0x000000010e92bf66 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::AllowedConstructsKind, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) + 1238
10 clang-7.0                0x000000010e92b938 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::AllowedConstructsKind, clang::SourceLocation*) + 264
11 clang-7.0                0x000000010e933739 clang::Parser::ParseCompoundStatementBody(bool) + 1497

@coby Are you still looking as this? If not (@coby hasn't been active on phab since 2017) @thakis would you be in a position to commandeer this and complete it?

@coby Are you still looking as this? If not (@coby hasn't been active on phab since 2017) @thakis would you be in a position to commandeer this and complete it?

I can try, but I don't know this code well. I've uploaded a rebased but otherwise unmodified version at https://reviews.llvm.org/D49648

The crash I posted above is because IntelExpr() used to map a Scale of 0 to 1. If I add that back in, my crash goes away.

However, offset is also supposed to work with quoted symbols. If I allow that like so:

+  if (!isParsingInlineAsm()) {
+    if ((getTok().isNot(AsmToken::Identifier) &&
+         getTok().isNot(AsmToken::String)) ||
+        getParser().parsePrimaryExpr(Val, End))
+      return Error(Start, "unexpected token!");

then one of the two examples in PR36676 starts working, but the other asserts due to an unknown relocation. The problem is that offset is modeled as a X86Operand::CreateImm() with a fixup, but since intel asm instrs don't have sizes the logic sees that the imm is 0 and hence creates a 16-bit imm and we then get a 2 byte reloc, which aren't implemented. This is wrong anyhow since push offset foo should create a pointer size relocation. X86AsmParser::ParseIntelOperand() could not create an imm but a memory ref if SM.isOffsetOperator() and that makes things go, but then the push is encoded with a PUSH32rmm instead of a PUSH32i8, which is a byte larger and different from what clang-cl writes if you don't pass /FA, so I'd like to fix that too. I haven't yet figured out how :-)

Ping @thakis do you have any updates on this?

Is this enough to keep using CreateImm and fix the push issue

@@ -3163,7 +3202,7 @@ bool X86AsmParser::MatchAndEmitIntelInstruction(SMLoc IDLoc, unsigned &Opcode,
       // If it's not a constant fall through and let remainder take care of it.
       const auto *CE = dyn_cast<MCConstantExpr>(X86Op->getImm());
       unsigned Size = getPointerWidth();
-      if (CE &&
+      if (!CE ||
           (isIntN(Size, CE->getValue()) || isUIntN(Size, CE->getValue()))) {
         SmallString<16> Tmp;
         Tmp += Base;

This extends the hack we have to use pointer sized push to also do it when the immediate can't be resolved to a constant.

thakis added a comment.Feb 1 2019, 1:14 PM

(I rebased this again and uploaded it with the assert fix at D49648. A few other tests fail in check-clang with it, and apparently clang-cl /FA -m32 no longer writes asm with offset, so some of my old tests no longer work. Will look more over the weekend.)

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 1:14 PM
thakis added a comment.Feb 4 2019, 8:23 AM

There are at least 3 things missing to get check-clang to work:

  1. clang currently supports offset localvar, which cl.exe doesn't support. This is currently rewritten to the stack address; after this patch it instead becomes offset localvar but localvar isn't an existing symbol of course. So for locals, this would still have to do the rewrite.
  1. For offset Foo::bar, clang used to rewrite that to a local symbol, not it just prints offset Foo::bar – I think we probably need to put the mangled name there instead.
  1. For __asm mov [eax], offset var we get ambiguous operand size for instruction 'mov' now; that needs fixing.

I don't know if I'll have time soon to look into these.

reupen added a subscriber: reupen.Feb 21 2019, 1:43 PM
epastor added a subscriber: epastor.EditedDec 12 2019, 1:59 PM

There are at least 3 things missing to get check-clang to work:

  1. clang currently supports offset localvar, which cl.exe doesn't support. This is currently rewritten to the stack address; after this patch it instead becomes offset localvar but localvar isn't an existing symbol of course. So for locals, this would still have to do the rewrite.
  1. For offset Foo::bar, clang used to rewrite that to a local symbol, not it just prints offset Foo::bar – I think we probably need to put the mangled name there instead.
  1. For __asm mov [eax], offset var we get ambiguous operand size for instruction 'mov' now; that needs fixing.

I've got a draft that fixes (1) and (2), handling both as operands. It avoids having to special-case local symbols too deeply, handling everything as an operand - local symbols have their location on the stack passed in through a register (and so can't be included in compound expressions), global symbols come in as an immediate (and so can be included in compound expressions).

(3) is interesting. Looking at https://godbolt.org/z/QQ9ASx, currently clang rewrites offset x to an explicitly pointer-sized register, with the choice of register hard-wired to RBX/EBX. Now that I'm handling local symbol offset as an operand, we get operand size ambiguity... which seems appropriate to me. Would it be reasonable to consider this the new correct behavior, or do we need to preserve it as is?

thakis requested changes to this revision.Sep 23 2021, 12:23 PM
This revision now requires changes to proceed.Sep 23 2021, 12:23 PM