This is an archive of the discontinued LLVM Phabricator instance.

[X86][MC] Fix parsing Intel syntax indirect branch with symbol only
ClosedPublic

Authored by alvinhochun on May 1 2023, 4:31 AM.

Details

Summary

Clang on Windows targets often requires indirect calls through the
import address table (IAT), and also .refptr stubs for MinGW target.
On 32-bit this generates assembly in the form of
call dword ptr [__imp__func], which MC had failed to handle correctly.
64-bit targets are not affected because rip-relative addressing is used.

Reported on: https://github.com/llvm/llvm-project/issues/62010

Depends on D149695, D149920

Diff Detail

Event Timeline

alvinhochun created this revision.May 1 2023, 4:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2023, 4:31 AM
alvinhochun requested review of this revision.May 1 2023, 4:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2023, 4:31 AM
alvinhochun edited the summary of this revision. (Show Details)

Format the commit message better

alvinhochun retitled this revision from [X86][MC] Fix parsing Intel syntax indirect branch with identifier only to [X86][MC] Fix parsing Intel syntax indirect branch with symbol only.May 1 2023, 5:26 AM
epastor accepted this revision.May 1 2023, 6:13 AM

Looks about right to me, as an extension of D124413 - but I would appreciate it if ayzhao@ can weigh in here, since that was his patch.

This revision is now accepted and ready to land.May 1 2023, 6:13 AM

I'm aware of the check-clang failure in CodeGen/ms-inline-asm-functions.c. It seems to be caused by the MS inline assembly being translated in a way that breaks with this patch, which means I may have to look into fixing that code too. That also revealed an issue with my patch regarding how call [offset function] is handled -- apparently GAS treats it the same as call function. I'll try to get this fixed.

Turns out this change affects MS inline assembly and possibly other inline assemblies as well. Looking at ms-inline-asm-functions.c (on Compiler Explorer - https://godbolt.org/z/3xfW75Go4):

  • Direct call: __asm call k; generates the inline assembly call dword ptr ${1:P} (which seems wrong, should be call ${1:P} instead?), which is converted into call dword ptr _k, gets compiled as call dword ptr [_k] with this patch (incorrect).
  • Call through a global function pointer, which is marked as a "broken case" in the test: __asm call kptr; generates the inline assembly call $4, which is converted into call [offset _kptr], gets compiled as call dword ptr [_kptr] with this patch, which actually fixes the test case, except that call [offset _kptr] doesn't seem right (see below).
  • Direct jump: __asm jmp k; generates the inline assembly jmp dword ptr ${1:P}, same issue as direct call.

Among these, the usage of operator offset in call seems questionable. I checked the behaviour of other assemblers:

  • NASM:
    • Does not support offset operator.
  • MASM (ml) Version 14.35.32215.0:
    • call offset fn is oddly assembled as push cs \n call fn.
    • call [offset fn] generates error A2023:instruction operand must have size.
    • call dword ptr [offset fn] is assembled as call dword ptr [fn].
  • GAS (Intel syntax):
    • Both call offset fn and call [offset fn] are assembled the same way as call fn, which seems odd.
    • call dword ptr [offset fn] is assembled as call dword ptr [fn].

I think this issue may end up being too much for me to handle. I already spent more time than I wanted tracing the code and trying various hacks, but couldn't quite come up with proper fixes. Will someone be willing to take over this?

alvinhochun requested review of this revision.May 2 2023, 6:53 AM
MaskRay added a comment.EditedMay 2 2023, 1:20 PM

It looks we accept a case (t2, w/o or with the patch) that GNU as rejects.

% as -msyntax=intel =(printf 'call qword ptr [rip + fn_ref]') -o gas.o
/tmp/zshjfuv2e: Assembler messages:
/tmp/zshjfuv2e: Warning: end of file not at end of a line; newline inserted
/tmp/zshjfuv2e:1: Error: invalid operands (*UND* and *UND* sections) for `+'

We parse it as callq *fn_ref(%rip).

It looks we accept a case (t2, w/o or with the patch) that GNU as rejects.

% as -msyntax=intel =(printf 'call qword ptr [rip + fn_ref]') -o gas.o
/tmp/zshjfuv2e: Assembler messages:
/tmp/zshjfuv2e: Warning: end of file not at end of a line; newline inserted
/tmp/zshjfuv2e:1: Error: invalid operands (*UND* and *UND* sections) for `+'

It does work if you add .intel_syntax noprefix to the assembly or pass -mnaked-reg to as.

Turns out this change affects MS inline assembly and possibly other inline assemblies as well. Looking at ms-inline-asm-functions.c (on Compiler Explorer - https://godbolt.org/z/3xfW75Go4):

  • Direct call: __asm call k; generates the inline assembly call dword ptr ${1:P} (which seems wrong, should be call ${1:P} instead?), which is converted into call dword ptr _k, gets compiled as call dword ptr [_k] with this patch (incorrect).
  • Call through a global function pointer, which is marked as a "broken case" in the test: __asm call kptr; generates the inline assembly call $4, which is converted into call [offset _kptr], gets compiled as call dword ptr [_kptr] with this patch, which actually fixes the test case, except that call [offset _kptr] doesn't seem right (see below).
  • Direct jump: __asm jmp k; generates the inline assembly jmp dword ptr ${1:P}, same issue as direct call.

Among these, the usage of operator offset in call seems questionable. I checked the behaviour of other assemblers:

  • NASM:
    • Does not support offset operator.
  • MASM (ml) Version 14.35.32215.0:
    • call offset fn is oddly assembled as push cs \n call fn.
    • call [offset fn] generates error A2023:instruction operand must have size.
    • call dword ptr [offset fn] is assembled as call dword ptr [fn].
  • GAS (Intel syntax):
    • Both call offset fn and call [offset fn] are assembled the same way as call fn, which seems odd.
    • call dword ptr [offset fn] is assembled as call dword ptr [fn].

I think this issue may end up being too much for me to handle. I already spent more time than I wanted tracing the code and trying various hacks, but couldn't quite come up with proper fixes. Will someone be willing to take over this?

OK. I think after D149695 (landed), D149920, and this patch D149579, these cases should be correct.

MaskRay added inline comments.May 4 2023, 8:52 PM
llvm/test/MC/X86/intel-syntax-branch.s
49

ICC and MSVC parse this differently.

Is this syntax valid?

alvinhochun edited the summary of this revision. (Show Details)

Rebase and add test cases with offset operator, and some TODOs/FIXMEs for comment

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 8:15 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
alvinhochun marked an inline comment as not done.May 5 2023, 8:18 AM

OK. I think after D149695 (landed), D149920, and this patch D149579, these cases should be correct.

Thank you so much for looking into this!

I have one question in inline comment.

llvm/test/MC/X86/intel-syntax-branch.s
49

MASM ml.exe assembles this as jmp dword ptr [fn_ref] in my test, so does GAS. I don't know how valid this syntax is though.

62–68

About the call [offset fn_ref] case, do you think we should reject it since MASM does not accept this syntax and GAS appears to behave oddly with it? The ms-inline-asm-functions.c test does generate call [offset _kptr] so this will also need to be changed.

MaskRay added inline comments.May 5 2023, 10:59 AM
llvm/test/MC/X86/intel-syntax-branch.s
49

LG. cl.exe compiles void (*kptr)(); ... __asm call kptr; to call DWORD PTR _kptr, so it seems that the bracket can be omitted.

62–68

Yes, I think we should reject call [offset fn_ref]. Confirmed with a MinGW dev (lh_mouse) that this is invalid.

MaskRay accepted this revision.May 5 2023, 11:03 PM

Thanks!

This revision is now accepted and ready to land.May 5 2023, 11:03 PM
MaskRay added inline comments.May 6 2023, 1:04 PM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2660

I think this patch can be pushed now.

Remove #if 0 from this patch. Just add the lines in D150048.

MaskRay added inline comments.May 6 2023, 1:05 PM
llvm/test/MC/X86/intel-syntax-branch-fail.s
1 ↗(On Diff #519869)

Don't create a separate test for negative tests. Use the --defsym=ERR=1 and .ifdef ERR pattern to keep positive and negative tests in one file.

MaskRay added inline comments.May 6 2023, 1:07 PM
llvm/test/MC/X86/intel-syntax-branch-fail.s
5 ↗(On Diff #519869)

[[#@LINE+-1]] => [[#@LINE-1]]

Remove the file name intel-syntax-branch-fail.s.

I'd remove TODO-CHECK in this patch and add `// CHECK: :[[#@LINE+-1]]:1: error: OFFSET` operator cannot be used in an unconditional branch`` in D150048.

Apply suggestions: remove TODO, put negative tests in the same file as positive tests.