This is an archive of the discontinued LLVM Phabricator instance.

[llvm-ml] Improve indirect call parsing
ClosedPublic

Authored by ayzhao on Apr 25 2022, 1:08 PM.

Details

Summary

In MASM, if a QWORD symbol is passed to a jmp or call instruction in
64-bit mode or a DWORD or WORD symbol is passed in 32-bit mode, then
MSVC's assembler recognizes that as an indirect call. Additionally, if
the operand is qualified as a ptr, then that should also be an indirect
call.

Furthermore, in 64-bit mode, such operands are implicitly rip-relative
(in fact, MSVC's assembler ml64.exe does not allow explicitly specifying
rip as a base register.)

To keep this patch managable, this patch does not include:

  • error messages for wrong operand types (e.g. passing a QWORD in 32-bit mode)
  • resolving indirect calls if the symbol is declared after it's first use (llvm-ml currently only runs a single pass).
  • imlementing the extern keyword (required to resolve https://crbug.com/762167.)

This patch is likely missing a bunch of edge cases, so please do point
them out in the review.

Diff Detail

Event Timeline

ayzhao created this revision.Apr 25 2022, 1:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 1:08 PM
ayzhao requested review of this revision.Apr 25 2022, 1:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 1:08 PM
ayzhao updated this revision to Diff 425011.Apr 25 2022, 1:15 PM

Remove extra spaces and leftover debugging lines

MaskRay added inline comments.
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
1110

Fix the function naming to use lowerCase while updating the signature.

llvm/test/tools/llvm-ml/unconditional_branch_x64.asm
12

t0 alone may not be unique in the output.

62

Consider -NEXT: if appropriate.

epastor added inline comments.Apr 25 2022, 1:39 PM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2642

Why do we need to set the DefaultBaseReg unconditionally for the unconditional branches? Aren't they only implicitly RIP-relative if referencing a named symbol (i.e., one with known type, which is what the SM.getElementSize() below is testing)?

(Also, I'm a little surprised if the conditional branches have different handling, though anything's possible.)

2655

Rather than lookUpData, can't we just use the type info that's already available in SM? (getType if you want the name, and others if you want other information.) There's even a chance this might account for the QWORD PTR "cast" already.

2658

Can't we check the type size, rather than the name? If it's actually special-cased on these types alone, then maybe we should consider marking the type info with a special flag rather than handling the mnemonic here.

hans added inline comments.Apr 26 2022, 5:59 AM
llvm/test/tools/llvm-ml/unconditional_branch_x64.asm
6

probably a silly question, but what's the 28 for?

70

These are great test cases!

Can we also have some tests where instead of jmp or call, "fn", "[fn]" etc. are moved into a register?
And should there be some tests with conditional jumps too?
It seems from this patch that they might be different, but in either case it seems good to cover that with tests.

llvm/test/tools/llvm-ml/unconditional_branch_x86.asm
2

Since they're mostly the same, maybe instead of having separate test files for 32- and 64-bit, maybe it would be simpler to combine them into one. file with two "RUN" lines.
You could use a macro to define e.g. "WORD" to "qword" or "dword", and use "CHECK32" and "CHECK64" for when the expectations are different.

ayzhao updated this revision to Diff 425347.Apr 26 2022, 5:10 PM
ayzhao marked 9 inline comments as done.

code review feedback

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
2642

Why do we need to set the DefaultBaseReg unconditionally for the unconditional branches? Aren't they only implicitly RIP-relative if referencing a named symbol (i.e., one with known type, which is what the SM.getElementSize() below is testing)?

Done.

The difficulty here is that in direct jumps, the operand is a relative displacement, not a memory address. My original thought was that direct jumps are inherently rip relative (e.g. jmp 0xAB means jump to the address [rip + 0xAB], not jump to 0xAB], and direct jumps and indirect jumps (e.g. jump to the address [rip + 0xAB] vs jump to the address stored at memory location [rip + 0xAB] would be differentiated by BaseReg ad MaybeDirectBranchDest. In hindsight though, this would've meant that we have to set DefaultBaseReg = X86::RIP for all branching instructions in 64-bit mode, which feels pretty excessive.

To complicate things, there are also far jumps, which apparently do jump to absolute addresses. This patch ignores those.

(Also, I'm a little surprised if the conditional branches have different handling, though anything's possible.)

All branching instructions that I've encountered only accept offsets as parameters[0][1], so I _think_ only call and jmp can do an indirect branch. OTOH, I could be completely missing something.

[0]: https://www.felixcloutier.com/x86/jcc
[1]: https://www.felixcloutier.com/x86/loop:loopcc

2658

I played around with MSVC ml and it turns out that it is entirely dependent on the type size rather than the name, so, for example, an 8 byte sized struct would still work. I added a test case for this scenario.

llvm/test/tools/llvm-ml/unconditional_branch_x64.asm
6

Dummy value

70

Can we also have some tests where instead of jmp or call, "fn", "[fn]" etc. are moved into a register?

Already covered in rip-relative-addressing.asm

And should there be some tests with conditional jumps too?

Done (though conditional jumps only accept labels)

llvm/test/tools/llvm-ml/unconditional_branch_x86.asm
2

Done.

At this time, type aliases are defined with the TYPEDEF keyword, which is not yet implemented, so I ended up splitting it into multiple blocks.

epastor accepted this revision.Apr 27 2022, 6:59 AM

Looks good to me, but probably make sure Hans agrees first.

llvm/test/tools/llvm-ml/indirect_branch.asm
189 ↗(On Diff #425347)

nit: let's add the end-of-file newline.

This revision is now accepted and ready to land.Apr 27 2022, 6:59 AM
hans added a comment.Apr 27 2022, 8:04 AM

The logic seems pretty tricky to me, but I defer to Eric for that.
And I'm slowly coming to terms that brackets doesn't seem to mean anything to masm, it's really just about the operand types.

I'm basically happy, but maybe we can make the test file clearer by defining some macro with the /D option?

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
4727

This change looks like a no-op?

llvm/lib/Target/X86/AsmParser/X86Operand.h
76

A comment would be good.

llvm/test/tools/llvm-ml/indirect_branch.asm
186 ↗(On Diff #425347)

Do we give a decent error for "je fn_ref"?

llvm/test/tools/llvm-ml/unconditional_branch_x86.asm
2

Sorry, I was thinking of "macro" as in those that can be defined on the command-line, e.g. "ml /DWORD=dword". I didn't check if llvm-ml supports the /D flag, but in case it does that might be easier?

ayzhao updated this revision to Diff 425580.Apr 27 2022, 11:29 AM
ayzhao marked 3 inline comments as done.

more code review comments

ayzhao marked an inline comment as done.Apr 27 2022, 11:30 AM
ayzhao added inline comments.
llvm/test/tools/llvm-ml/indirect_branch.asm
186 ↗(On Diff #425347)

llvm-ml seems to accept je fn_ref by treating fn_ref as a label. My guess is that llvm-ml (and perhaps other assembler dialects) are more tolerant of what can be treated as a label.

This is definitely an inconsistency, but fixing it feels like it's out of the scope of this patch.

hans accepted this revision.Apr 28 2022, 4:09 AM

lgtm

MaskRay accepted this revision.Apr 28 2022, 8:51 AM
MaskRay added inline comments.
llvm/test/tools/llvm-ml/indirect_branch.asm
7 ↗(On Diff #425580)

Two-column indentation is much more common.

ayzhao updated this revision to Diff 425815.Apr 28 2022, 9:28 AM

indirect_branch_asm spacing

ayzhao marked an inline comment as done.Apr 28 2022, 9:29 AM
ayzhao updated this revision to Diff 425822.Apr 28 2022, 9:49 AM

one more indent

This revision was landed with ongoing or failed builds.Apr 28 2022, 10:17 AM
This revision was automatically updated to reflect the committed changes.