This is an archive of the discontinued LLVM Phabricator instance.

[x86][inline-asm]Extend support for memory reference expression
AbandonedPublic

Authored by coby on Jul 23 2017, 6:05 AM.

Details

Summary

Extend support for expressions which represent a variable access in ms-style inline-asm, to allow the incorporation of both registers and variables.
Currently, expression such as 'asm mov eax, [var + eax]' would have been reduced to the (equivalent of) 'asm mov eax, [var]'
This patch amends it
clang counterpart: D35775

Diff Detail

Repository
rL LLVM

Event Timeline

coby created this revision.Jul 23 2017, 6:05 AM
rnk edited edge metadata.Jul 24 2017, 3:24 PM

I would really prefer to see fewer rewrites and more pass through of inline assembly to MC. The way we parse inline asm in two phases is a confusing source of bugs. It also makes it harder to test the Intel asm parser, since the more rewrites we do the more cross-repository tests we have to add.

include/llvm/MC/MCParser/MCTargetAsmParser.h
77

format

test/MC/X86/intel-syntax-error.s
21–24

I see, these are in fact valid x86. :) Can you move them to a valid test case so we don't lose coverage, like test/MC/X86/intel-syntax.s?

coby added a comment.Jul 25 2017, 12:57 AM

I would really prefer to see fewer rewrites and more pass through of inline assembly to MC
As do I. What are you suggesting? Keep rewrites to the absolute necessary minimum?
Pepole do some funky stuff on MS-Inline-Asm, a large portion of it should be 'legalized' in order for the Assembler to be able to digest.
The other way around is to allow this nonsense in the Assembler as well.
Both options are bad, but I believe the latter is a bit more.

If it's of any comfort - have this 'new' rewrite is to be allowed - AOK_Imm/AOK_ImmPrefix/AOK_SizeDirective can be implemented as a special case of it (AOK_DotOperator should be cancelled).
Net effect will remain the same - but it'll be (possibly) less of a mess of code.

coby updated this revision to Diff 108014.Jul 25 2017, 1:01 AM

Addressed @rnk 's (most of) comments

rnk added a comment.Jul 26 2017, 4:21 PM
In D35774#819706, @coby wrote:

I would really prefer to see fewer rewrites and more pass through of inline assembly to MC
As do I. What are you suggesting? Keep rewrites to the absolute necessary minimum?
Pepole do some funky stuff on MS-Inline-Asm, a large portion of it should be 'legalized' in order for the Assembler to be able to digest.
The other way around is to allow this nonsense in the Assembler as well.
Both options are bad, but I believe the latter is a bit more.

If it's of any comfort - have this 'new' rewrite is to be allowed - AOK_Imm/AOK_ImmPrefix/AOK_SizeDirective can be implemented as a special case of it (AOK_DotOperator should be cancelled).
Net effect will remain the same - but it'll be (possibly) less of a mess of code.

Hm, rewriting everything in terms of a general AOK_IntelExpr does sound better, even if it is kind of the opposite direction from what I was thinking. :) What I'm imagining is that any time there is a symbolic reference to C/C++ enums, globals, or variables, we basically fully parse that operand and decompose it into the usual parts of an X86 memory operand: segment, base, index, scale, displacement, etc, check that those parts make sense, and emit a completely rewritten operand. Right now the parsing code in MC has a hard time deciding what's legal because it doesn't know whether certain C/C++ variable references refer to stack locals or global variables. A local variable will require using a base register, but a global will not, but on x86_64, a global will require %rip-relative addressing, which can't use base+index registers.

Doing this exposes a lot of x86 details that aren't really supposed to be in the supposedly target neutral asmparser interface. That battle may already be lost, though.

coby marked 2 inline comments as done.EditedJul 30 2017, 3:07 AM
In D35774#822227, @rnk wrote:
In D35774#819706, @coby wrote:

I would really prefer to see fewer rewrites and more pass through of inline assembly to MC
As do I. What are you suggesting? Keep rewrites to the absolute necessary minimum?
Pepole do some funky stuff on MS-Inline-Asm, a large portion of it should be 'legalized' in order for the Assembler to be able to digest.
The other way around is to allow this nonsense in the Assembler as well.
Both options are bad, but I believe the latter is a bit more.

If it's of any comfort - have this 'new' rewrite is to be allowed - AOK_Imm/AOK_ImmPrefix/AOK_SizeDirective can be implemented as a special case of it (AOK_DotOperator should be cancelled).
Net effect will remain the same - but it'll be (possibly) less of a mess of code.

Hm, rewriting everything in terms of a general AOK_IntelExpr does sound better, even if it is kind of the opposite direction from what I was thinking. :) What I'm imagining is that any time there is a symbolic reference to C/C++ enums, globals, or variables, we basically fully parse that operand and decompose it into the usual parts of an X86 memory operand: segment, base, index, scale, displacement, etc, check that those parts make sense, and emit a completely rewritten operand. Right now the parsing code in MC has a hard time deciding what's legal because it doesn't know whether certain C/C++ variable references refer to stack locals or global variables. A local variable will require using a base register, but a global will not, but on x86_64, a global will require %rip-relative addressing, which can't use base+index registers.

Doing this exposes a lot of x86 details that aren't really supposed to be in the supposedly target neutral asmparser interface. That battle may already be lost, though.

This battle can fairly been viewed as lost, until a 'global' refactoring/reconsideration will be announced (if ever) regarding the way targets' specific parsers and AsmParser are to integrate one with each other, in terms of responsibilities. those 'leaks' you've mentioned from X86 to AsmParser are a fine exhibition of it. I have some ideas regarding how we may overcome/'hide' such phenomena in a way which should make rewrites unnecessary, but they are currently seem 'too heavy' (and reaching code parts wihch I should get more familiar with ere tempering). perhaps a llvm-dev discussion should be commenced? to either find a more clean and elegant way, or setting 'guidelines' which will confine this witchcraft to an agreed upon, tolerable minimum.
In the meantime - I deem we should 'allow' ourselves to 'rewrite' only where no scale-able alternative can be conceived, and try to concentrate/uniform all rewrites under one hood (in an utopic future - it should ease rewrites elimination)

coby added a comment.Aug 10 2017, 6:01 AM

lads?
anyone?

rnk added inline comments.Aug 14 2017, 1:31 PM
lib/Target/X86/AsmParser/X86AsmParser.cpp
1281

Please keep the curly braces on the for loop. I know elsewhere they are omitted, but this is just too clever for me.

1283

If we have to keep doing comparing source locations, this code would greatly benefit from some helper functions that do things like the following: isLocBetween, isLocAfter, isLocBefore, so you don't have to deal in pointers all the time.

1308–1319

This is too clever. IMO we should take SymName, make an SMLoc out of it, and use the comparison helpers to figure out if the string is before or after '[', and do the appropriate rewrite.

1525

User-facing diagnostics are supposed to be sentence fragments that start with lower case. We also try to avoid contractions in user-facing diagnostics, so this should be "cannot" instead of "Can't".

coby added a comment.Aug 20 2017, 6:06 AM

This patch is largely covered by D36793, and so better be conducted afterwards.
will upload updated version once D36793 is approved and commited. sorry for all the hassle.

coby abandoned this revision.Sep 3 2017, 5:11 AM

superseded by D37413