Page MenuHomePhabricator

[MC] Separate masm integer literal lexer support from inline asm
ClosedPublic

Authored by rnk on Oct 22 2018, 3:58 PM.

Details

Summary

This effectively renames the IsParsingMSInlineAsm member variable of
AsmLexer to LexMasmIntegers and moves it up to MCAsmLexer. This is the
only behavior controlled by that variable. I added a public setter, so
that it can be set from outside or from the command line.

Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang,
but 0b label references work when using .intel_syntax in standalone .s
files.

However, 0b label references will *not* work from __asm blocks in clang.
They will work from GCC inline asm blocks, which it sounds like is
important for Crypto++ as mentioned in PR36144.

Essentially, we only lex masm literals when the asm blob itself is
marked as using intel syntax. If the user emits .intel_syntax directly
in the asm string, masm literals will not be lexed, as is the case if we
did a standalone .s compilation.

Hopefully this addresses PR36144.

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Oct 22 2018, 3:58 PM

FWIW, a different bug report where normal GNU non-inline assembly is broken by the MS inline asm handling is PR32973. It doesn't seem to get fixed by this change unfortunately - I haven't dug any further to see how closely related the issues are; that's also a case where e.g. 8h gets treated as a hex literal and gets replaced by 8 where it definitely shouldn't - where no intel syntax is involved whatsoever.

rnk updated this revision to Diff 170783.Oct 23 2018, 4:37 PM
  • fix mstorsjo's bug too
rnk updated this revision to Diff 170785.Oct 23 2018, 4:42 PM
  • add test case
rnk added a comment.Oct 23 2018, 4:44 PM

FWIW, a different bug report where normal GNU non-inline assembly is broken by the MS inline asm handling is PR32973. It doesn't seem to get fixed by this change unfortunately - I haven't dug any further to see how closely related the issues are; that's also a case where e.g. 8h gets treated as a hex literal and gets replaced by 8 where it definitely shouldn't - where no intel syntax is involved whatsoever.

I looked at the code, and there was extra 'h' hex literal handling that wasn't conditionalized the way it was supposed to be. Hopefully this new patch addresses it.

In D53535#1273593, @rnk wrote:

FWIW, a different bug report where normal GNU non-inline assembly is broken by the MS inline asm handling is PR32973. It doesn't seem to get fixed by this change unfortunately - I haven't dug any further to see how closely related the issues are; that's also a case where e.g. 8h gets treated as a hex literal and gets replaced by 8 where it definitely shouldn't - where no intel syntax is involved whatsoever.

I looked at the code, and there was extra 'h' hex literal handling that wasn't conditionalized the way it was supposed to be. Hopefully this new patch addresses it.

Nice! Indeed, it does seem to be fixed by this change now. Thanks!

thegameg accepted this revision.Oct 24 2018, 4:31 AM

LGTM, thanks for looking into this Reid!

This revision is now accepted and ready to land.Oct 24 2018, 4:31 AM
This revision was automatically updated to reflect the committed changes.