This is an archive of the discontinued LLVM Phabricator instance.

[MC] Fixed parsing of macro arguments where expressions with spaces are present.
ClosedPublic

Authored by s.egerton on Oct 9 2015, 6:31 AM.

Details

Summary

Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which has a space between '272' and '+'. The parser would then parse '272' and '+8' as two arguments instead of a single expression resulting in one too many arguments in the pseudo instruction.
The reason that the test case has been changed is so that the expected
output matches the output of the GNU assembler.

Diff Detail

Event Timeline

s.egerton updated this revision to Diff 36944.Oct 9 2015, 6:31 AM
s.egerton retitled this revision from to [MC] Fixed parsing of macro arguments where expressions with spaces are present..
s.egerton updated this object.
s.egerton added reviewers: dsanders, vkalintiris.
s.egerton added a subscriber: llvm-commits.
vkalintiris requested changes to this revision.Oct 9 2015, 2:27 PM
vkalintiris edited edge metadata.

Tests? Also, the macro-gas.s test case change should be in a separate patch.

This revision now requires changes to proceed.Oct 9 2015, 2:27 PM
dsanders edited edge metadata.Oct 10 2015, 8:39 AM

Fixed an issue for mips where an instruction such as:
sdc1 $f1, SC_FPREGS+8(a0) would return an error as SC_FPREGS and +8 are
interpreted as two arguments instead of an expression.

I believe you've missed a key part of the problem description here. I don't have the original test to hand so I might not have this exactly right but it was about the expansion of a macro where the arguments to the macro were 'sdc1 $f1, SC_FPREGS+8(a0)'. After pre-processing, it would end up as something like 'macroname sdc1 $f1, 32 +8(a0)' which was parsed a macro named 'macroname' followed by four arguments ('sdc1', '$f1', '32', '+8(a0)'). The macro was expecting three arguments.

Could you update the description?

Tests? Also, the macro-gas.s test case change should be in a separate patch.

+1 to needing a minimal version of the original test case in the patch.

The whitespace change in macro-gas.s is necessary though. The bug was about how the parser handles expressions-with-spaces and whitespace separated arguments. The updated test checks for the same string as GAS emits.

s.egerton updated this revision to Diff 41139.Nov 25 2015, 6:25 AM
s.egerton edited edge metadata.

I have added test cases. I also had to update the patch to ensure all of the tests passed.

s.egerton updated this object.Nov 26 2015, 3:08 AM
s.egerton edited edge metadata.
s.egerton updated this object.Dec 7 2015, 7:29 AM
dsanders requested changes to this revision.Dec 8 2015, 3:07 AM
dsanders edited edge metadata.

Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which
has a space between '272' and '+'. The parser would then parse '272' and '+8' as
two arguments instead of a single expression resulting in one too many arguments
in the pseudo instruction.

This isn't quite right since 'sdc1 $f1 272 + 8(a0)' would parse successfully by itself. This change is about parsing the arguments to a user-defined macro and dividing the input tokens into tokens for each of the three arguments of the macro (insn, reg, and src). At the moment, we find three argument separators (because we don't handle whitespace separated arguments correctly) leaving us with four arguments to the EX user-defined macro. This macro only accepts three arguments so we emit an error.

lib/MC/MCParser/AsmParser.cpp
2010

I think these two arguments do the wrong thing here:

4 +sym
4 +(1)

*NextChar will be neither a space or a digit so the loop will terminate at the 'AddTokens == 0 && SpaceEaten' check having only consumed '4 +'.

I think that dropping the AddTokens variable in favour of adding the tokens and continuing should fix this:

if (isOperator(Lexer.getKind())) {
  MA.push_back(getTok());
  Lex();

  // Whitespace after an operator can be ignored.
  if (Lexer.is(AsmToken::Space))
    Lex();

  continue;
}
test/MC/Mips/macro-sdc1.s
1 ↗(On Diff #41139)

(about filename): By our naming conventions, this would be a test for a macro called 'sdc1' but the test is really about argument parsing and the times it's correct to treat whitespace as an argument separator. Something like user-macro-argument-separation.s would be clearer.

9 ↗(On Diff #41139)

Indentation. Likewise for the comments below

12–22 ↗(On Diff #41139)

I think we need a few extra cases here. At the moment we don't test prefix operators, symbols, expressions in parentheses, and combinations of them. I've tried a few of these on GAS and there were a couple surprising behaviours.

Here's the test case I ran through GAS:

.extern sym
# imm and rs are deliberately swapped to test whitespace separated arguments.
.macro EX2 insn, rd, imm, rs
.ex\@: \insn \rd, \rs, \imm
.endm

EX2 addiu $2, 1 $3
EX2 addiu $2, ~1 $3
EX2 addiu $2, ~ 1 $3
EX2 addiu $2, 1+1 $3
EX2 addiu $2, 1+ 1 $3
EX2 addiu $2, 1 +1 $3
EX2 addiu $2, 1 + 1 $3
EX2 addiu $2, 1+~1 $3
EX2 addiu $2, 1+~ 1 $3
EX2 addiu $2, 1+ ~1 $3
EX2 addiu $2, 1 +~1 $3
EX2 addiu $2, 1 +~ 1 $3
EX2 addiu $2, 1 + ~1 $3
EX2 addiu $2, 1 + ~ 1 $3
# Each of the next four produce variations of '1+(1)$3' as a single argument.
EX2 addiu $2, 1+(1) $3
EX2 addiu $2, 1 +(1) $3
EX2 addiu $2, 1+ (1) $3
EX2 addiu $2, 1 + (1) $3
EX2 addiu $2, 1+(1)+1 $3
EX2 addiu $2, 1 +(1)+1 $3
EX2 addiu $2, 1+ (1)+1 $3
EX2 addiu $2, 1 + (1)+1 $3
nop
EX2 addiu $2, sym $3
EX2 addiu $2, -sym $3
EX2 addiu $2, - sym $3
EX2 addiu $2, 1+sym $3
EX2 addiu $2, 1+ sym $3
EX2 addiu $2, 1 +sym $3
EX2 addiu $2, 1 + sym $3
EX2 addiu $2, 1+~sym $3
EX2 addiu $2, 1+~ sym $3
EX2 addiu $2, 1+ ~sym $3
EX2 addiu $2, 1 +~sym $3
EX2 addiu $2, 1 +~ sym $3
EX2 addiu $2, 1 + ~sym $3
EX2 addiu $2, 1 + ~ sym $3
# Each of the next four produce variations of '1+(sym)$3' as a single argument.
EX2 addiu $2, 1+(sym) $3
EX2 addiu $2, 1 +(sym) $3
EX2 addiu $2, 1+ (sym) $3
EX2 addiu $2, 1 + (sym) $3
EX2 addiu $2, 1+(1)+sym $3
EX2 addiu $2, 1 +(1)+sym $3
EX2 addiu $2, 1+ (1)+sym $3
EX2 addiu $2, 1 + (1)+sym $3

Removing the commas also produces some surprising arguments such as '$2~1'.

This revision now requires changes to proceed.Dec 8 2015, 3:07 AM
s.egerton updated this revision to Diff 44213.EditedJan 7 2016, 7:49 AM
s.egerton edited edge metadata.
s.egerton marked 4 inline comments as done.

Responded to reviewers comments. In order to pass the newly added test cases, I had to produce another patch to fix issues whilst printing expressions. This patch requires this in order to pass all test cases. The patch ca be found here: http://reviews.llvm.org/D15949

dsanders accepted this revision.Feb 5 2016, 1:50 AM
dsanders edited edge metadata.

In order to pass the newly added test cases, I had to produce another patch to fix issues whilst printing expressions. This patch requires this in order to pass all test cases. The patch ca be found here: http://reviews.llvm.org/D15949

Hmm. The code changed by that patch already breaks the encapsulation of the MCExpr hierarchy and needs to be moved inside MipsMCExpr. I tried doing this a few weeks ago and it's a lot harder than it looks because there's two mutually exclusive ways to handle things like %hi and we seem to have chosen both. D15949 makes the encapsulation worse by duplicating a functionality from MCExpr so I'd rather not commit that and instead work on putting the current contents of that function inside the MCExpr hierarchy.

Dropping the tests involving 'sym' doesn't fundamentally change the testing w.r.t whitespace separated arguments so I think the lesser evil is to remove the sym tests from this patch. On balance, I think we should go with this patch by itself and drop the tests marked below.

LGTM with the two removals indicated below.

lib/Target/Mips/AsmParser/MipsAsmParser.cpp
3906–3920 ↗(On Diff #44213)

Given that we're dropping the symbol test cases for now, we should drop this section too and leave it for a later patch

test/MC/Mips/user-macro-argument-separation.s
41–70 ↗(On Diff #44213)

As explained above, I'm not keen on this suggestion but it's the lesser of two evils. Could you remove these tests so that we don't need D15949?

This revision was automatically updated to reflect the committed changes.
s.egerton marked 2 inline comments as done.

Closing bug PR24319 https://bugs.llvm.org/show_bug.cgi?id=24319 because your fix had fixed this bug as well.