Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which has a space between '272' and '+'. The parser would then parse '272' and '+8' as two arguments instead of a single expression resulting in one too many arguments in the pseudo instruction.
The reason that the test case has been changed is so that the expected
output matches the output of the GNU assembler.
Details
Diff Detail
Event Timeline
Fixed an issue for mips where an instruction such as:
sdc1 $f1, SC_FPREGS+8(a0) would return an error as SC_FPREGS and +8 are
interpreted as two arguments instead of an expression.
I believe you've missed a key part of the problem description here. I don't have the original test to hand so I might not have this exactly right but it was about the expansion of a macro where the arguments to the macro were 'sdc1 $f1, SC_FPREGS+8(a0)'. After pre-processing, it would end up as something like 'macroname sdc1 $f1, 32 +8(a0)' which was parsed a macro named 'macroname' followed by four arguments ('sdc1', '$f1', '32', '+8(a0)'). The macro was expecting three arguments.
Could you update the description?
+1 to needing a minimal version of the original test case in the patch.
The whitespace change in macro-gas.s is necessary though. The bug was about how the parser handles expressions-with-spaces and whitespace separated arguments. The updated test checks for the same string as GAS emits.
I have added test cases. I also had to update the patch to ensure all of the tests passed.
Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which
has a space between '272' and '+'. The parser would then parse '272' and '+8' as
two arguments instead of a single expression resulting in one too many arguments
in the pseudo instruction.
This isn't quite right since 'sdc1 $f1 272 + 8(a0)' would parse successfully by itself. This change is about parsing the arguments to a user-defined macro and dividing the input tokens into tokens for each of the three arguments of the macro (insn, reg, and src). At the moment, we find three argument separators (because we don't handle whitespace separated arguments correctly) leaving us with four arguments to the EX user-defined macro. This macro only accepts three arguments so we emit an error.
lib/MC/MCParser/AsmParser.cpp | ||
---|---|---|
2105–2106 | I think these two arguments do the wrong thing here: 4 +sym 4 +(1) *NextChar will be neither a space or a digit so the loop will terminate at the 'AddTokens == 0 && SpaceEaten' check having only consumed '4 +'. I think that dropping the AddTokens variable in favour of adding the tokens and continuing should fix this: if (isOperator(Lexer.getKind())) { MA.push_back(getTok()); Lex(); // Whitespace after an operator can be ignored. if (Lexer.is(AsmToken::Space)) Lex(); continue; } | |
test/MC/Mips/macro-sdc1.s | ||
1 ↗ | (On Diff #41139) | (about filename): By our naming conventions, this would be a test for a macro called 'sdc1' but the test is really about argument parsing and the times it's correct to treat whitespace as an argument separator. Something like user-macro-argument-separation.s would be clearer. |
9 ↗ | (On Diff #41139) | Indentation. Likewise for the comments below |
12–22 ↗ | (On Diff #41139) | I think we need a few extra cases here. At the moment we don't test prefix operators, symbols, expressions in parentheses, and combinations of them. I've tried a few of these on GAS and there were a couple surprising behaviours. Here's the test case I ran through GAS: .extern sym # imm and rs are deliberately swapped to test whitespace separated arguments. .macro EX2 insn, rd, imm, rs .ex\@: \insn \rd, \rs, \imm .endm EX2 addiu $2, 1 $3 EX2 addiu $2, ~1 $3 EX2 addiu $2, ~ 1 $3 EX2 addiu $2, 1+1 $3 EX2 addiu $2, 1+ 1 $3 EX2 addiu $2, 1 +1 $3 EX2 addiu $2, 1 + 1 $3 EX2 addiu $2, 1+~1 $3 EX2 addiu $2, 1+~ 1 $3 EX2 addiu $2, 1+ ~1 $3 EX2 addiu $2, 1 +~1 $3 EX2 addiu $2, 1 +~ 1 $3 EX2 addiu $2, 1 + ~1 $3 EX2 addiu $2, 1 + ~ 1 $3 # Each of the next four produce variations of '1+(1)$3' as a single argument. EX2 addiu $2, 1+(1) $3 EX2 addiu $2, 1 +(1) $3 EX2 addiu $2, 1+ (1) $3 EX2 addiu $2, 1 + (1) $3 EX2 addiu $2, 1+(1)+1 $3 EX2 addiu $2, 1 +(1)+1 $3 EX2 addiu $2, 1+ (1)+1 $3 EX2 addiu $2, 1 + (1)+1 $3 nop EX2 addiu $2, sym $3 EX2 addiu $2, -sym $3 EX2 addiu $2, - sym $3 EX2 addiu $2, 1+sym $3 EX2 addiu $2, 1+ sym $3 EX2 addiu $2, 1 +sym $3 EX2 addiu $2, 1 + sym $3 EX2 addiu $2, 1+~sym $3 EX2 addiu $2, 1+~ sym $3 EX2 addiu $2, 1+ ~sym $3 EX2 addiu $2, 1 +~sym $3 EX2 addiu $2, 1 +~ sym $3 EX2 addiu $2, 1 + ~sym $3 EX2 addiu $2, 1 + ~ sym $3 # Each of the next four produce variations of '1+(sym)$3' as a single argument. EX2 addiu $2, 1+(sym) $3 EX2 addiu $2, 1 +(sym) $3 EX2 addiu $2, 1+ (sym) $3 EX2 addiu $2, 1 + (sym) $3 EX2 addiu $2, 1+(1)+sym $3 EX2 addiu $2, 1 +(1)+sym $3 EX2 addiu $2, 1+ (1)+sym $3 EX2 addiu $2, 1 + (1)+sym $3 Removing the commas also produces some surprising arguments such as '$2~1'. |
Responded to reviewers comments. In order to pass the newly added test cases, I had to produce another patch to fix issues whilst printing expressions. This patch requires this in order to pass all test cases. The patch ca be found here: http://reviews.llvm.org/D15949
In order to pass the newly added test cases, I had to produce another patch to fix issues whilst printing expressions. This patch requires this in order to pass all test cases. The patch ca be found here: http://reviews.llvm.org/D15949
Hmm. The code changed by that patch already breaks the encapsulation of the MCExpr hierarchy and needs to be moved inside MipsMCExpr. I tried doing this a few weeks ago and it's a lot harder than it looks because there's two mutually exclusive ways to handle things like %hi and we seem to have chosen both. D15949 makes the encapsulation worse by duplicating a functionality from MCExpr so I'd rather not commit that and instead work on putting the current contents of that function inside the MCExpr hierarchy.
Dropping the tests involving 'sym' doesn't fundamentally change the testing w.r.t whitespace separated arguments so I think the lesser evil is to remove the sym tests from this patch. On balance, I think we should go with this patch by itself and drop the tests marked below.
LGTM with the two removals indicated below.
lib/Target/Mips/AsmParser/MipsAsmParser.cpp | ||
---|---|---|
3906–3920 | Given that we're dropping the symbol test cases for now, we should drop this section too and leave it for a later patch | |
test/MC/Mips/user-macro-argument-separation.s | ||
41–70 | As explained above, I'm not keen on this suggestion but it's the lesser of two evils. Could you remove these tests so that we don't need D15949? |
Closing bug PR24319 https://bugs.llvm.org/show_bug.cgi?id=24319 because your fix had fixed this bug as well.
I think these two arguments do the wrong thing here:
*NextChar will be neither a space or a digit so the loop will terminate at the 'AddTokens == 0 && SpaceEaten' check having only consumed '4 +'.
I think that dropping the AddTokens variable in favour of adding the tokens and continuing should fix this: