This is an archive of the discontinued LLVM Phabricator instance.

[MC] Fixed parsing of macro arguments where expressions with spaces are present.
ClosedPublic

Authored by s.egerton on Oct 9 2015, 6:31 AM.

Download Raw Diff

Details

Reviewers

dsanders
vkalintiris

Commits

rGa1fa68ac9c0d: [MC] Fixed parsing of macro arguments where expressions with spaces are present.
rL260521: [MC] Fixed parsing of macro arguments where expressions with spaces are present.

Summary

Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which has a space between '272' and '+'. The parser would then parse '272' and '+8' as two arguments instead of a single expression resulting in one too many arguments in the pseudo instruction.
The reason that the test case has been changed is so that the expected
output matches the output of the GNU assembler.

Diff Detail

Event Timeline

s.egerton updated this revision to Diff 36944.Oct 9 2015, 6:31 AM

s.egerton retitled this revision from to [MC] Fixed parsing of macro arguments where expressions with spaces are present..

s.egerton updated this object.

s.egerton added reviewers: dsanders, vkalintiris.

s.egerton added a subscriber: llvm-commits.

Tests? Also, the macro-gas.s test case change should be in a separate patch.

This revision now requires changes to proceed.Oct 9 2015, 2:27 PM

Fixed an issue for mips where an instruction such as:
sdc1 $f1, SC_FPREGS+8(a0) would return an error as SC_FPREGS and +8 are
interpreted as two arguments instead of an expression.

I believe you've missed a key part of the problem description here. I don't have the original test to hand so I might not have this exactly right but it was about the expansion of a macro where the arguments to the macro were 'sdc1 $f1, SC_FPREGS+8(a0)'. After pre-processing, it would end up as something like 'macroname sdc1 $f1, 32 +8(a0)' which was parsed a macro named 'macroname' followed by four arguments ('sdc1', '$f1', '32', '+8(a0)'). The macro was expecting three arguments.

Could you update the description?

In D13592#263958, @vkalintiris wrote:

Tests? Also, the macro-gas.s test case change should be in a separate patch.

+1 to needing a minimal version of the original test case in the patch.

The whitespace change in macro-gas.s is necessary though. The bug was about how the parser handles expressions-with-spaces and whitespace separated arguments. The updated test checks for the same string as GAS emits.

I have added test cases. I also had to update the patch to ensure all of the tests passed.

Herald added a subscriber: dsanders. · View Herald TranscriptNov 25 2015, 6:25 AM

s.egerton updated this object.Nov 26 2015, 3:08 AM

s.egerton edited edge metadata.

s.egerton updated this object.Dec 7 2015, 7:29 AM

Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which
has a space between '272' and '+'. The parser would then parse '272' and '+8' as
two arguments instead of a single expression resulting in one too many arguments
in the pseudo instruction.

This isn't quite right since 'sdc1 $f1 272 + 8(a0)' would parse successfully by itself. This change is about parsing the arguments to a user-defined macro and dividing the input tokens into tokens for each of the three arguments of the macro (insn, reg, and src). At the moment, we find three argument separators (because we don't handle whitespace separated arguments correctly) leaving us with four arguments to the EX user-defined macro. This macro only accepts three arguments so we emit an error.

lib/MC/MCParser/AsmParser.cpp
2105–2106	I think these two arguments do the wrong thing here: 4 +sym 4 +(1) *NextChar will be neither a space or a digit so the loop will terminate at the 'AddTokens == 0 && SpaceEaten' check having only consumed '4 +'. I think that dropping the AddTokens variable in favour of adding the tokens and continuing should fix this: if (isOperator(Lexer.getKind())) { MA.push_back(getTok()); Lex(); // Whitespace after an operator can be ignored. if (Lexer.is(AsmToken::Space)) Lex(); continue; }
test/MC/Mips/macro-sdc1.s
1 ↗	(On Diff #41139)	(about filename): By our naming conventions, this would be a test for a macro called 'sdc1' but the test is really about argument parsing and the times it's correct to treat whitespace as an argument separator. Something like user-macro-argument-separation.s would be clearer.
9 ↗	(On Diff #41139)	Indentation. Likewise for the comments below
12–22 ↗	(On Diff #41139)	I think we need a few extra cases here. At the moment we don't test prefix operators, symbols, expressions in parentheses, and combinations of them. I've tried a few of these on GAS and there were a couple surprising behaviours. Here's the test case I ran through GAS: .extern sym # imm and rs are deliberately swapped to test whitespace separated arguments. .macro EX2 insn, rd, imm, rs .ex\@: \insn \rd, \rs, \imm .endm EX2 addiu $2, 1 $3 EX2 addiu $2, ~1 $3 EX2 addiu $2, ~ 1 $3 EX2 addiu $2, 1+1 $3 EX2 addiu $2, 1+ 1 $3 EX2 addiu $2, 1 +1 $3 EX2 addiu $2, 1 + 1 $3 EX2 addiu $2, 1+~1 $3 EX2 addiu $2, 1+~ 1 $3 EX2 addiu $2, 1+ ~1 $3 EX2 addiu $2, 1 +~1 $3 EX2 addiu $2, 1 +~ 1 $3 EX2 addiu $2, 1 + ~1 $3 EX2 addiu $2, 1 + ~ 1 $3 # Each of the next four produce variations of '1+(1)$3' as a single argument. EX2 addiu $2, 1+(1) $3 EX2 addiu $2, 1 +(1) $3 EX2 addiu $2, 1+ (1) $3 EX2 addiu $2, 1 + (1) $3 EX2 addiu $2, 1+(1)+1 $3 EX2 addiu $2, 1 +(1)+1 $3 EX2 addiu $2, 1+ (1)+1 $3 EX2 addiu $2, 1 + (1)+1 $3 nop EX2 addiu $2, sym $3 EX2 addiu $2, -sym $3 EX2 addiu $2, - sym $3 EX2 addiu $2, 1+sym $3 EX2 addiu $2, 1+ sym $3 EX2 addiu $2, 1 +sym $3 EX2 addiu $2, 1 + sym $3 EX2 addiu $2, 1+~sym $3 EX2 addiu $2, 1+~ sym $3 EX2 addiu $2, 1+ ~sym $3 EX2 addiu $2, 1 +~sym $3 EX2 addiu $2, 1 +~ sym $3 EX2 addiu $2, 1 + ~sym $3 EX2 addiu $2, 1 + ~ sym $3 # Each of the next four produce variations of '1+(sym)$3' as a single argument. EX2 addiu $2, 1+(sym) $3 EX2 addiu $2, 1 +(sym) $3 EX2 addiu $2, 1+ (sym) $3 EX2 addiu $2, 1 + (sym) $3 EX2 addiu $2, 1+(1)+sym $3 EX2 addiu $2, 1 +(1)+sym $3 EX2 addiu $2, 1+ (1)+sym $3 EX2 addiu $2, 1 + (1)+sym $3 Removing the commas also produces some surprising arguments such as '$2~1'.

This revision now requires changes to proceed.Dec 8 2015, 3:07 AM

Responded to reviewers comments. In order to pass the newly added test cases, I had to produce another patch to fix issues whilst printing expressions. This patch requires this in order to pass all test cases. The patch ca be found here: http://reviews.llvm.org/D15949

s.egerton added a parent revision: D15949: [mips] Changed the way expressions in instructions are printed to support different kinds of expressions..Jan 7 2016, 7:49 AM

ping

In order to pass the newly added test cases, I had to produce another patch to fix issues whilst printing expressions. This patch requires this in order to pass all test cases. The patch ca be found here: http://reviews.llvm.org/D15949

Hmm. The code changed by that patch already breaks the encapsulation of the MCExpr hierarchy and needs to be moved inside MipsMCExpr. I tried doing this a few weeks ago and it's a lot harder than it looks because there's two mutually exclusive ways to handle things like %hi and we seem to have chosen both. D15949 makes the encapsulation worse by duplicating a functionality from MCExpr so I'd rather not commit that and instead work on putting the current contents of that function inside the MCExpr hierarchy.

Dropping the tests involving 'sym' doesn't fundamentally change the testing w.r.t whitespace separated arguments so I think the lesser evil is to remove the sym tests from this patch. On balance, I think we should go with this patch by itself and drop the tests marked below.

LGTM with the two removals indicated below.

lib/Target/Mips/AsmParser/MipsAsmParser.cpp
3906–3920	Given that we're dropping the symbol test cases for now, we should drop this section too and leave it for a later patch
test/MC/Mips/user-macro-argument-separation.s
41–70	As explained above, I'm not keen on this suggestion but it's the lesser of two evils. Could you remove these tests so that we don't need D15949?

dsanders mentioned this in D15949: [mips] Changed the way expressions in instructions are printed to support different kinds of expressions..Feb 5 2016, 2:09 AM

Closed by commit rL260521: [MC] Fixed parsing of macro arguments where expressions with spaces are present. (authored by s.egerton). · Explain WhyFeb 11 2016, 5:53 AM

This revision was automatically updated to reflect the committed changes.

s.egerton marked 2 inline comments as done.

Closing bug PR24319 https://bugs.llvm.org/show_bug.cgi?id=24319 because your fix had fixed this bug as well.

Revision Contents

Path

Size

lib/

MC/

MCParser/

AsmParser.cpp

36 lines

Target/

Mips/

AsmParser/

MipsAsmParser.cpp

15 lines

test/

MC/

AsmParser/

macros-gas.s

4 lines

Mips/

user-macro-argument-separation.s

70 lines

Diff 44213

lib/MC/MCParser/AsmParser.cpp

Show First 20 Lines • Show All 2,033 Lines • ▼ Show 20 Lines	static bool isOperator(AsmToken::TokenKind kind) {
case AsmToken::EqualEqual:		case AsmToken::EqualEqual:
case AsmToken::Pipe:		case AsmToken::Pipe:
case AsmToken::PipePipe:		case AsmToken::PipePipe:
case AsmToken::Caret:		case AsmToken::Caret:
case AsmToken::Amp:		case AsmToken::Amp:
case AsmToken::AmpAmp:		case AsmToken::AmpAmp:
case AsmToken::Exclaim:		case AsmToken::Exclaim:
case AsmToken::ExclaimEqual:		case AsmToken::ExclaimEqual:
case AsmToken::Percent:
case AsmToken::Less:		case AsmToken::Less:
case AsmToken::LessEqual:		case AsmToken::LessEqual:
case AsmToken::LessLess:		case AsmToken::LessLess:
case AsmToken::LessGreater:		case AsmToken::LessGreater:
case AsmToken::Greater:		case AsmToken::Greater:
case AsmToken::GreaterEqual:		case AsmToken::GreaterEqual:
case AsmToken::GreaterGreater:		case AsmToken::GreaterGreater:
return true;		return true;
Show All 22 Lines	if (Vararg) {
if (Lexer.isNot(AsmToken::EndOfStatement)) {		if (Lexer.isNot(AsmToken::EndOfStatement)) {
StringRef Str = parseStringToEndOfStatement();		StringRef Str = parseStringToEndOfStatement();
MA.emplace_back(AsmToken::String, Str);		MA.emplace_back(AsmToken::String, Str);
}		}
return false;		return false;
}		}

unsigned ParenLevel = 0;		unsigned ParenLevel = 0;
unsigned AddTokens = 0;

// Darwin doesn't use spaces to delmit arguments.		// Darwin doesn't use spaces to delmit arguments.
AsmLexerSkipSpaceRAII ScopedSkipSpace(Lexer, IsDarwin);		AsmLexerSkipSpaceRAII ScopedSkipSpace(Lexer, IsDarwin);

		bool SpaceEaten;

for (;;) {		for (;;) {
		SpaceEaten = false;
if (Lexer.is(AsmToken::Eof) \|\| Lexer.is(AsmToken::Equal))		if (Lexer.is(AsmToken::Eof) \|\| Lexer.is(AsmToken::Equal))
return TokError("unexpected token in macro instantiation");		return TokError("unexpected token in macro instantiation");

if (ParenLevel == 0 && Lexer.is(AsmToken::Comma))		if (ParenLevel == 0) {

		if (Lexer.is(AsmToken::Comma))
break;		break;

if (Lexer.is(AsmToken::Space)) {		if (Lexer.is(AsmToken::Space)) {
		SpaceEaten = true;
Lex(); // Eat spaces		Lex(); // Eat spaces
		}

// Spaces can delimit parameters, but could also be part an expression.		// Spaces can delimit parameters, but could also be part an expression.
// If the token after a space is an operator, add the token and the next		// If the token after a space is an operator, add the token and the next
// one into this argument		// one into this argument
if (!IsDarwin) {		if (!IsDarwin) {
if (isOperator(Lexer.getKind())) {		if (isOperator(Lexer.getKind())) {
// Check to see whether the token is used as an operator,		MA.push_back(getTok());
		dsandersUnsubmitted Done Reply Inline Actions I think these two arguments do the wrong thing here: 4 +sym 4 +(1) NextChar will be neither a space or a digit so the loop will terminate at the 'AddTokens == 0 && SpaceEaten' check having only consumed '4 +'. I think that dropping the AddTokens variable in favour of adding the tokens and continuing should fix this: if (isOperator(Lexer.getKind())) { MA.push_back(getTok()); Lex(); // Whitespace after an operator can be ignored. if (Lexer.is(AsmToken::Space)) Lex(); continue; } dsanders:* I think these two arguments do the wrong thing here: 4 +sym 4 +(1) *NextChar will be…
// or part of an identifier		Lex();
const char *NextChar = getTok().getEndLoc().getPointer();
if (*NextChar == ' ')
AddTokens = 2;
}

if (!AddTokens && ParenLevel == 0) {		// Whitespace after an operator can be ignored.
break;		if (Lexer.is(AsmToken::Space))
		Lex();

		continue;
}		}
}		}
		if (SpaceEaten)
		break;
}		}

// handleMacroEntry relies on not advancing the lexer here		// handleMacroEntry relies on not advancing the lexer here
// to be able to fill in the remaining default parameter values		// to be able to fill in the remaining default parameter values
if (Lexer.is(AsmToken::EndOfStatement))		if (Lexer.is(AsmToken::EndOfStatement))
break;		break;

// Adjust the current parentheses level.		// Adjust the current parentheses level.
if (Lexer.is(AsmToken::LParen))		if (Lexer.is(AsmToken::LParen))
++ParenLevel;		++ParenLevel;
else if (Lexer.is(AsmToken::RParen) && ParenLevel)		else if (Lexer.is(AsmToken::RParen) && ParenLevel)
--ParenLevel;		--ParenLevel;

// Append the token to the current argument list.		// Append the token to the current argument list.
MA.push_back(getTok());		MA.push_back(getTok());
if (AddTokens)
AddTokens--;
Lex();		Lex();
}		}

if (ParenLevel != 0)		if (ParenLevel != 0)
return TokError("unbalanced parentheses in macro argument");		return TokError("unbalanced parentheses in macro argument");
return false;		return false;
}		}

▲ Show 20 Lines • Show All 2,846 Lines • Show Last 20 Lines

lib/Target/Mips/AsmParser/MipsAsmParser.cpp

Show First 20 Lines • Show All 3,897 Lines • ▼ Show 20 Lines	if (ResTy == MatchOperand_ParseFail)
return true;		return true;

DEBUG(dbgs() << ".. Generic Parser\n");		DEBUG(dbgs() << ".. Generic Parser\n");

switch (getLexer().getKind()) {		switch (getLexer().getKind()) {
default:		default:
Error(Parser.getTok().getLoc(), "unexpected token in operand");		Error(Parser.getTok().getLoc(), "unexpected token in operand");
return true;		return true;
		case AsmToken::Identifier: {
		StringRef Identifier;
		if (Parser.parseIdentifier(Identifier))
		return true;

		SMLoc S = Parser.getTok().getLoc();
		SMLoc E = SMLoc::getFromPointer(S.getPointer());
		MCSymbol *Sym = getContext().getOrCreateSymbol(Identifier);
		// Otherwise create a symbol reference.
		const MCExpr *Res =
		MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, getContext());

		Operands.push_back(MipsOperand::CreateImm(Res, S, E, *this));
		return false;
		}
		dsandersUnsubmitted Done Reply Inline Actions Given that we're dropping the symbol test cases for now, we should drop this section too and leave it for a later patch dsanders: Given that we're dropping the symbol test cases for now, we should drop this section too and…
case AsmToken::Dollar: {		case AsmToken::Dollar: {
// Parse the register.		// Parse the register.
SMLoc S = Parser.getTok().getLoc();		SMLoc S = Parser.getTok().getLoc();

// Almost all registers have been parsed by custom parsers. There is only		// Almost all registers have been parsed by custom parsers. There is only
// one exception to this. $zero (and it's alias $0) will reach this point		// one exception to this. $zero (and it's alias $0) will reach this point
// for div, divu, and similar instructions because it is not an operand		// for div, divu, and similar instructions because it is not an operand
// to the instruction definition but an explicit register. Special case		// to the instruction definition but an explicit register. Special case
▲ Show 20 Lines • Show All 2,289 Lines • Show Last 20 Lines

test/MC/AsmParser/macros-gas.s

	Show All 33 Lines

	// CHECK: .ascii "1 2 3 \003"			// CHECK: .ascii "1 2 3 \003"
	test3 1, 2 3			test3 1, 2 3

	.macro test3_prime _a _b _c			.macro test3_prime _a _b _c
	.ascii "\_a \_b \_c"			.ascii "\_a \_b \_c"
	.endm			.endm

	// CHECK: .ascii "1 (23) "			// CHECK: .ascii "1 (2 3) "
	test3_prime 1, (2 3)			test3_prime 1, (2 3)

	// CHECK: .ascii "1 (23) "			// CHECK: .ascii "1 (2 3) "
	test3_prime 1 (2 3)			test3_prime 1 (2 3)

	// CHECK: .ascii "1 2 "			// CHECK: .ascii "1 2 "
	test3_prime 1 2			test3_prime 1 2

	.macro test5 _a			.macro test5 _a
	.globl \_a			.globl \_a
	.endm			.endm
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

test/MC/Mips/user-macro-argument-separation.s

This file was added.

				# RUN: llvm-mc %s -triple=mipsel-unknown-linux -show-encoding -mcpu=mips32r2 \| \
				# RUN: FileCheck %s
				# RUN: llvm-mc %s -triple=mipsel-unknown-linux -show-encoding -mcpu=mips32r2 \| \
				# RUN: FileCheck %s
				# RUN: llvm-mc %s -triple=mips-unknown-linux -show-encoding -mcpu=mips32r2 \| \
				# RUN: FileCheck %s

				# Check that the IAS expands macro instructions in the same way as GAS

				.extern sym
				# imm and rs are deliberately swapped to test whitespace separated arguments.
				.macro EX2 insn, rd, imm, rs
				.ex\@: \insn \rd, \rs, \imm
				.endm

				.option pic0

				EX2 addiu $2, 1 $3 # CHECK: addiu $2, $3, 1
				EX2 addiu $2, ~1 $3 # CHECK: addiu $2, $3, -2
				EX2 addiu $2, ~ 1 $3 # CHECK: addiu $2, $3, -2
				EX2 addiu $2, 1+1 $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1+ 1 $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1 +1 $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1 + 1 $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1+~1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1+~ 1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1+ ~1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1 +~1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1 +~ 1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1 + ~1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1 + ~ 1 $3 # CHECK: addiu $2, $3, -1
				EX2 addiu $2, 1+(1) $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1 +(1) $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1+ (1) $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1 + (1) $3 # CHECK: addiu $2, $3, 2
				EX2 addiu $2, 1+(1)+1 $3 # CHECK: addiu $2, $3, 3
				EX2 addiu $2, 1 +(1)+1 $3 # CHECK: addiu $2, $3, 3
				EX2 addiu $2, 1+ (1)+1 $3 # CHECK: addiu $2, $3, 3
				EX2 addiu $2, 1 + (1)+1 $3 # CHECK: addiu $2, $3, 3
				nop # CHECK: nop
				EX2 addiu $2, sym $3 # CHECK: addiu $2, $3, sym
				EX2 addiu $2, -sym $3 # CHECK: addiu $2, $3, -sym
				EX2 addiu $2, - sym $3 # CHECK: addiu $2, $3, -sym
				EX2 addiu $2, 1+sym $3 # CHECK: addiu $2, $3, 1+sym
				EX2 addiu $2, 1+ sym $3 # CHECK: addiu $2, $3, 1+sym
				EX2 addiu $2, 1 +sym $3 # CHECK: addiu $2, $3, 1+sym
				EX2 addiu $2, 1 + sym $3 # CHECK: addiu $2, $3, 1+sym
				EX2 addiu $2, 1+~sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1+~ sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1+ ~sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1 +~sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1 +~ sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1 + ~sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1 + ~ sym $3 # CHECK: addiu $2, $3, 1+~sym
				EX2 addiu $2, 1+(sym) $3 # CHECK: addiu $2, $3, 1+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1 +(sym) $3 # CHECK: addiu $2, $3, 1+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1+ (sym) $3 # CHECK: addiu $2, $3, 1+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1 + (sym) $3 # CHECK: addiu $2, $3, 1+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1+(1)+sym $3 # CHECK: addiu $2, $3, 2+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1 +(1)+sym $3 # CHECK: addiu $2, $3, 2+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1+ (1)+sym $3 # CHECK: addiu $2, $3, 2+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				EX2 addiu $2, 1 + (1)+sym $3 # CHECK: addiu $2, $3, 2+sym
				# CHECK: fixup A - offset: 0, value: sym, kind: fixup_Mips_32
				dsandersUnsubmitted Done Reply Inline Actions As explained above, I'm not keen on this suggestion but it's the lesser of two evils. Could you remove these tests so that we don't need D15949? dsanders: As explained above, I'm not keen on this suggestion but it's the lesser of two evils. Could you…