Clang generates a '%' prefix for some registers in CFI directives. E.g.
".cfi_register lr, r12" becomes ".cfi_register lr, %r12" after
processing.
Details
- Reviewers
hfinkel nickdesaulniers MaskRay - Group Reviewers
Restricted Project - Commits
- rGb2f663073917: [PowerPC] Allow a '%' prefix for registers in CFI directives
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I was thinking "would it be better to move this logic down into PPCAsmParser::MatchRegisterName? Well, I would suppose that would depend on if the % prefix on registers is valid in other contexts, for instance maybe other places that directly call PPCAsmParser::MatchRegisterName rather than `PPCAsmParser::tryParseRegister which you've modified here.
Right away, the next caller of PPCAsmParser::MatchRegisterName I found is in PPCAsmParser::ParseOperand:
1480 switch (getLexer().getKind()) { 1481 // Special handling for register names. These are interpreted 1482 // as immediates corresponding to the register number. 1483 case AsmToken::Percent: 1484 Parser.Lex(); // Eat the '%'. 1485 unsigned RegNo; 1486 int64_t IntVal; 1487 if (MatchRegisterName(RegNo, IntVal)) 1488 return Error(S, "invalid register name"); ... 1553 switch (getLexer().getKind()) { 1554 case AsmToken::Percent: 1555 Parser.Lex(); // Eat the '%'. 1556 unsigned RegNo; 1557 if (MatchRegisterName(RegNo, IntVal))
So it looks like it's currently idiosyncratic that some callers lex the % and others do not. I highly doubt that's intentional. In that case, would you mind please sinking the lexing of % into PPCAsmParser::MatchRegisterName, then removing the existing % lexing, that way all sites that expect to parse registers conditionally with % prefixed do so in one place? I guess the existing sites expect there to be a % unconditionally; I suspect it should be conditional, at least to match GNU as portability concerns.
When I do that, I get several register parsing failures, including for this testcase. I wanted to restrict this change to the CFI directives (this function seems to be used by those instead of normal functions). X86 has similar code in the same place.
I think what may be happening is that the MatchRegisterName function is looking at the token that's sometimes already lexed, and so isn't always "current" when the function calls.
Oh?
diff --git a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp index 458edf71d6c8..5e4d17ec9a13 100644 --- a/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp +++ b/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp @@ -1201,6 +1201,8 @@ bool PPCAsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode, } bool PPCAsmParser::MatchRegisterName(unsigned &RegNo, int64_t &IntVal) { + if (getParser().getTok().is(AsmToken::Percent)) + getParser().Lex(); // Eat the '%'. if (getParser().getTok().is(AsmToken::Identifier)) { StringRef Name = getParser().getTok().getString(); if (Name.equals_lower("lr")) { @@ -1481,7 +1483,6 @@ bool PPCAsmParser::ParseOperand(OperandVector &Operands) { // Special handling for register names. These are interpreted // as immediates corresponding to the register number. case AsmToken::Percent: - Parser.Lex(); // Eat the '%'. unsigned RegNo; int64_t IntVal; if (MatchRegisterName(RegNo, IntVal)) @@ -1552,7 +1553,6 @@ bool PPCAsmParser::ParseOperand(OperandVector &Operands) { int64_t IntVal; switch (getLexer().getKind()) { case AsmToken::Percent: - Parser.Lex(); // Eat the '%'. unsigned RegNo; if (MatchRegisterName(RegNo, IntVal)) return Error(S, "invalid register name");
passes all tests for me, other than this newly added one; for some reason we get an additional newline before the blr instruction:
llvm-mc -triple powerpc64le-unknown-unknown /android0/llvm-project/llvm/test/MC/PowerPC/cfi-register-directive-parse.s .text .globl __test1 __test1: .cfi_startproc mflr 12 .cfi_register lr, r12 blr .cfi_endproc
(Though maybe that's a bug in MCAsmStreamer for the assembler .cfi directives, or how registers are printed? Maybe DwarfRegNumForCFI is related?)
PPCAsmParser::ParseOperand also has a comment that %rNN is used for ELF but not Macho; maybe this should be conditioned on isDarwin()? Though that comment and the current implementation of PPCAsmParser::ParseOperand don't look like they match to me. Also, the above print out is curious to me how r12 gets printed without r prefix for mflr. LLVM looks a little inconsistent here in how PPC asm register operands are printed.
I'm not sure how much ppc Darwin is much relevant anymore. If anyone cares, we can always make the % token consumption conditional.