This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add Custom Parser for Atomic Memory Operands
ClosedPublic

Authored by lenary on Jul 24 2019, 6:18 AM.

Details

Summary

GCC Accepts both (reg) and 0(reg) for atomic instruction memory
operands. These instructions do not allow for an offset in their
encoding, so in the latter case, the 0 is silently dropped.

Due to how we have structured the RISCVAsmParser, the easiest way to add
support for parsing this offset is to add a custom AsmOperand and
parser. This parser drops all the parens, and just keeps the register.

This commit also adds a custom printer for these operands, which matches
the GCC canonical printer, printing both (a0) and 0(a0) as (a0).

Event Timeline

lenary created this revision.Jul 24 2019, 6:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2019, 6:18 AM

Yeah this looks like a nice way to do it, better than making an InstAlias for each one. GNU as will accept any expression that evaluates to the constant 0 in place of the 0 (i.e. it's just an imm0, same logic as all the other [su]immX fields, other than discarding it). Could we not therefore reuse part of the generic parser, just with a tweak to drop the immediate if parsed and 0? Something like:

// Attempt to parse token as a register.
if (parseRegister(Operands, true) == MatchOperand_Success)
  return false;

// Attempt to parse token as an immediate
if (parseImmediate(Operands) == MatchOperand_Success) {
  // Discard immediate and check it evaluates to 0
  std::unique_ptr<RISCVOperand> ImmOp = Operands.pop_back_val();
  int64_t Imm;
  RISCVMCExpr::VariantKind VK;
  bool IsConstantImm = evaluateConstantImm(ImmOp->getImm(), Imm, VK);
  if (!IsConstantImm || Imm != 0)
    return Error(ImmOp->getStartLoc(), "immediate must be zero");
  // Parse memory base register if present
  if (getLexer().is(AsmToken::LParen))
    return parseMemOpBaseReg(Operands) != MatchOperand_Success;
  return false;
}

// Finally we have exhausted all options and must declare defeat.
Error(getLoc(), "unknown operand");
return true;

You could avoid the pushing and subsequent popping from the vector if you also split RISCVAsmParser::parseImmediate (and thus also RISCVAsmParser::parseOperandWithModifier) up into everything but the pushing.

Yeah this looks like a nice way to do it, better than making an InstAlias for each one. GNU as will accept any expression that evaluates to the constant 0 in place of the 0 (i.e. it's just an imm0, same logic as all the other [su]immX fields, other than discarding it). Could we not therefore reuse part of the generic parser, just with a tweak to drop the immediate if parsed and 0? Something like:

... snip ...

I am worried about this approach, mostly because of the lines like 1268 in parseMemOpBaseReg, where they're pushing on (/) tokens, which aren't in the instruction operand strings (any more). It does seem like these might complicate the pushing/popping more than necessary. I am also not sure we want to support bare register operands.

That said, I do understand the value of using parseImmediate, to allow for any value that evaluates to zero.

You could avoid the pushing and subsequent popping from the vector if you also split RISCVAsmParser::parseImmediate (and thus also RISCVAsmParser::parseOperandWithModifier) up into everything but the pushing.

This feels intrusive. At the moment I'm happy to refactor to using parseImmediate (but doing all my own register parsing to avoid paren issues), and I'll add a test to make sure it works with more complex expressions.

Yeah this looks like a nice way to do it, better than making an InstAlias for each one. GNU as will accept any expression that evaluates to the constant 0 in place of the 0 (i.e. it's just an imm0, same logic as all the other [su]immX fields, other than discarding it). Could we not therefore reuse part of the generic parser, just with a tweak to drop the immediate if parsed and 0? Something like:

... snip ...

I am worried about this approach, mostly because of the lines like 1268 in parseMemOpBaseReg, where they're pushing on (/) tokens, which aren't in the instruction operand strings (any more). It does seem like these might complicate the pushing/popping more than necessary. I am also not sure we want to support bare register operands.

Is there a reason not to include them in the instruction operand string? It would mean you don't need the custom printer, no?

That said, I do understand the value of using parseImmediate, to allow for any value that evaluates to zero.

You could avoid the pushing and subsequent popping from the vector if you also split RISCVAsmParser::parseImmediate (and thus also RISCVAsmParser::parseOperandWithModifier) up into everything but the pushing.

This feels intrusive. At the moment I'm happy to refactor to using parseImmediate (but doing all my own register parsing to avoid paren issues), and I'll add a test to make sure it works with more complex expressions.

Yes, that's another sensible alternative.

Yeah this looks like a nice way to do it, better than making an InstAlias for each one. GNU as will accept any expression that evaluates to the constant 0 in place of the 0 (i.e. it's just an imm0, same logic as all the other [su]immX fields, other than discarding it). Could we not therefore reuse part of the generic parser, just with a tweak to drop the immediate if parsed and 0? Something like:

... snip ...

I am worried about this approach, mostly because of the lines like 1268 in parseMemOpBaseReg, where they're pushing on (/) tokens, which aren't in the instruction operand strings (any more). It does seem like these might complicate the pushing/popping more than necessary. I am also not sure we want to support bare register operands.

Is there a reason not to include them in the instruction operand string? It would mean you don't need the custom printer, no?

This is actually really complex. If I include the parens in the tablegen instruction definition, then we definitely cannot support 0(a0) because the parser requires a ( where the 0 is. If i don't include them in the instruction definition, then when parseMemBaseOp runs, it adds ( tokens into the Operands, which means later the register operand is not in the expected place in Operands when operand matching happens.

That said, I do understand the value of using parseImmediate, to allow for any value that evaluates to zero.

You could avoid the pushing and subsequent popping from the vector if you also split RISCVAsmParser::parseImmediate (and thus also RISCVAsmParser::parseOperandWithModifier) up into everything but the pushing.

This feels intrusive. At the moment I'm happy to refactor to using parseImmediate (but doing all my own register parsing to avoid paren issues), and I'll add a test to make sure it works with more complex expressions.

Yes, that's another sensible alternative.

Update: I looked into doing this, and it's a lot more complex than I hoped. parseImmediate accepts left parens, and we cannot backtrack through a parse. So, if you write lr.w a0, (a1), parseImmediate tries to parse (a1) as an immediate, which succeeds, and then only fails when you try to check it's a constant. I would therefore have to duplicate my memory operand parsing code (which drops parens, because above) to attempt to parse a register first, and then if that fails, backtrack and attempt to parse an immediate then a register. There are other approaches (doing lookahead, and assuming that (<token>) must refer to a register, so skipping the immediate parsing if the third token is a right paren) but they all feel really hacky and not wonderful.

For the moment I'm going to leave the code approximately* how it is currently, because I do not see the added value in the extra complexity by using parseImmediate, compared to the added assembly constructs (computed offsets) that it supports.

*= I may have a more robust way of checking the immediate is zero, based on your code and other isImm* functions, which I want to polish and add to this patch.

lenary updated this revision to Diff 211719.Jul 25 2019, 5:04 AM
  • Extract isImmZero check, Better Errors

Is there a reason not to include them in the instruction operand string? It would mean you don't need the custom printer, no?

This is actually really complex. If I include the parens in the tablegen instruction definition, then we definitely cannot support 0(a0) because the parser requires a ( where the 0 is. If i don't include them in the instruction definition, then when parseMemBaseOp runs, it adds ( tokens into the Operands, which means later the register operand is not in the expected place in Operands when operand matching happens.

Yeah, I see the problem now, we can't trigger our custom parser early enough.

Yes, that's another sensible alternative.

Update: I looked into doing this, and it's a lot more complex than I hoped. parseImmediate accepts left parens, and we cannot backtrack through a parse. So, if you write lr.w a0, (a1), parseImmediate tries to parse (a1) as an immediate, which succeeds, and then only fails when you try to check it's a constant. I would therefore have to duplicate my memory operand parsing code (which drops parens, because above) to attempt to parse a register first, and then if that fails, backtrack and attempt to parse an immediate then a register. There are other approaches (doing lookahead, and assuming that (<token>) must refer to a register, so skipping the immediate parsing if the third token is a right paren) but they all feel really hacky and not wonderful.

For the moment I'm going to leave the code approximately* how it is currently, because I do not see the added value in the extra complexity by using parseImmediate, compared to the added assembly constructs (computed offsets) that it supports.

*= I may have a more robust way of checking the immediate is zero, based on your code and other isImm* functions, which I want to polish and add to this patch.

I personally don't think it's too bad, and created D65291 based on this to demonstrate it. I will let others decide which they prefer though.

llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1292

GNU as

llvm/lib/Target/RISCV/RISCVInstrInfoA.td
19

GNU as, not GCC

lenary updated this revision to Diff 212366.Jul 30 2019, 9:38 AM

Address review comments

  • GCC -> GNU as
lenary updated this revision to Diff 212381.Jul 30 2019, 10:42 AM
  • Initialise RISCVMCExpr::VariantKind in isImmZero. This matches a patch that has landed in the interim to prevent UB in other uses of evaluateConstantImm
lenary marked 2 inline comments as done.Jul 30 2019, 10:42 AM
lewis-revill added inline comments.Jul 31 2019, 3:08 AM
llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1306

Purely a style nitpick, but IMO this can be renamed as something like OptionalImmOp, then you can check for its existence instead of using the FoundInteger variable.

lenary updated this revision to Diff 212562.Jul 31 2019, 6:17 AM

Address review feedback:

  • Remove FoundInteger in favour of checking OptionalImmOp for non-null pointer.
lenary marked an inline comment as done.Jul 31 2019, 6:17 AM
asb accepted this revision.Aug 1 2019, 5:10 AM

Added a nit around a doc comment, but this is looking good - thanks Sam. I agree that the error messages are nicer in this patch vs D65291 for now. However I'm very open to further improvements and refactoring of AsmParser logic in the future in follow-on patches. Only caveat to that is that the asm parser for any backend isn't particularly pretty, and that sometimes a little bit of repetition can be easier to read than an all-singing-all-dancing function that can do anything based on what function args you pass.

llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1299

"if it is zero"? expects->expect?

This revision is now accepted and ready to land.Aug 1 2019, 5:10 AM
This revision was automatically updated to reflect the committed changes.