This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use new assembler diags for ARM
ClosedPublic

Authored by olista01 on Mar 31 2017, 4:06 AM.

Details

Summary

This converts the ARM AsmParser to use the new assembly matcher error
reporting mechanism, which allows errors to be reported for multiple
instruction encodings when it is ambiguous which one the user intended
to use.

By itself this doesn't improve many error messages, because we don't have diagnostic text for most operand types, but as we add that then this will allow more of those diagnostic strings to be used when they are relevant.

Diff Detail

Repository
rL LLVM

Event Timeline

olista01 created this revision.Mar 31 2017, 4:06 AM
rengolin edited edge metadata.Mar 31 2017, 4:20 AM

Hi Oliver,

This is an interesting development and I agree with many of your comments in the code, especially the one that says this is not ARM specific at all.

I imagine a generic engine which calls a target specific for the messages.

Also, can you give an example of a more general error that should silence the more specific ones?

cheers,
--renato

I imagine a generic engine which calls a target specific for the messages.

Do you think it's ok to develop this as ARM-specific code first then refactor it once we have stabilised it, or would it be better to start with the target-independent core?

Also, can you give an example of a more general error that should silence the more specific ones?

There are plenty of examples of this in Thumb, where we have 16-bit and 32-bit versions of an instruction, where the 32-bit version accepts a wider range of immediates or registers. Note that this case won't be hit often at all with just this patch, because we have diagnostic messages for very few operand classes (just those listed in getOperandMatchFailDiag). Expanding that is on my TODO list.

sanwou01 edited reviewers, added: sanwou01; removed: sanne.wouda.Mar 31 2017, 5:57 AM

Do you think it's ok to develop this as ARM-specific code first then refactor it once we have stabilised it, or would it be better to start with the target-independent core?

I'm not sure. Right now, the implementation is very much ARM dependent, which could make matters worse when trying to other targets. Maybe splitting now, that the functionality is small and the relationships are obvious, would be a much easier way to evolve later. Or maybe it'll just complicate more things now and make it harder to predict.

I have no strong opinion in this particular case. I have gone with both approaches in the past, and have had mixed results with both. :)

There are plenty of examples of this in Thumb, where we have 16-bit and 32-bit versions of an instruction, where the 32-bit version accepts a wider range of immediates or registers. Note that this case won't be hit often at all with just this patch, because we have diagnostic messages for very few operand classes (just those listed in getOperandMatchFailDiag). Expanding that is on my TODO list.

Interesting, that's a very fine grained level of warnings. I assume this wouldn't be enabled for the normal lowering, just assembly files/snippets, right? Or is this a tool to help us predict better codegen?

Regardless, they're both interesting tools that we could control independently (but use the same machinery).

cheers,
--renato

I have no strong opinion in this particular case. I have gone with both approaches in the past, and have had mixed results with both. :)

Ok, I'll stick with the current approach for now, but keep in mind the fact that this will need to be refactored at some point in the future.

I assume this wouldn't be enabled for the normal lowering, just assembly files/snippets, right?

Correct, this is only used when parsing textual assembly (both files and inline asm), not during code-generation from C/C++. Most of my motivating examples come from hand-written Thumb code, where there are a lot of different immediate ranges, and the assembler often just gives a "requires arm-mode" error, which is useless on a Cortex-M target that doesn't have ARM mode.

Most of my motivating examples come from hand-written Thumb code, where there are a lot of different immediate ranges, and the assembler often just gives a "requires arm-mode" error, which is useless on a Cortex-M target that doesn't have ARM mode.

Right, that indeed needs a better solution! :)

olista01 updated this revision to Diff 93641.Mar 31 2017, 7:30 AM
olista01 retitled this revision from [ARM] Use new assembler diags for ARM (WIP) to [ARM] Use new assembler diags for ARM.

Updated tests, the regression suite is now fully passing.

A few of the tests appear to show less-detailed error messages than before, printing "invalid instruction" rather than "invalid operand for instruction". However, these are cases where changing the operand in question would not make the instruction valid, so I think this is OK. There may be value in emitting a message when multiple operands are invalid, but that can come later.

olista01 edited the summary of this revision. (Show Details)Apr 3 2017, 8:02 AM

Hi Oliver,

I'm still getting my bearings. The inline comments are more as a way to understand than to criticise your patch.

I think this is, in the long run, a good way forward. But I don't want to muddle the field before we get there.

So, before I look at the patch per se, we should discuss the error messages, what do they mean and what we can do.

Then later we can agree on multiple steps to get there, even if not all problems are fixed on the first commit (I'm sure they won't).

cheers,
--renato

test/MC/ARM/arm-branch-errors.s
14 ↗(On Diff #93877)

This is a bit hard to understand... I'm not sure what you mean by "for one encoding".

Of course, the best message here would be "address needs to be 4-byte aligned on ARM", I'm not sure how we could get that with the current mechanism.

But it's also not clear (to me at least), how is this better than what we had.

test/MC/ARM/diagnostics.s
183 ↗(On Diff #93877)

Why did this one get worse?

test/MC/ARM/fullfp16-neg.s
5 ↗(On Diff #93877)

requires what?

I know it was there already, but it's confusing. If we're trying to show off the new (better) error messages, we should display them in their full glory. :)

test/MC/ARM/invalid-fp-armv8.s
38 ↗(On Diff #93877)

This is technically "better" because you have chosen -mattr=-neon, but still doesn't say why the instruction is invalid.

Some error messages have the "requires ARMv8", but others don't. This one should know that it requires NEON, maybe a way to get the right info?

test/MC/ARM/lsl-zero-errors.s
49 ↗(On Diff #93877)

It says multiple near misses, but only shows one... Not very helpful.

olista01 added inline comments.Apr 4 2017, 3:33 AM
test/MC/ARM/arm-branch-errors.s
14 ↗(On Diff #93877)

To make it clearer, here is the full error message for this instruction, with all of the context lines:

<stdin>:1:3: error: invalid instruction, multiple near-miss encodings found
  b #2
  ^
<stdin>:1:3: note: for one encoding: instruction requires: thumb
  b #2
  ^
<stdin>:1:5: note: for one encoding: invalid operand for instruction
  b #2
    ^

The two "note: for one encoding:" lines correspond to rows in the table-generated matcher table (which roughly corresponds to encodings in the ARMARM) which nearly match this instruction.

Two of these rows are Thumb branch instructions, for which an immediate of #2 is valid. I'm squashing these down into the one "instruction requires: thumb" message, as the second one wouldn't add anything.

Another of the rows is the ARM branch instruction, which doesn't match because the immediate must be a multiple of 4 (or a label, with alignment checked later). We don't currently have a diagnostic string for this operand class, so we emit the generic "invalid operand for instruction" message. Once this has been committed we can add a diagnostic string like "address needs to be 4-byte aligned" for the operand class.

test/MC/ARM/diagnostics.s
183 ↗(On Diff #93877)

This one was previously picking up the diagnostic for MCK_Imm0_255, which is only valid for Thumb targets (this test is using ARM mode). If the immediate was reduced, this would actually match the MCK_ModImm class. However, that doesn't have a diagnostic associated with it yet. Once it does, we'll get a more useful diagnostic than before.

test/MC/ARM/fullfp16-neg.s
5 ↗(On Diff #93877)

The full message is "instruction requires: full half-float", I'll update the test.

Many of these instructions also get an "invalid operand for instruction" error pointing to the ".f16" token, because they would be valid if they had ".f32" instead. I haven't updated the test to show this as I expect it to change once I add diagnostics for token operands.

test/MC/ARM/invalid-fp-armv8.s
38 ↗(On Diff #93877)

A lot of the instructions in this file (including this one) are invalid both because NEON is disabled, and because they have incorrect operands (so would not be correct even if NEON were enabled). We don't emit a more specific error here because neither change alone would make the instruction valid.

test/MC/ARM/lsl-zero-errors.s
49 ↗(On Diff #93877)

A lot of the instructions in this test now get multiple near-misses (mostly different operands invalid, valid in other ISA), I'll update the test.

olista01 updated this revision to Diff 94059.Apr 4 2017, 5:48 AM
rengolin accepted this revision.Apr 18 2017, 4:28 AM

I guess this will be a long road anyway, so we should cope with some vague messages now, and add better ones with time.

Thanks for doing the effort, this is an area that is really bad at error messages. :)

LGTM.

This revision is now accepted and ready to land.Apr 18 2017, 4:28 AM

Thanks, could you also review D27620, which this depends on?

I guess this will be a long road anyway, so we should cope with some vague messages now, and add better ones with time.

Yep, my plan was to get this change in without regressing anything too badly, then add diagnostics for each operand and register class. I've got a few patches that build on this up in phab, but I'm currently hitting a problem caused by the shouldOmitCCOutOperand function in the ARMAsmParser causing many instructions to fail to match all of their operands, not just the visible one. Once I've found a good solution to that, adding diags will hopefully only require adding the diagnostic string in the tablegen description of the operand.

olista01 updated this revision to Diff 110986.Aug 14 2017, 8:56 AM

After discussing this with someone in our documentation team, I've changed the wording of the messages slightly. They now look like this (with some patches adding diagnostic text for more operands, which I'm almost ready to upstream):

<stdin>:2:3: error: invalid instruction, any one of the following would fix this:
  adds r0, r1, #8
  ^
<stdin>:2:3: note: instruction requires: thumb2
  adds r0, r1, #8
  ^
<stdin>:2:16: note: operand must be an immediate in the range [0,7]
  adds r0, r1, #8
               ^
<stdin>:2:16: note: operand must be a register in range [r0, r7]
  adds r0, r1, #8
               ^

I've changed the initial error test to be more explicit about the fact that the notes relate to the error, and that fixing any one of them will make the instruction valid, rather than having to fix all of them. I've removed the "for one encoding" text from the notes, as this is now redundant, and seemed to be causing confusion.

This is dependent on D27620, which is still in review.

This revision was automatically updated to reflect the committed changes.
echristo edited edge metadata.Oct 9 2017, 1:30 PM

In general a fan of this FWIW:

One question is it intended that ultimately this is going to be turned on by default? From looking at the error messages it seems like it'd be goodness?

Yes, I agree that it would be good to get this turned on for all targets. I'm developing it on ARM only at the moment so that I can make changes to the general mechanism without breaking lots of target's tests. Once it's stabilised, here's what I think needs doing to use it for other targets:

  • Move the ReportNearMisses and FilterNearMisses from the ARM asm parser to somewhere target-independent. There is some ARM-specific code in there at the moment, so this will need some sort of callbacks into the target asm parser.
  • Switch each asm parser to use the new mechanism. This should involve changing the signature of MatchInstruction, implementing any target-specific callbacks needed, and flipping the bit in the tablegen. I think the biggest part of this work will be updating the tests to match the new diagnostic format.

I plan on doing this for AArch64 once I'm done with ARM, but I'm not sure I know enough about the other targets assembly to be able to update the tests myself.

SGTM. Thanks!

llvm/trunk/test/MC/ARM/neon-complex.s