This is an archive of the discontinued LLVM Phabricator instance.

[MC layer][AArch64] llvm-mc accepts 4-bit immediate values for "msr pan, #imm"
ClosedPublic

Authored by labrinea on Sep 21 2015, 5:13 AM.

Details

Summary

llvm-mc incorrectly encodes and decodes "msr pan, #imm" instructions. Only 1-bit immediate values should be valid. Currently 4-bit immediates are accepted.

echo "0x9f 0x42 0x00 0xd5" | llvm-mc -disassemble -triple=aarch64 -mattr=+v8.1a
        .text
        msr     PAN, #2
$ echo "msr pan, #2" | llvm-mc -triple=aarch64 -mattr=+v8.1a
        .text
        msr     PAN, #2

Diff Detail

Repository
rL LLVM

Event Timeline

labrinea updated this revision to Diff 35231.Sep 21 2015, 5:13 AM
labrinea retitled this revision from to [MC layer][AArch64] llvm-mc accepts 4-bit immediate values for "msr pan, #imm".
labrinea updated this object.
labrinea added reviewers: grosbach, t.p.northover.

Hi Alexandros,

Is this really an error? The MSR immediate is a 4-bit field. I understand why privileged access only needs a boolean, but if the CPU ignores the other three bits, we could maybe safely ignore it? Just a guess....

cheers,
--renato

As you can see from the above commands, the MC layer does not give any errors or warnings when encoding an assembly string. Furthermore it incorrectly decodes a byte stream.

As you can see from the above commands, the MC layer does not give any errors or warnings when encoding an assembly string. Furthermore it incorrectly decodes a byte stream.

No, I mean, in the CPU. Would that trigger a trap if the immediate is not 000x?

grosbach edited edge metadata.Sep 21 2015, 10:18 AM

The MSR instruction is explicitly documented as taking a 4-bit immediate operand.

You're arguing that the assembler should contextually decide what is or isn't legal for the second operand based on the value of the first? That's generally not something we do for these sorts of instructions unless there is documentation requiring it.

rengolin requested changes to this revision.Sep 22 2015, 5:12 PM
rengolin added a reviewer: rengolin.

You're arguing that the assembler should contextually decide what is or isn't legal for the second operand based on the value of the first? That's generally not something we do for these sorts of instructions unless there is documentation requiring it.

I second that. Until we have an update on the ARM ARM stating that PAM's immediate has to be 1 bit encoded and/or we have hardware to show that it traps, we shouldn't be changing this.

This revision now requires changes to proceed.Sep 22 2015, 5:12 PM

Until we have an update on the ARM ARM stating that PAM's immediate has to be 1 bit encoded

The ARM ARMv8.1 (which hasn't been published yet, but LLVM's implementation of ARMv8.1 is based on it) lists all encodings of "MSR PAN" with immediates > 1 as UNPREDICTABLE, meaning that the processor may do any of these:

  • Trap
  • Treat the instruction as a NOP
  • Ignore the high bits of the immediate
  • Write an arbitrary value to the destination register

Shall we reconsider keeping this patch after Oliver's feedback?

The ARM ARMv8.1 (which hasn't been published yet, but LLVM's implementation of ARMv8.1 is based on it) lists all encodings of "MSR PAN" with immediates > 1 as UNPREDICTABLE

When is it going to be public?

I'm not doubting you, but I don't want to set the precedent to implement things that aren't public. Though, as you say, all ARMv8.1 is based on that document, so I'm not sure what to choose.

Personally, I'm ok with this going in, as long as we're explicit that this is v8.1 only (even if PAN is not available before, not everyone knows that). Jim, are you ok with this, too?

Also, please pass clang-format on your change.

cheers,
--renato

lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
3929 ↗(On Diff #35231)

Include the fact that this is armv8.1 only in the comment.

3946 ↗(On Diff #35231)

bad format, please pass clang-format on this.

I don't see the text which describes the other values as UNPREDICTABLE. Can you provide a doc number and section reference? The v8.1 extension doc describes the other bits of the CRm field as being zero, but I don't see a mention of what happens if they're not. I believe you, I'm just wondering where I should be looking that I'm obviously not.

That aside, is there perhaps a cleaner way to implement this? Perhaps via a generic instruction definition that allows all encoding values and then assembly aliases that have more restricted operand sets? If we can avoid it, I'd really like to stay away from more C++ fiddling in the parser. We already have way too much of it. :(

Can you provide a doc number and section reference?

The v8.1 spec, section 8.1.3, lists the top 3 bits of the CRm field as "(0)(0)(0)". The v8A ARMARM, section C2.2.1 defines that the behaviour is CONSTRAINED UNPREDICTABLE if any of these bits are 1, and lists the allowed behaviours.

labrinea updated this revision to Diff 36340.Oct 2 2015, 4:47 AM
labrinea edited edge metadata.

Reimplemented via a generic instruction definition that allows all encoding values and then assembly aliases that have more restricted operand sets.

rengolin accepted this revision.Oct 3 2015, 7:32 AM
rengolin edited edge metadata.

Hi Alexandros,

With the nitpick, looks good to me. Thanks!

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2268 ↗(On Diff #36340)

Nitpick: I'd extract this on the line above, like:

auto State = Immed < 2 ? AArch64::MSRpstateImm1 : AArch64::MSRpstateImm4;
return CurDAG->getMachineNode(State, DL, ...
This revision is now accepted and ready to land.Oct 3 2015, 7:32 AM
labrinea added inline comments.Oct 4 2015, 12:35 PM
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2268 ↗(On Diff #36340)

Sure, thanks!

rengolin added inline comments.Oct 4 2015, 12:38 PM
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2268 ↗(On Diff #36340)

Wait, wouldn't this make it use the wrong imm encoding for other calls if their values happened to be < 2?

labrinea added inline comments.Oct 5 2015, 3:07 AM
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2268 ↗(On Diff #36340)

Good point the check should be based on the value of Reg, and MVT::i16 should be changed to MVT::i1 too for AArch64::MSRpstateImm1 opcode. I am updating the patch.

labrinea added inline comments.Oct 5 2015, 3:34 AM
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2268 ↗(On Diff #36340)

Sorry I got confused, I thought MVT::i16 means values from 0 to 16 rather than 16 bits. I think MVT enumeration doesn't really matter since MVT::i16 was already more than the 4 bits required for the Immediate. I think we should just change the patch like this:

auto State = (Reg == AArch64PState::PAN) ? AArch64::MSRpstateImm1
                                         : AArch64::MSRpstateImm4;
rengolin added inline comments.Oct 5 2015, 3:38 AM
lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2268 ↗(On Diff #36340)

Yes, that's what I was thinking, too. Maybe also adding an assert?

if (Reg == AArch64PState::PAN)
  assert(Immed < 2 && "Bad imm");
labrinea updated this revision to Diff 36499.Oct 5 2015, 4:07 AM
labrinea edited edge metadata.

AArch64DAGToDAGISel::SelectWriteRegister updated.

With the nitpick, LGTM. Thanks!

lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2272 ↗(On Diff #36499)

nitpick: no new line here. :)

This revision was automatically updated to reflect the committed changes.