This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Add the non-MVE instructions in Arm v8.1-M.

Authored by simon_tatham on May 30 2019, 8:19 AM.



This adds support for the new family of conditional selection /
increment / negation instructions; the low-overhead branch
instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole
list of registers at once; the new VMRS/VMSR and VLDR/VSTR
instructions to get data in and out of 8.1-M system registers,
particularly including the new VPR register used by MVE vector

To support this, we also add a register name 'zr' (used by the CSEL
family to force one of the inputs to the constant 0), and operand
types for lists of registers that are also allowed to include APSR or
VPR (used by CLRM). The VLDR/VSTR instructions also need some new
addressing modes.

The low-overhead branch instructions exist in their own separate
architecture extension, which we treat as enabled by default, but you
can say -mattr=-lob or equivalent to turn it off.

Diff Detail


Event Timeline

simon_tatham created this revision.May 30 2019, 8:19 AM

I will continue going over this tomorrow.

5128 ↗(On Diff #202202)

I think isTerminator can also be = 0, I can't remember seeing anything about it needing to be at the end of a basic block.

2300 ↗(On Diff #202202)

Where has this come from..?

2672 ↗(On Diff #202202)

Is this correct? What is IndexModePost?

2687 ↗(On Diff #202202)

Why does the post form have one more operand than the pre? This doesn't really look like the existing pre/post load/store instructions.

264 ↗(On Diff #202202)

I've just came here from the CSEL instructions, so forgive my limited scope, but why can't this register class access SP?

341 ↗(On Diff #202202)

Is the i32 type just to copy us to easily copy the value around?

343 ↗(On Diff #202202)

Not needed.

samparker added inline comments.May 31 2019, 1:28 AM
98 ↗(On Diff #202202)

Why is this needed? And why just the A57?

5754 ↗(On Diff #202202)

Looks like these two functions could be refactored into one.

861 ↗(On Diff #202202)

This is a lot of copy-paste, these three cases can be folded into one.

955 ↗(On Diff #202202)

So does this mean we just haven't created, and selected, an operand properly? Why?

1034 ↗(On Diff #202202)

Same question here.

dmgreen added inline comments.May 31 2019, 2:16 AM
98 ↗(On Diff #202202)

I think this one is because the schedule has CompleteModel = 1 (it may be the only Arm schedule like that). A similar thing was done for SVE on older CPU's in the AArch64 backend not long ago.

samparker added inline comments.May 31 2019, 2:56 AM
2672 ↗(On Diff #202202)

So to answer my own question, this are bits defined in the Arm backend and are later used to populate the target-specific instruction flags. So I'm pretty certain that this should be IndexModeNone. Not sure of the real purpose of the flags though so I'm unsure how to test it.

samparker added inline comments.May 31 2019, 3:12 AM
2687 ↗(On Diff #202202)

Scrap that, it does look like the existing ones but it still doesn't make sense to me. I'll have a closer look at the original descriptions.

simon_tatham marked 5 inline comments as done.May 31 2019, 6:20 AM
simon_tatham added inline comments.
2300 ↗(On Diff #202202)

Now you mention it, it looks redundant to me – the base class VFPI (via VFPAI) sets the same value in any case. Removing it leaves the tablegen diagnostic output unchanged.

264 ↗(On Diff #202202)

It's used by the CSEL instructions themselves, and the vector/scalar forms of VCMP and VPT. VCMP can't access sp at all (the ArmARM says "see related encodings"), and all the others are listed as CONSTRAINED UNPREDICTABLE if you try.

341 ↗(On Diff #202202)

I didn't write this code personally, but that seems likely to me: it means you can register-allocate an i32 value into VPR at one point, and into a GPR at another point, and make copy instructions that 'spill' it back and forth, without having to change its type all the time.

98 ↗(On Diff #202202)

Yes – this is the only SchedMachineModel that would otherwise give errors at tablegen time because it doesn't know what all the MVE instructions look like. And I didn't fancy making up a whole set of nonsense about what MVE instructions do look like for purposes of scheduling them on a Cortex-A57, for obvious reasons :-)

955 ↗(On Diff #202202)

It looks as if that FIXME comment is identical to the one in the previous function, which wasn't added by this patch. I guess this function must have been cloned-and-hacked from the previous one with adjustments only to the offset field size.

As I read the comment, it's saying that it would be nicer to have a different division of work between this level of decoding and whatever created the MC operands that it's starting from. I suppose if anyone ever changes the format of the MC operands as the comment suggests, then all the functions with copies of this FIXME comment will need updating to match.

samparker added inline comments.May 31 2019, 6:32 AM
955 ↗(On Diff #202202)

Ah, the above function wasn't automatically shown in my browser. Then, can we combine these two into a templated function?

samparker added inline comments.May 31 2019, 6:41 AM
264 ↗(On Diff #202202)

cheers, I missed that bit and just saw that PC wasn't permitted.

This should address several of the previous comments.

simon_tatham marked 7 inline comments as done.May 31 2019, 8:52 AM
simon_tatham added inline comments.
955 ↗(On Diff #202202)

Done, and also, there was another function added later on by D62680 (the load/store instructions) which I've also managed to replace with another instance of this template.

samparker added inline comments.Jun 4 2019, 5:23 AM
5114 ↗(On Diff #202445)

Have you added a target feature to identify whether the low-overhead branch extension is supported? We need one to be attached as a predicate to the instructions extending this class.

simon_tatham marked an inline comment as done.

I haven't finished updating this yet, but a minor fix: a colleague
pointed out that unlike ldm and stm, the register-list-based
instruction clrm has no need to warn if its register list isn't in
ascending order, since that would not cause any misunderstanding of
its effects, and it's useful to be able to list some register aliases
in an inline-assembler clrm instruction without worrying about their

miyuki added a subscriber: miyuki.Jun 7 2019, 10:54 AM
miyuki added inline comments.
7094 ↗(On Diff #203571)

Missing break

5614 ↗(On Diff #203571)

Probably missing LLVM_FALLTHROUGH

Fixed nits, and added a LOB feature which disables the hardware branch/loop instructions.

simon_tatham edited the summary of this revision. (Show Details)Jun 10 2019, 7:07 AM
simon_tatham marked 3 inline comments as done.
simon_tatham edited the summary of this revision. (Show Details)
samparker accepted this revision.EditedJun 10 2019, 7:36 AM

Thanks! LGTM. (no need to re-review for the csel in it-block test)

634 ↗(On Diff #203814)

I don't see any tests that cover this.

286 ↗(On Diff #203814)

Another refactoring nit...

This revision is now accepted and ready to land.Jun 10 2019, 7:36 AM
SjoerdMeijer added inline comments.Jun 10 2019, 7:53 AM
5187 ↗(On Diff #203814)

Nit, I think this is more pleasant to read:

def t2CSNEG : CS<"csneg", 0b1011>;
def t2CSINC : CS<"csinc", 0b1001>;

perhaps an easy refactoring before you commit, but will leave that up to you.

This revision was automatically updated to reflect the committed changes.
simon_tatham marked 3 inline comments as done.Jun 10 2019, 8:45 AM
simon_tatham added inline comments.
5187 ↗(On Diff #203814)

Done, but due to a commit-time goof, it ended up in rL362955 instead.

simon_tatham reopened this revision.Jun 10 2019, 9:27 AM
simon_tatham marked an inline comment as done.

I had to revert this work because it was missing a piece which caused build failures. (Sorry about that – next time I'll check more carefully which other patches I have in my local git checkout when I run the tests before pushing.)

The extra piece of code is from one of the not-yet-reviewed MVE patches, so I'll upload a revised version of this patch tomorrow with that part added. But not today, because it doesn't look as if my test run is going to finish in time.

This revision is now accepted and ready to land.Jun 10 2019, 9:27 AM

Right. Sorry about the mess. I've reopened this review, and uploaded a
fresh patch which adds the missing pieces of the AddrModeT2_i7s4
addressing mode so that the vldr and vstr instructions won't cause
a build failure. Also addressed Sam's remaining review comment by
refactoring reasonForFixupRelaxation.

simon_tatham marked an inline comment as done.Jun 11 2019, 1:53 AM
samparker accepted this revision.Jun 11 2019, 2:16 AM


This revision was automatically updated to reflect the committed changes.
chill added a subscriber: chill.Aug 7 2019, 3:58 AM
chill added inline comments.

Why was this change needed? I would think ARM::GPRPair_with_gsub_1_in_rGPRRegClass was the correct one (except that it does not exist anymore :/ )