This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add vendor-defined XTheadMemIdx (Indexed Memory Operations) extension

Authored by mtsamis on Feb 17 2023, 2:14 AM.



The vendor-defined XTHeadMemIdx (no comparable standard extension exists
at the time of writing) extension adds indexed load/store instructions
as well as load/store and update register instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:

Support for these instructions has already landed in GNU Binutils:;a=commit;h=27cfd142d0a7e378d19aa9a1278e2137f849b71b

Depends on D144002

Diff Detail

Event Timeline

mtsamis created this revision.Feb 17 2023, 2:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2023, 2:14 AM
mtsamis requested review of this revision.Feb 17 2023, 2:14 AM
craig.topper added inline comments.Feb 17 2023, 3:09 PM

alphabetize above mempair


Use STI.hasFeature(RISCV::FeatureVendorXTHeadMemIdx). I just rewrote all the existing code to that form.


alphbetize above MemPair


I would calling RegisterClass VT since there's all ValueType class that would typically be associated with VT. Looks like we use Ty or RegTy or some other variation involving "Ty" or "ty".


Split this line. Keep string opcodestr> on the first line lined up with RegisterClass on the previous line. Put the reset on a new line


When is VT not GPR?


Line this up with 0b100 on the previous line


Use the hyphenated spelling of T-Head in the comments.


Can we avoid AddedComplexity here by using the complexity parameter in the ComplexPattern?




80 columns


These last 2 should be IsRV64 I think?


This one should be IsRV64 only I think?


Is there a plain load pattern for TH_LRW for RV32?


These last 2 patterns should be IsRV32 only I think?

mtsamis updated this revision to Diff 498821.Feb 20 2023, 6:08 AM

Correct rewrite of all the MemIdx patterns
Use Complexity argument of ComplexPattern
Fix identation, line breaks, typos, names

mtsamis marked 14 inline comments as done.Feb 20 2023, 6:19 AM
mtsamis added inline comments.

Indeed, changed to Ty


Sorry, I forgot to mention the reasoning behind these.
I'm also developing the XTHeadFMemIdx (floating point indexed ops) extension which reuses these patterns with FPR32/64 for RegisterClass and f32/f64 for ValueType to avoid excess code duplication.

Apart from me forgetting to tell the reason, If you prefer it I can change this to only work with GPR for now and move the generic version to the FMemIdx patch.


Done, but for some reason I had to set the Complexity pararameter to 10 in order to achieve the same effect (i.e. the zext pattern being preferred from the plain one).
I don't know why it is different from AddedComplexity.


Thanks for spotting these. While trying to refactor the ZEXT and normal patterns into a single multiclass I completely messed these up.

I have re-wrote them all to be correct now (including RV32/64 as per your comments) and also removed the unnecessarry passing of the i32/i64 arguments since these match XLenVT.
In order to work correctly, the new code has separate LdIdxPat, LdZextIdxPat, StIdxPat, StZextIdxPat multiclasses (since the zext index is only meaningfull on RV64).

craig.topper added inline comments.Feb 20 2023, 5:21 PM

Do we need to check the shift amount is 1,2, or 3?


We can leave it like this. I should have checked the spec for an FP version.


We might look at a ComplexPattern here to match a shifted simm5 and return the simm5 and the shift amount as 2 results.

mtsamis marked 5 inline comments as done.Feb 21 2023, 12:14 AM
mtsamis added inline comments.

I wanted to make this pattern generic and not limited to specific shift amounts.
For the specific MemIdx extension the shift amount should be enforced on the pattern side with (AddrRegRegScale GPR:$rs1, GPR:$rs2, uimm2:$uimm2).
If I'm not mistaken, SelectAddrRegRegScale can match any shift amount but that will be rejected by the uimm2 there.


Sounds good, I'll try to refactor that with a ComplexPattern.

craig.topper added inline comments.Feb 21 2023, 12:17 AM

I worry the ComplexPattern might have bypassed that check.

mtsamis added inline comments.Feb 21 2023, 12:30 AM

I just checked and what you say is true (i.e. if SelectAddrRegRegScale returns value not in the 0-3 range the compiler crashes with 'Bad machine code: Invalid immediate').

Then, is there a way for the ComplexPattern to take into account the restrictions of the matched types (uimm2 in this case)?
I could add a template argument that explicitly sets the range accepted for the shift (and then have ... (AddrRegRegScale<4> ...)) but I wanted something more elegant that doesn't require this.
Are you aware of other ways to implement that restriction?

P.S. I will make sure to add testcases for that.

mtsamis added inline comments.Feb 21 2023, 1:31 AM

After adding additional tests, I found that there is a similar issue if the ADDI offset matched is large. I will also see how to properly address that in the AddrRegRegScale complex pattern.

craig.topper added inline comments.Feb 21 2023, 12:35 PM

I don't know of a way to do that from tablegen with the ComplexPattern. Tablegen would normally emit the check when it visited the shl. But since the shl is visited inside the complex pattern, the tablegen generated checks never see it and no checks are generated after the ComplexPattern is called.

mtsamis updated this revision to Diff 499537.Feb 22 2023, 9:09 AM

Restrict shift amount to 2 bits
Only re-arrange ADDI if is simm12
Introduce Complex pattern simm5shl2 for StoreUpdate constants.
Introduce negative constants in StoreUpdate tests.

mtsamis marked 4 inline comments as done.Feb 22 2023, 9:13 AM
mtsamis added inline comments.

Ok, I implemented it with a MaxShiftAmount argument that is exposed as a templated argument to the .td file.
Also restricted the re-arranged constant to be a simm12 one.

New test functions have been added for both.


Changed to a simm5shl2 complex pattern which makes the StoreUpdate patterns much more readable.

Unfortunately the same pattern didn't fit for use in tryIndexedLoad because there is a potential negation of the offset and that made the code messier when the same pattern was used there.

craig.topper added inline comments.Feb 22 2023, 11:35 AM

Declare Shift before this loop. It adds a line, but the declaration is barely noticeable where it is now.


Use EVT instead of auto. It didn't save any characters.

craig.topper added inline comments.Feb 22 2023, 11:59 AM

Can we put AM == ISD::PRE_INC and Ext == ISD::ZEXTLOAD into a variables IsPre and IsZExt. That should reduce the amount of text here.

mtsamis updated this revision to Diff 499744.Feb 22 2023, 11:53 PM
mtsamis marked an inline comment as done.

Create variables IsPre, IsPost, IsZExt
Change some local variable declarations

mtsamis marked 3 inline comments as done.Feb 22 2023, 11:57 PM
mtsamis added inline comments.

Thanks for the suggestion, done.

I set IsPre = (AM == ISD::PRE_INC || AM == ISD::PRE_DEC); which allowed to remove these statements entirely

if (AM == ISD::PRE_DEC)
else if (AM == ISD::POST_DEC)
This revision is now accepted and ready to land.Feb 23 2023, 1:53 PM
This revision was landed with ongoing or failed builds.Feb 23 2023, 3:18 PM
This revision was automatically updated to reflect the committed changes.