This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add codegen pattern matching for bit manipulation assembly instructions.
Needs ReviewPublic

Authored by PaoloS on Sep 9 2019, 4:34 AM.



This patch provides optimization of single block bit manipulation operations by enabling the +b target feature.
The matching patterns are reduced to equivalent single assembly bit manipulation instructions.

This patch is based on Clifford Wolf's proposal of the bit manipulation extension for RISCV:

Diff Detail

Event Timeline

PaoloS created this revision.Sep 9 2019, 4:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2019, 4:34 AM
PaoloS edited the summary of this revision. (Show Details)Sep 9 2019, 4:42 AM
PaoloS updated this revision to Diff 220532.Sep 17 2019, 10:08 AM

Updated tests that failed due to latest commit in LLVM.

PaoloS updated this revision to Diff 220543.Sep 17 2019, 11:10 AM

Removed new lines and trailing spaces.

PaoloS updated this revision to Diff 222842.Oct 2 2019, 8:35 AM

Added matching of LLVM bit manipulation instrinsics like bswap, bitreverse, fshl and fshr to the corresponding asm instructions in the RISC-V bit manipulation ISA extension. Added codegen tests accordingly. Added codegen tests for the intrinsics ctlz, cttz and ctpop, to check that they are matched with the assembly instructions clz, ctz and pcnt.

edward-jones added inline comments.Nov 7 2019, 4:06 AM

Is there a way to use a symbolic value for the CC here instead of the (i32 20) magic number? I notice that other backends appear to use "SETLT", "SETULT" and similar in their DAG patterns.


Checking for the exact codegen for ctlz without the Zbb extension seems like it could be fragile. Is it necessary to test the non-Zbb expansion of ctlz given that it is just the default lowering?

If the test exists to make sure that "clz" isn't generated unless Zbb is present then it might be easier to change this test into a single "RV32I-NOT clz".

PaoloS marked an inline comment as done.Nov 28 2019, 9:26 AM
PaoloS added inline comments.

I thought that too, but if I use the symbolic name the pattern doesn't match. Instead the CC Constant is turned into a TargetConstant and the pattern is matched with a generic select with CC. It seems that at the early stage in which I apply the pattern recognition for min/max/minu/maxu the symbolic names SETLT, SETULT, SETEQ... don't match.

PaoloS updated this revision to Diff 231857.Dec 3 2019, 4:00 AM
PaoloS edited the summary of this revision. (Show Details)

Update codegen pattern matching to RISCV BitManip v0.92:

Add pattern matching for packu, packh, sext.b, sext.h.
Add correspondent tests.
Remove pattern matching for bfp. It is very unlikely to find an exact user implementation of such operation in the source code that can be matched automatically. And it is fragile to maintain.
Revert the lowering of select CC instructions into cmov. cmov would be used for any select CC instruction regardless the condition code and such condition wouldn't be computed.
Changed the tests in a way that they check just whether the bitmanip instruction is generated or not. There's no need for this patch to check the correctness of the selection without the B extension.

simoncook updated this revision to Diff 242091.Feb 3 2020, 8:27 AM
simoncook edited the summary of this revision. (Show Details)

Rebase on other bitmanip patches

simoncook updated this revision to Diff 243326.Feb 7 2020, 5:10 PM

Update Target Features to be of form zb<x> rather than just b<x>.

simoncook updated this revision to Diff 243330.Feb 7 2020, 5:21 PM

Update to keep in sync with feature names in D65649

simoncook updated this revision to Diff 250769.Mar 17 2020, 7:40 AM

Rebase on updated D65649

All looks good apart from some test nitpicks. Also add more reviewers


This comment is not the case for these files anymore right?


I'm not certain what the underscore is for, assuming it's to avoid clashing with LLVM intrinsics? If so shouldn't all LLVM intrinsics which cause a clash have a lowering?

simoncook added inline comments.Mar 19 2020, 4:39 AM

nitpick: can you remove these double spaces after instruction names


Looking at the other RISC-V InstrInfo files, we don't indent here and add comments at the closing block indicating which predicates no longer apply, could you update to be consistent


These are inconsistently rewrapped to avoid hitting 80 columns, can you wrap those that aren't correctly done so

simoncook updated this revision to Diff 254220.Apr 1 2020, 8:54 AM

Rebase on updated MC patch

PaoloS marked 5 inline comments as done.Apr 10 2020, 10:17 AM
PaoloS added inline comments.

That was a very old design choice of mine to be sure to avoid conflicts, but I should have removed it after verifying that it doesn't clash at all. I'll remove the underscore and check again.

PaoloS updated this revision to Diff 256604.Apr 10 2020, 10:17 AM
PaoloS marked an inline comment as done.

Fixed the order of the patterns according to follow the order in the opcode table in the specs.
Same for the tests.
Fixed the flags of the tests.
Removed duplicate patterns.
Added pattern matching for rev8.h from bswap.
Removed underscores from names of the functions in the tests.
Wrapped indentation to fit into 80 columns.

Looking at the structure this is looking good, it's probably worth renaming the codegen tests to match what I did with the MC patch before commiting ('Z'->'z') but other than a few formatting changes, this seems good. I haven't yet read through all the patterns, I'll add more comments as I go through it.


missing a space here after if


missing a space here after if


missing a space here after if


I think the patterns shouldn't be indented after a let, the other backend tablegen files don't indent here, so we should be consistent.

simoncook added inline comments.Apr 13 2020, 7:36 AM

These seem inconsistent as to whether they put GPR:$rs1 or the other argument first.


Isn't this the same pattern as on line 664?


Are these doing the operations in the right order? By my reading the shifts and the and are the wrong way around.

Isn't this the pattern you're trying to match?

(or GPR:$rs1, (or (shl (and GPR:$rs1, (i32 0x55555555), (i32 1))),
                  (srl (and GPR:$rs1, (i32 0xaaaaaaaa), (i32 1)))))

(The same applies to all the GREVI/GORCI instructions)


Can we do anything neater than raw ISD::CondCode values here?


This line and the one below should be split out to a HasStdExtZbb, IsRV32 like the 64-bit variants below


These patterns are the same as the ones in the block above?


I think this might have the same shift/and issue and GREVI/GORCI?

PaoloS marked 16 inline comments as done.Apr 14 2020, 9:12 AM
PaoloS added inline comments.

This feature belongs to the MC patch, but I haven't seen this comment here. I'll fix that for the next commit on the encodings, or hopefully once we'll have the official release 1.00


Fair point, but LLVM seems to prefer to give priority to the shift. Therefore it changes e.g. 0x55555555 to 0xaaaaaaaa and vice versa in order to compensate the fact that the shift happens before.
It does the same operation.
In the case of GORCI there's also an exchange of the order of the two operands of the OR operation. The result of the operation is eventually ORed with rs1 itself and it seems that LLVM prefers to do this OR only with one of the two operands, the first one, that eventually will be ORed anyway with the other.
Here again, the outcome is the same.

Just as an example I implemented a test following the exact C syntax of the spec:

uint32_t _grev1(uint32_t a) {
  return ((a & 0x55555555) << 1) | ((a & 0xaaaaaaaa) >> 1);

And the pattern that is created is the one I described that privileges the shifts.

I also implemented in C the version that does the shifts first:

uint32_t _grev1b(uint32_t a) {
  return ((a << 1) & 0xaaaaaaaa) | ((a >> 1) & 0x55555555);

And the pattern was the same, with the shifts coming first.


If I add for instance the ConCode from the enum, like SETEQ, it breaks.
I could maybe add some macros with similar names, like RVB_SETEQ in this case. But I'm not sure this would be a preferred way to do it. Would it?

Besides if I use the labels SETEQ and similar other riscv codegen tests not related to bitmanip fail.

By having a brief look at other targets like for instance PowerPC I've seen that they use values like SETEQ and others in the pattern matching as I do, but they actually rely on selectcc while RISCV uses its own riscv_selectcc.


Yes same pattern, but LLVM uses a 64 bit value for the CondCode and complains if I don't specify the type in the pattern of riscv_selectcc. for this reason I have to use 2 versions of the same pattern, one with the 32 bit CondCode and the other with the 64 bit one.
If I'm missing something here of course any suggestion is welcome.


It is similar to the case of GREVI/GORCI. In this case though the difference with the description of the operation in the spec:

uint32_t shuffle32_stage(uint32_t src, uint32_t maskL, uint32_t maskR, int N)
  uint32_t x = src & ~(maskL | maskR);
  x |= ((src << N) & maskL) | ((src >> N) & maskR);
  return x;

is that the OR-increment of x is carried out not at the end, but with one of the operands of the OR on the right (the first or the second depending on the parenthesis in the implementation), and always with an equivalent outcome. So instead of having something like this:

src & ~(maskL | maskR);
 x |= ((src << N) & maskL) | ((src >> N) & maskR);

we have this:

uint32_t x = src & ~(maskL | maskR);
x = (x | ((src << N) & maskL)) | ((src >> N) & maskR);

Actually I noticed that what happens is that x is ORed with the second operand of the OR:

uint32_t x = src & ~(maskL | maskR);
x = ((src << N) & maskL) | (x | ((src >> N) & maskR));

but then I verified that of course that doesn't matter as it is commutative and LLVM selects correctly

shfli a0 8

Well, I used the tool for the tests, and I see that many other tests left that message. If it doesn't do any harm...

PaoloS marked 3 inline comments as done.Apr 14 2020, 9:22 AM
simoncook added inline comments.Apr 14 2020, 9:48 AM

Ok, I hadn't noticed that the constants were the other way round to what I'd expect. If this is the canonical pattern SelectionDAG wants to match then indeed, use that pattern.

PaoloS marked 3 inline comments as done.Apr 15 2020, 11:19 AM
PaoloS added inline comments.

Correction, I edited the tests after generating them, so yes you were right. Thank you!

PaoloS updated this revision to Diff 257771.Apr 15 2020, 11:21 AM

Fixed indentation.
Regrouped/rearranged the patterns to follow the order of the instructions in the encoding table in the specs.
Removed autogenerated comments in the tests.

asb added a comment.May 7 2020, 6:08 AM

Hi Paolo. I'm sorry this has been left hanging for some time. On the one hand, with this being an experimental feature and purely additive the bar for merging is slightly lower than e.g. a rewrite of all our existing codegen patterns (which could cause new regressions). On the other hand, this pre-commit review is realistically going to be the time when the codegen patterns and associated tests get most scrutiny and it would be a shame to skip that.

I think this patch is currently of a size where it's quite difficult to review all in one go (I've certainly sat down several times to try to do so, and failed). So I'd like to propose that you split it up so we can incrementally review and merge the bitmanip subsets individually. I think we can do this pretty quickly, e.g. as long as there are no unforeseen issues found I'd imagine we might get through at least one a day over the next week (possibly more).

Does that sound like a reasonable path forwards for you?

PaoloS added a comment.May 7 2020, 6:42 AM

Thanks Alex.
Yes splitting it sounds reasonable. It is indeed quite a big monolithic patch and I struggle to review it myself sometimes.
Splitting it into subsets sounds a practical approach, will do.

As suggested I split the patch by subextension.
You can find in order the pieces that add pattern-matching for zbb, zbp, zbbp, zbs and zbt here:

I'll keep this revision as a reference as long as the pieces are reviewed.
Thank you again for the time and the useful comments.