Page MenuHomePhabricator

[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions

Authored by PaoloS on May 13 2020, 8:29 AM.



This patch provides optimization of bit manipulation operations by enabling
the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the base subset (zbb subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.

This patch is based on Clifford Wolf's proposal for the bit manipulation
extension of RISCV:

Diff Detail

Event Timeline

PaoloS created this revision.May 13 2020, 8:29 AM
asb added a comment.May 28 2020, 7:19 AM

Sorry for the delay on this - the lockdown situation is really hurting my review time, though it looks like my childcare situation will improve from the week after next.

Quite a lot of comments below, but I think this actually is almost ready to go with a few extra patterns or at least reworked test cases. Thanks again for your work.

I've added a few comments inline, beyond that:

  • For the test files, I'd probably just have RV32IB and RV64IB check lines. The RV32I-NOT/RV64I-NOT lines are helpful defence in depth, but they're not very compatible with which we prefer to use whenever possible. This is like the float-*.ll test files. Alternatively we just let update_llc_test_checks generate code for the non-bitmanip targets (more like the mul.ll test).
  • Use to generate and maintain the check lines
  • Missing slliu.w?
  • No immediate variants? e.g. sloi, sroi
  • Missing some W instructions such as SLOW and SROW
  • I'd suggest having one file called something like rvZbb.ll which contains tests that are relevant for both RV32 and RV64. Note this likely does include both i64 and i32 test cases - we want to ensure reasonable codegen for i32 values on RV64 and for i64 values on RV32, in both cases using hardware instructions when possible. If there are tests that really would just be noise for RV32, then put those in rvZ64bb.ll.

I think we do have flexibility to commit something that falls short of handling codegen for all Zbb instructions, but if doing so it would be helpful to note in e.g. the test file (ideally even with tests!) cases that aren't yet handled, so it's easy to return to. If it's easy enough to add the missing cases, that would be preferred.


clz on a zero is a well defined operation that will return XLEN. So shouldn't this just lower to clz and ret?


Same comment as for clz above

PaoloS added a comment.EditedJun 25 2020, 5:04 AM

Sorry for the late answer.
I'm catching up with this now.

I agree on the reorganization of the tests. I'm fixing that.
I notice that the tests of the 64 bit instructions on 32 bit are quite noisy (above all for clz, ctz and pcnt). I'll soon upload a revision so that you can all see.

The immediate shifts with ones instead (sloi, sroi) have the problem that LLVM optimizes them so that instead of having DAG nodes resembling the straightforward operation:


~(~x << shamt)

it prefers to use a mask:

(x << shamt) | (~(-1 << shamt))

That means that in the DAG pattern there's a constant (the right operand of the 'or') with a value that depends on the value of the shamt and that the resulting pattern (sloi/sroi) won't use.
Of course we could just drop it since it isn't used, like this:

def : Pat<(or (shl GPR:$rs1, simm12:$shamt), mask),
        (SLOI GPR:$rs1, simm12:$shamt)>;

But that introduces an ambiguity.
In order to check that the operand is actually the mask derived from the shamt we need to check that it is related to that.
I'm trying now to see if a ComplexPattern can do the trick and select sloi and sroi for me while checking that the mask is correct.
I'm not sure though if it is a neat enough solution for upstream.

About slliu.w instead the issue is quite different. As the documentation says slliu.w is identical to slli apart from the fact that it zeroes the (xlen-1):31 bits before shifting.
LLVM though optimizes out such casting before getting to the instruction selection. In that way it is not possibile to distinguish it from a normal slli. Considering that the result is a single instruction (slli) in any case I think it's just better to leave it like that and let the user use the slliu.w instruction directly if needed.
A similar thing happens with ctzw.
While for clzw and pcntw LLVM doesn't optimize out the truncation as not doing it could actually affect the result, for ctz it doesn't care. I guess that since it's checking the tail zeroes until up bit 31, once it sees that the lower 32 bits of the number are not 0 it processes the original 64 bit value normally.
Otherwise it returns 32.
That makes it unpractical to tell it apart from a rv64 ctz.
The outcome is that for llvm.cttz.i32 on rv64 instead of getting ctzw I get anyway ctz.

PaoloS marked 4 inline comments as done.Jun 25 2020, 5:11 AM
PaoloS added inline comments.

I agree, unfortunately the code gets split into multiple basic blocks before the selection and just the block with the condition a0 != 0 has the ctlz operation in it. Since I can focus on one block per time when pattern matching that's what I could do from the backend.
I based the pattern matching of clz on the llvm instrinc llvm.ctlz.i32 that already relies on its own idiom recognition in the middle end. A solution could be to turn off the intrinsics and try to pattern match it directly from the backend, maybe we could semplify it. But the scope is limited.


Same as above

PaoloS updated this revision to Diff 276249.Jul 7 2020, 4:04 PM
PaoloS marked 2 inline comments as done.

Added missing pattern-matching for *w instructions.
Added codegen tests.
Added ComplexPattern instances that are crucial to pattern-match SLOI, SROI, SLOIW, SROIW and SLLIUW.
Both 32 and 64 bit test files have both 32 and 64 bit test cases of the instructions (were existing).

PaoloS added a comment.Jul 7 2020, 4:12 PM

Just a clarification. I decided to split the tests into 32bit and 64bit because the 32bit code compiled on RV64 commonly produces sign-extended IR and that's when many *w instructions are selected. A version of the tests in a unique file could imply on one hand to have 32 bit IR with sign-extension compiled for RV32 (harmless but redundant), on the other hand we would have i32 code with no explicit sign-extension compiled for RV32. That is correct but it might lead to misleading selections, like pattern-matching the IR code of a 32bit SLOI on RV64 with a RV64 SLOI instead of a SLOIW (the difference is that SLOIW ignores the upper 32 bit of the result while RV64 doesn't).

lewis-revill added inline comments.Jul 9 2020, 6:51 AM

Indentation within these Select functions is messed up, presumably due to a mix of tabs and spaces.


I'm not sure the convention other select functions for W instructions follow but perhaps an assert for IsRV64 should be added for completeness?


Can these W selects be guarded for 64 bit only?

PaoloS marked 2 inline comments as done.Jul 9 2020, 8:38 AM
PaoloS added inline comments.

Yes, I was trying to use spaces only in the end. Must have missed these.


Well, SLOIW exists only on RV64. I could add it, but I think it would be a bit redundant if I guard the selects only for RV64.
But yes, for completeness I probably should.

PaoloS marked an inline comment as done.Jul 9 2020, 7:24 PM
PaoloS added inline comments.

Not sure how to do it, they can't be enclosed in Predicates like the instruction patterns.

PaoloS updated this revision to Diff 276897.Jul 9 2020, 7:33 PM

Fixed indentation.
Added architecture type control for complex pattern matching of sloiw, sroiw and slliuw.

PaoloS updated this revision to Diff 277526.Jul 13 2020, 12:23 PM

Updated the test:

  • the tests have been updated from the top of all the sub-patches together so that they are exactly the same as they would be if updated with the whole final patch.
  • labels specific to the sub-extension have been added alongside the generic RISCVIB label (that activates all the sub-extensions) so that we can see how differently the patterns are matched with the specific subextension or with all of them together.
  • the tests will probably fail if run by checking out the commit of a subextension and if updated they'll change. These tests are designed to work with the final squashed patch.

Corrected the order of the patterns.

PaoloS edited the summary of this revision. (Show Details)Jul 14 2020, 6:13 AM
lewis-revill added inline comments.Jul 14 2020, 8:04 AM

Looks like you fixed this with the operand to ComplexPattern. Only nitpick is to get the : characters aligned vertically here.


Nitpick: For these tests on RV32 where no bitmanip instructions are selected (EG: slo_i64, sro_i64, min_i64 etc.) perhaps it's worth either omitting these, or if the goal is to eventually support them, just add a quick comment?

I noticed the same in the 3rd and 5th patches too, for rol_i64 and fshl_i64.

lewis-revill accepted this revision.Jul 14 2020, 8:09 AM

Thanks Paolo, tests are all passing and apart from the nitpicks this is a green light from me, as with the rest in this series.

This revision is now accepted and ready to land.Jul 14 2020, 8:09 AM
PaoloS marked an inline comment as done.Jul 14 2020, 8:28 AM
PaoloS added inline comments.

I see what you mean. I just like the idea to show consistency with the other tests while showing cases where there's still room for improvement. Much of this codegen pattern-matching work was also to look for cases that could be optimized.
Also things could still change considering that a new subextension is being drafted.
On the other hand I understand that it looks like these tests are wrong I guess since they don't show particular changes.

PaoloS marked 7 inline comments as done.Jul 14 2020, 10:25 AM
PaoloS added inline comments.

On it.


I'm commenting those.

PaoloS marked an inline comment as done.Jul 14 2020, 11:19 AM
PaoloS updated this revision to Diff 277919.Jul 14 2020, 11:37 AM

Aligned the declarations of the complex patterns.
Added comments to inefficient tests.

Thank you very much @lewis-revill, very appreciated.
I haven't got commit access, can you @asb or someone else commit it for me?
Unless of course there's something else that needs immediate correction.

Many thanks all.

Sure I'll land these apologies for the delay..

No worries.
Thank you @lewis-revill

This revision was automatically updated to reflect the committed changes.

It's a shame this just missed the creation of the llvm 11.0 branch, do we think it's worth trying to get this backported since it only just missed?

asb added a comment.Jul 21 2020, 6:51 AM

It's a shame this just missed the creation of the llvm 11.0 branch, do we think it's worth trying to get this backported since it only just missed?

I wouldn't be opposed. As they're all guarded by experimental flags, the risk of issues in cherry-picking these patches is pretty minimal.