Now the backend promotes mask vector to an i8 vector and insert element to that. We could bitcast to a widen element vector, and extract from it to GPR, then use I instruction to set the certain bit, and insert back to the widen element vector.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll | ||
---|---|---|
109–118 | The upper xlen-1 bits of a0 have an unknown value. The value of %elt is only in the lower bit. You need to mask off the other bits. A mask would only not be needed if the i1 was passed with zeroext attribute. |
Address comment.
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-i1.ll | ||
---|---|---|
109–118 | Done, thanks. |
I may have an alternate suggestion for the bit manipulation
%old_bit = srl %wideelt, %bitindex shift old_bit to bit 0
%b = xor %old_bit, new_bit xor the old and new values together
%c = andi %b, 1 clear all bits except bit 0. This will clear all the extra bits from the shift and remove any extra bits that came with new_bit.
%shl = shl %c, %bitindex shift the xored value back to the correction position
%xor = xor %wideelt, shl // at bitindex this will be old_bit^(old_bit^new_bit) which cancels to new_bit. Every other index will xor with 0.
This might be a shorter sequence, but I'm not sure.
I'm sorry it took me so long to get round to this! I think it LGTM now other than questions/suggestions I've left. @craig.topper are you happy with it?
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
4283–4336 | It's a bit of a shame we have to indent twice. We could do if (VecVT.isFixedLengthVector() && VecVT.getVectorNumElements() >= 8) which might be clearer for readers, even if we later have to take NumElts later. Just a style suggestion though. | |
4285 | I think maybe isPowerOf2_32(NumElts) would be most correct here? I realise there's an assertion in one case below but I'm worried that something experimenting with differently-sized vectors may not hit this and get incorrect code. |
Something worth thinking about is that this may not be a good sequence for a core with decoupled scalar and vector units. This introduces a copy from vector to scalar which prevents the scalar core from running ahead of the vector unit in such a design.
I don't think we should land this patch. It involves moving a value from vector to scalar and then back again. The vector to scalar domain crossing is likely to be expensive on at least some real hardware.
We can probably do better by forming the bitmask on the vector side entirely. Given an index in a scalar register, we can produce a single bit mask by comparing a vid vector against that index. With that, we can construct a vmerge with a scalar operand to set or clear the desired bit in the mask.
I don't understand your last sentence well, I think that vmerge could not be used for mask type, so how to only merge the desired bit?
You're right on the vmerge point. I made this same mistake in another review, caught it, and then didn't realize I'd already posted this comment. Sorry for the confusion.
However, you can replace the vmerge in the above with vmandn and a vmor.
So, code sequence looks something like:
v2 = incoming masking x1 = incoming index x2 = incoming bool (elt) v1 = vid v0 = vseteq v1, <idx> v2 = vmandn v2, v0 // original mask with lane cleared v3 = vmv.v.x x2 v3 = vseteq v3, 1 v3 = vmand v3, v0 v2 = vmor v2, v3
OR
x1 = incoming index x2 = incoming bool (elt) v1 = vid v0 = vseteq v1, <idx> v2 = vmandn v2, v0 // original mask with lane cleared bneqz x2, skip v0 = vxor v3, v3 skip: v2 = vmor v2, v0
I'm not claiming this is optimal codegen; there may be better. This is just what occurs to me with a bit of thought.
I think maybe isPowerOf2_32(NumElts) would be most correct here? I realise there's an assertion in one case below but I'm worried that something experimenting with differently-sized vectors may not hit this and get incorrect code.