The motivation for this is due to clang's conformance to
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift
which makes clang emit (<shift> a, (and b, <width> - 1)) for a <shift> b
in OpenCL where a is an int of bit width <width>.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Also should test globalisel
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2520 ↗ | (On Diff #374181) | Should also handle scalar cases. Also consider handling 32-bit and 16-bit shifts |
llvm/test/CodeGen/AMDGPU/shift-opts.ll | ||
13–14 ↗ | (On Diff #374181) | I would prefer to split each case into a separate function. Should also test the scalar cases |
Would it be better to do this as some kind of DAGCombine, so that other patterns involving shifts (like for v_lshl_add etc instructions) can take advantage of it too? I wonder how other targets do this. I see X86 has isUnneededShiftMask() which is used in PatFrags for shift instruction selection.
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2520 ↗ | (On Diff #374181) | And 64-bit. |
A combine would be painful because then we have to have custom nodes with these semantics. IIRC X86ISelDAGToDAG does the same basic thing
Fair enough. How about some handy patfrags that we can use in all isel patterns that involve shifts?
Rebased; Did not address all review comments since I'm not sure about the
best approach.
I'm still confused between the generic fold approach and the PatFrags approach.
I think the PatFrags approach makes sense, but I'm a bit worried about the large
scale substitution we have to do in all the generic shift opcode references in
all the Target/AMDGPU .td files.
As @foad mentioned, it will be neater if we do (<shift> a, (and b, <width> - 1))
-> (<shift> a, b) in AMDGPU specific compilation where the target <shift> is
generic so that the other generic shift based pattern match (like the v_lshl_add
pattern match) will work. The problem here is that we'll be violating the
generic shift's semantics and, as @arsenm mentioned, creating custom nodes will
be painful.
Is there a situation where the aforementioned folding can go wrong in the AMDGPU
compilation? Also, is there a pre-isel, global-isel and selection-dag-isel
compatible, AMDGPU specific, tablegen way of generating (<shift> a, (and b,
<width> - 1)) -> (<shift> a, b)?
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2529 ↗ | (On Diff #375823) | Why can't I see the OPC_MorphNodeTo1 in the MatcherTable array of the generated |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2516 ↗ | (On Diff #375823) | !sub(!shl(1, width), 1), surely? |
I think the PatFrags approach makes sense, but I'm a bit worried about the large
scale substitution we have to do in all the generic shift opcode references in
all the Target/AMDGPU .td files.
I was hoping we could define something like this (I'm not sure of the exact syntax):
def ShiftAmount32 : PatFrags<(ops node:$src), [(i32 $src), (and (i32 $src), 31)]>;
This would match either a raw shift amount or a shift amount ANDed with 31. Then we could change every pattern that currently contains (shl $a, $b) to (shl $a, (ShiftAmount32 $b)) instead. Yes this will touch every pattern that mentions a generic shift opcode, but I think it's the right thing to do.
As @foad mentioned, it will be neater if we do (<shift> a, (and b, <width> - 1))
-> (<shift> a, b) in AMDGPU specific compilation where the target <shift> is
generic so that the other generic shift based pattern match (like the v_lshl_add
pattern match) will work. The problem here is that we'll be violating the
generic shift's semantics and, as @arsenm mentioned, creating custom nodes will
be painful.
I agree with Matt that this was not a good idea.
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2516 ↗ | (On Diff #375823) | This mask is used to perform modulo width using and, i.e. 'RHS % width'. See clang's ScalarExprEmitter::ConstrainShiftValue() which is invoked by shift codegen if (CGF.getLangOpts().OpenCL). |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2516 ↗ | (On Diff #375823) | You are right of course, sorry for the noise. |
LGTM, thanks!
Possible future improvement: apply this to other instructions like s_bfm and v_alignbit, which have an operand that works like a shift amount.
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | ||
---|---|---|
249 | Possible future improvement: instead of ignoring AND with mask, ignore AND with any value that has all those bits set, but possibly more. For example you could ignore AND with 0xFF here. X86 isUnneededShiftMask does this. |
Possible future improvement: instead of ignoring AND with mask, ignore AND with any value that has all those bits set, but possibly more. For example you could ignore AND with 0xFF here. X86 isUnneededShiftMask does this.