Details
- Reviewers
arsenm - Commits
- rG295d1fe7333c: [AMDGPU] Custom lowering of i64 umulo/smulo
Diff Detail
Event Timeline
Can you also add cases with power of 2 constants that the default expansion handles? I assume we miss out on these as-is?
// mulo(X, 1 << S) -> { X << S, (X << S) >> S != X }
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5008 | I assume this is extracted from the default expansion? | |
5011 | Shift amount should be i32 | |
llvm/test/CodeGen/AMDGPU/llvm.mulo.ll | ||
4 | Can you also add a pair that stress the scalar path and add a gfx9 run line |
That is questionably if we are missing something here with umulo, we probably missing quite a bit with smulo. The main difference is the avoidance of 64 bit shifts we do after such lowering.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5008 | Right. A little simplified for what is legal for us, otherwise it is a default implementation. |
Which one? smulo_i64_v_4? It is there. I thought all the tests are quite simmetrical.
I assume this is extracted from the default expansion?