This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Custom lowering of i64 umulo/smulo
ClosedPublic

Authored by rampitec on Jun 8 2020, 1:55 PM.

Diff Detail

Event Timeline

rampitec created this revision.Jun 8 2020, 1:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2020, 1:55 PM
arsenm added a comment.Jun 8 2020, 3:12 PM

Can you also add cases with power of 2 constants that the default expansion handles? I assume we miss out on these as-is?

// mulo(X, 1 << S) -> { X << S, (X << S) >> S != X }
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5008

I assume this is extracted from the default expansion?

5011

Shift amount should be i32

llvm/test/CodeGen/AMDGPU/llvm.mulo.ll
4

Can you also add a pair that stress the scalar path and add a gfx9 run line

rampitec updated this revision to Diff 269371.Jun 8 2020, 3:47 PM
rampitec marked 4 inline comments as done.

Addressed comments.

Can you also add cases with power of 2 constants that the default expansion handles? I assume we miss out on these as-is?

// mulo(X, 1 << S) -> { X << S, (X << S) >> S != X }

That is questionably if we are missing something here with umulo, we probably missing quite a bit with smulo. The main difference is the avoidance of 64 bit shifts we do after such lowering.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5008

Right. A little simplified for what is legal for us, otherwise it is a default implementation.

rampitec updated this revision to Diff 269373.Jun 8 2020, 4:00 PM

Copied power of two optimization as well.

arsenm added a comment.Jun 8 2020, 5:41 PM

Missing test for smulo case?

Missing test for smulo case?

Which one? smulo_i64_v_4? It is there. I thought all the tests are quite simmetrical.

arsenm accepted this revision.Jun 8 2020, 6:23 PM
This revision is now accepted and ready to land.Jun 8 2020, 6:23 PM
This revision was automatically updated to reflect the committed changes.