This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Simplify 64-bit division/remainder expansion
ClosedPublic

Authored by foad on Nov 11 2021, 7:36 AM.

Details

Summary

The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.

Diff Detail

Unit TestsFailed

Event Timeline

foad created this revision.Nov 11 2021, 7:36 AM
foad requested review of this revision.Nov 11 2021, 7:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2021, 7:36 AM
foad added inline comments.
llvm/test/CodeGen/AMDGPU/udiv64.ll
257–258

This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later...

279–280

.. and here we recompute v5+v9 but without carry-in from the corresponding low part addition v4+v8, but in the very next instruction we add back in the missing carry!

arsenm accepted this revision.Nov 11 2021, 8:04 AM
This revision is now accepted and ready to land.Nov 11 2021, 8:04 AM
This revision was landed with ongoing or failed builds.Nov 12 2021, 7:51 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll