The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Time | Test | |
---|---|---|
990 ms | x64 debian > LLVM.CodeGen/AMDGPU::wave32.ll |
Event Timeline
llvm/test/CodeGen/AMDGPU/udiv64.ll | ||
---|---|---|
257–258 | This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later... | |
279–280 | .. and here we recompute v5+v9 but without carry-in from the corresponding low part addition v4+v8, but in the very next instruction we add back in the missing carry! |
This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later...