The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
| Time | Test | |
|---|---|---|
| 990 ms | x64 debian > LLVM.CodeGen/AMDGPU::wave32.ll |
Event Timeline
| llvm/test/CodeGen/AMDGPU/udiv64.ll | ||
|---|---|---|
| 257–258 | This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later... | |
| 279–280 | .. and here we recompute v5+v9 but without carry-in from the corresponding low part addition v4+v8, but in the very next instruction we add back in the missing carry! | |
This is probably the clearest place to see the effect of the patch. Here, in the old code, we save the carry-out from one add into s[4:5] in order to use it again 20-odd instructions later...