There are differences in codegen between Linux and Windows due to: 1. Using std::sort which uses quicksort which is a non-stable sort. 2. Iterating over Set data structure where the iteration order is non deterministic.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Here is a snippet of the generated code on Linux and Windows ( I cannot post the entire code since it is proprietary).
On Linux:
4ec: e0882002 add r2, r8, r2 4f0: e0881001 add r1, r8, r1 4f4: f4e0083f vld1.32 {d16[0]}, [r0 :32] 4f8: e5980898 ldr r0, [r8, #2200] ; 0x898 4fc: e2800018 add r0, r0, #24 500: f3c80a30 vmovl.u8 q8, d16 504: f4602a8f vld1.32 {d18-d19}, [r0] 508: f3d00a30 vmovl.u16 q8, d16
On Windows:
4ec: e28e1044 add r1, lr, #68 ; 0x44 4f0: e28de901 add lr, sp, #16384 ; 0x4000 4f4: e0883003 add r3, r8, r3 4f8: e0882002 add r2, r8, r2 4fc: f4e0083f vld1.32 {d16[0]}, [r0 :32] 500: e5980898 ldr r0, [r8, #2200] ; 0x898 504: e2800018 add r0, r0, #24 508: f3c80a30 vmovl.u8 q8, d16
Seems sensible. What do you think about adding a tie-breaking rule in the comparator instead of switching to stable sorts?
struct MemOpInfo { ... bool operator<(const MemOpInfo&RHS) const { return std::tie(BaseReg, Offset, SU->NodeNum) < std::tie(RHS.BaseReg, RHS.Offset, RHS.SU->NodeNum); } }; class FrameRef { bool operator<(const FrameRef &RHS) const { return std::tie(LocalOffset, FrameIndex) < std::tie(RHS.LocalOffset, FrameIndex); } };
Thanks Matthias for your suggestion.
I liked the idea of adding a tie-breaking rule instead of using std::stable_sort (mergesort would use more memory).
I will implement your suggestion and push another patch.
The operator < changes LGTM. Is the RegionInfo change accidental?
include/llvm/Analysis/RegionInfo.h | ||
---|---|---|
682 | This was not part of your first commit, was it? I could not figure out where this typedef is used. In fact when I remove the whole line llvm still builds without a problem for me. Am I missing something here? | |
lib/CodeGen/LocalStackSlotAllocation.cpp | ||
322–324 | This can stay array_pod_sort I presume? |
include/llvm/Analysis/RegionInfo.h | ||
---|---|---|
682 | Yes this was part of my first commit. The RegionSet is used in Polly in a number of places. | |
lib/CodeGen/LocalStackSlotAllocation.cpp | ||
322–324 | No, it seems array_pod_sort uses its own comparator: inline int array_pod_sort_comparator(const void *P1, const void *P2) { if (std::less<T>()(*reinterpret_cast<const T*>(P1), *reinterpret_cast<const T*>(P2))) return -1; if (std::less<T>()(*reinterpret_cast<const T*>(P2), *reinterpret_cast<const T*>(P1))) return 1; return 0; } |
I just checked here and array_pod_sort definitely ends up calling operator < (I put a printf there). Can you check on your side? It would be very troubling if that was not the case (broken std::less implementation?).
I have a patch out for review which fixed this by tracking the program order and using that to tie break. The same frame index wit the same offset can appear multiple times
Some feedback on Polly:
llvm/trunk/include/llvm/Analysis/RegionInfo.h | ||
---|---|---|
682 ↗ | (On Diff #74936) | I just deleted this line and Polly still compiles fine. This seems to be unused to me, so we should probably just drop this entirely. (In general, in Polly we have been careful to not iterate over Maps/Sets. We may still have missed some by accident, tough). |
Matthias,
As Matt says he already has a patch (https://reviews.llvm.org/D25188) which fixes the non-deterministic behavior in LocalStackSlotAllocation.cpp, I can remove it from my patch.
Matt,
What are your thoughts on the tie-breaker for MachineScheduler.cpp. Are you fine with my implementation? Can there be multiple entries with the same BaseReg, Offset and NodeNum? Maybe I can add "Order" to this similar to your patch?
--Mandeep
llvm/trunk/include/llvm/Analysis/RegionInfo.h | ||
---|---|---|
682 ↗ | (On Diff #74936) | Removed here: https://reviews.llvm.org/D25744 |
This was not part of your first commit, was it?
I could not figure out where this typedef is used. In fact when I remove the whole line llvm still builds without a problem for me. Am I missing something here?