Page MenuHomePhabricator

Fix differences in codegen between Linux and Windows toolchains
ClosedPublic

Authored by mgrang on Oct 17 2016, 1:31 PM.

Details

Summary
There are differences in codegen between Linux and Windows due to:
1. Using std::sort which uses quicksort which is a non-stable sort.

2. Iterating over Set data structure where the iteration order is
   non deterministic.

Diff Detail

Repository
rL LLVM

Event Timeline

mgrang updated this revision to Diff 74893.Oct 17 2016, 1:31 PM
mgrang retitled this revision from to Fix differences in codegen between Linux and Windows toolchains.
mgrang updated this object.
mgrang added reviewers: arsenm, grosbach, junbuml, zinob.
mgrang set the repository for this revision to rL LLVM.

Here is a snippet of the generated code on Linux and Windows ( I cannot post the entire code since it is proprietary).

On Linux:

4ec:	e0882002 	add	r2, r8, r2
4f0:	e0881001 	add	r1, r8, r1
4f4:	f4e0083f 	vld1.32	{d16[0]}, [r0 :32]
4f8:	e5980898 	ldr	r0, [r8, #2200]	; 0x898
4fc:	e2800018 	add	r0, r0, #24
500:	f3c80a30 	vmovl.u8	q8, d16
504:	f4602a8f 	vld1.32	{d18-d19}, [r0]
508:	f3d00a30 	vmovl.u16	q8, d16

On Windows:

4ec:	e28e1044 	add	r1, lr, #68	; 0x44
4f0:	e28de901 	add	lr, sp, #16384	; 0x4000
4f4:	e0883003 	add	r3, r8, r3
4f8:	e0882002 	add	r2, r8, r2
4fc:	f4e0083f 	vld1.32	{d16[0]}, [r0 :32]
500:	e5980898 	ldr	r0, [r8, #2200]	; 0x898
504:	e2800018 	add	r0, r0, #24
508:	f3c80a30 	vmovl.u8	q8, d16

Seems sensible. What do you think about adding a tie-breaking rule in the comparator instead of switching to stable sorts?

struct MemOpInfo { ...
   bool operator<(const MemOpInfo&RHS) const {
     return std::tie(BaseReg, Offset, SU->NodeNum)
       < std::tie(RHS.BaseReg, RHS.Offset, RHS.SU->NodeNum);
   }
};

class FrameRef {
   bool operator<(const FrameRef &RHS) const {
     return std::tie(LocalOffset, FrameIndex) < std::tie(RHS.LocalOffset, FrameIndex);
   }
};

Thanks Matthias for your suggestion.
I liked the idea of adding a tie-breaking rule instead of using std::stable_sort (mergesort would use more memory).
I will implement your suggestion and push another patch.

mgrang updated this revision to Diff 74933.Oct 17 2016, 4:45 PM
mgrang updated this object.
MatzeB accepted this revision.Oct 17 2016, 5:04 PM
MatzeB added a reviewer: MatzeB.

The operator < changes LGTM. Is the RegionInfo change accidental?

include/llvm/Analysis/RegionInfo.h
682 ↗(On Diff #74933)

This was not part of your first commit, was it?

I could not figure out where this typedef is used. In fact when I remove the whole line llvm still builds without a problem for me. Am I missing something here?

lib/CodeGen/LocalStackSlotAllocation.cpp
322–324 ↗(On Diff #74933)

This can stay array_pod_sort I presume?

This revision is now accepted and ready to land.Oct 17 2016, 5:04 PM
mgrang added inline comments.Oct 17 2016, 5:12 PM
include/llvm/Analysis/RegionInfo.h
682 ↗(On Diff #74933)

Yes this was part of my first commit. The RegionSet is used in Polly in a number of places.

lib/CodeGen/LocalStackSlotAllocation.cpp
322–324 ↗(On Diff #74933)

No, it seems array_pod_sort uses its own comparator:

inline int array_pod_sort_comparator(const void *P1, const void *P2) {

if (std::less<T>()(*reinterpret_cast<const T*>(P1),
                   *reinterpret_cast<const T*>(P2)))
  return -1;
if (std::less<T>()(*reinterpret_cast<const T*>(P2),
                   *reinterpret_cast<const T*>(P1)))
  return 1;
return 0;

}

This revision was automatically updated to reflect the committed changes.
MatzeB added inline comments.
include/llvm/Analysis/RegionInfo.h
682 ↗(On Diff #74933)

Probably best to get a comment from someone familiar with polly for this part then.

lib/CodeGen/LocalStackSlotAllocation.cpp
322–324 ↗(On Diff #74933)

The default implementation of std::less<T> uses operator < so that should work, does it fail for you?

Yes, if I use array_pod_sort I still see the diffs in codegen in Linux vs Windows.

Yes, if I use array_pod_sort I still see the diffs in codegen in Linux vs Windows.

I just checked here and array_pod_sort definitely ends up calling operator < (I put a printf there). Can you check on your side? It would be very troubling if that was not the case (broken std::less implementation?).

arsenm edited edge metadata.Oct 17 2016, 7:26 PM

I have a patch out for review which fixed this by tracking the program order and using that to tie break. The same frame index wit the same offset can appear multiple times

grosser edited edge metadata.Oct 17 2016, 9:52 PM

Some feedback on Polly:

llvm/trunk/include/llvm/Analysis/RegionInfo.h
682

I just deleted this line and Polly still compiles fine. This seems to be unused to me, so we should probably just drop this entirely.

(In general, in Polly we have been careful to not iterate over Maps/Sets. We may still have missed some by accident, tough).

mgrang added a comment.EditedOct 18 2016, 11:06 AM

Matthias,

As Matt says he already has a patch (https://reviews.llvm.org/D25188) which fixes the non-deterministic behavior in LocalStackSlotAllocation.cpp, I can remove it from my patch.

Matt,
What are your thoughts on the tie-breaker for MachineScheduler.cpp. Are you fine with my implementation? Can there be multiple entries with the same BaseReg, Offset and NodeNum? Maybe I can add "Order" to this similar to your patch?

--Mandeep

mgrang added inline comments.Oct 18 2016, 12:58 PM
llvm/trunk/include/llvm/Analysis/RegionInfo.h
682