This is an archive of the discontinued LLVM Phabricator instance.

Use sorting instead of a lot of stable_partitions for `ICF::segregate` function
Needs ReviewPublic

Authored by alex.telishev on Aug 21 2017, 8:23 AM.

Details

Reviewers
zturner
ruiu
rnk
Summary

ICF::segregate function groups together equal chunks to merge them afterwards, and it's much faster faster to do it by sorting instead of using stable_partitions in a loop.

Diff Detail

Event Timeline

alex.telishev created this revision.Aug 21 2017, 8:23 AM
zturner requested changes to this revision.Aug 21 2017, 9:17 AM
zturner added reviewers: ruiu, rnk.

+ruiu and +rnk for the substance of the patch. Some style issues inline.

lld/COFF/ICF.cpp
106

We need to use LLVM naming conventions here, which is that all variables are CamelCase.

137–139

Same here regarding naming conventions. Also global functions should be marked static so they don't get external linkage.

153–154

This patch will need to be be clang-formatted. If clang\tools\clang-format\git-clang-format is in your PATH, then you can just run git clang-format from the git source root and it will will format the diff.

176

This is only ever called as lexicographical_compare_helper(X, X->Relocs.begin(), X->Relocs.end(), Y, Y->Relocs.begin(), Y->Relocs.end()) where X and Y are of type const SectionChunk *. Can you just make this function take const SectionChunk *A, const SectionChunk *B and have the function get the two iterators out?

This revision now requires changes to proceed.Aug 21 2017, 9:17 AM
alex.telishev edited edge metadata.

Fix style

ruiu added inline comments.Aug 22 2017, 11:03 AM
lld/COFF/ICF.cpp
103

This comment is not useful because it poses a question to a reader instead of answering a question for a reader.

You cannot actually use std::sort because we want a deterministic output in general. For a set of input files, there are a lot of valid link results for it, but we always want to output the same file, as it is important for build reproducibility.

106

auto -> size_t

Basically, you want to update the patch so that your new code looks like the existing code in this file. At the moment, your code looks different because of the usage of auto, coding style, variable naming style, parentheses, use of cast to void, etc.

163–171

Did you have to replace & with this?

168–169

It seems too smart. Dumb but straightforward code would probably be more readable.

if (R1.Type != R2.Type)
  return R1.Type < R2.Type;
if (R1.VirtualAddress != R2.VirtualAddress)
  return R1.VirtualAddress < R2.VirtualAddress;
174–175

I don't see a need to create tuples. I believe you could do

return D1->getValue() < D2->getValue ||
       D1->getChunk()->Class[Cnt % 2]) < D2->getChunk()->Class[Cnt % 2];
215–221

You can capture A and B, then you don't need to pass A and B to your lexicographical_compare_helper, no? Then, you can use std::lexicographical_compare instead of your own function.

Fix style

lld/COFF/ICF.cpp
174–175

This is not correct - the right way to compare is

return     (D1->getValue() < D2->getVale ())
             || ((D1->getValue() == D2->getValue ())
                  &&  (D1->getChunk()->Class[Cnt % 2]) < D2->getChunk()->Class[Cnt % 2]));

It's very easy to make this mistake. Because of it I always write comparisons using std::tie or std::make_tuple, but if it doesn't fit the code style, I'll rewrite it.

215–221

No, you can't capture A and B here, because you don't know which SectionChunk corresponds to left and right parts of Less function (if you look into source of std::lexicographical_compare, Comp function is called twice with different parameter order). Initially I did it the way you propose and got a lot of segfaults.

zturner added inline comments.Aug 22 2017, 1:10 PM
lld/COFF/ICF.cpp
103

Using std::sort should not affect determinism right? If you call std::sort two times with an identical input data set, you are guaranteed to get the same results. The only thing stable_sort does is preserve relative ordering. Even though std::sort doesn't preserve that ordering, ti still produces deterministic output.

ruiu edited edge metadata.Aug 22 2017, 1:35 PM

Can you upload a patch between SVN head and your patch instead of one between your last patch and your local head? On reviews.llvm.org, I'm seeing the diffs between your patches instead of the main repository and your patch, which is making code review hard.

lld/COFF/ICF.cpp
103

We want to guarantee determinism even across machines. If you are using two hosts whose std::sort are not the same, your outputs are not guaranteed to be the same, no?

Merge all patches into one

ruiu added inline comments.Aug 22 2017, 1:55 PM
lld/COFF/ICF.cpp
90–91

Could you keep this and the following comment in your new code somehow?

139–140

One definition on each line and do not use auto for consistency.

174–175

Please use clang-format, or if you do that by hand, format it as clang-format would do.

226

I think this comparison breaks determinism because it compares pointer values that change on every process invocation. You may be able to just return false?

alex.telishev added inline comments.Aug 22 2017, 2:09 PM
lld/COFF/ICF.cpp
226

Unfortunately, no - that would break strict ordering (A < B == false and B < A == false, like NaN) and stable_sort will break in this case. It's possible to return true and make them all be equal to each other for the purposes of sorting. But in this case loop after stable_sort in segregate function needs to use different comparison. It's possible to use equalsConstant from before this patch (as we only need equality there), but it'll be a lot of copy-paste.

alex.telishev added inline comments.Aug 22 2017, 2:16 PM
lld/COFF/ICF.cpp
226

Sorry, it's exact opposite to what I said, stable_sort will break if return true is used, and on return false; everything works, but we need to change comparison in segregate.

ruiu added inline comments.Aug 22 2017, 2:16 PM
lld/COFF/ICF.cpp
226

I'm not sure if I understand your point correctly. If a comparison function returns false for both (A, B) and (B, A), then it means that A == B, and std::stable_sort sort them accordingly (i.e. do not alter relative order of A and B), no?

On the other hand, I believe returning true for both (A, B) and (B, A) breaks strict weak ordering, as A<B && B<A cannot be true.

Fix non-deterministic behavior

ruiu added inline comments.Aug 22 2017, 3:06 PM
lld/COFF/ICF.cpp
265

It looks like this lexicographicalCompareRelocs function doesn't help code readability that much. You might want to just do things in-place instead of passing a callback function to another function. I'd do something like this:

for (size_t I = 0; I != A->NumRelocs; ++I) {
  const coff_relocation *R1 = A->Relocs.begin() + I;
  const coff_relocation *R2 = B->Relocs.begin() + I;

  if (R1.Type != R2.Type)
    return R1.Type < R2.Type;
  if (R1.VirtualAddress != R2.VirtualAddress)
    return R1.VirtualAddress < R2.VirtualAddress;

  SymbolBody *B1 = A->File->getSymbolBody(R1.SymbolTableIndex);
  SymbolBody *B2 = B->File->getSymbolBody(R2.SymbolTableIndex);
  if (B1 == B2)
    continue;

  auto *D1 = dyn_cast<DefinedRegular>(B1);
  auto *D2 = dyn_cast<DefinedRegular>(B2);
  if (!D1 || D2)
    continue;
  if (D1->getValue() != D2->getValue())
    return D1->getValue() < D2->getValue();

  uint32_t C1 = D1->getChunk()->Class[Cnt % 2];
  uint32_t C2 = D2->getChunk()->Class[Cnt % 2];
  if (C1 != C2)
    return C1 < C2;
}
return false;

Note that when the control reaches this loop, A->NumReloc == B->NumReloc is guaranteed because we do have a check for that at the beginning of this function.

273

Ditto

apply clang-format

ruiu added inline comments.Aug 22 2017, 3:26 PM
lld/COFF/ICF.cpp
93

No need to add a blank line at beginning of a function.

109

80 cols.

112

Since we only maintain beginning of the current group, drop Cur.

113

Ditto -- CurPos -> Pos

230–233

This is probably inefficient because it may scan the same vector twice, once for operator!= and once for std::lexicographical_compare. One way of avoiding it is this.

if (int X =
        toStringRef(A->getContents()).compare(toStringRef(B->getContents())))
  return X;

It is unfortuante that ArrayRef doesn't have compare.

Replace lexicographical_compare stuff to manually written loops

ruiu added inline comments.Aug 22 2017, 4:22 PM
lld/COFF/ICF.cpp
111

I believe you can use Less instead of Equal and remove equals{Constant,Variables} functions. What am I missing?

114

It is no longer curPro

230–233

Looks like

if (int X =
        toStringRef(A->getContents()).compare(toStringRef(B->getContents())))
  return X;

is better as it is just three lines.

262–265

I think this comment makes sense only when you know that you previously attempted D1<D2. Since it's gone from the code, I'd remove this comment.

alex.telishev added inline comments.Aug 23 2017, 1:01 AM
lld/COFF/ICF.cpp
111

We changed Less{Constant,Variables} functions so now every non-DefinedRegular relocations are equal to each other. As the result they will all receive the same Class if we use Less function in the loop.

226

Yes, you're right. But comparison after that still needs to be changed. Are you ok with using old version that do only equality comparison for this case?

265

Agreed, that way it even will be faster, because no need to call comparison function twice.

Fixed some style and a bug in lessVariable function

I wonder why COMDAT folding is performed until convergence, because msvc linker only do it 2 times by default (https://msdn.microsoft.com/en-us/library/bxwfs976.aspx). In my case it's performed 25 times, which is a bit too much I think.

ruiu added a comment.Aug 23 2017, 8:13 AM

As to the number of iterations, MSVC's and our algorithms are different, so it doesn't make much sense to compare the raw numbers. It's okay as long as our algorithm is faster and more accurate than the MSVC's.

lld/COFF/ICF.cpp
111

Can you make a change to less functions so that you can remove equals functions somehow?

alex.telishev added inline comments.Aug 23 2017, 9:57 AM
lld/COFF/ICF.cpp
111

less function treats all non-DefinedRegular relocations as equal, if they have the same Type and VirtualAddress. If we use the same function instead of equals, such COMDATS will be merged together. Are we allowed to do it?

alex.telishev added inline comments.Aug 23 2017, 10:10 AM
lld/COFF/ICF.cpp
111

Maybe we can compare them by name?

Use one compare function instead of equal and less.

ruiu added a comment.Aug 23 2017, 1:48 PM

less function treats all non-DefinedRegular relocations as equal, if they have the same Type and VirtualAddress. If we use the same function instead of equals, such COMDATS will be merged together. Are we allowed to do it?

I may be missing something, but why do we now have to handle relocations pointing to non-DefinedRegular symbols differently? I do not fully understand the new code, but such relocations have been handled correctly in the original code. As far as I understand this patch, this is purely a performance improvement change by using std::stable_sort instead of std::stable_partition, and I don't understand as to why a new issue wrt comparing two relocations arises with it.

alex.telishev added a comment.EditedAug 23 2017, 2:05 PM

The problem is that stable_partition only used operator==, but stable_sort needs to use operator<. The difference is that comparison of pointers is not deterministic (and equality is), and for non-DefinedRegular symbols we don't have anything to compare except for pointers. Without the need for deterministic build everything were ok, but now we have only two options - to provide some stable means for comparison (it can be done by adding new member to SymbolBody, or creating a map from SymbolBody * to something) or to treat all DefinedRegular symbols as equal. But if we treat them as equal, some non-identical COMDATS will be folded. Because of it we need to make additional checks after sorting to make sure we don't merge non-identical COMDATS. I could try to do the first approach. I guess adding new member to a SymbolBody is not very good for performance, so only aproach with a map can be used.

Another way to fix everything is to find something that we can reliably compare SymbolBody with each other. Do I understand correctly that each SymbolBody that could be in ICF is actyally Defined and each one of them returns different value for a getRVA function? If it's correct I can just use it for comparison and this strange NonRegularChunksAreEqual logic will go away

Ok, everything become much simpler after using map from non-DefinedRegular symbol pointers to some increasing indexes to compare them instead of pointers. Map is usually very small, so practically no efficiency loss here.

ruiu added a comment.Aug 23 2017, 3:57 PM

I took time to think about the algorithm and experimented your patch locally. First thing I notice is that this patch actually significantly improves the performance of ICF. Excellent job!

So, what I found is, I believe we can compare pointer values as you originally did, and that should produce fully reproducible outputs. That's because relative order in Chunks vector doesn't affect the result of our ICF algorithm. To see that, assume equivalence classes A, B, C and D. Also assume two Chunk vectors X and Y where X and Y contain sections whose equivalence classes are

A, A, B, C, C, D, D, D

and

B, A, A, D, D, D, C, C

respectively.

X will be processed in A, B, C, D order, and Y will be processed in B, A, D, C order. However, that order doesn't really matter because our algorithm only care about relative order in the same equivalence class.

Do you think my analysis is correct?

I guess you're right, chunks in the same equivalence class should remain in the same order.

Revert to comparing pointers

ruiu added inline comments.Aug 23 2017, 4:28 PM
lld/COFF/ICF.cpp
91–93

I don't think this extra check is needed.

104–122

I wonder this code is correct. If you directly translate the original code to use Less instead of Equals, it would be something like this:

while (Begin < End) {
  // Find a sequence of elements that belong to the same equivalence class.
  size_t I = Begin + 1;
  while (I < End && !Less(Chunks[Begin], Chunks[I]))
    ++I;

  // Split [Begin, End) into [Begin, I) and [I, End). We use I as an
  // equivalence class ID because every group ends with a unique index.
  for (size_t J = Begin; J < I; ++J)
    Chunks[J]->Class[(Cnt + 1) % 2] = I;

  // If we created a group, we need to iterate the main loop again.
  if (I != End)
    Repeat = true;
  Begin = I;
}

I think this is a bit more readable.

192

return false;

233–234

For consistency, I'd assign them C1 and C2 just like above.

237

return false;

Fix style.
I have a question - why opt:icf is the default option? msvc do it only for release builds and I think they're right about it - opt:icf make debugging very hard as different functions that have the same body will be merged. I can send a patch with a fix for it.

ruiu added inline comments.Aug 24 2017, 8:12 AM
lld/COFF/ICF.cpp
192

You missed this one.

214–222

I believe you don't need this because sections with different number of relocations are already bin into different equivalence classes by lessConstant.

ruiu added a comment.Aug 24 2017, 12:47 PM

As to why lld doesn't disable /icf even if /debug is given, it is just a bug which needs fixing. Thank you for pointing it out.

ruiu added a comment.Aug 24 2017, 1:25 PM

Code looks good to me. Could you update the commit message?

alex.telishev edited the summary of this revision. (Show Details)Aug 24 2017, 1:52 PM
ruiu added a comment.Aug 24 2017, 3:57 PM

I tried this patch locally again to self-link lld. Turned out that the output is not deterministic. For some reason, lld with this patch generates slightly different executable once in a while. There must be a logical bug in the code, but I don't know where it comes from. Can you take a look?

lld/COFF/ICF.cpp
98

Put a full stop at the end of the sentence.

ruiu added a comment.Aug 24 2017, 4:20 PM

So, I thought about the algorithm again for a while, and I believe I found a flaw in using sort in ICF.

To see that, assume chunks A, B, C, D where A = C but all the others are distinctive. In order to use sort, you have to define a < relation for the chunks, which at least satisfy A < C == false and C < A == false. For other chunks, < relation can be arbitrary as long as it is transitive and asymmetric.

Consider a relation A<B<C<D. This is not transitive because although A=C, both A<B and B<C are true. So, this < relation is not OK.

Thus, a valid < relation must respect chunk equality. For example, a relation B<A<C<D is valid. In other words, a valid < must know A=C before doing anything. But this is not doable because, if we do know that before defining a < relation, we could just simply use that fact to find all chunks that are equal to a given chunk. That means sort is not usable in ICF.

What do you think?

ruiu added a comment.Aug 24 2017, 4:22 PM

Thus, a valid < relation must respect chunk equality. For example, a relation B<A<C<D is valid. In other words, a valid < must know A=C before doing anything. But this is not doable because, if we do know that before defining a < relation, we could just simply use that fact to find all chunks that are equal to a given chunk. That means sort is not usable in ICF.

Correction: For example, a relation B<A=C<D is valid.

How exactly have you tested for non-determinism? I've tried to do that by comparing output binaries for my project, but they were different every time even before this patch. Could you try testing for determinism with and without this patch?

But this is not doable because, if we do know that before defining a < relation, we could just simply use that fact to find all chunks that are equal to a given chunk.

I don't get this part. Consider, for example, that chunks were just integer values. You're saying that we cannot use sort to segregate them. But it's obviously wrong. Similarly, everything works if chunks are collections of ints, and we're comparing them in lexicographical order. Note that chunk is basically a collection of ints, strings etc. The only hard thing is that dyn_cast, but as we first compare by type, if we get to there, it should either both fail, or both success. What I'm saying is that we know which chunks are equal when writing that comparison function, and can write correct comparison function.

How exactly have you tested for non-determinism? I've tried to do that by comparing output binaries for my project, but they were different every time even before this patch. Could you try testing for determinism with and without this patch?

Are you testing with /DEBUG or without? With /DEBUG we have to generate a random GUID into the executable for the PDB when doing a fresh build, but without /DEBUG I believe they should still be identical. Eventually we will need a way to create reproducible builds even with /DEBUG (for example by putting a placeholder GUID like {00000000-0000-0000-0000-000000000000} behind an option).

Try without /DEBUG and see.

alex.telishev added a comment.EditedAug 25 2017, 11:56 AM

I made some tests and found that without /DEBUG and with /opt:icf versions with and without this patch both produce different .exe once in a while (1 out of 5 approximately). With /opt:noref or /opt:noicf build seems to be reproducible. So the problem is definitely somewhere in the icf code, but it was not introduced in this patch.

ruiu added a comment.Aug 25 2017, 12:08 PM

I don't get this part. Consider, for example, that chunks were just integer values. You're saying that we cannot use sort to segregate them. But it's obviously wrong. Similarly, everything works if chunks are collections of ints, and we're comparing them in lexicographical order. Note that chunk is basically a collection of ints, strings etc. The only hard thing is that dyn_cast, but as we first compare by type, if we get to there, it should either both fail, or both success. What I'm saying is that we know which chunks are equal when writing that comparison function, and can write correct comparison function.

Okay, let's consider chunks [c1, c2, ..., c_n] as integers [1, 2, ..., n]. If we consider their addresses as integers, that makes sense. Now, assume that c2 equals to c3 in terms of ICF, but no other chunks equal to any other chunks.

What Less(c1, c2) and Less(c3, c2) should return? They should be [true, true] or [false, false]. [true, false] is not transitive and that breaks the rule that a comparison function for sort should satisfy. But, how do you return that consistent boolean values?

ruiu added a comment.Aug 25 2017, 12:10 PM

Now, assume that c2 equals to c3 in terms of ICF, but no other chunks equal to any other chunks.

Typo: assume that c2 equals to c3 in terms of ICF
-> assume that c1 equals to c3 in terms of ICF

ruiu added a comment.Aug 25 2017, 12:13 PM

Are you testing with /DEBUG or without? With /DEBUG we have to generate a random GUID into the executable for the PDB when doing a fresh build, but without /DEBUG I believe they should still be identical. Eventually we will need a way to create reproducible builds even with /DEBUG (for example by putting a placeholder GUID like {00000000-0000-0000-0000-000000000000} behind an option).

Results were different in size, so I think that's not the case.

Okay, let's consider chunks [c1, c2, ..., c_n] as integers [1, 2, ..., n]. If we consider their addresses as integers, that makes sense. Now, assume that c2 equals to c3 in terms of ICF, but no other chunks equal to any other chunks.

I'm not saying we should treat their addresses as integers, it's of course wrong. I mean that ICF equality function compares a bunch of values (basically integer) inside of two chunks. consider that chunk C_i is composed of integers [x_i1, xi2, ..., x_ik] and ICF comparison function is C_m = C_n iff x_m1== x_n1 && x_m2== x_n2, && ... && x_mk== x_nk. Then we can choose comparison function as C_m < C_n iff [x_m1, xm2, ..., x_mk] < [x_n1, xn2, ..., x_nk] where compasion is done lexicographically.

Are you testing with /DEBUG or without?

Last test was without /DEBUG and both version sometimes produced different results (not is size though)

ruiu added a comment.Aug 25 2017, 1:33 PM

I believe you missed a point that we use symbols differently in comparison if all the others are the same. In this case, the address. Let me describe it more concretely.

Let's say we have three chunks, C1, C2 and C3 and symbols B1, B2 and B3. B1 and B3 are DefinedRegular, but B2 is not. Their addresses are B1<B2<B3. Each chunk has only one relocation, and C1 has B1 as a relocation target, C2 has B2 as a relocation target, and so on. All other chunk attributes (size, contents, alignment, etc.) are all the same.

Now consider lessVariable(C1, C2). At line 191, lessVariable returns true because B1<B2. How about lessVariable(C2, C3)? At line 191, lessVaraible returns true because B2<B3. At this point, it concludes that C1<C2<C3. But that's wrong because both lessVaraible(C1, C3) and lessVariable(C3, C1) return false. That's what I was saying that the comparison function is not transitive.

ruiu added a comment.Aug 25 2017, 1:36 PM

[Re-sending as Phabricator automatically replaced B<number> with a URL.]

I believe you missed a point that we use symbols differently in comparison if all the others are the same. In this case, the address. Let me describe it more concretely.

Let's say we have three chunks, C1, C2 and C3 and symbols S1, S2 and S3. S1 and S3 are DefinedRegular, but S2 is not. Their addresses are S1<S2<S3. Each chunk has only one relocation, and C1 has S1 as a relocation target, C2 has S2 as a relocation target, and so on. All other chunk attributes (size, contents, alignment, etc.) are all the same.

Now consider lessVariable(C1, C2). At line 191, lessVariable returns true because S1<S2. How about lessVariable(C2, C3)? At line 191, lessVaraible returns true because S2<S3. At this point, it concludes that C1<C2<C3. But that's wrong because both lessVaraible(C1, C3) and lessVariable(C3, C1) return false. That's what I was saying that the comparison function is not transitive.

Now consider lessVariable(C1, C2). At line 191, lessVariable returns true because S1<S2. How about lessVariable(C2, C3)? At line 191, lessVaraible returns true because S2<S3. At this point, it concludes that C1<C2<C3. But that's wrong because both lessVaraible(C1, C3) and lessVariable(C3, C1) return false. That's what I was saying that the comparison function is not transitive.

I thought about that, this problem is taken care of because of type comparion - lessVariable(C1, C2) will return false at line 141. Actually, at line 191 we can only be both nullptr or both not.

Sorry, line 147.

ruiu added a comment.Aug 25 2017, 2:37 PM

You may be right, but I'm still trying to figure out the source of nondeterminism. I disabled /DEBUG and ran the same command with and without this patch, and I'm still seeing that only the case with this patch produces nondeterministic outputs (outputs are different in size).

If I replaced parallel::par with parallel:seq in this file, it became deterministic, so there must be a threading issue, but I can't figure out the cause. It is pretty tricky.

Intel Inspector indeed found a race here between findBoundary and stable_sort. findBoundary checks for if (Chunks[Begin]->Class[Cnt % 2] != Chunks[I]->Class[Cnt % 2]) and Chunks[I] or Chunks[Begin] can be located in the other thread equality range and is modified by stable_sort (and stable_partition too, btw). But I'm not sure if it really can affect results because all alements in the next range should have different Class from previous chunk.

Fix data race

ruiu added a comment.Aug 25 2017, 5:02 PM

Did you succeed to reproduce the nondeterminsm issue, and does this patch fix it?

Is lld with this patch still faster than lld without this patch?

When I build clang with this patch, the resulting binary size is 151,181,312 byte. Without this patch, it is 151,170,560 bytes. So, this patch produces slightly larger executable. Could you take a look?

Fix for_each_n policy to par instead of seq. This patch speed is the same as before (much faster than without).

About testing - I linked clang and my project with old version, older patch and newer patch without /DEBUG (with /DEBUG lld crashed). For clang all 3 versions were consistently getting binary the same result. For my project all 3 versions were getting sometimes different results.

I found another source of non-determinism. Function Writer::sortExceptionTable uses parallel_sort to sort exception table. Microsoft implementation of parallel_sort is unstable and non-deterministic. If ICF is used then exception table entries may have equal entries which results in randomness. After changing parallel_sort to stable_sort randomness for my project disappeared (launched link 50 times).

Fix non-determinism in sorting exception table (probably should be in a separate commit)

ruiu added a comment.Aug 28 2017, 8:39 AM

There is still a reproducible issue. When I run lld several times to link clang with this patch, it produces two clang executables that are differ in size:

151,173,632 clang.exe
151,181,312 clang.exe

while without this patch, lld consistently produces the following executable:

151,170,560 clang.exe

I'll try to reproduce it again. Could you please tell me which configuration do you use for building clang, as your size for clang executable (150Mb) differs a lot from mine (55MB)? I compile with MSVC 2015 in RelWithDebInfo configuration. Do you recompile clang from scratch every time or only remove clang.exe and relink it (I did the latter)?

ruiu added a comment.Aug 28 2017, 9:10 AM

I compile with MSVC 2015 in RelWithDebInfo configuration. Do you recompile clang from scratch every time or only remove clang.exe and relink it (I did the latter)?

I'm using MSVC 2017 in Debug configuration. I remove only clang.exe and re-link it, so I believe the procedure is the same as yours.

ruiu added a comment.Aug 28 2017, 9:21 AM

I may be able to write a test for reproducibility, but the test needs to be fairly large because it probably involves a multi-thread nondeterminism which you can't see with a small test.

Unfortunately, I was unable to reproduce this issue. Tried msvc 2015 and 2017 in Debug and RelWithDebugInfo for x86 and x64, with or without /DEBUG for several times and every time got identical clang.exe (except for small GUID part with /DEBUG). Maybe you could upload resulting different clang.exe files (with pdb if you can) so I can try to find where the diffs occur.

ruiu added a comment.Aug 30 2017, 9:53 AM

My Windows workstation is a massive one with 28 cores 56 threads. That may be a reason why that issue is visible only on my machine.

I'm compiling on 16-cores 32-threads workstation and I've tested on a 20-cores 20-threads one.

Another thing I noticed is that I can't link lld with lld - it crashes in markLive function if /OPT:ref is enabled and in SectionChunk::writeTo otherwise because of nullptr SymbolBody *Body.

ruiu added a comment.Sep 5 2017, 5:49 PM

Are you still not able to reproduce the issue? If not, I can send you a script and output files, but I need to try linking something different because generated clang executables are too large to share over network (more than 1 GiB).

No, I was unable to reproduce yet, cross-linking from linux with asan and tsan found nothing. Script to reproduce and executables will be very helpful if you could provide them.

ruiu added a comment.Sep 7 2017, 4:16 PM

I ran this batch file in an llvm build directory. (You want to replace ..\rel\bin\lld-link.exe with an actual path to your lld-link.exe.)

====
@echo OFF

del bin\clang.exe
ninja -d keeprsp bin\clang.exe

:loop

del bin\clang.exe
..\rel\bin\lld-link.exe /nologo @CMakeFiles/clang.rsp  /out:bin\clang.exe /implib:lib\clang.lib /version:0.0  /machine:x64 /STACK:10000000 /subsystem:console /opt:icf
dir bin\clang.exe

goto loop
====

When I ran the batch file, I got nondeterministic outputs as you can see below.

C:\src\b>run.bat
Could Not Find C:\src\b\bin\clang.exe
[1/1] Linking CXX executable bin\clang.exe
LINK : bin\clang.exe not found or not built by the last incremental link; performing full link
 Volume in drive C has no label.
 Volume Serial Number is 9E74-104F

 Directory of C:\src\b\bin

09/07/2017  04:14 PM       152,404,992 clang.exe
               1 File(s)    152,404,992 bytes
               0 Dir(s)  759,003,766,784 bytes free
 Volume in drive C has no label.
 Volume Serial Number is 9E74-104F

 Directory of C:\src\b\bin

09/07/2017  04:14 PM       152,394,240 clang.exe
               1 File(s)    152,394,240 bytes
               0 Dir(s)  759,003,725,824 bytes free
 Volume in drive C has no label.
 Volume Serial Number is 9E74-104F

 Directory of C:\src\b\bin

09/07/2017  04:14 PM       152,397,312 clang.exe
               1 File(s)    152,397,312 bytes
               0 Dir(s)  759,003,721,728 bytes free
 Volume in drive C has no label.
 Volume Serial Number is 9E74-104F

 Directory of C:\src\b\bin

09/07/2017  04:14 PM       152,397,312 clang.exe
               1 File(s)    152,397,312 bytes
               0 Dir(s)  759,003,721,728 bytes free
 Volume in drive C has no label.
 Volume Serial Number is 9E74-104F

 Directory of C:\src\b\bin

09/07/2017  04:14 PM       152,404,992 clang.exe
               1 File(s)    152,404,992 bytes
               0 Dir(s)  759,003,713,536 bytes free
 Volume in drive C has no label.
 Volume Serial Number is 9E74-104F

 Directory of C:\src\b\bin

09/07/2017  04:15 PM       152,404,992 clang.exe
               1 File(s)    152,404,992 bytes
               0 Dir(s)  759,003,713,536 bytes free
^CTerminate batch job (Y/N)?
^CThe system cannot open the device or file specified.