This is an archive of the discontinued LLVM Phabricator instance.

[dsymutil] Remove non-determinism.
ClosedPublic

Authored by JDevlieghere on Aug 29 2018, 10:35 AM.

Details

Summary

Before this patch, analyzeContext called getCanonicalDIEOffset(), for which the result depends on the timings of the setCanonicalDIEOffset() calls in the cloneLambda. This can lead to slightly different output between runs.

This patch fixes this by moving the call to getCanonicalDIEOffset() into the cloneLambda.

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.Aug 29 2018, 10:35 AM
aprantl added inline comments.Aug 29 2018, 12:57 PM
llvm/tools/dsymutil/DwarfLinker.cpp
287

// Only prune forward declarations inside a DW_TAG_module for which a definition exists elsewhere.

friss added a comment.Aug 29 2018, 1:00 PM

What's the perf impact?

What's the perf impact?

The performance impact seems to be around 5%.

Old

75.52 real 121.74 user 8.17 sys
75.40 real 121.69 user 8.19 sys
75.24 real 121.20 user 7.99 sys
75.32 real 121.56 user 8.08 sys
76.23 real 122.46 user 8.18 sys

New

80.03 real 126.08 user 8.30 sys
78.98 real 125.02 user 8.18 sys
79.02 real 125.08 user 8.28 sys
79.03 real 125.10 user 8.40 sys
79.96 real 126.08 user 8.22 sys

friss added inline comments.Aug 29 2018, 2:21 PM
llvm/tools/dsymutil/DwarfLinker.cpp
280

Can you try something different (which - I believe - should give the same result):

static void updatePruning(const CompileUnit::DIEInfo &Info, CompileUnit &CU) {
    if (!Info.Prune || !Info.Ctxt || Info.Ctxt->getCanonicalDIEOffset())
        return;
    Info.Prune = false;
    unsigned int ParentIdx = Info.ParentIdx;
    do {
        CompileUnit::DIEInfo &ParentInfo = CU.getInfo(ParentIdx);
        if (!ParentInfo.Prune)
            return;
        ParentInfo.Prune = false;
        ParentIdx = ParentInfo.ParentIdx;
    } while (ParentIdx != 0);
}

and then instead of doing another loop over all the DIEs, add this at the beginning of the wordlist iteration in lookForDIEsToKeep:

if (!flags)
    updatePrune(MyInfo, CU);

Not sure whether it's going to be better performance wise, but it's worth a shot.

Use Fred's suggestion as it's faster.

JDevlieghere planned changes to this revision.Aug 30 2018, 5:04 AM

I didn't have a modules build of clang yesterday. Both versions trigger the same assertion: Assertion failed: (Ref > InputDIE.getOffset()), function cloneDieReferenceAttribute. I'll have to investigate.

friss added inline comments.Aug 30 2018, 1:23 PM
llvm/tools/dsymutil/DwarfLinker.cpp
299

I think the last bit of condition is backwards. You want to set Prune to false if getCanonicalDIEOffset() returns 0.

  • Alternative approach with no performance penalty.
  • We only consider offsets that were emitted *before* the analyze/clone lambda runs.
aprantl added inline comments.Sep 4 2018, 8:35 AM
llvm/tools/dsymutil/DwarfLinker.cpp
294

This sounds like it would make the resulting .dSYM larger as it now may contain redundant definitions. Is that difference measurable?
Also, we should add a comment here that explains this new behavior.

xbolva00 added inline comments.
llvm/tools/dsymutil/DwarfLinker.cpp
2

Misstyped

Make comment more explicit & fix typo.

JDevlieghere marked 4 inline comments as done.Sep 5 2018, 2:46 AM
friss added a comment.Sep 5 2018, 10:27 AM

2 additional comments:

  • DefinitionOffset is a horrible name. What about ModulesEndOffset?
  • Did you answer Adrian's question about the size increase? Hopefully this is not a blocker, because I like the simplicity of this approach.
llvm/tools/dsymutil/DwarfLinker.cpp
293–294

as you checked that DefinitionOffset is not 0, the first part of your condition seems unnecessary.

JDevlieghere added inline comments.Sep 5 2018, 11:09 AM
llvm/tools/dsymutil/DwarfLinker.cpp
293–294

The size difference is marginal. I'll compute the actual value tomorrow, but it was less than a megabyte on an almost 1GB dSYM (for clang).

s/DefinitionOffset/ModulesEndOffset/

JDevlieghere added a comment.EditedSep 6 2018, 1:28 AM

dSYM for a clang modules build:

$ stat -f%z tot/Contents/Resources/DWARF/clang
962974260

$ stat -f%z differential/Contents/Resources/DWARF/clang
962974305

I understand the difference would be larger for an Objective-C project, so I'll try to benchmark that as well.

That size difference is not big. (That is with a -gmodules -fcxxmodules build, right?) Are we concerened that -j1 will create a different but semantically equivalent output?

friss added a comment.Sep 6 2018, 9:29 AM

dSYM for a clang modules build:

$ stat -f%z tot/Contents/Resources/DWARF/clang
962974260

$ stat -f%z differential/Contents/Resources/DWARF/clang
962974305

I understand the difference would be larger for an Objective-C project, so I'll try to benchmark that as well.

It doesn't hurt to check, but the difference should be smaller when using Obj-C modules. The reason is that for ObjC modules everything will be pruned in the skeleton CUs, everything gets uniqued to the module debug info at the beginning. With C++, template instantiations are not in the module debug info, so they could be uniqued afterwards (and won't with this change). I'm actually surprised that the difference is not bigger for clang.

That size difference is not big. (That is with a -gmodules -fcxxmodules build, right?) Are we concerened that -j1 will create a different but semantically equivalent output?Yu

Yup. Why would -j1 generate different output? With this patch it's actually the other way around, we used to generate different output without threading, but now we do the same thing in both cases because we use the offset which is computed before.

dSYM for a clang modules build:

$ stat -f%z tot/Contents/Resources/DWARF/clang
962974260

$ stat -f%z differential/Contents/Resources/DWARF/clang
962974305

I understand the difference would be larger for an Objective-C project, so I'll try to benchmark that as well.

It doesn't hurt to check, but the difference should be smaller when using Obj-C modules. The reason is that for ObjC modules everything will be pruned in the skeleton CUs, everything gets uniqued to the module debug info at the beginning. With C++, template instantiations are not in the module debug info, so they could be uniqued afterwards (and won't with this change). I'm actually surprised that the difference is not bigger for clang.

Ah, I must have misunderstood then when we talked. Anyway, even better then :-)

Yup. Why would -j1 generate different output? With this patch it's actually the other way around, we used to generate different output without threading, but now we do the same thing in both cases because we use the offset which is computed before.

I discussed this with Fred and that was a misunderstanding on my end. On the flipside this also means that it isn't clear that -j1 actually uses less memory. Did you confirm that that's the case?

Yup. Why would -j1 generate different output? With this patch it's actually the other way around, we used to generate different output without threading, but now we do the same thing in both cases because we use the offset which is computed before.

I discussed this with Fred and that was a misunderstanding on my end. On the flipside this also means that it isn't clear that -j1 actually uses less memory. Did you confirm that that's the case?

Jup, it definitely uses less memory (see the other differential for the actual values) but it’s not an order of magnitude. After talking with Fred we decided to remove the mention of it in the man page to avoid confusion.

friss accepted this revision.Sep 6 2018, 4:23 PM
This revision is now accepted and ready to land.Sep 6 2018, 4:23 PM
This revision was automatically updated to reflect the committed changes.