We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm having trouble coming up with a small test case due to not be super familiar with this area, if anybody could give pointers for constructing a test case that would be helpful
What kind of non-determinism are you seeing? Is the transformed IR not deterministic?
The promoteMergeNotInlinedContextSamples function basically merges the profile of a non-inlined callee function that is nested in its caller's profile into the callee function's own top-level outlined profile. When there aren't multiple callsites to the same callee, the order shouldn't matter. When there are, the merge operation basically does accumulation, and I don't see how the merging order matters either.
Some branch_weights are slightly different across multiple runs of ./build/rel/bin/opt -passes=sample-profile -S reduced.ll -sample-profile-file ~/tmp/default.cs.xfdo -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=false.
I'll need some time to reduce/clean them up to make them sharable, I'll get back to you soon
do you know how to reduce default.cs.xfdo to only include entries for relevant functions?
For text profile, you can edit/delete entries directly. For binary profile, you can convert it to text profile first and then edit. Conversion can be done with llvm-profdat: llvm-profdata merge --text <input_binary_profile> --output=<output_text_profile>
We usually use text profile for test cases.
thanks, that worked
how do I know which entries correspond to a function name? looks like everything is a hash
hmm, it looks like the text output of the profile is invalid, I'll ask around internally what's going on
I guess your binary profile is md5-based, where all function names are actually GUIDs.
does the non-deterministic iteration the denseMap (with pointer key) cause the outlineFS to have different profiles (line 1549)? It might be possible to come up with a manual test case.
I would argue that one doesn't need a test case for this. It is essentially dangerous to iterate a DenseMap and do transforms.
Is it possible to construct a test that fails in reverse iteration mode? Or, does an existing test fail in reverse iteration mode? If so, you can use that for test coverage instead.
llvm/lib/Transforms/IPO/SampleProfile.cpp | ||
---|---|---|
1512 | This looks like the non-deterministic iteration |
Agree with Reid. I am just curious that the 'reduction' operation should be immune to the iteration order (unless overflows are involved), and if it is that case.
Since non-deterministic-ness is definitely a good thing to fix, I will approve the patch and the test case can be added later if feasible.
Right, fixing non-determinism doesn't really need a test case. And the change itself is good.
The ask for example is more for us to understand whether there's other bug exposed by the non-determinism. Remarks printing aside, the use in the iteration should not be order sensitive, so it's not obvious how different iteration order would change branch weights and there could be another bug somewhere..
Looks good -- thanks for the fix. As mentioned above, we'd appreciate an example demonstrating the non-deterministic branch weights.
I'm having a very hard time reducing the profile since it's over 1GB. I tried extracting only the entries for the relevant functions, then ran into
error: /home/aeubanks/tmp/1.prof:166: Expected 'NUM[.NUM]: NUM[ mangled_name:NUM]*', found 3: 4734850658227013518:33988
when I deleted the lines that the reader didn't expect, the non-determinism didn't repro anymore.
llvm-profdata show --function foo --sample --text doesn't seem to output the profile in the proper text format, just a human readable format
The IR is fairly small:
define void @_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE(ptr %0, i32 %1, i32 %2) #0 !dbg !8 { br label %4 4: ; preds = %3 call void @_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE(ptr null, i32 0, i32 0), !dbg !11 ret void } define internal void @_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE(ptr %0, ptr %1, i32 %2) #0 !dbg !12 { br label %4 4: ; preds = %3 call void @_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE(ptr null, i32 0, i32 0), !dbg !13 br label %5 5: ; preds = %4 call void @_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE(ptr null, ptr null, i32 0), !dbg !14 ret void } define ptr @_ZN3re26Regexp5ParseENSt3__u17basic_string_viewIcNS1_11char_traitsIcEEEENS0_10ParseFlagsEPNS_12RegexpStatusE() #0 !dbg !15 { br label %1 1: ; preds = %0 call void @_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE(ptr null, ptr null, i32 0), !dbg !16 ret ptr null } attributes #0 = { "use-sample-profile" } !llvm.dbg.cu = !{!0} !llvm.module.flags = !{!2} !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None) !1 = !DIFile(filename: "third_party/re2/parse.cc", directory: "") !2 = !{i32 2, !"Debug Info Version", i32 3} !8 = distinct !DISubprogram(name: "AddRangeFlags", linkageName: "_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE", scope: !1, file: !1, line: 1602, type: !9, scopeLine: 1603, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !10) !9 = !DISubroutineType(types: !10) !10 = !{} !11 = !DILocation(line: 1610, column: 7, scope: !8) !12 = distinct !DISubprogram(name: "AddUGroup", linkageName: "_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE", scope: !1, file: !1, line: 1658, type: !9, scopeLine: 1659, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !10) !13 = !DILocation(line: 1662, column: 11, scope: !12) !14 = !DILocation(line: 1675, column: 7, scope: !12) !15 = distinct !DISubprogram(name: "Parse", linkageName: "_ZN3re26Regexp5ParseENSt3__u17basic_string_viewIcNS1_11char_traitsIcEEEENS0_10ParseFlagsEPNS_12RegexpStatusE", scope: !1, file: !1, line: 2217, type: !9, scopeLine: 2218, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !10) !16 = !DILocation(line: 2461, column: 11, scope: !17) !17 = !DILexicalBlockFile(scope: !15, file: !1, discriminator: 2)
the script to repro non-determinism
PROF=~/tmp/default.cs.xfdo ./build/rel/bin/opt -passes=sample-profile -S $@ -sample-profile-file $PROF -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=0 -o /tmp/1.ll || exit 1 for i in $(seq 0 20); do ./build/rel/bin/opt -passes=sample-profile -S $@ -sample-profile-file $PROF -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=0 -o /tmp/2.ll || exit 1 diff -q /tmp/1.ll /tmp/2.ll || exit 0 done exit 1
example diff
72c72 < !35 = !{!"function_entry_count", i64 15301} --- > !35 = !{!"function_entry_count", i64 15312} 74c74 < !37 = !{!"branch_weights", i32 15301} --- > !37 = !{!"branch_weights", i32 15312}
@rnk mentioned reverse iteration mode - have you tried that? Does it stabilize the nondeterminism at all? (nondeterminism due to hash ordering of pointers could be easily perturbed by unrelated changes causing different allocation profiles, locations, etc - but reverse iteration can stabilize that more/make the non-determinism... deterministic?)
It'd be good to have a test that fails under reverse or forward iteration and passes under the other, if possible.
I tried reverse iteration mode, but it wasn't any more useful than the script I used, the issue is still that I can't figure out how to reduced the profile.
Speaking of the profile, I believe the reason for the error is that we use md5 hashes in place of the function names, which breaks [1].
hmm, I modified llvm-profdata to only output entries for the three functions in the IR in binary format and that doesn't reproduce the issue
This looks like the non-deterministic iteration