This is an archive of the discontinued LLVM Phabricator instance.

[SampleProfile] Fix non-determinism in promoteMergeNotInlinedContextSamples()
ClosedPublic

Authored by aeubanks on Aug 10 2022, 9:38 AM.

Details

Summary

We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.

Diff Detail

Event Timeline

aeubanks created this revision.Aug 10 2022, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 9:38 AM
aeubanks requested review of this revision.Aug 10 2022, 9:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2022, 9:38 AM

I'm having trouble coming up with a small test case due to not be super familiar with this area, if anybody could give pointers for constructing a test case that would be helpful

hoy added a comment.Aug 10 2022, 11:09 AM

What kind of non-determinism are you seeing? Is the transformed IR not deterministic?

The promoteMergeNotInlinedContextSamples function basically merges the profile of a non-inlined callee function that is nested in its caller's profile into the callee function's own top-level outlined profile. When there aren't multiple callsites to the same callee, the order shouldn't matter. When there are, the merge operation basically does accumulation, and I don't see how the merging order matters either.

What kind of non-determinism are you seeing? Is the transformed IR not deterministic?

The promoteMergeNotInlinedContextSamples function basically merges the profile of a non-inlined callee function that is nested in its caller's profile into the callee function's own top-level outlined profile. When there aren't multiple callsites to the same callee, the order shouldn't matter. When there are, the merge operation basically does accumulation, and I don't see how the merging order matters either.

Some branch_weights are slightly different across multiple runs of ./build/rel/bin/opt -passes=sample-profile -S reduced.ll -sample-profile-file ~/tmp/default.cs.xfdo -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=false.

hoy added a comment.Aug 10 2022, 11:14 AM

What kind of non-determinism are you seeing? Is the transformed IR not deterministic?

The promoteMergeNotInlinedContextSamples function basically merges the profile of a non-inlined callee function that is nested in its caller's profile into the callee function's own top-level outlined profile. When there aren't multiple callsites to the same callee, the order shouldn't matter. When there are, the merge operation basically does accumulation, and I don't see how the merging order matters either.

Some branch_weights are slightly different across multiple runs of ./build/rel/bin/opt -passes=sample-profile -S reduced.ll -sample-profile-file ~/tmp/default.cs.xfdo -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=false.

Can you share reduced.ll and default.cs.xfdo?

I'll need some time to reduce/clean them up to make them sharable, I'll get back to you soon

do you know how to reduce default.cs.xfdo to only include entries for relevant functions?

do you know how to reduce default.cs.xfdo to only include entries for relevant functions?

For text profile, you can edit/delete entries directly. For binary profile, you can convert it to text profile first and then edit. Conversion can be done with llvm-profdat: llvm-profdata merge --text <input_binary_profile> --output=<output_text_profile>

We usually use text profile for test cases.

thanks, that worked
how do I know which entries correspond to a function name? looks like everything is a hash

hmm, it looks like the text output of the profile is invalid, I'll ask around internally what's going on

hoy added a comment.EditedAug 11 2022, 9:49 AM

thanks, that worked
how do I know which entries correspond to a function name? looks like everything is a hash

I guess your binary profile is md5-based, where all function names are actually GUIDs.

does the non-deterministic iteration the denseMap (with pointer key) cause the outlineFS to have different profiles (line 1549)? It might be possible to come up with a manual test case.

rnk added a subscriber: rnk.Aug 11 2022, 10:45 AM

I would argue that one doesn't need a test case for this. It is essentially dangerous to iterate a DenseMap and do transforms.

Is it possible to construct a test that fails in reverse iteration mode? Or, does an existing test fail in reverse iteration mode? If so, you can use that for test coverage instead.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1512

This looks like the non-deterministic iteration

Agree with Reid. I am just curious that the 'reduction' operation should be immune to the iteration order (unless overflows are involved), and if it is that case.

Since non-deterministic-ness is definitely a good thing to fix, I will approve the patch and the test case can be added later if feasible.

davidxl accepted this revision.Aug 11 2022, 3:02 PM

lgtm

This revision is now accepted and ready to land.Aug 11 2022, 3:02 PM

I would argue that one doesn't need a test case for this. It is essentially dangerous to iterate a DenseMap and do transforms.

Is it possible to construct a test that fails in reverse iteration mode? Or, does an existing test fail in reverse iteration mode? If so, you can use that for test coverage instead.

Right, fixing non-determinism doesn't really need a test case. And the change itself is good.

The ask for example is more for us to understand whether there's other bug exposed by the non-determinism. Remarks printing aside, the use in the iteration should not be order sensitive, so it's not obvious how different iteration order would change branch weights and there could be another bug somewhere..

wenlei accepted this revision.Aug 12 2022, 8:25 AM

Looks good -- thanks for the fix. As mentioned above, we'd appreciate an example demonstrating the non-deterministic branch weights.

aeubanks added a comment.EditedAug 12 2022, 10:11 AM

I'm having a very hard time reducing the profile since it's over 1GB. I tried extracting only the entries for the relevant functions, then ran into

error: /home/aeubanks/tmp/1.prof:166: Expected 'NUM[.NUM]: NUM[ mangled_name:NUM]*', found  3: 4734850658227013518:33988

when I deleted the lines that the reader didn't expect, the non-determinism didn't repro anymore.

llvm-profdata show --function foo --sample --text doesn't seem to output the profile in the proper text format, just a human readable format

The IR is fairly small:

define void @_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE(ptr %0, i32 %1, i32 %2) #0 !dbg !8 {                                                                               
  br label %4                                                                                                                                                                                      
                                                                                                                                                                                                   
4:                                                ; preds = %3                                                                                                                                     
  call void @_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE(ptr null, i32 0, i32 0), !dbg !11                                                                                  
  ret void                                                                                                                                                                                         
}                                                                                                                                                                                                  
                                                                                                                                                                                                   
define internal void @_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE(ptr %0, ptr %1, i32 %2) #0 !dbg !12 {                                                        
  br label %4                                                                                                                                                                                      
                                                                                                                                                                                                   
4:                                                ; preds = %3
  call void @_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE(ptr null, i32 0, i32 0), !dbg !13
  br label %5

5:                                                ; preds = %4
  call void @_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE(ptr null, ptr null, i32 0), !dbg !14
  ret void
}

define ptr @_ZN3re26Regexp5ParseENSt3__u17basic_string_viewIcNS1_11char_traitsIcEEEENS0_10ParseFlagsEPNS_12RegexpStatusE() #0 !dbg !15 {
  br label %1

1:                                                ; preds = %0
  call void @_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE(ptr null, ptr null, i32 0), !dbg !16
  ret ptr null
}

attributes #0 = { "use-sample-profile" }

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
!1 = !DIFile(filename: "third_party/re2/parse.cc", directory: "")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!8 = distinct !DISubprogram(name: "AddRangeFlags", linkageName: "_ZN3re216CharClassBuilder13AddRangeFlagsEiiNS_6Regexp10ParseFlagsE", scope: !1, file: !1, line: 1602, type: !9, scopeLine: 1603, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !10)
!9 = !DISubroutineType(types: !10)
!10 = !{}
!11 = !DILocation(line: 1610, column: 7, scope: !8)
!12 = distinct !DISubprogram(name: "AddUGroup", linkageName: "_ZN3re2L9AddUGroupEPNS_16CharClassBuilderEPKNS_6UGroupEiNS_6Regexp10ParseFlagsE", scope: !1, file: !1, line: 1658, type: !9, scopeLine: 1659, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !10)
!13 = !DILocation(line: 1662, column: 11, scope: !12)
!14 = !DILocation(line: 1675, column: 7, scope: !12)
!15 = distinct !DISubprogram(name: "Parse", linkageName: "_ZN3re26Regexp5ParseENSt3__u17basic_string_viewIcNS1_11char_traitsIcEEEENS0_10ParseFlagsEPNS_12RegexpStatusE", scope: !1, file: !1, line: 2217, type: !9, scopeLine: 2218, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !10)
!16 = !DILocation(line: 2461, column: 11, scope: !17)
!17 = !DILexicalBlockFile(scope: !15, file: !1, discriminator: 2)

the script to repro non-determinism

PROF=~/tmp/default.cs.xfdo
./build/rel/bin/opt -passes=sample-profile -S $@ -sample-profile-file $PROF -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=0 -o /tmp/1.ll || exit 1
for i in $(seq 0 20); do
  ./build/rel/bin/opt -passes=sample-profile -S $@ -sample-profile-file $PROF -pgo-kind pgo-sample-use-pipeline -use-profiled-call-graph=0 -o /tmp/2.ll || exit 1
  diff -q /tmp/1.ll /tmp/2.ll || exit 0
done
exit 1

example diff

72c72
< !35 = !{!"function_entry_count", i64 15301}
---
> !35 = !{!"function_entry_count", i64 15312}
74c74
< !37 = !{!"branch_weights", i32 15301}
---
> !37 = !{!"branch_weights", i32 15312}
This revision was landed with ongoing or failed builds.Aug 12 2022, 10:18 AM
This revision was automatically updated to reflect the committed changes.

@rnk mentioned reverse iteration mode - have you tried that? Does it stabilize the nondeterminism at all? (nondeterminism due to hash ordering of pointers could be easily perturbed by unrelated changes causing different allocation profiles, locations, etc - but reverse iteration can stabilize that more/make the non-determinism... deterministic?)

It'd be good to have a test that fails under reverse or forward iteration and passes under the other, if possible.

@rnk mentioned reverse iteration mode - have you tried that? Does it stabilize the nondeterminism at all? (nondeterminism due to hash ordering of pointers could be easily perturbed by unrelated changes causing different allocation profiles, locations, etc - but reverse iteration can stabilize that more/make the non-determinism... deterministic?)

It'd be good to have a test that fails under reverse or forward iteration and passes under the other, if possible.

I tried reverse iteration mode, but it wasn't any more useful than the script I used, the issue is still that I can't figure out how to reduced the profile.

Speaking of the profile, I believe the reason for the error is that we use md5 hashes in place of the function names, which breaks [1].

[1] https://github.com/llvm/llvm-project/blob/3329cec2f79185bafd678f310fafadba2a8c76d2/llvm/lib/ProfileData/SampleProfReader.cpp#L342

hmm, I modified llvm-profdata to only output entries for the three functions in the IR in binary format and that doesn't reproduce the issue