Page MenuHomePhabricator

[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
ClosedPublic

Authored by xur on May 11 2021, 8:41 AM.

Details

Summary

This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has
the following changes:
(1) disable current discriminator coding scheme,
(2) new hierarchical discriminator for FSAFDO.

For this patch, "-enable-fs-discriminator=true" turns on the new functionality.
"-enable-fs-discriminator=false" (the default) keeps the current SampleFDO
behavior. When the fs-discriminator is enabled, we insert a flag variable,
llvm_fs_discriminator, to the object.
This symbol will checked by create_llvm_prof tool, and used to generate a
profile with FS-AFDO discriminators enabled.
If this happens, for an extbinary format profile, create_llvm_prof tool will add a
flag to profile summary section. For other format profiles, the
users need to use an internal option (-profile-isfs) to tell the compiler that the
profile uses FS-AFDO discriminators.

Diff Detail

Unit TestsFailed

TimeTest
170 msx64 windows > LLVM.Other::new-pm-lto-defaults.ll
Script: -- : 'RUN: at line 5'; c:\ws\w32-1\llvm-project\premerge-checks\build\bin\opt.exe -disable-verify -verify-cfg-preserved=0 -debug-pass-manager -passes='lto<O1>' -S C:\ws\w32-1\llvm-project\premerge-checks\llvm\test\Other\new-pm-lto-defaults.ll 2>&1 | c:\ws\w32-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w32-1\llvm-project\premerge-checks\llvm\test\Other\new-pm-lto-defaults.ll --check-prefix=CHECK-O --check-prefix=CHECK-O1

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
xur added a comment.May 13 2021, 12:02 PM

Integrated Wi and Hongtao's review comments. Split changes in ProfileData and llvm-profdata to separated patches.

snehasish added inline comments.May 13 2021, 12:50 PM
llvm/include/llvm/Support/FSAFDODiscriminator.h
41

I don't think these helpers are used in this patch. If so can we remove them from this patch and add it when they are actually used?

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
24

This seems to be unused?

68

Why not just use the MF reference throughout? I don't see any uses in this function where a pointer is necessary.

90

nit: I think this can simply be written as
Location L{DIL->getFilename(), LineNo};

same for LocationDiscriminator LD below.

118

I think this should not be set unless we actually set the DILocation for the instruction (i.e. only when the else part is executed). Perhaps just add a continue on L110 inside the if() so that you can remove the else indentation block making it a bit easier to read as well.

124

Just use const char * since this string is immutable?
Also typo in FSDisriminatorVar -> FSDiscriminatorVar

llvm/lib/CodeGen/TargetPassConfig.cpp
169

Can you also augment the comment to indicate where this option might be useful? It wasn't clear to me why this is needed.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
579

Can you add a comment here and below why this loop is skipped when FSDiscriminators are requested?

llvm/test/CodeGen/X86/fsafdo_test2.ll
8

nit: I think ;; is used for comments. Adding a new line before the comment wouldn't hurt either.

23

Since the IR for this test is non-trivial, can you include the c source and the invocation used to generate this?
Probably good to do for the previous test as well.

xur marked an inline comment as done.May 13 2021, 1:50 PM

snehasish, thanks for the reviewing comments! I'll integrate your comments and update the patch.

llvm/include/llvm/Support/FSAFDODiscriminator.h
41

Yes. That are for the FS profile loader. I'll remove them and add later when they are used.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
24

This is left over from refactor. I'll remove this.

68

This code has been changed quite a bit over time. But you are right that now we can use reference directly. I will make the change.

90

Yes. will fix.

118

It's rarely that cloneWithDiscriminator return empty DIL. But you are right here -- I should only set in the else clause.
I will fix this.

124

Right. I'll fix.

llvm/lib/CodeGen/TargetPassConfig.cpp
169

This is for debuging and tuning purpose. I don't think this gonna hurt anything. If later we found this rarely used, we can remove by then.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
579

This is just to disable the existing multiply factor in discriminators.
I will add a comment here (and a few other places like this).

llvm/test/CodeGen/X86/fsafdo_test2.ll
8

will fix.

23

Yes. This is a good idea.

xur updated this revision to Diff 345287.May 13 2021, 2:25 PM
xur marked an inline comment as done.

Integrated Snehasish's review comments.

snehasish accepted this revision.May 13 2021, 2:32 PM

lgtm, however please wait a bit for any follow up comments from other reviewers. Thanks!

This revision is now accepted and ready to land.May 13 2021, 2:32 PM
wmi added a comment.May 13 2021, 9:52 PM

Thanks for splitting the patches.

llvm/include/llvm/IR/DebugInfoMetadata.h
2246

It will be slightly clearer if the cases of EnableFSDiscriminator being true and false are not interleaved, so that it is easy to know that there won't be undefined use for DF and CI.

if (EnableFSDiscriminator) {
  return ...;
}
...
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
56

Could you describe how FSDiscriminator is computed here?

61–62

How about using tuple for LocationDiscriminator?

llvm/lib/CodeGen/TargetPassConfig.cpp
169

typo: discriminators

173

typo: discriminators

llvm/test/Transforms/SampleProfile/Inputs/fsafdo.prof
2

fsafdo.prof and fsafdo.extbinary.afdo may only be used in the other splitted patch.

xur marked 4 inline comments as done.May 14 2021, 10:08 AM
xur added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
2246

agreed. I will change this.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
61–62

Yes. I think that's a better choice.

wenlei added inline comments.May 14 2021, 10:09 AM
llvm/include/llvm/CodeGen/FSAFDODiscriminator.h
2

IIUC, FSDiscriminator can be added to IR pass too, possibly with the existing AddDiscriminator IR pass plus tweak on discriminator encoding.

The current implementation makes it look like FSDiscriminator is tied to MIR pass alone.. Perhaps name the files/pass for MIR AddMIR[FS]Discriminator etc so it doesn't convey the message that FSDiscriminator is tied to MIR?

llvm/include/llvm/Support/FSAFDODiscriminator.h
2

I wondering if it also makes sense to pull all existing discriminator encoding stuff into a single header, for consistency. Then it can be Discriminator.h

encodeDiscriminator/decodeDiscriminator for today's discriminator encoding are static functions that does pure encoding and can live anywhere.

18

Is this needed?

37

Is N>31 actually expected? could assert?

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
48

Empty string may not be a big/noticeable problem as it just slightly increase the chance of hash collision. I think in many places, we fall back to getName when getLinkageName is empty.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
584

It's probably cleaner to just bail out inside cloneByMultiplyingDuplicationFactor by checking EnableFSDiscriminator, return this if that flag is on. That would avoid checks for the 3 loop passes. But for compile time, there's no reason to go through the IR if we aren't going to do anything.

If we skip these from the loop passes, how about also assert in cloneByMultiplyingDuplicationFactor that EnableFSDiscriminator if off as a safe guard.

xur updated this revision to Diff 345481.May 14 2021, 10:10 AM
xur marked an inline comment as done.

Thanks to Wei for the comments. Here is the version integrate his suggestions.

wenlei added inline comments.May 14 2021, 10:18 AM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
91

When discriminator passes run in order, do we actually expect anything to add bits outside of BitMaskBefore?

101

This seems different from the description in original RFC where you talked about using count of total discriminators and sequential id as seed for a fs-discriminator. Given that we still use inline stack as part of line info, and discriminator is on top of line stack, what extra benefit does hashing the entire inline stack bring us?

xur marked 3 inline comments as done.May 14 2021, 11:08 AM

Thanks to Wenlei for the comments!

llvm/include/llvm/CodeGen/FSAFDODiscriminator.h
2

Yes. This is true. The IR level FSDiscriminator could a revised version of current AddDiscriminator pass. I'll follow your suggestion and do the renaming.

llvm/include/llvm/Support/FSAFDODiscriminator.h
2

This is doable. Let me try.

18

This was needed by other functions that were split from this patch. Thanks for catching this. I'll remove it.

37

should not be greater than 31. Assert is good idea.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
48

For the case of empty name and linkagename, they do return the same hash and may result in a conflict. But I don't thing we can do much here - it just means we cannot use call stack to differentiate.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
584

I was thinking the same. But decided to do this way to favor the compile time. We might want to deprecate multiply factor later, so all this might be just a temporary thing.
I will add an assert in cloneByMultiplyingDuplicationFactor

xur marked 3 inline comments as done.May 14 2021, 11:16 AM
xur added inline comments.
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
91

I'm not fully understand the question.

This is to compute the bits that can be set in this pass. BitMaskBefore for the mask for the bits that have been set by previous passes -- it assumes the the previous discriminators only touch the lower bits.

101

yes. this part has gone through quite some change before and after RFC. Here we still uses a sequential id.
Add a callstack hash is just one more safety measure to catch the mismatches.

wenlei added a subscriber: wlei.May 14 2021, 11:57 AM

cc: @wlei

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
91

I was wondering whether (Discriminator &= BitMaskBefore) == Discriminator is expected here since we haven't set extra bits for current pass yet. If that's true, then the &= is not needed or could be an assert?

(I think somehow the comment got associated with the wrong line, it should about Discriminator &= BitMaskBefore;).

101

Not a big deal, but I think since we always have line stack, and discriminator always has paired line stack, it doesn't need to have that info captured. Could saving the walk over inlineAt chain?

xur added inline comments.May 14 2021, 1:46 PM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
91

Yes. you are right here. I'll remove that stmt.

101

when we insert the discriminator, we just just check the lineno (I meant the tuple of <filename, lineno, discriminator) -- this is not line stack.
I did some pretty large build and did not find this part consume much time.

I'll keep this in mind and gonna revisit in the tuning.

xur updated this revision to Diff 345549.May 14 2021, 1:51 PM

Integrated Wenlei's suggestions.

@wenlei: I only moved encoding / decoding static functions from DebugInfoMetadata.h to the new Support/Discriminator.h and left other discriminator codes in DebugInfoMetadata.h. Not sure if this is what you meant in the comments.

wenlei added inline comments.May 14 2021, 1:54 PM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
101

ok, it's fine for now. For existing sample loader, we use nested FunctionSamples to get profile for instruction based on its line stack (FunctionSamples::findFunctionSamples), so the use of line stack is implicit there. Not sure if FS sample loader is different.

Integrated Wenlei's suggestions.

@wenlei: I only moved encoding / decoding static functions from DebugInfoMetadata.h to the new Support/Discriminator.h and left other discriminator codes in DebugInfoMetadata.h. Not sure if this is what you meant in the comments.

Right, that's what I meant. Thanks for the refactoring.

xur added inline comments.May 14 2021, 2:17 PM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
101

ok, it's fine for now. For existing sample loader, we use nested FunctionSamples to get profile for instruction based on its line stack (FunctionSamples::findFunctionSamples), so the use of line stack is implicit there. Not sure if FS sample loader is different.

Oh. I see what you meant here now. FS sample loader uses the same mechanism. In this sense, yes, the location has the call stack info. I'll do some more experiments on this. For now, let's just keep it.

hoy added inline comments.May 14 2021, 6:12 PM
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
67 ↗(On Diff #345549)

Nit: not sure if std::unordered_multimap can be a bit faster by not constructing a BBSet if most instructions are not duplicated.

llvm/lib/CodeGen/TargetPassConfig.cpp
1180

Wondering why using PASS_LAST_DIS_BIT_BEG here. Will other bits be used in later patches?

xur added inline comments.May 14 2021, 10:29 PM
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
67 ↗(On Diff #345549)

Do you mean line line 67 using
std::unordered_multimap<LocationDiscriminator, const MachineBasicBlock *> ?

I'm not sure.
My impression is DenseMap performs closely to unordered_multimap. You are considering the cost of constructing BBSet. I agree with your on this. But for some cases, there are lot instruction duplicattion for some <line, discriminator> -- like some codes in stl header. In these cases, unordered_multimap might not perform well as it needs to store all the BBs.

Let me run an experiments to compare. I will report back.

llvm/lib/CodeGen/TargetPassConfig.cpp
1180

This is to catch all the clones in the optimization pipeline, so we could sum the samples instead of using max. We will have a FS profile loader here. For earlier pass, there will be an FS Profile loader for each AddFSDiscriminatorPass.

xur added inline comments.May 18 2021, 10:24 AM
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
67 ↗(On Diff #345549)

When I tried the multimap, I found it did not fit here well here. The problem is for the same <file, lineno, discriminator> tuple, if they are in the same BB, BBset will aggregate them together. For the multimap, they are of different entries. Without checking all the entries with the same key, I could not tell if they are with the same BB, or with different BB. In short, we need to check all the entries and find out (1) if they are from the same BB (2) if there are two distinct BBs in the map. The code is not as clean as using a set.

Anyway, I tested both implementations using SPEC2006/Xalancbmk and one large google benchmark. The compiler time for these two about the same (note that compiler time has a pretty large noise range).

I think I would like to keep DenseSet version.

wmi accepted this revision.May 18 2021, 11:35 AM

LGTM.

hoy accepted this revision.May 18 2021, 12:08 PM
hoy added inline comments.
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
67 ↗(On Diff #345549)

Sounds good. Thanks for the analysis and experiment.

wenlei accepted this revision.May 18 2021, 12:14 PM

lgtm too, thanks. looking forward to complete fs-adfo support and we'd be happy to give it a try.

This revision was landed with ongoing or failed builds.May 18 2021, 4:27 PM
This revision was automatically updated to reflect the committed changes.