This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
ClosedPublic

Authored by xur on May 11 2021, 8:41 AM.

Details

Summary

This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has
the following changes:
(1) disable current discriminator coding scheme,
(2) new hierarchical discriminator for FSAFDO.

For this patch, "-enable-fs-discriminator=true" turns on the new functionality.
"-enable-fs-discriminator=false" (the default) keeps the current SampleFDO
behavior. When the fs-discriminator is enabled, we insert a flag variable,
llvm_fs_discriminator, to the object.
This symbol will checked by create_llvm_prof tool, and used to generate a
profile with FS-AFDO discriminators enabled.
If this happens, for an extbinary format profile, create_llvm_prof tool will add a
flag to profile summary section. For other format profiles, the
users need to use an internal option (-profile-isfs) to tell the compiler that the
profile uses FS-AFDO discriminators.

Diff Detail

Event Timeline

xur created this revision.May 11 2021, 8:41 AM
xur requested review of this revision.May 11 2021, 8:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 8:41 AM
wmi added a comment.May 12 2021, 7:49 AM

Can you split the changes under ProfileData into a separate patch?

llvm/include/llvm/IR/DebugInfoMetadata.h
65

This may introduce undefined variable problem if llvm library user only use LLVMCore and not LLVMProfileData. The locations of definition and declaration need to be swapped.

1738

If the first bit is the 0th bit, the 1 in 0x1FF is in the 8th bit and why input B = 7 can clear it?

xur added a comment.May 12 2021, 9:48 AM

Can you split the changes under ProfileData into a separate patch?

OK. I will do that.

llvm/include/llvm/IR/DebugInfoMetadata.h
65

OK. This makes sense. I'll fix this.

1738

The comment is not accurate. It should be "Zero out the (B+1)-th and above bits for D.

hoy added inline comments.May 13 2021, 12:35 AM
llvm/include/llvm/CodeGen/FSAFDODiscriminator.h
11 ↗(On Diff #344417)

Nit: its

llvm/include/llvm/IR/DebugInfoMetadata.h
1738

Looks like B == 0 is a special case, can you please update the comment for that too?

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
47 ↗(On Diff #344417)

BB.getName() and getSubprogram()->getLinkageName() could be empty. Could it be a problem?

77 ↗(On Diff #344417)

The xor could result in 1 for higher bits above HighBit. Does it matter?

llvm/lib/ProfileData/SampleProfReader.cpp
260 ↗(On Diff #344417)

Set FunctionSamples::ProfileIsFS as well to be consistent with the extbin reader?

xur marked 2 inline comments as done.May 13 2021, 10:21 AM

Hoy, Thanks for comments!

llvm/include/llvm/IR/DebugInfoMetadata.h
1738

Thanks. Will update this.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
47 ↗(On Diff #344417)

MD5Hash can take empty string. I did not encounter issues for this.

77 ↗(On Diff #344417)

This should not happen. The bits higher than HighBit should be 0 for for variables.

llvm/lib/ProfileData/SampleProfReader.cpp
260 ↗(On Diff #344417)

Extbin format is a little different from other formats. We assume the format is encoded in summary section.
It's in function SampleProfileReaderExtBinaryBase::readOneSection().

I did not overwrite the flag in summary section with the internal flag -- this flag is mostly used in test and debug with text and binary format. I assume most users will be using extbinary format where flag variable in summary section should have been set properly.

xur marked an inline comment as done.May 13 2021, 11:12 AM
xur added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
1738

Think again, I'll remove the special case of B==0. Currently there is not caller with B==0.

xur updated this revision to Diff 345246.May 13 2021, 12:02 PM
xur added a comment.May 13 2021, 12:02 PM

Integrated Wi and Hongtao's review comments. Split changes in ProfileData and llvm-profdata to separated patches.

snehasish added inline comments.May 13 2021, 12:50 PM
llvm/include/llvm/Support/FSAFDODiscriminator.h
41 ↗(On Diff #345246)

I don't think these helpers are used in this patch. If so can we remove them from this patch and add it when they are actually used?

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
24 ↗(On Diff #345246)

This seems to be unused?

68 ↗(On Diff #345246)

Why not just use the MF reference throughout? I don't see any uses in this function where a pointer is necessary.

90 ↗(On Diff #345246)

nit: I think this can simply be written as
Location L{DIL->getFilename(), LineNo};

same for LocationDiscriminator LD below.

118 ↗(On Diff #345246)

I think this should not be set unless we actually set the DILocation for the instruction (i.e. only when the else part is executed). Perhaps just add a continue on L110 inside the if() so that you can remove the else indentation block making it a bit easier to read as well.

124 ↗(On Diff #345246)

Just use const char * since this string is immutable?
Also typo in FSDisriminatorVar -> FSDiscriminatorVar

llvm/lib/CodeGen/TargetPassConfig.cpp
169

Can you also augment the comment to indicate where this option might be useful? It wasn't clear to me why this is needed.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
573–575

Can you add a comment here and below why this loop is skipped when FSDiscriminators are requested?

llvm/test/CodeGen/X86/fsafdo_test2.ll
9

nit: I think ;; is used for comments. Adding a new line before the comment wouldn't hurt either.

24

Since the IR for this test is non-trivial, can you include the c source and the invocation used to generate this?
Probably good to do for the previous test as well.

xur marked an inline comment as done.May 13 2021, 1:50 PM

snehasish, thanks for the reviewing comments! I'll integrate your comments and update the patch.

llvm/include/llvm/Support/FSAFDODiscriminator.h
41 ↗(On Diff #345246)

Yes. That are for the FS profile loader. I'll remove them and add later when they are used.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
24 ↗(On Diff #345246)

This is left over from refactor. I'll remove this.

68 ↗(On Diff #345246)

This code has been changed quite a bit over time. But you are right that now we can use reference directly. I will make the change.

90 ↗(On Diff #345246)

Yes. will fix.

118 ↗(On Diff #345246)

It's rarely that cloneWithDiscriminator return empty DIL. But you are right here -- I should only set in the else clause.
I will fix this.

124 ↗(On Diff #345246)

Right. I'll fix.

llvm/lib/CodeGen/TargetPassConfig.cpp
169

This is for debuging and tuning purpose. I don't think this gonna hurt anything. If later we found this rarely used, we can remove by then.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
573–575

This is just to disable the existing multiply factor in discriminators.
I will add a comment here (and a few other places like this).

llvm/test/CodeGen/X86/fsafdo_test2.ll
9

will fix.

24

Yes. This is a good idea.

xur updated this revision to Diff 345287.May 13 2021, 2:25 PM
xur marked an inline comment as done.

Integrated Snehasish's review comments.

snehasish accepted this revision.May 13 2021, 2:32 PM

lgtm, however please wait a bit for any follow up comments from other reviewers. Thanks!

This revision is now accepted and ready to land.May 13 2021, 2:32 PM
wmi added a comment.May 13 2021, 9:52 PM

Thanks for splitting the patches.

llvm/include/llvm/IR/DebugInfoMetadata.h
2213–2221

It will be slightly clearer if the cases of EnableFSDiscriminator being true and false are not interleaved, so that it is easy to know that there won't be undefined use for DF and CI.

if (EnableFSDiscriminator) {
  return ...;
}
...
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
55 ↗(On Diff #345287)

Could you describe how FSDiscriminator is computed here?

60–61 ↗(On Diff #345287)

How about using tuple for LocationDiscriminator?

llvm/lib/CodeGen/TargetPassConfig.cpp
169

typo: discriminators

173

typo: discriminators

llvm/test/Transforms/SampleProfile/Inputs/fsafdo.prof
1 ↗(On Diff #345287)

fsafdo.prof and fsafdo.extbinary.afdo may only be used in the other splitted patch.

xur marked 4 inline comments as done.May 14 2021, 10:08 AM
xur added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
2213–2221

agreed. I will change this.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
60–61 ↗(On Diff #345287)

Yes. I think that's a better choice.

wenlei added inline comments.May 14 2021, 10:09 AM
llvm/include/llvm/CodeGen/FSAFDODiscriminator.h
1 ↗(On Diff #345287)

IIUC, FSDiscriminator can be added to IR pass too, possibly with the existing AddDiscriminator IR pass plus tweak on discriminator encoding.

The current implementation makes it look like FSDiscriminator is tied to MIR pass alone.. Perhaps name the files/pass for MIR AddMIR[FS]Discriminator etc so it doesn't convey the message that FSDiscriminator is tied to MIR?

llvm/include/llvm/Support/FSAFDODiscriminator.h
1 ↗(On Diff #345287)

I wondering if it also makes sense to pull all existing discriminator encoding stuff into a single header, for consistency. Then it can be Discriminator.h

encodeDiscriminator/decodeDiscriminator for today's discriminator encoding are static functions that does pure encoding and can live anywhere.

17 ↗(On Diff #345287)

Is this needed?

36 ↗(On Diff #345287)

Is N>31 actually expected? could assert?

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
47 ↗(On Diff #344417)

Empty string may not be a big/noticeable problem as it just slightly increase the chance of hash collision. I think in many places, we fall back to getName when getLinkageName is empty.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
580

It's probably cleaner to just bail out inside cloneByMultiplyingDuplicationFactor by checking EnableFSDiscriminator, return this if that flag is on. That would avoid checks for the 3 loop passes. But for compile time, there's no reason to go through the IR if we aren't going to do anything.

If we skip these from the loop passes, how about also assert in cloneByMultiplyingDuplicationFactor that EnableFSDiscriminator if off as a safe guard.

xur updated this revision to Diff 345481.May 14 2021, 10:10 AM
xur marked an inline comment as done.

Thanks to Wei for the comments. Here is the version integrate his suggestions.

wenlei added inline comments.May 14 2021, 10:18 AM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
90 ↗(On Diff #345287)

When discriminator passes run in order, do we actually expect anything to add bits outside of BitMaskBefore?

100 ↗(On Diff #345287)

This seems different from the description in original RFC where you talked about using count of total discriminators and sequential id as seed for a fs-discriminator. Given that we still use inline stack as part of line info, and discriminator is on top of line stack, what extra benefit does hashing the entire inline stack bring us?

xur marked 3 inline comments as done.May 14 2021, 11:08 AM

Thanks to Wenlei for the comments!

llvm/include/llvm/CodeGen/FSAFDODiscriminator.h
1 ↗(On Diff #345287)

Yes. This is true. The IR level FSDiscriminator could a revised version of current AddDiscriminator pass. I'll follow your suggestion and do the renaming.

llvm/include/llvm/Support/FSAFDODiscriminator.h
1 ↗(On Diff #345287)

This is doable. Let me try.

17 ↗(On Diff #345287)

This was needed by other functions that were split from this patch. Thanks for catching this. I'll remove it.

36 ↗(On Diff #345287)

should not be greater than 31. Assert is good idea.

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
47 ↗(On Diff #344417)

For the case of empty name and linkagename, they do return the same hash and may result in a conflict. But I don't thing we can do much here - it just means we cannot use call stack to differentiate.

llvm/lib/Transforms/Utils/LoopUnroll.cpp
580

I was thinking the same. But decided to do this way to favor the compile time. We might want to deprecate multiply factor later, so all this might be just a temporary thing.
I will add an assert in cloneByMultiplyingDuplicationFactor

xur marked 3 inline comments as done.May 14 2021, 11:16 AM
xur added inline comments.
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
90 ↗(On Diff #345287)

I'm not fully understand the question.

This is to compute the bits that can be set in this pass. BitMaskBefore for the mask for the bits that have been set by previous passes -- it assumes the the previous discriminators only touch the lower bits.

100 ↗(On Diff #345287)

yes. this part has gone through quite some change before and after RFC. Here we still uses a sequential id.
Add a callstack hash is just one more safety measure to catch the mismatches.

wenlei added a subscriber: wlei.May 14 2021, 11:57 AM

cc: @wlei

llvm/lib/CodeGen/FSAFDODiscriminator.cpp
90 ↗(On Diff #345287)

I was wondering whether (Discriminator &= BitMaskBefore) == Discriminator is expected here since we haven't set extra bits for current pass yet. If that's true, then the &= is not needed or could be an assert?

(I think somehow the comment got associated with the wrong line, it should about Discriminator &= BitMaskBefore;).

100 ↗(On Diff #345287)

Not a big deal, but I think since we always have line stack, and discriminator always has paired line stack, it doesn't need to have that info captured. Could saving the walk over inlineAt chain?

xur added inline comments.May 14 2021, 1:46 PM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
90 ↗(On Diff #345287)

Yes. you are right here. I'll remove that stmt.

100 ↗(On Diff #345287)

when we insert the discriminator, we just just check the lineno (I meant the tuple of <filename, lineno, discriminator) -- this is not line stack.
I did some pretty large build and did not find this part consume much time.

I'll keep this in mind and gonna revisit in the tuning.

xur updated this revision to Diff 345549.May 14 2021, 1:51 PM

Integrated Wenlei's suggestions.

@wenlei: I only moved encoding / decoding static functions from DebugInfoMetadata.h to the new Support/Discriminator.h and left other discriminator codes in DebugInfoMetadata.h. Not sure if this is what you meant in the comments.

wenlei added inline comments.May 14 2021, 1:54 PM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
100 ↗(On Diff #345287)

ok, it's fine for now. For existing sample loader, we use nested FunctionSamples to get profile for instruction based on its line stack (FunctionSamples::findFunctionSamples), so the use of line stack is implicit there. Not sure if FS sample loader is different.

Integrated Wenlei's suggestions.

@wenlei: I only moved encoding / decoding static functions from DebugInfoMetadata.h to the new Support/Discriminator.h and left other discriminator codes in DebugInfoMetadata.h. Not sure if this is what you meant in the comments.

Right, that's what I meant. Thanks for the refactoring.

xur added inline comments.May 14 2021, 2:17 PM
llvm/lib/CodeGen/FSAFDODiscriminator.cpp
100 ↗(On Diff #345287)

ok, it's fine for now. For existing sample loader, we use nested FunctionSamples to get profile for instruction based on its line stack (FunctionSamples::findFunctionSamples), so the use of line stack is implicit there. Not sure if FS sample loader is different.

Oh. I see what you meant here now. FS sample loader uses the same mechanism. In this sense, yes, the location has the call stack info. I'll do some more experiments on this. For now, let's just keep it.

hoy added inline comments.May 14 2021, 6:12 PM
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
68

Nit: not sure if std::unordered_multimap can be a bit faster by not constructing a BBSet if most instructions are not duplicated.

llvm/lib/CodeGen/TargetPassConfig.cpp
1181

Wondering why using PASS_LAST_DIS_BIT_BEG here. Will other bits be used in later patches?

xur added inline comments.May 14 2021, 10:29 PM
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
68

Do you mean line line 67 using
std::unordered_multimap<LocationDiscriminator, const MachineBasicBlock *> ?

I'm not sure.
My impression is DenseMap performs closely to unordered_multimap. You are considering the cost of constructing BBSet. I agree with your on this. But for some cases, there are lot instruction duplicattion for some <line, discriminator> -- like some codes in stl header. In these cases, unordered_multimap might not perform well as it needs to store all the BBs.

Let me run an experiments to compare. I will report back.

llvm/lib/CodeGen/TargetPassConfig.cpp
1181

This is to catch all the clones in the optimization pipeline, so we could sum the samples instead of using max. We will have a FS profile loader here. For earlier pass, there will be an FS Profile loader for each AddFSDiscriminatorPass.

xur added inline comments.May 18 2021, 10:24 AM
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
68

When I tried the multimap, I found it did not fit here well here. The problem is for the same <file, lineno, discriminator> tuple, if they are in the same BB, BBset will aggregate them together. For the multimap, they are of different entries. Without checking all the entries with the same key, I could not tell if they are with the same BB, or with different BB. In short, we need to check all the entries and find out (1) if they are from the same BB (2) if there are two distinct BBs in the map. The code is not as clean as using a set.

Anyway, I tested both implementations using SPEC2006/Xalancbmk and one large google benchmark. The compiler time for these two about the same (note that compiler time has a pretty large noise range).

I think I would like to keep DenseSet version.

wmi accepted this revision.May 18 2021, 11:35 AM

LGTM.

hoy accepted this revision.May 18 2021, 12:08 PM
hoy added inline comments.
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
68

Sounds good. Thanks for the analysis and experiment.

wenlei accepted this revision.May 18 2021, 12:14 PM

lgtm too, thanks. looking forward to complete fs-adfo support and we'd be happy to give it a try.

This revision was landed with ongoing or failed builds.May 18 2021, 4:27 PM
This revision was automatically updated to reflect the committed changes.