This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Add brackets for context id to support extended binary format
ClosedPublic

Authored by wlei on Feb 3 2021, 12:39 AM.

Details

Summary

To align with https://reviews.llvm.org/D95547, we need to add brackets for context id before initializing the SampleContext.

Also added test cases for extended binary format from llvm-profgen side.

Diff Detail

Event Timeline

wlei created this revision.Feb 3 2021, 12:39 AM
wlei requested review of this revision.Feb 3 2021, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2021, 12:39 AM
wlei edited the summary of this revision. (Show Details)Feb 3 2021, 10:06 AM
wlei added reviewers: wmi, davidxl, hoy, wenlei.
hoy added inline comments.Feb 3 2021, 11:01 AM
llvm/lib/ProfileData/SampleProfWriter.cpp
362–363

The whole if-else can be just replaced by OS << S.getNameWithContext(true) << ":" << S.getTotalSamples();

llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

Should this be run unconditionally?

wlei added inline comments.Feb 3 2021, 11:10 AM
llvm/lib/ProfileData/SampleProfWriter.cpp
362–363

I see, thanks, FunctionSamples::ProfileIsCS is already checked in getNameWithContext.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

in line273 ContextId.rsplit(" @ ").first.str(); has already had the '[', so only add for the else branch

wlei updated this revision to Diff 321170.Feb 3 2021, 11:13 AM

address Hongtao's feedback

hoy added inline comments.Feb 3 2021, 3:11 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

I see. I'm wondering if it's possible to find a place that can centralize the appending of brackets, say right before populating or looking up the function profiles?

wlei added inline comments.Feb 3 2021, 3:37 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

Yeah, I thought about that, the way in this diff seems very error-prone.

One solution is to modify the ProfileMap at the end (right before it call the write), this way is less intrusive. The trade-off is it could slow down the tool. Since we use the StringRef to save string copy, if we add brackets at the end, we have to create new string instance.

But since our tool is as fast, I‘m inclined to use a less intrusive way. what do you think?

hoy added inline comments.Feb 3 2021, 3:45 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

Agreed, a final wrapping-up pass sounds solid. It might hurt performance a little bit but I'm not expecting a lot since each context string is only reprocessed once.

wlei added inline comments.Feb 3 2021, 3:48 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

Cool, let me fix it!

wenlei added inline comments.Feb 3 2021, 3:51 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
278

+1 for final clean up pass. was thinking along the lines of that too.

wlei updated this revision to Diff 321522.Feb 4 2021, 11:11 AM

use a final clean up pass for the bracket adding

hoy added a comment.Feb 4 2021, 11:26 AM

use a final clean up pass for the bracket adding

Thanks! Looks a lot cleaner now.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
416

This should not be needed either. Perhaps in the future we'll also have the profile writer automatically detect pseudo-probe-based profiles and set FunctionSamples::ProfileIsProbeBased.

llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

This should be removed since now every context has brackets.

wlei added inline comments.Feb 4 2021, 1:08 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
416

How about in SampleProfWriter we use the checksum field to detect it's a probe-based profile then set the FunctionSamples::ProfileIsProbeBased?

llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

It seems it also depends on this variable, see the code below. I saw your previous patch set it in the SampleProfReader but llvm-profgen only use the SampleProfWriter.

/// Return function name with context.
StringRef getNameWithContext(bool WithBracket = false) const {
  return FunctionSamples::ProfileIsCS
             ? Context.getNameWithContext(WithBracket)
             : Name;
}

I can understand we should automatically detect this, so how about to change the Writer to add some detection, say if the name has the bracket then set the ProfileIsCS?

wlei added inline comments.Feb 4 2021, 2:20 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

Seems there aren't a central place in Writer to detect it, the write function is override by different format.
or how about code like this, we ensure the Context is always valid for CS profile, right?

StringRef getNameWithContext(bool WithBracket = false) const {
  return Context.getNameWithContext().size()?
          Context.getNameWithContext(WithBracket) : Name ;
hoy added inline comments.Feb 4 2021, 2:35 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

Yeah, we might end up doing the detection per writer. That should only be needed for ProfileIsPseudoProbe . For now we don't need to set ProfileIsCS because the contexts all have brackets and they can just be treated as regular function profiles by the existing writers.

wlei updated this revision to Diff 321615.Feb 4 2021, 4:43 PM

address Hongtao's feedback: remove ProfileIsCS assignment.

wlei added inline comments.Feb 4 2021, 4:48 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

I see, I guess you meant ProfileIsCS is false by default, so it will use the default Name variable like the regular function and the Name variable actually has the brackets.

Just fixed following this idea by setting the Name.

FProfile.setName(FContext.getNameWithContext(true));
hoy added inline comments.Feb 5 2021, 9:59 AM
llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

Exactly. Also we can avoid setting ProfileIsCS in llvm-profgen now. It's yet ready to do the same for ProfileIsProbeBased since work will be needed on the profile writer side.

wlei added inline comments.Feb 5 2021, 3:14 PM
llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

Yeah, have removed the ProfileIsCS, see the latest diff.

hoy added inline comments.Feb 5 2021, 3:47 PM
llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
10–11

What causes this diff? Looks like the output order changed.

llvm/tools/llvm-profgen/ProfileGenerator.h
68–69

Thanks!

wlei added inline comments.Feb 5 2021, 3:58 PM
llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
10–11

Exactly, because now it ordered in the context with bracket, then the ascii code of ']' is greater than other alphabet or ':', the following test changes are the same.

hoy accepted this revision.Feb 5 2021, 4:02 PM

LGTM.

This revision is now accepted and ready to land.Feb 5 2021, 4:02 PM
wenlei accepted this revision.Feb 5 2021, 5:47 PM

thanks for the cleanup, lgtm.

This revision was landed with ongoing or failed builds.Feb 12 2021, 1:15 AM
This revision was automatically updated to reflect the committed changes.
wenlei added inline comments.Apr 2 2021, 9:54 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

This seems problematic, the string could get destructed while we still keep the StringRef for it in FunctionSamples's name and context..

hoy added inline comments.Apr 2 2021, 10:00 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

Does the try_emplace below copy construct a new one?

wenlei added inline comments.Apr 2 2021, 10:01 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

CxtWithBracketPMap is a StringMap, so key is StringRef not std::string.

hoy added inline comments.Apr 2 2021, 10:03 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

Oh right, this is a problem. Good catch! How did you find it out?

wlei added inline comments.Apr 2 2021, 10:05 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

I did think this is a problematic but after I used "Ret.first->first()" to get the StringRef, it seems something extend the lifetime of the string, otherwise Writer->write(CxtWithBracketPMap); will never work. do you have any case to trigger this?

wenlei added inline comments.Apr 2 2021, 10:05 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

I was doing some refactoring and a seemingly NFC change tripped this bug. I will send up a fix together with the refactoring.

hoy added inline comments.Apr 2 2021, 10:08 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

The std::string object will be deallocated out once it goes out of the loop. But the underlying memory may not be overwritten depending on the heap usage in the system. It worked probably because no other memory allocation reclaims that piece of memory.

wlei added inline comments.Apr 2 2021, 10:19 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

Thanks for the explanation. If so, the global ProfileMap is also problematic, IIRC, it also used a local std::string used similar as here..

At that time, I tried to avoid to do memory copy for the string, so didn't copy to a global lifetime container.

hoy added inline comments.Apr 2 2021, 10:25 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
392

I think you are right. We are missing a global lifetime container. Probably ProfileMap should use std::string as keys.