Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

wenlei (Wenlei He)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 3 2019, 5:55 PM (234 w, 2 d)

Recent Activity

Mon, Sep 18

wenlei accepted D159322: LoopVectorize: Set branch_weight for conditional branches.

lgtm assuming comments for constants will be added. thanks.

Mon, Sep 18, 9:57 AM · Restricted Project, Restricted Project

Mon, Sep 11

wenlei accepted D158642: LoopUnrollRuntime: Add weights to all branches.

lgtm, thanks!

Mon, Sep 11, 1:58 PM · Restricted Project, Restricted Project
wenlei added inline comments to D158642: LoopUnrollRuntime: Add weights to all branches.
Mon, Sep 11, 1:24 PM · Restricted Project, Restricted Project

Tue, Sep 5

wenlei accepted D157518: Avoid running optimization passes in frontend test.

lgtm, thanks.

Tue, Sep 5, 11:35 AM · Restricted Project, Restricted Project
wenlei added a comment to D158680: RFC: Interpet {branch_weights 1, 0} as likely/unlikely.

Because as you point out later sample profile loaders etc. actually trying hard to avoid the ,0 case by adding +1 which feels like people maybe do not want it to be legal and then we could have re-used it :)

Tue, Sep 5, 11:25 AM · Restricted Project, Restricted Project
wenlei added a comment to D158668: RFC: Add getLikelyBranchWeight helper function.

And as another strawman / discussion-starter I put up D158680 where I use !{"branch_weights", i32 1, i32 0} to represent likely branches and the actual "LikelyWeight" mostly becomes an internal implementation detail of the BranchProbabilityInfo class... This seemed like a neat way to get an abstract "likely", "unlikely" notation, but not sure of the effects if we no longer would have "truly zero" weights because they would be interpreted differently now...

Tue, Sep 5, 10:07 AM · Restricted Project, Restricted Project, Restricted Project
wenlei added a comment to D158680: RFC: Interpet {branch_weights 1, 0} as likely/unlikely.

Meaning !{"branch_weights", i32 1, i32 0} being an abstract representation of a "likely" branch.

Tue, Sep 5, 9:51 AM · Restricted Project, Restricted Project
wenlei added inline comments to D157518: Avoid running optimization passes in frontend test.
Tue, Sep 5, 7:29 AM · Restricted Project, Restricted Project

Mon, Sep 4

wenlei accepted D157937: ProfDataUtils: Add extractFromBranchWeightMD function; NFC.

lgtm, thanks.

Mon, Sep 4, 2:44 PM · Restricted Project, Restricted Project
wenlei added inline comments to D158642: LoopUnrollRuntime: Add weights to all branches.
Mon, Sep 4, 2:35 PM · Restricted Project, Restricted Project
wenlei added a comment to D159322: LoopVectorize: Set branch_weight for conditional branches.

This is a good change. Thanks.

Mon, Sep 4, 2:00 PM · Restricted Project, Restricted Project

Fri, Sep 1

wenlei added a comment to D158889: [AsmPrinter][PGO] Adds optional dumping of branch probabilities for PGO metrics..

We're probably among the few people that will actually benefit from something like this, but honestly I'm still a bit unsure whether the use case is common enough to justify built-in support like this.

Micrea makes a good point about it being easy for a pass to corrupt PGO information the same way it can corrupt debug information. Having a tool to track or debug PGO info like branch_weights, even if something different than this patch, seems beneficial for upstream.

  1. Try to incorporate block counts/frequencies as well. Most of the researches on profile quality use a block overlap metric which relies on block counts rather than branch probabilities. Our internal version also uses block counts, as branch weights cannot represent the profile for branchless code.

I'll try to look into this.

  1. Instead of coupling this with a specific consumer, Pin tool in your case, and in the next patch, I suggest we build general support to decode such metadata section, so tools like llvm-objdump can be used to inspect its payload.

For the metrics, we need an execution branch trace for comparison, Pin tool we've found works for x86 and we have an tool for one of our internal targets. We hope to keep the tracing part minimally coupled to the compiler metadata.

Makes sense to add general support for extracting the section info. I can try to look into it.

Even if relatively few groups would build up on such information, arguably PGO (as a technique) is probably the most impactful tool we have for improving performance. Like I think we're all observing here, profiles aren't necessarily well maintained throughout passes - it's actually easy to write a pass that accidentally and silently drops it, or that corrupts it somehow. I think having primitives in llvm that can help anyone interested build validation tooling and detect and fix such bugs would end up helping the community way more than their cost.

Agreed that it is easy to corrupt. The team I'm with at MediaTek has found cases in LLVM passes and our backend where branch weights are mishandled or not updated which was a driving factor to developing metrics for it.

Maybe we can even, eventually, have a layer of defense doing this on a build bot with e.g. llvm-test-suite benchmarks. (I have a rfc for doing some even simpler validation transparently as part of opt/llc, would send it after the long weekend)

Internally we are planning to track some of our benchmarks using these metrics since we've found its helpful for this source of change/regression tracking for our PGO data.

Actually - @red1bluelost - sorry if this would creep up the scope of your work, but since we're talking design choices, would summarizing in a RFC be maybe a good step? Then we can discuss the motivation and design choices in the community, and since there is similar work (@wenlei's earlier remark), their experience will surely help!

That makes sense. I can see about getting stuff written up. It will take some time. I'll also be giving a talk at the developer meeting in October on how we've been using the metrics and where we think it can help others.

Fri, Sep 1, 1:19 PM · Restricted Project, Restricted Project

Thu, Aug 31

wenlei added a comment to D158889: [AsmPrinter][PGO] Adds optional dumping of branch probabilities for PGO metrics..

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Yeah, we internally serialize machine block execution counts to the binary. Would it be helpful to write those counts into some bb section similar to the llvm_bb_addr_map section?

This is interesting and similar but our metrics are only focusing on branch weights at the moment. We've focused on branch weights so far since they are a direct result of Profile Loading.

At least for the current state of the metrics here and consumed in D158890, there is not a use for block counts but it could be a future extension of the metrics.

Thu, Aug 31, 9:56 PM · Restricted Project, Restricted Project
wenlei accepted D159169: [CSSPGO] Refactoring findIRAnchors.

lgtm, thanks.

Thu, Aug 31, 4:07 PM · Restricted Project, Restricted Project
wenlei added a comment to D153486: [llvm-profdata] GUIDToFuncNameMap can be static.

We should consider having singleton instance and use LLVM's synchronization support to guard updating. Assuming updating the map is rare so the contention should have low runtime impact.

Essentially this is per-module map that needs to be accessed from FunctionSamples. For local (non-distributed) ThinLTO, each thin-backend thread will build part of the map (for symbols in that module) and access that part of the map, so write is not uncommon and contention could be common. I think TLS is the best option if it's available to use.

Is this map initialized once and used mostly readonly? if that is the case, serializing initialization from all threads (in the order of tens, not hundreds) is no different from one static initialization done globally. Anyway, the effect needs to be measured to verify.

Thu, Aug 31, 12:32 PM · Restricted Project, Restricted Project
wenlei added a comment to D159221: [RFC] Alterative to https://reviews.llvm.org/D159169.

Thanks for attempting the clean up. I think the original version is probably more readable. I left comments there. wdyt?

Thu, Aug 31, 12:18 PM · Restricted Project, Restricted Project
wenlei added inline comments to D159169: [CSSPGO] Refactoring findIRAnchors.
Thu, Aug 31, 12:15 PM · Restricted Project, Restricted Project
wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

This brings up to 8% speed up (31.4s vs 29.0s) when reading a large test profile, and 5% speedup (0.82s vs 0.78s) when reading the function offset table alone.

Thu, Aug 31, 9:55 AM · Restricted Project, Restricted Project

Aug 30 2023

wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.
  • 0.02% less instructions from the unordered_map version.
  • 1.2% less wall clock time from the unordered_map version.

We didn't measure the time spent within sample profile loader though, the above numbers are all e2e compilations.

Aug 30 2023, 10:35 AM · Restricted Project, Restricted Project
wenlei added inline comments to D158891: [CSSPGO] Retire FlattenProfileForMatching.
Aug 30 2023, 10:28 AM · Restricted Project, Restricted Project
wenlei added inline comments to D159169: [CSSPGO] Refactoring findIRAnchors.
Aug 30 2023, 10:12 AM · Restricted Project, Restricted Project

Aug 28 2023

wenlei accepted D159014: [llvm-profdata] Use std::unordered_map in SampleProfileMap.

lgtm with nits. thanks.

Aug 28 2023, 1:48 PM · Restricted Project, Restricted Project

Aug 25 2023

wenlei accepted D158900: [CSSPGO] Compute checksum mismatch recursively on nested profile.

lgtm, thanks.

Aug 25 2023, 7:28 PM · Restricted Project, Restricted Project
wenlei accepted D156725: [CSSPGO] Skip reporting staleness metrics for imported functions.

The stack with refactoring first makes the changes easier to review. Thanks!

Aug 25 2023, 3:39 PM · Restricted Project, Restricted Project
wenlei accepted D156722: [CSSPGO] Support stale profile matching for LTO.

lgtm, thanks.

Aug 25 2023, 3:36 PM · Restricted Project, Restricted Project
wenlei added inline comments to D158900: [CSSPGO] Compute checksum mismatch recursively on nested profile.
Aug 25 2023, 3:31 PM · Restricted Project, Restricted Project
wenlei accepted D158891: [CSSPGO] Retire FlattenProfileForMatching.

lgtm except nits, thanks.

Aug 25 2023, 2:24 PM · Restricted Project, Restricted Project
wenlei updated subscribers of D158889: [AsmPrinter][PGO] Adds optional dumping of branch probabilities for PGO metrics..

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Aug 25 2023, 2:20 PM · Restricted Project, Restricted Project
wenlei accepted D158817: [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction.

lgtm, thanks.

Aug 25 2023, 1:31 PM · Restricted Project, Restricted Project
wenlei added inline comments to D158817: [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction.
Aug 25 2023, 1:29 PM · Restricted Project, Restricted Project
wenlei added a comment to D158817: [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction.

looks cleaner and more structured now, thanks for update!

Aug 25 2023, 1:18 PM · Restricted Project, Restricted Project
wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

@wenlei, may I propose having this patch in first (to unblock) @ayermolo while we are discussing longer term strategy? Another benefit is that we can wait to see if there are other incidence (of using long lifetime reference) can show up so that we have more cases to consider the redesign (if needed).

Aug 25 2023, 10:38 AM · Restricted Project, Restricted Project
wenlei added inline comments to D158817: [CSSPGO] Refactoring SampleProfileMatcher::runOnFunction.
Aug 25 2023, 10:20 AM · Restricted Project, Restricted Project
wenlei added inline comments to D156722: [CSSPGO] Support stale profile matching for LTO.
Aug 25 2023, 9:35 AM · Restricted Project, Restricted Project
wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

That said, even if we still want to attempt the change to use DenseMap, I suggest we try that in a separate patch, so the majority of the original changes can get in and be stabilized first, and others can be unblocked now without needing to revert the whole patch. Evidently attempts to remove reference stability is non-trivial and also needs more testing, which warrants a change of its own.

Aug 25 2023, 8:31 AM · Restricted Project, Restricted Project
wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

I have slightly different opinion on this -- I am on the camp for better performance over a small loss of convenience.

This is no different from iterator stability (lack of) for vector type when the loop adds new elements.

Aug 25 2023, 8:15 AM · Restricted Project, Restricted Project

Aug 24 2023

wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

What is the root cause of unordered_map being slower?

I think it is just the implementation in general, their algorithms are quite different.

Aug 24 2023, 9:23 PM · Restricted Project, Restricted Project
wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

DenseMap is significantly faster than std::unordered_map as previously used.

Benchmarking with an internal profile for production, when reading the function offset table alone, the original implementation (before D147740), DenseMap, and unordered_map takes 1.2, 0.78, 0.82 s respectively, and reading the full profiles takes 34.6, 29.0, 31.4 s respectively. That means using a std::unordered_map is 5-8% slower. (Meanwhile the refactoring itself has been very significant regardless of which container to use)

Aug 24 2023, 4:40 PM · Restricted Project, Restricted Project
wenlei added a comment to D156725: [CSSPGO] Skip reporting staleness metrics for imported functions.

For the current sample loader, if the root or parent function profile are lost or mismatched, all its profile and its children profile are dropped, we should use the non-flattened profile(the original nested profile) to count the total mismatched samples.

Aug 24 2023, 12:02 PM · Restricted Project, Restricted Project
wenlei added inline comments to D156722: [CSSPGO] Support stale profile matching for LTO.
Aug 24 2023, 10:59 AM · Restricted Project, Restricted Project
wenlei added inline comments to D156722: [CSSPGO] Support stale profile matching for LTO.
Aug 24 2023, 10:44 AM · Restricted Project, Restricted Project
wenlei added a comment to D156715: [CSSPGO] Fix issues with post-link function checksum check.

How does perf look like after disabling annotating imported functions?

So far from what I tested, there is no perf change if only disabling annotating imported functions. Note that this also disables the profile matching for the imported functions, so it could cause perf issue, depending on if the imported functions is stale.

Discussed offline. This effect of this change is potentially two-fold :

  1. positive impact of disabling unnecessary matching for non-stale profiles
  2. negative impact of disabling matching for to-be-inlined imported functions with a stale profile.

Currently they are not tested separately. With a solution of #1, allowing matching for #2 might be a win. Suggesting to hold on this change until we have a good solution for #1.

Thoughts?

Yeah, as we discussed offline, sounds good to hold on this, will work on a proper solution for #1 or #2 (alternatively we can pass the metadata or a mismatch flag for the imported function)

Aug 24 2023, 9:51 AM · Restricted Project, Restricted Project

Aug 23 2023

wenlei accepted D157462: LoopRotate: Add code to update branch weights.

I agree that hard coded probability is not great. But I also don't think there is a one-size-fit-all answer here. EH path can be unlikely, z-trip loop can be unlikely, [[unlikely]] hint can be unlikely, but they don't necessarily share the same probability.

Aug 23 2023, 10:27 PM · Restricted Project, Restricted Project
wenlei requested changes to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.
Aug 23 2023, 9:57 PM · Restricted Project, Restricted Project
wenlei added a comment to D158689: [llvm-profdata] Use llvm::DenseMap in SampleProfileMap.

Fix a dangling reference bug in ProfileConverter::flattenNestedProfile . This is an existing bug only revealed after D147740 changes the profile map container type.

Aug 23 2023, 9:57 PM · Restricted Project, Restricted Project

Aug 22 2023

wenlei added inline comments to D157462: LoopRotate: Add code to update branch weights.
Aug 22 2023, 3:56 PM · Restricted Project, Restricted Project

Aug 16 2023

wenlei accepted D157061: [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples.

thanks for the fix and following up with the comments, lgtm now.

Aug 16 2023, 1:16 PM · Restricted Project, Restricted Project

Aug 15 2023

wenlei added inline comments to D107534: Add a pass to garbage-collect empty basic blocks after code generation..
Aug 15 2023, 8:59 PM · Restricted Project, Restricted Project
wenlei added inline comments to D157061: [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples.
Aug 15 2023, 8:52 PM · Restricted Project, Restricted Project
wenlei added inline comments to D157061: [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples.
Aug 15 2023, 5:20 PM · Restricted Project, Restricted Project
wenlei added inline comments to D107534: Add a pass to garbage-collect empty basic blocks after code generation..
Aug 15 2023, 4:28 PM · Restricted Project, Restricted Project
wenlei added inline comments to D157462: LoopRotate: Add code to update branch weights.
Aug 15 2023, 12:07 PM · Restricted Project, Restricted Project
wenlei added a comment to D157462: LoopRotate: Add code to update branch weights.

IIRC, mcf in spec has a function sort_basket which is known to be problematic for maintaining loop rotate counts, if you want to play with it a bit, especially the guessing part of the heuristic.

Aug 15 2023, 12:53 AM · Restricted Project, Restricted Project
wenlei added a comment to D157462: LoopRotate: Add code to update branch weights.

I think the heuristics are better than doing nothing in theory. Though one other thing to note is that for AutoFDO, if the profiled build also has the loop rotated, the incoming counts won't be right to begin with -- if the two branches after rotate are executed A times and B times, the original branch should be executed A+B times, but AutoFDO will annotate it with max(A,B) times for the original branch.

Aug 15 2023, 12:50 AM · Restricted Project, Restricted Project

Aug 14 2023

wenlei added inline comments to D157061: [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples.
Aug 14 2023, 3:39 PM · Restricted Project, Restricted Project

Aug 8 2023

wenlei added inline comments to D157061: [SampleProfile] Potential use after move in SampleProfileLoader::promoteMergeNotInlinedContextSamples.
Aug 8 2023, 4:21 PM · Restricted Project, Restricted Project

Jul 17 2023

wenlei updated subscribers of D155257: [llvm-profdata] Changed SampleProfWriter to take a range of of NameFunctionSamples.

cc @hoy @wlei

Jul 17 2023, 7:48 AM · Restricted Project, Restricted Project

Jul 14 2023

wenlei accepted D155252: [PseudoProbe] Remove unnecessary asserts about non-zero discriminator..

In general if the effort to make sure probe has zero discriminator is too much, it makes sense to just rely on discriminator clean up instead during FS discriminator assignment for probes. I'm curious though what exposed this case? IIRC, we didn't hit this assert for a while now.

Yeah, it was good for a while until incrPGO was on. I suspect incrPGO opened more opportunity for optimizations and exposed this issue.

Jul 14 2023, 7:53 PM · Restricted Project, Restricted Project
wenlei accepted D155253: [CodeGen] Separate MachineFunctionSplitter logic for different profile types.

lgtm, thanks.

Jul 14 2023, 3:08 PM · Restricted Project, Restricted Project

Jul 13 2023

wenlei added a comment to D155252: [PseudoProbe] Remove unnecessary asserts about non-zero discriminator..

In general if the effort to make sure probe has zero discriminator is too much, it makes sense to just rely on discriminator clean up instead during FS discriminator assignment for probes. I'm curious though what exposed this case? IIRC, we didn't hit this assert for a while now.

Jul 13 2023, 7:14 PM · Restricted Project, Restricted Project
wenlei added a comment to D147740: [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map.

@huangjd any follow up on simplifying the implantation based on the assumption that collision is non-issue? Just want to make sure the comments on D153692 don't fall through the cracks.

Jul 13 2023, 6:53 PM · Restricted Project, Restricted Project
wenlei added inline comments to D155253: [CodeGen] Separate MachineFunctionSplitter logic for different profile types.
Jul 13 2023, 6:40 PM · Restricted Project, Restricted Project
wenlei added a comment to D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. .

It seems that we now need to specify "-mllvm -mfs-psi-cutoff=0" to enable the MFS for FSAFDO, if we want to get the same optimization behavior. This is not good for usage adoption.
I think if the users want to enable MFS in FSAFDO, they should only need to add -fsplit-machine-functions. They should not need to use ANY extra internal options.

Jul 13 2023, 6:39 PM · Restricted Project, Restricted Project

Jul 10 2023

wenlei accepted D152577: Part 2 of Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..

lgtm, thanks.

Jul 10 2023, 5:21 PM · Restricted Project, Restricted Project
wenlei accepted D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. .

lgtm, thanks.

Jul 10 2023, 10:14 AM · Restricted Project, Restricted Project

Jul 7 2023

wenlei accepted D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.

nit: update the change description too based on the new comment (describing it using push/pop isn't very intuitive). otherwise lgtm.

Jul 7 2023, 2:15 PM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 11:54 AM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 11:20 AM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 11:07 AM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 11:01 AM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 10:41 AM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 10:34 AM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 7 2023, 9:17 AM · Restricted Project, Restricted Project

Jul 6 2023

wenlei added inline comments to D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. .
Jul 6 2023, 8:00 PM · Restricted Project, Restricted Project
wenlei added inline comments to D154637: [SamplePGO] Fix ICE that callee samples returns null while finding import functions.
Jul 6 2023, 7:37 PM · Restricted Project, Restricted Project

Jul 5 2023

wenlei added inline comments to D152577: Part 2 of Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..
Jul 5 2023, 2:38 PM · Restricted Project, Restricted Project
wenlei added inline comments to D152577: Part 2 of Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..
Jul 5 2023, 2:36 PM · Restricted Project, Restricted Project
wenlei added inline comments to D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. .
Jul 5 2023, 2:27 PM · Restricted Project, Restricted Project

Jun 28 2023

wenlei added inline comments to D154027: [CSSPGO] Enable stale profile matching by default for CSSPGO.
Jun 28 2023, 5:50 PM · Restricted Project, Restricted Project
wenlei accepted D154027: [CSSPGO] Enable stale profile matching by default for CSSPGO.

lgtm, thanks!

Jun 28 2023, 5:26 PM · Restricted Project, Restricted Project
wenlei committed rGf97eb1dd8f08: [NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples (authored by wenlei).
[NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples
Jun 28 2023, 1:44 PM · Restricted Project, Restricted Project
wenlei closed D153995: [NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples.
Jun 28 2023, 1:44 PM · Restricted Project, Restricted Project
wenlei added inline comments to D153995: [NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples.
Jun 28 2023, 1:42 PM · Restricted Project, Restricted Project
wenlei requested review of D153995: [NFC][Sample PGO] Avoid non-const accessor for CallsiteSamples.
Jun 28 2023, 12:03 PM · Restricted Project, Restricted Project
wenlei added a comment to D153927: Resubmit: [Analysis] Refactor MBB hotness/coldness into templated PSI functions.

Apologies but I am going to revert this patch. There is a layering violation (https://llvm.org/docs/CodingStandards.html#library-layering). LLVMAnalysis cannot depend on LLVMCodeGen. This is breaking our Bazel build.

llvm/include/llvm/Analysis/ProfileSummaryInfo.h:19:10: fatal error: 'llvm/CodeGen/MachineFunction.h' file not found
   19 | #include "llvm/CodeGen/MachineFunction.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jun 28 2023, 9:44 AM · Restricted Project, Restricted Project

Jun 27 2023

wenlei accepted D153820: [CSSPGO][Preinliner] Always inline zero-sized functions..

lgtm, thanks.

Jun 27 2023, 4:54 PM · Restricted Project, Restricted Project
wenlei accepted D153927: Resubmit: [Analysis] Refactor MBB hotness/coldness into templated PSI functions.

lgtm, thanks.

Jun 27 2023, 4:03 PM · Restricted Project, Restricted Project
wenlei added a comment to D153820: [CSSPGO][Preinliner] Always inline zero-sized functions..

Can this be achieved by simply tweaking ProfiledCandidateComparer so zero SizeCost candidate always comes before non-zero ones?

Jun 27 2023, 1:12 PM · Restricted Project, Restricted Project

Jun 26 2023

wenlei added a comment to D153692: [llvm-profdata] Remove MD5 collision check in D147740.

I don't think there's much we can do still, even if there's collision report. Maybe we can revert back to use full name instead of MD5, if that turns out to be a problem. But as you mentioned we already support MD5 in profile generation.

If a collision is rare while there is a strong demand for correctness by users, we can add some ad-hoc logic to deal with that, which is still faster than using a multimap, and much faster than using the full function name. If collision happens frequently, then we have no choice but to revert back to use full name, but this is extremely unlikely. I couldn't find any research on partial MD5 (we only use 64-bit of it) collision using ASCII strings, so that's why I am not ruling out this out of caution.

Jun 26 2023, 5:09 PM · Restricted Project, Restricted Project
wenlei added inline comments to D153797: [CSSPGO][Preinliner] Bump up the threshold to favor previous compiler inline decision..
Jun 26 2023, 4:55 PM · Restricted Project, Restricted Project
wenlei added inline comments to D153797: [CSSPGO][Preinliner] Bump up the threshold to favor previous compiler inline decision..
Jun 26 2023, 4:54 PM · Restricted Project, Restricted Project
wenlei accepted D153797: [CSSPGO][Preinliner] Bump up the threshold to favor previous compiler inline decision..

lgtm, thanks.

Jun 26 2023, 4:53 PM · Restricted Project, Restricted Project
wenlei added inline comments to D153797: [CSSPGO][Preinliner] Bump up the threshold to favor previous compiler inline decision..
Jun 26 2023, 12:01 PM · Restricted Project, Restricted Project
wenlei added a comment to D153692: [llvm-profdata] Remove MD5 collision check in D147740.

Copying comments from the original patch for continuation..

In D147740#4443233, @huangjd wrote:
Actually do we really care about MD5 collision? ExtBinary format already ignored MD5 collision for regular string names (and therefore regular function profiles), as only one of two functions with colliding MD5 get written to the name table (and the other is therefore lost). If we are using CS profiles, since different CS profiles have different serializations, their > hashes are distributed as expected. The most important thing is that, even if we detect a hash collision, we can't do anything about it except logging it (using a multi-map makes the reader much slower), so I think the MD5 collision check should be marked as LLVM_DEBUG. This does reduce 0.5 second out of ~30 seconds (1.67%) over the 1 GB profile read .

I don't think we care. Is the new type HashKeyMap and SampleProfileMap all for detecting and reporting collision? I'd avoid all that complexity and prefer a simple DenseMap + a SampleContext->hash_code converter and not even bother with debug prints for collision...

I will add a separate patch to remove the hash collision check. It would be noteworthy if users reports collision in actual use cases, and if that doesn't happen in a while the hash collision check can be removed. Even the probability of collision is very small for one single program but given that LLVM is used everywhere so we can't be so sure.

Jun 26 2023, 11:07 AM · Restricted Project, Restricted Project
wenlei committed rG1a0d23efe157: [NFC] Generalize llvm-profgen message to cover both AutoFDO and CSSPGO (authored by wenlei).
[NFC] Generalize llvm-profgen message to cover both AutoFDO and CSSPGO
Jun 26 2023, 9:49 AM · Restricted Project, Restricted Project
wenlei closed D153730: [NFC] Generalize llvm-profgen message to cover both AutoFDO and CSSPGO.
Jun 26 2023, 9:48 AM · Restricted Project, Restricted Project

Jun 25 2023

wenlei requested review of D153730: [NFC] Generalize llvm-profgen message to cover both AutoFDO and CSSPGO.
Jun 25 2023, 4:40 PM · Restricted Project, Restricted Project
wenlei added a comment to D153692: [llvm-profdata] Remove MD5 collision check in D147740.

Copying comments from the original patch for continuation..

Jun 25 2023, 4:34 PM · Restricted Project, Restricted Project
wenlei added inline comments to D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. .
Jun 25 2023, 4:32 PM · Restricted Project, Restricted Project
wenlei accepted D152758: [NFC] Refactor MBB hotness/coldness into templated PSI functions.

Thanks for the refactoring and cleanup. LGTM.

Jun 25 2023, 4:27 PM · Restricted Project, Restricted Project