Page MenuHomePhabricator

wenlei (Wenlei He)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 3 2019, 5:55 PM (94 w, 6 d)

Recent Activity

Today

wenlei accepted D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation.

lgtm, thanks.

Wed, Jan 27, 2:48 PM · Restricted Project
wenlei updated the diff for D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.

rebase

Wed, Jan 27, 12:55 PM · Restricted Project
wenlei updated the diff for D94001: [CSSPGO] Call site prioritized inlining for sample PGO.

rebase

Wed, Jan 27, 12:54 PM · Restricted Project
wenlei accepted D95547: [CSSPGO] Support of CS profiles in extended binary format..

lgtm, with one nit.

Wed, Jan 27, 12:50 PM · Restricted Project
wenlei accepted D95480: [CSSPGO][llvm-profgen] Fix bug with parsing hybrid sample trace line.
Wed, Jan 27, 12:08 PM · Restricted Project
wenlei added inline comments to D95547: [CSSPGO] Support of CS profiles in extended binary format..
Wed, Jan 27, 12:05 PM · Restricted Project

Yesterday

wenlei added a comment to D95480: [CSSPGO][llvm-profgen] Fix bug with parsing hybrid sample trace line.

and btw.. bug fix isn't really NFC

Tue, Jan 26, 11:29 PM · Restricted Project
wenlei added inline comments to D95480: [CSSPGO][llvm-profgen] Fix bug with parsing hybrid sample trace line.
Tue, Jan 26, 11:28 PM · Restricted Project
wenlei added a comment to D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation.

I think the aggregation is still worth keeping as it's probably still cheaper than trie with hashing. But we will know for sure through experiments.

With the latest, do you see similar speed up for probe profile and dwarf profile?

have implemented the global trie for the virtual unwinder part and done the evaluation over the LBR trie against baseline and global trie on probe profile. See the chart below, here only use gobmk and sjeng because other benchmarks's run time of virtual unwinder are very small(less than 1s).

The virtual unwinding time(s):

No-TrieLBR-TrieGlobal-Trie
gobmk8.313.64.11
sjeng46.8219.1524.91

Speed up:

LBR-Trie vs No-TrieGlobal-Trie vs No-TrieGlobal-Trie vs LBR-Trie
gobmk2.32.020.88
sjeng2.441.870.77

Sum-up:

  • LBR-Trie and Global-Trie can have about 2x speed-up over the baseline
  • Global-Trie have slight regression(about 10%) against LBR-Trie as we discussed this might be caused by hash overhead.
  • Didn't see the similar speed-up as our internal prototype, I guess it's because of the refinement we did in the previous patches like removing the redundant string concatenation and reversal, switching to use probe based key.

Beside the time, LBR-trie seems good for the readability but Global-Trie can be easy to detect shared context. Haven't tried to defer the context generation(should have more speed-up for Global-Trie) and detect shared context, will try it.

CC: @wmi @hoy

Tue, Jan 26, 11:25 PM · Restricted Project
wenlei added inline comments to D95480: [CSSPGO][llvm-profgen] Fix bug with parsing hybrid sample trace line.
Tue, Jan 26, 10:35 PM · Restricted Project
wenlei retitled D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline from [NFC] Factor out common part for CSSPGO inline and AFDO inline to [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.
Tue, Jan 26, 8:20 AM · Restricted Project

Sun, Jan 24

wenlei accepted D95271: [CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker..

lgtm, thanks.

Sun, Jan 24, 10:50 PM · Restricted Project

Sat, Jan 23

wenlei added inline comments to D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.
Sat, Jan 23, 8:33 AM · Restricted Project
wenlei updated the diff for D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.

Update comment based on Wei's feedback.

Sat, Jan 23, 8:32 AM · Restricted Project
wenlei added inline comments to D95271: [CSSPGO] Passing the clang driver switch -fpseudo-probe-for-profiling to the linker..
Sat, Jan 23, 8:07 AM · Restricted Project
wenlei accepted D94334: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks.

Looks great, thanks!

Sat, Jan 23, 7:44 AM · Restricted Project

Fri, Jan 22

wenlei accepted D95269: [SampleFDO] Report error when reading a bad/incompatible profile instead of turning off profile use silently..

LGTM. Thanks for tightening this up. though I think warning does give others more flexibility in how/when to respond to this - in the end, this is not a build break, but a performance issue.

Fri, Jan 22, 6:08 PM · Restricted Project

Thu, Jan 21

wenlei added inline comments to D92998: [CSSPGO][llvm-profgen] Pseudo probe based CS profile generation.
Thu, Jan 21, 11:42 PM · Restricted Project
wenlei added a comment to D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation.

With the latest, do you see similar speed up for probe profile and dwarf profile?

Thu, Jan 21, 11:25 PM · Restricted Project
wenlei added a comment to D94110: [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation.
In D94110#2489390, @wmi wrote:

This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it.

5x speedup shows it is a really impressive improvement. I am wondering whether there is callstack overlap between different LBR samples so you can have further grouping of call frames -- by reusing unwindState. You may also save some cost by reusing the frame trie. IIUC although samples have been aggregated based on callstack, each LBR sample may have multiple callstacks inferred from unwindCall/unwindReturn. If there are callstack overlap between different LBR samples, you may be able to further group them.

Thu, Jan 21, 11:24 PM · Restricted Project
wenlei accepted D95056: [CSSPGO] LTO option for pseudo probe.

lgtm with minor comment.

Thu, Jan 21, 10:33 PM · Restricted Project
wenlei added inline comments to D95056: [CSSPGO] LTO option for pseudo probe.
Thu, Jan 21, 10:19 PM · Restricted Project
wenlei accepted D95186: [Inlining] Delete redundant optnone/alwaysinline check.

Ah, you're right.. sorry for confusion..

Thu, Jan 21, 5:12 PM · Restricted Project
wenlei added a comment to D95186: [Inlining] Delete redundant optnone/alwaysinline check.

On the other hand, I expect this to be captured as incompatible attributes (caller without it, but callee has it), so removing the check here should be fine.

Thu, Jan 21, 5:01 PM · Restricted Project
wenlei added a comment to D95186: [Inlining] Delete redundant optnone/alwaysinline check.

The same check is done in InlineCost: https://github.com/llvm/llvm-project/blob/8b0bd54d0ec968df28ccc58bbb537a7b7c074ef2/llvm/lib/Analysis/InlineCost.cpp#L2537-L2552

Thu, Jan 21, 5:00 PM · Restricted Project
wenlei added inline comments to D94334: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks.
Thu, Jan 21, 4:24 PM · Restricted Project
wenlei accepted D93556: [CSSPGO][llvm-profgen] Compress recursive cycles in calling context.

lgtm, thanks!

Thu, Jan 21, 12:36 PM · Restricted Project
wenlei added inline comments to D93556: [CSSPGO][llvm-profgen] Compress recursive cycles in calling context.
Thu, Jan 21, 12:27 PM · Restricted Project

Wed, Jan 20

wenlei added inline comments to D93556: [CSSPGO][llvm-profgen] Compress recursive cycles in calling context.
Wed, Jan 20, 12:05 PM · Restricted Project
wenlei added inline comments to D93556: [CSSPGO][llvm-profgen] Compress recursive cycles in calling context.
Wed, Jan 20, 11:09 AM · Restricted Project
wenlei added inline comments to D94111: [CSSPGO][llvm-profgen] Merge and trim profile for cold context to reduce profile size.
Wed, Jan 20, 9:29 AM · Restricted Project
wenlei updated subscribers of D93747: Rename debug linkage name with -funique-internal-linkage-names.
In D93747#2470504, @hoy wrote:

Please remove the clang test change - if this is an LLVM change with LLVM test coverage, that's enough. (we don't generally test every LLVM change from both LLVM and Clang)

Sounds good.

I'd still be curious if you could look around to see whether there are other cases of function splitting/cloning/etc to see how they deal with updating the DISubprograms, to see if there's some prior art that should be used here.

To the best of my knowledge, existing code does not change the linkage name field of a DISubprogram once created. You can create a new DISubprogram record with any linkage name you want but bear in mind how the debugger will consume the record. Looks like we are the first one to change existing record which will confuse the debugger.

Sure enough - do any other transforms have similar requirements (of creating a record for something that looks like the same function but isn't quite) but take a different approach? Be good to know if other approaches were chosen/how/why they are applicable there but not here, etc. (conversely perhaps other passes worked around the issue in less than ideal ways and could benefit from using this new approach).

Looks like some places that could use this functionality aren't quite there yet -
The Attributor has an internalizing feature (added in 87a85f3d57f55f5652ec44f77816c7c9457545fa ) that produces invalid IR (ends up with two !dbg attachments of the same DISubprogram) if the function it's internalizing has a DISubprogram - but if it succeeded it'd have the same problem that the linkage name on the DISubprogram wouldn't match the actual symbol/function name. (@jdoerfert @bbn).
The IR Outliner (@AndrewLitteken ) seems to be only a few months old and appears to have no debug info handling - probably results in the same problem.
The Partial Inliner does clone a function into a new name & so would have an invalid DISubprogram, though it only uses the clone for inlining (this probably then goes on to produce the desired debug info, where the inlining appears to come from the original function)
ThinLTO does some function cloning within a module for aliases, but it then renames the clone to the aliasees name so I think the name works out to match again.

Wed, Jan 20, 9:02 AM · Restricted Project, Restricted Project
wenlei added inline comments to D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.
Wed, Jan 20, 12:12 AM · Restricted Project
wenlei added inline comments to D94001: [CSSPGO] Call site prioritized inlining for sample PGO.
Wed, Jan 20, 12:03 AM · Restricted Project

Tue, Jan 19

wenlei requested review of D95024: [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline.
Tue, Jan 19, 11:54 PM · Restricted Project
wenlei updated the diff for D94001: [CSSPGO] Call site prioritized inlining for sample PGO.

default CallsitePrioritizedInline to true only for CSSPGO

Tue, Jan 19, 11:46 PM · Restricted Project
wenlei updated the diff for D94001: [CSSPGO] Call site prioritized inlining for sample PGO.

Address Wei's feedback. Clean up ICPCallee from InlineCandidate now that we don't check InlineCost before ICP.

Tue, Jan 19, 6:01 PM · Restricted Project
wenlei added inline comments to D94001: [CSSPGO] Call site prioritized inlining for sample PGO.
Tue, Jan 19, 5:59 PM · Restricted Project
wenlei accepted D95009: [llvm-profgen][NFC] Fix the incorrect computation of callsite sample count.

lgtm, thanks!

Tue, Jan 19, 5:35 PM · Restricted Project
wenlei added inline comments to D94333: [Inliner] Change inline remark format and update ReplayInlineAdvisor to use it.
Tue, Jan 19, 12:59 PM · Restricted Project, Restricted Project

Mon, Jan 18

wenlei added inline comments to D94334: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks.
Mon, Jan 18, 11:34 PM · Restricted Project
wenlei accepted D94435: [SampleFDO] Add the support to split the function profiles with context into separate sections..
Mon, Jan 18, 11:09 PM · Restricted Project
wenlei added inline comments to D94435: [SampleFDO] Add the support to split the function profiles with context into separate sections..
Mon, Jan 18, 8:20 AM · Restricted Project

Thu, Jan 14

wenlei added inline comments to D94435: [SampleFDO] Add the support to split the function profiles with context into separate sections..
Thu, Jan 14, 11:01 PM · Restricted Project

Wed, Jan 13

wenlei accepted D94613: [NFC] Rename ThinLTOPhase to PhaseInAllLTO and move it from PassBuilder.h to Pass.h.

lgtm

Wed, Jan 13, 1:58 PM · Restricted Project
wenlei added a comment to D94613: [NFC] Rename ThinLTOPhase to PhaseInAllLTO and move it from PassBuilder.h to Pass.h.

Thanks for refactoring. We had internal change to add IsFullLTOPreLink alongside with IsThinLTOPreLink for a few places. And we always wanted clean it up too. :)

Wed, Jan 13, 10:12 AM · Restricted Project

Tue, Jan 12

wenlei added a comment to D94435: [SampleFDO] Add the support to split the function profiles with context into separate sections..

Thanks for the change.

Tue, Jan 12, 11:43 PM · Restricted Project
wenlei retitled D94001: [CSSPGO] Call site prioritized inlining for sample PGO from [CSSPGO] Call site prioritized BFS inlining for sample PGO to [CSSPGO] Call site prioritized inlining for sample PGO.
Tue, Jan 12, 11:01 PM · Restricted Project
wenlei updated the diff for D94001: [CSSPGO] Call site prioritized inlining for sample PGO.

retitle, rename inlineCandidate.

Tue, Jan 12, 11:01 PM · Restricted Project
wenlei added a comment to D94001: [CSSPGO] Call site prioritized inlining for sample PGO.
In D94001#2494594, @wmi wrote:

Sorry for the delay in review. It makes a lot of sense to have a priority based inliner for CSSPGO since its profile annotation doesn't rely on replaying the inlining. But I don't understand why we rely on BFS/DFS strategy to expose the hottest callsite for priority based inliner. In my mind, CSSPGO can know the hotness of each callsite with full context information. Can we just sort all the callsites based on CSSPGO profile then try to inline the hottest callsite under a specific context first in a top down manner? We may actually need to inline some parent callsites before we can inline the hottest callsite, but it is all driven by the target to inline the hottest callsite first. If we worry some callsite is too deep and we need to inline through a deep path before we can inline the specific callsite, we may add the depth into priority computation. What do you think?

Tue, Jan 12, 10:58 PM · Restricted Project
wenlei added inline comments to D94334: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks.
Tue, Jan 12, 9:07 PM · Restricted Project
wenlei added inline comments to D94333: [Inliner] Change inline remark format and update ReplayInlineAdvisor to use it.
Tue, Jan 12, 9:07 PM · Restricted Project, Restricted Project

Fri, Jan 8

wenlei added inline comments to D94333: [Inliner] Change inline remark format and update ReplayInlineAdvisor to use it.
Fri, Jan 8, 2:33 PM · Restricted Project, Restricted Project
wenlei added inline comments to D94334: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks.
Fri, Jan 8, 2:17 PM · Restricted Project

Tue, Jan 5

wenlei added a comment to D94001: [CSSPGO] Call site prioritized inlining for sample PGO.

A few high level questions:

  1. Can the size bloat problem be handled in llvm-profgen time ? Basically using hotness information to prune/merge profiles properly?
Tue, Jan 5, 12:49 PM · Restricted Project

Mon, Jan 4

wenlei added reviewers for D94001: [CSSPGO] Call site prioritized inlining for sample PGO: wmi, davidxl, hoy.
Mon, Jan 4, 10:24 AM · Restricted Project

Sun, Jan 3

wenlei updated the diff for D94001: [CSSPGO] Call site prioritized inlining for sample PGO.

Fix test and linter.

Sun, Jan 3, 11:50 PM · Restricted Project
wenlei updated the summary of D94001: [CSSPGO] Call site prioritized inlining for sample PGO.
Sun, Jan 3, 4:54 PM · Restricted Project
wenlei requested review of D94001: [CSSPGO] Call site prioritized inlining for sample PGO.
Sun, Jan 3, 4:45 PM · Restricted Project

Dec 16 2020

wenlei accepted D92584: [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter.

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

Good question, Yes, my initial thought is to decouple this by using separated LBRSample when it comes to AutoFDO, perhaps this's good for readability. I guess your suggestion is for the performance since we won't have virtual call. That's good. If so and we don't have other kinds of perf sample, we might don't need the base class PerfSample and can just name HybridSample to more general name(like PerfSample).

Dec 16 2020, 1:57 PM · Restricted Project
wenlei added a comment to D92584: [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter.

Thanks for the refactoring and clean up, looks great!

Dec 16 2020, 10:10 AM · Restricted Project

Dec 14 2020

wenlei added a comment to D93254: [NFC][SampleFDO] Preparation to support multiple sections with the same type in ExtBinary format..

To reduce the profile loading time during postlink phase when ThinLTO is enabled, we are going to split profile into two parts. One part contains profiles with inline instance and another part contains flattened profiles without inline instance in them.

Dec 14 2020, 9:08 PM · Restricted Project

Dec 7 2020

wenlei accepted D92816: [llvm-profgen][NFC] Fix test failure by making unwinder's output deterministic.

Thanks for quick fix!

Dec 7 2020, 10:15 PM · Restricted Project
wenlei accepted D92621: [SampleFDO] Store fixed length MD5 in NameTable instead of using ULEB128 if MD5 is used..
In D92621#2437658, @wmi wrote:

We find that after this change, the elapsed time to build a large application distributively is reduced by 5% and the accumulative cpu time used for building is also reduced by 5%.

This is great, thanks for the patch! Just to clarify, did you mean 5% of end to end ThinLTO+AutoFDO build? I'm curious what is the total profile reading time your saw in terms of %? and how different that % is between md5 profile vs non-md5 profile? Currently we mostly use non-md5 profile, assuming md5 profile is more about saving profile size.

Yes, it is 5% saving of end to end ThinLTO + AutoFDO build time.

Each module compiling has different % of profile reading time. The percentage is negligible for large source module but can be significant for small source module. I tried it on a very small file.

The build time using md5 profile was 0.55 second. 
The build time using non-md5 profile was 0.75 second. 
The build time using fixlenmd5 profile was 0.15 second. 
The build time not using any profile was 0.1 second.

Because for a large project, many source files are small or medium sizes so the building time for small files matters especially for build cpu resource. It also matters for end-to-end build time but usually not as significant. For the experiment I did (a 20M binary profile), the end-to-end build time saving was about the same as the aggregate build cpu resource saving (both were 5%), but from our experience using larger profile (like 300M profile), it will affect aggregate build cpu resource more than end-to-end build time.

Dec 7 2020, 8:54 PM · Restricted Project

Dec 6 2020

wenlei committed rG6b989a171073: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining (authored by wenlei).
[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
Dec 6 2020, 12:12 PM
wenlei closed D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Dec 6 2020, 12:12 PM · Restricted Project
wenlei updated the diff for D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

rebase

Dec 6 2020, 12:00 PM · Restricted Project
wenlei added a comment to D92706: [coroutine] should disable inline before calling coro split.

I believe it's not worth doing just to be able to use one constant string in one place.
If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill.

Dec 6 2020, 11:22 AM · Restricted Project
wenlei added a comment to D92621: [SampleFDO] Store fixed length MD5 in NameTable instead of using ULEB128 if MD5 is used..

We find that after this change, the elapsed time to build a large application distributively is reduced by 5% and the accumulative cpu time used for building is also reduced by 5%.

Dec 6 2020, 6:58 AM · Restricted Project

Dec 3 2020

wenlei accepted D89723: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.

This looks great. Thanks for working on this and making all the changes!

Dec 3 2020, 8:34 PM · Restricted Project

Dec 2 2020

wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Dec 2 2020, 6:53 PM · Restricted Project
wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Dec 2 2020, 1:38 PM · Restricted Project
wenlei added inline comments to D89723: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.
Dec 2 2020, 9:56 AM · Restricted Project
wenlei updated the diff for D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

rebase

Dec 2 2020, 8:53 AM · Restricted Project

Dec 1 2020

wenlei updated the diff for D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

Address Wei's feedback.

Dec 1 2020, 11:40 PM · Restricted Project
wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Dec 1 2020, 11:34 PM · Restricted Project

Nov 20 2020

wenlei added inline comments to D89723: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.
Nov 20 2020, 2:24 PM · Restricted Project
wenlei accepted D89715: [CSSPGO][llvm-profgen] Instruction symbolization.
Nov 20 2020, 10:59 AM · Restricted Project
wenlei added inline comments to D89723: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.
Nov 20 2020, 10:55 AM · Restricted Project
wenlei accepted D89712: [CSSPGO][llvm-profgen] Disassemble text sections.
Nov 20 2020, 10:39 AM · Restricted Project
wenlei added inline comments to D89712: [CSSPGO][llvm-profgen] Disassemble text sections.
Nov 20 2020, 10:19 AM · Restricted Project
wenlei accepted D89707: [CSSPGO][llvm-profgen] Parse mmap events from perf script.

LGTM.

Nov 20 2020, 9:58 AM · Restricted Project
wenlei added a comment to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

Thanks for reviewing and the feedbacks. I think I've addressed all of the current ones. Please take a look, thanks!

Nov 20 2020, 9:53 AM · Restricted Project

Nov 19 2020

wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 19 2020, 2:59 PM · Restricted Project

Nov 18 2020

wenlei updated subscribers of D61540: [PGO] Use sum of count values to fix func entry count and add a check to verify BFI counts.

We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs

Nov 18 2020, 5:48 PM · Restricted Project

Nov 13 2020

wenlei added a comment to D90507: [Driver] Add DWARF64 flag: -gdwarf64.

This change covers non-LTO cases. For LTO, I think we would need to pass it from driver to LTO. Something like this: tools::addLTOOptions -> lld -> lto::Config (Config->TargetOptions->MCTargetOptions) ->LTO Backend.

Nov 13 2020, 8:21 AM · Restricted Project

Nov 11 2020

wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 11 2020, 3:30 PM · Restricted Project
wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 11 2020, 12:58 PM · Restricted Project
wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 11 2020, 9:08 AM · Restricted Project

Nov 10 2020

wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 10 2020, 6:17 PM · Restricted Project
wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 10 2020, 1:10 PM · Restricted Project

Nov 9 2020

wenlei added a comment to D87011: [DebugInfo] Add the -dwarf64 switch to llc and other internal tools (4/19)..

At least in LLD, it's not quite as simple as being added after the user's code: if a library appears on the link line it will be included in the output order as soon as it is determined it is needed. Thus if you have have three modules 1.o, 2.o, and 3.o, with 3.o in an archive 3.a and 1.o requiring 3.o, you end up with an output order of 1.o 3.o 2.o if the input order was 1.o 3.a 2.o or 3.a 1.o 2.o or an output order of 1.o 2.o 3.o if the input order was 1.o 2.o 3.a. In fact, with use of the --undefined linker switch, you can even force 3.o to appear first.

I accept using --undefined or rearranging the command-line order is less than ideal, but I'm really not convinced LLD should have any place in parsing the DWARF to determine output order. Furthermore, it's not even a reliable solution - if the objects built with DWARF32 (potentially all of which might have come from libraries) are large enough, no amount of reordering will fix the behaviour. I think users who need DWARF64 in their libraries are just going to have to request DWARF64 versions of the libraries, if the --undefined and reordering command line options are insufficient.

I'd guess that for a large-scale project the recommendation to use -u would be unrealistic. We are talking about projects where debugging information in a single section can easily go beyond the 4GiB limit; it is impossible for the developer to adjust the command line manually.

By the way, from a semantic point of view, I don't think it matters if the DWARF is in a different order to the data it represents - I'm just concerned about the maintenance and performance burden of having to parse the DWARF to achieve this reordering.

There is no need to parse the debug info sections. Reading only the first 4 bytes of .debug_info is enough to assess the format (there might be input files with format intermixing, but we can ignore them in the sack of simplicity). And we do not need any automatic sorting if the size of an output section is less than 4GiB.

Exactly. Not to mention, I think for users that actually worry about 4Gig limit they have pretty complex build system that will need to be modified to get build order right. Probably doable, but looking at overall compilation pipeline, is it really the best approach? Within lld we don't have to parse entire debug section, just read few bytes in each CU to determine if it's 32 or 64 bit.
Yes theoretically it is possible that there are just so many third party libraries that they will over flow 4gig by themselves, but I think common case is they will be under 4 gigs.

FWIW, this is probably a big enough discussion to deserve it's own review, probably even it's own llvm-dev thread. My personal take would be: Unless there's a specific user who needs this, probably not worth building it. If you personally have a need or support users who need it, that swings the discussion a fair bit into "what's the best way we can help these users".

Nov 9 2020, 10:38 AM · debug-info, Restricted Project

Nov 5 2020

wenlei added inline comments to D89723: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.
Nov 5 2020, 3:06 PM · Restricted Project

Nov 4 2020

wenlei added inline comments to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
Nov 4 2020, 11:47 PM · Restricted Project

Nov 3 2020

wenlei added a comment to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

If CSSPGO has the priority based inliner, does it mean it will do most inliner work and CGSCC inliner will mostly be used as an iterative clean up pass?

Nov 3 2020, 10:03 AM · Restricted Project

Nov 2 2020

wenlei added a comment to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.
In D90125#2365948, @wmi wrote:

Another question. Have you ever evaluated the performance by comparing this patch "CSSPGO profile working with existing AFDO pipeline" with the default SPGO? I understand CSSPGO's benefit has not been used without the change of CGSCC inliner. The intention of the comparison is to understand how well existing AFDO pipeline work with CSSPGO profile. It may expose problem in existing SPGO profile or CSSPGO.

Nov 2 2020, 10:44 AM · Restricted Project
wenlei updated the diff for D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

Fix typo and linter, add getCanonicalFnName.

Nov 2 2020, 10:04 AM · Restricted Project
wenlei added a comment to D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

Thanks for quick review! I've update the patch, also see replies inline.

Nov 2 2020, 10:02 AM · Restricted Project
wenlei updated the diff for D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.

address feedback from Wei and David

Nov 2 2020, 8:38 AM · Restricted Project

Oct 30 2020

wenlei added inline comments to D89723: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.
Oct 30 2020, 4:42 PM · Restricted Project