eraman (Easwaran Raman)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 11 2015, 10:49 AM (132 w, 2 d)

Recent Activity

Tue, Sep 19

eraman added inline comments to D37198: [InlineCost] add visitSelectInst().
Tue, Sep 19, 5:48 PM

Wed, Sep 13

eraman added inline comments to D37198: [InlineCost] add visitSelectInst().
Wed, Sep 13, 3:51 PM
eraman accepted D37779: Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader..

LGTM

Wed, Sep 13, 2:21 PM
eraman committed rL313185: [Inliner] Add another way to compute full inline cost..
[Inliner] Add another way to compute full inline cost.
Wed, Sep 13, 1:17 PM
eraman closed D37819: [Inliner] Add another way to compute full inline cost. by committing rL313185: [Inliner] Add another way to compute full inline cost..
Wed, Sep 13, 1:17 PM
eraman created D37819: [Inliner] Add another way to compute full inline cost..
Wed, Sep 13, 12:01 PM
eraman added a comment to D37779: Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader..

Discussed this offline with Dehao. Instead of using an arbitrary threshold of 100000, it is better to piggyback on the the option in InlineCost to compute cost analysis in full. That way, if the analysis returns NeverInlineCost, Sample loader shouldn't inline. Anything else (doesn't matter if the cost >= threshold), it can do the inlining. I will refactor InlineCost.cpp to enable the feature to be used without requiring an option and Dehao will base this patch on top of that.

Wed, Sep 13, 11:07 AM

Tue, Sep 12

eraman accepted D37779: Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader..

LGTM

Tue, Sep 12, 5:36 PM
eraman added inline comments to D37779: Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader..
Tue, Sep 12, 4:42 PM
eraman accepted D37773: Refactor the code to pass down ACT to SampleProfileLoader correctly..

LGTM

Tue, Sep 12, 2:56 PM
eraman added inline comments to D37773: Refactor the code to pass down ACT to SampleProfileLoader correctly..
Tue, Sep 12, 2:43 PM

Fri, Aug 25

eraman added a comment to D36976: [Inliner] Tweak the condition for determining cold callsites..

Thinking about this a little more, this will end up treating non-cold callsites as cold. As an exmple, consider bar gets inlined to foo. Let entry_freq(foo) = 8, entry_freq(bar) = 8, callsite_freq(foo->bar) = 7. Now, any block in bar with freq == 1 will end up with a frequency of 0 after cloning and will be treated as cold even it's frequency relative to foo's entry is 1/7 which is not cold for the default value of cold-callsite-rel-freq (1/50). Rouding the freuenies while scaling will mitigate this a little bit, but still not sufficient. Need to think more.

I think you'll need to saturate at 1, and if necessary to preserve precision scale the caller up.

Scaling the caller up pretty much defeats the purpose of incremental BFI updates and we no longer have the property that cost of a single inlining is constant.

Fri, Aug 25, 2:30 PM

Aug 23 2017

eraman added a comment to D36976: [Inliner] Tweak the condition for determining cold callsites..

Thinking about this a little more, this will end up treating non-cold callsites as cold. As an exmple, consider bar gets inlined to foo. Let entry_freq(foo) = 8, entry_freq(bar) = 8, callsite_freq(foo->bar) = 7. Now, any block in bar with freq == 1 will end up with a frequency of 0 after cloning and will be treated as cold even it's frequency relative to foo's entry is 1/7 which is not cold for the default value of cold-callsite-rel-freq (1/50). Rouding the freuenies while scaling will mitigate this a little bit, but still not sufficient. Need to think more.

I think you'll need to saturate at 1, and if necessary to preserve precision scale the caller up.

Aug 23 2017, 12:18 PM
eraman added a comment to D36976: [Inliner] Tweak the condition for determining cold callsites..

Thinking about this a little more, this will end up treating non-cold callsites as cold. As an exmple, consider bar gets inlined to foo. Let entry_freq(foo) = 8, entry_freq(bar) = 8, callsite_freq(foo->bar) = 7. Now, any block in bar with freq == 1 will end up with a frequency of 0 after cloning and will be treated as cold even it's frequency relative to foo's entry is 1/7 which is not cold for the default value of cold-callsite-rel-freq (1/50). Rouding the freuenies while scaling will mitigate this a little bit, but still not sufficient. Need to think more.

Aug 23 2017, 11:49 AM

Aug 22 2017

eraman added a comment to D33946: [InlineCost] Find identical loads in the callee.

LGTM with the comments addressed, but please check with Chandler if he has more to add.

Aug 22 2017, 2:25 PM

Aug 21 2017

eraman created D36976: [Inliner] Tweak the condition for determining cold callsites..
Aug 21 2017, 12:21 PM

Aug 17 2017

eraman added a comment to D35850: [InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes..

Change the output of optimization remarks to something like below if anything like recursive call, dynamic alloca is found

foz is recursive and allocates too much stack space. Cost is not fully computed
foz not inlined into bar because it should never be inlined (cost=never)

Is this okay? Thank you.

Aug 17 2017, 3:58 PM

Aug 14 2017

eraman added a comment to D36722: [InlineCost] Simplify the cold attribute handling in inline-cost..

LGTM. Not handling the cold attribute was an oversight and when the callee is marked cold, it makes sense to treat all its callsites as cold (therefore reasonable to handle in isColdCallsite). I also agree with collapsing the two similar params with the same value until we see a need to have different default values for them.

Aug 14 2017, 7:19 PM
eraman added a comment to D36710: [InlineCost] NFC - refactor the checks for different analyses to be a bit more localized to the code that uses those analyses..

This is not strictly NFC since the entire hot/cold threshold update code used to be guarded by if (PSI) even though PSI is not needed for local hot/cold callsite updates, but practically all the callers within LLVM does pass a non-null PSI.

Aug 14 2017, 2:17 PM

Aug 9 2017

eraman added a comment to D35850: [InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes..

analyzeBlock bails out when inlining is not feasible due to things like recursive call, dynamic alloca etc. Arguably it doesn't make sense to compute the full inline cost even with this new option (and that's the behavior with this patch). In that case, especially with the ORE hookup, there should be some indication that the cost is not fully computed because the callee can not get inlined even if cost < threshold.

Aug 9 2017, 11:30 AM

Aug 4 2017

eraman committed rL310073: [Inliner] Fix a typo in option description. NFC..
[Inliner] Fix a typo in option description. NFC.
Aug 4 2017, 10:16 AM

Aug 3 2017

eraman committed rL309994: [Inliner] Increase threshold for hot callsites without PGO..
[Inliner] Increase threshold for hot callsites without PGO.
Aug 3 2017, 3:24 PM
eraman closed D36199: [Inliner] Increase threshold for hot callsites without PGO. by committing rL309994: [Inliner] Increase threshold for hot callsites without PGO..
Aug 3 2017, 3:24 PM
eraman added inline comments to D36288: Use profile summary to disable peeling for huge working sets.
Aug 3 2017, 2:50 PM
eraman updated the diff for D36199: [Inliner] Increase threshold for hot callsites without PGO..
  • Apply this heuristic only at O3 and disable for optsize
Aug 3 2017, 2:07 PM

Aug 2 2017

eraman added a comment to D36199: [Inliner] Increase threshold for hot callsites without PGO..

While the total size increase doesn't concern me much, the >10% code size growth in some benchmarks is a bit concerning. Do these benchmarks also improve performance? If so, then this might be fine in general (once the -Os behavior is fixed, see below). However, if the benchmarks that are growing in size by a lot aren't also getting faster, then it seems a hard sell at O2 where we expect size increase to be at least generally associated with performance wins.

Naturally, this seems completely fine at O3 either way.

Aug 2 2017, 3:39 PM
eraman accepted D36025: Do not want to use BFI to get profile count for sample pgo.

LGTM based on offline discussion with Dehao on why this could happen. Please add comments to the effect that if any callsite has a non-zero count, it gets a metadata and conversely any callsite without metadata has no count.

Aug 2 2017, 2:08 PM
eraman added a comment to D36199: [Inliner] Increase threshold for hot callsites without PGO..

Hi Easwaran,

What if the callee of a hot callsite also has a inline hint?

(FWIW, my 2 cents here would be that this should win over inline hint, as it should be quite a bit stronger. Anyways, will let Easwaran actually respond as well as I'm curious what he thinks.)

Aug 2 2017, 11:58 AM
eraman added a comment to D36025: Do not want to use BFI to get profile count for sample pgo.

I still feel this behavior is confusing, but I'll let David and Teresa to weigh in. Do you have any cases in real code where a block containing a call with no metadata has a high profile count? Is this masking a bug in some other code (BFI or sample profile loader)?

Aug 2 2017, 10:29 AM

Aug 1 2017

eraman created D36199: [Inliner] Increase threshold for hot callsites without PGO..
Aug 1 2017, 4:17 PM

Jul 28 2017

eraman added a comment to D36025: Do not want to use BFI to get profile count for sample pgo.

If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.

We only want to do this on callsites, as we still rely on BFI to get other block frequencies.

Jul 28 2017, 4:36 PM
eraman added a comment to D36025: Do not want to use BFI to get profile count for sample pgo.

If the profile count based on BFI is completely reliable that you are discarding it, then shouldn't BFI->getProfileCount also return 0 for sampled profiling? I think this can end up in surprising behavior.

Jul 28 2017, 4:09 PM
eraman committed rL309441: [Inliner] Do not apply any bonus for cold callsites..
[Inliner] Do not apply any bonus for cold callsites.
Jul 28 2017, 2:48 PM
eraman closed D35823: [Inliner] Do not apply any bonus for cold callsites. by committing rL309441: [Inliner] Do not apply any bonus for cold callsites..
Jul 28 2017, 2:48 PM
eraman updated the diff for D35823: [Inliner] Do not apply any bonus for cold callsites..

Replace the two vector bonus variables by one.

Jul 28 2017, 1:52 PM

Jul 25 2017

eraman updated the diff for D35823: [Inliner] Do not apply any bonus for cold callsites..

Address Chandler's comments.

Jul 25 2017, 2:13 PM
eraman added a comment to D35823: [Inliner] Do not apply any bonus for cold callsites..
Jul 25 2017, 2:11 PM

Jul 24 2017

eraman created D35823: [Inliner] Do not apply any bonus for cold callsites..
Jul 24 2017, 4:39 PM

Jul 9 2017

eraman added inline comments to D33946: [InlineCost] Find identical loads in the callee.
Jul 9 2017, 6:13 AM

Jun 30 2017

eraman created D34919: [Inliner] Optimize shouldBeDeferred.
Jun 30 2017, 5:39 PM

Jun 28 2017

eraman committed rL306542: Create inliner params based on size and opt levels..
Create inliner params based on size and opt levels.
Jun 28 2017, 6:34 AM
eraman closed D34309: [NewPM/Inliner] Customize threshold based on optlevel and sizelevel by committing rL306542: Create inliner params based on size and opt levels..
Jun 28 2017, 6:34 AM
eraman updated the diff for D34309: [NewPM/Inliner] Customize threshold based on optlevel and sizelevel.

Address Chandler's comments.

Jun 28 2017, 6:30 AM

Jun 27 2017

eraman updated the diff for D34309: [NewPM/Inliner] Customize threshold based on optlevel and sizelevel.

Call getInlineParams in PassBuilder and pass the Params object to the existing Inliner constructor.

Jun 27 2017, 6:15 PM
eraman committed rL306484: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.
[NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case
Jun 27 2017, 4:11 PM
eraman closed D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case by committing rL306484: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.
Jun 27 2017, 4:11 PM
eraman added a comment to D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.

Also LGTM generally. Feel free to submit with updated (or current) threshold based on what makes the most sense to you. Also a minor comment below, lemme know if you want another look at the code if caching that ends up useful but weird/complex.

Jun 27 2017, 4:05 PM
eraman updated the diff for D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.

Increase the threshold to 2, fix a bug and add comments about caching the caller entry threshold.

Jun 27 2017, 4:04 PM
eraman added a comment to D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.

I used BranchProbability as you suggested but due to BranchProbability's implementation I need to adjust the test case (essentitally BranchProbability(1,100) is < 1/100). This is not a big deal, but it does not treat a callsite guarded by a builtin_expect inside a loop as cold anymore.

Jun 27 2017, 9:29 AM
eraman updated the diff for D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.

Address Chandler's comments.

Jun 27 2017, 9:21 AM

Jun 26 2017

eraman abandoned D34471: [Inliner] Boost inlining of an indirect call to always_inline function..
Jun 26 2017, 11:01 AM
eraman added a comment to D34309: [NewPM/Inliner] Customize threshold based on optlevel and sizelevel.

Ping.

Jun 26 2017, 9:33 AM

Jun 22 2017

eraman added a comment to D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.

Ping.

Jun 22 2017, 10:13 PM
eraman added a comment to D34471: [Inliner] Boost inlining of an indirect call to always_inline function..

This seems likely to lead to surprising behavior. LastCallToStaticBonus is very large, so we'll end up duplicating a lot of code in some cases, which could be surprising. On the other hand, it isn't infinitely large, so you can't depend on always_inline to trigger the behavior reliably.

What's the motivation for this change?

Jun 22 2017, 1:42 PM
eraman added inline comments to D34471: [Inliner] Boost inlining of an indirect call to always_inline function..
Jun 22 2017, 10:15 AM

Jun 21 2017

eraman created D34471: [Inliner] Boost inlining of an indirect call to always_inline function..
Jun 21 2017, 12:11 PM

Jun 17 2017

eraman updated the diff for D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.

Address David's comments.

Jun 17 2017, 12:52 PM
eraman abandoned D34319: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.
Jun 17 2017, 12:49 PM
eraman created D34319: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.
Jun 17 2017, 12:49 PM

Jun 16 2017

eraman created D34312: [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case.
Jun 16 2017, 6:09 PM
eraman created D34309: [NewPM/Inliner] Customize threshold based on optlevel and sizelevel.
Jun 16 2017, 5:42 PM

Jun 6 2017

eraman added a comment to D33850: Inlining: Don't re-map simplified cloned instructions..

From the summary:

Jun 6 2017, 3:52 PM

May 22 2017

eraman accepted D33389: Fix update VP metadata after inlining for instrumentation PGO.

LGTM

May 22 2017, 1:21 PM

May 16 2017

eraman committed rL303210: [Inliner] Do not mix callsite and callee hotness based updates..
[Inliner] Do not mix callsite and callee hotness based updates.
May 16 2017, 2:31 PM
eraman closed D33157: [Inliner] Do not mix callsite and callee hotness based updates. by committing rL303210: [Inliner] Do not mix callsite and callee hotness based updates..
May 16 2017, 2:31 PM
eraman updated the diff for D33157: [Inliner] Do not mix callsite and callee hotness based updates..

Use callsite's hotness for sample PGO.

May 16 2017, 2:14 PM
eraman closed D33156: Add hasProfileSummary and has{Sample|Instrumentation}Profile methods..

Committed at r303204.

May 16 2017, 1:28 PM
eraman committed rL303204: Add hasProfileSummary and has{Sample|Instrumentation}Profile methods.
Add hasProfileSummary and has{Sample|Instrumentation}Profile methods
May 16 2017, 1:28 PM
eraman updated the diff for D33156: Add hasProfileSummary and has{Sample|Instrumentation}Profile methods..

Add has{Sample|Instrumentation}Profile methods and update the unit test.

May 16 2017, 11:53 AM
eraman retitled D33156: Add hasProfileSummary and has{Sample|Instrumentation}Profile methods. from Add hasProfileSummary method. to Add hasProfileSummary and has{Sample|Instrumentation}Profile methods..
May 16 2017, 11:51 AM
eraman added a comment to D33157: [Inliner] Do not mix callsite and callee hotness based updates..

Turns out that this change is incorrect. When sample profile is used, we use callsite hotness even without BFI. In a separate patch, I plan to add a method in ProfileSummaryInfo to determine if the module has sample profile. Then I can fix this patch by using callsite hotness either when sample profile is used or when BFI is available.

May 16 2017, 10:49 AM

May 12 2017

eraman created D33157: [Inliner] Do not mix callsite and callee hotness based updates..
May 12 2017, 5:20 PM
eraman created D33156: Add hasProfileSummary and has{Sample|Instrumentation}Profile methods..
May 12 2017, 3:40 PM
eraman accepted D32783: [PartialInlining] Add frequency based cost analysis.

LGTM

May 12 2017, 1:43 PM
eraman added a comment to D32783: [PartialInlining] Add frequency based cost analysis.

In terms of correctness and heuristics this looks good. I'v a few comments that are mostly stylistic and documentation issues.

May 12 2017, 12:48 PM

May 11 2017

eraman accepted D32877: Restrict call metadata based hotness detection to Sample PGO mode.

LGTM

May 11 2017, 4:29 PM
eraman committed rL302829: Decrease inlinecold-threshold to 45.
Decrease inlinecold-threshold to 45
May 11 2017, 2:49 PM
eraman closed D33106: Decrease inlinecold-threshold to 45 by committing rL302829: Decrease inlinecold-threshold to 45.
May 11 2017, 2:49 PM
eraman created D33106: Decrease inlinecold-threshold to 45.
May 11 2017, 11:59 AM
eraman added inline comments to D32877: Restrict call metadata based hotness detection to Sample PGO mode.
May 11 2017, 10:53 AM

May 10 2017

eraman added inline comments to D32877: Restrict call metadata based hotness detection to Sample PGO mode.
May 10 2017, 5:51 PM

May 9 2017

eraman committed rL302597: [ProfileSummary] Make getProfileCount a non-static member function..
[ProfileSummary] Make getProfileCount a non-static member function.
May 9 2017, 4:34 PM
eraman closed D33012: [ProfileSummary] Make getProfileCount a non-static member function. by committing rL302597: [ProfileSummary] Make getProfileCount a non-static member function..
May 9 2017, 4:34 PM
eraman added a comment to D33012: [ProfileSummary] Make getProfileCount a non-static member function..

In offline discussion, Dehao asked me to pass PSI when constructing InlineFunctionInfo from PartialInliner. Will submit this patch with that change.

May 9 2017, 4:21 PM
eraman created D33012: [ProfileSummary] Make getProfileCount a non-static member function..
May 9 2017, 11:36 AM

May 5 2017

eraman added inline comments to D32877: Restrict call metadata based hotness detection to Sample PGO mode.
May 5 2017, 4:28 PM
eraman added inline comments to D32877: Restrict call metadata based hotness detection to Sample PGO mode.
May 5 2017, 4:10 PM
eraman committed rL302308: Override invalidate of ProfileSummaryInfo to return false..
Override invalidate of ProfileSummaryInfo to return false.
May 5 2017, 3:28 PM
eraman closed D32775: Provide an invalidate method in ProfileSummaryInfo that returns false by committing rL302308: Override invalidate of ProfileSummaryInfo to return false..
May 5 2017, 3:28 PM
eraman added a comment to D32877: Restrict call metadata based hotness detection to Sample PGO mode.

I feel the underlying problem here is the use of VP metadata to get counts in sample PGO mode. Dehao, even when VP metadata is available, can we add branch_weights to the call? Then, only sample PGO will annotate branch weights to calls and that won't affect instrumented PGO.

May 5 2017, 2:42 PM
eraman added inline comments to D32783: [PartialInlining] Add frequency based cost analysis.
May 5 2017, 11:07 AM

May 4 2017

eraman added inline comments to D32783: [PartialInlining] Add frequency based cost analysis.
May 4 2017, 5:46 PM
eraman accepted D32773: Update VP prof metadata during inlining..

LGTM

May 4 2017, 5:06 PM
eraman added inline comments to D32773: Update VP prof metadata during inlining..
May 4 2017, 4:49 PM
eraman added a comment to D32773: Update VP prof metadata during inlining..

LGTM after hoisting the APInt outside the loop (see inlined comment).

May 4 2017, 1:45 PM
eraman added a comment to D32775: Provide an invalidate method in ProfileSummaryInfo that returns false.

Missed adding llvm-commits as a subscriber when I created this. I added soon after, but I suspect phabricator doesn't send an email then. Hopefully this commit triggers an email.

May 4 2017, 11:45 AM
eraman committed rL302170: [PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline..
[PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline.
May 4 2017, 10:11 AM
eraman closed D32768: [PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline. by committing rL302170: [PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline..
May 4 2017, 10:11 AM

May 3 2017

eraman added a comment to D32783: [PartialInlining] Add frequency based cost analysis.

I am not fully sure if this cost model - first check if the function is a suitable candidate for outlining and then use inliner's cost analysis to inline the original function to its callsites - is the right one. Specifically, I'm thinking if both should be done on a per-callsite basis.

May 3 2017, 6:00 PM
eraman added a comment to D32768: [PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline..

I've added a test case for this which verifies that hot callsite threshold is applied with the default O2 pipeline. I don't see any other similar instances (running with the full O2 pipeline and checking the transformation has happened) but I hope this is fine.

May 3 2017, 5:43 PM