davidxl (David Li)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 13 2015, 9:48 AM (122 w, 6 d)

Recent Activity

Yesterday

davidxl accepted D36919: [ThinLTO] Fix ThinLTO crash.

lgtm

Sat, Aug 19, 8:52 AM

Fri, Aug 18

davidxl added inline comments to D36919: [ThinLTO] Fix ThinLTO crash.
Fri, Aug 18, 10:53 PM
davidxl accepted D36903: [profile] Create a copy of profile filename inside of lprofCurFilename..

Please add a comment why copy is needed.

Fri, Aug 18, 4:51 PM
davidxl added inline comments to D36753: [SimplifyCFG] Do not perform tail sinking if there are extra moves introduced.
Fri, Aug 18, 4:25 PM
davidxl committed rL311209: Fix comment /NFC.
Fix comment /NFC
Fri, Aug 18, 4:09 PM
davidxl added a comment to D36903: [profile] Create a copy of profile filename inside of lprofCurFilename..

I am not sure what you try to achieve here. Basically, there is no need to to use environment variable.

Fri, Aug 18, 4:07 PM
davidxl committed rL311208: [Profile] backward propagate profile info in JumpThreading.
[Profile] backward propagate profile info in JumpThreading
Fri, Aug 18, 4:03 PM
davidxl updated the diff for D36864: [Profile] backward propagate profile data in jump-threading.

Addressed Chandler's review feedbacks

Fri, Aug 18, 10:34 AM
davidxl added inline comments to D36864: [Profile] backward propagate profile data in jump-threading.
Fri, Aug 18, 10:32 AM

Thu, Aug 17

davidxl created D36864: [Profile] backward propagate profile data in jump-threading.
Thu, Aug 17, 11:16 PM
davidxl accepted D36844: [PGO] Fixed assertion due to mismatched memcpy size type..

lgtm

Thu, Aug 17, 4:42 PM
davidxl added a comment to D34393: Adding code padding for performance stability - infrastructure.

Tried the two patches with our internal benchmark -- the alignment related performance issue is still there. The problem disappears with -mllvm -x86-experimental-pref-loop-alignment=5 is used.

Thu, Aug 17, 10:58 AM

Wed, Aug 16

davidxl added a comment to D34393: Adding code padding for performance stability - infrastructure.

Can you send the full patch so that we can help evaluate it?

Wed, Aug 16, 1:31 PM
davidxl updated subscribers of D34393: Adding code padding for performance stability - infrastructure.
Wed, Aug 16, 1:31 PM
davidxl added a comment to D36388: [X86][SandyBridge] Additional updates to the SNB instructions scheduling information .

We verified that the regression we saw internally is due to loop alignment.

Wed, Aug 16, 1:24 PM
davidxl accepted D36775: Increase tail dup threshold for -O3 from 3 to 4.

lgtm

Wed, Aug 16, 1:21 PM
davidxl added a comment to D36775: Increase tail dup threshold for -O3 from 3 to 4.

performance tests with large benchmarks showed no regression.

Wed, Aug 16, 1:21 PM
davidxl committed rL311025: Add more comment.
Add more comment
Wed, Aug 16, 10:35 AM
davidxl committed rL311023: [PGO] Fix ThinLTO crash .
[PGO] Fix ThinLTO crash
Wed, Aug 16, 10:19 AM

Tue, Aug 15

davidxl added a comment to D36775: Increase tail dup threshold for -O3 from 3 to 4.

Will test on some very large programs that are sensitive to icache pressure and get back.

Tue, Aug 15, 5:10 PM
davidxl added a reviewer for D36753: [SimplifyCFG] Do not perform tail sinking if there are extra moves introduced: jmolloy.
Tue, Aug 15, 10:24 AM

Mon, Aug 14

davidxl accepted D36722: [InlineCost] Simplify the cold attribute handling in inline-cost..

convinced.

Mon, Aug 14, 8:26 PM
davidxl committed rL310907: Revert r310857 due to internal test failure.
Revert r310857 due to internal test failure
Mon, Aug 14, 8:15 PM
davidxl added inline comments to D36722: [InlineCost] Simplify the cold attribute handling in inline-cost..
Mon, Aug 14, 6:36 PM
davidxl added inline comments to D36722: [InlineCost] Simplify the cold attribute handling in inline-cost..
Mon, Aug 14, 6:26 PM
davidxl added a comment to D36726: [Inliner] Teach the inliner to propagate attributes that have specific effects on inlining thresholds when we happen to inline into the entry (extended) basic block..

I will provide more comments on propagating the inline hint later. However, I do think it is wrong to propagate the cold attribute in inliner -- there should be already an inter-procedural attribute propagation pass that does this.

Mon, Aug 14, 6:20 PM
davidxl added inline comments to D36722: [InlineCost] Simplify the cold attribute handling in inline-cost..
Mon, Aug 14, 6:16 PM
davidxl added inline comments to D36722: [InlineCost] Simplify the cold attribute handling in inline-cost..
Mon, Aug 14, 5:31 PM
davidxl committed rL310857: [PGO] Add support for relocate profile dumping directory.
[PGO] Add support for relocate profile dumping directory
Mon, Aug 14, 9:52 AM
davidxl updated the diff for D36648: [PGO] Add support for profile dumping dir relocation.

Addressed review feedback.

Mon, Aug 14, 9:34 AM

Sun, Aug 13

davidxl added a comment to D36655: Move SampleProfileLoader pass before all simplification passes..

A follow up patch for a test is fine.

Sun, Aug 13, 8:31 PM
davidxl added a comment to D36655: Move SampleProfileLoader pass before all simplification passes..

Is there a functional test case (i.e., better matching with this change)?

Sun, Aug 13, 2:39 PM

Sat, Aug 12

davidxl created D36648: [PGO] Add support for profile dumping dir relocation.
Sat, Aug 12, 1:56 PM

Fri, Aug 11

davidxl updated the diff for D36440: [ThinLTO] Fix thinLTO crash .

Update the fix.

Fri, Aug 11, 10:50 AM
davidxl committed rL310737: Fix typo /NFC.
Fix typo /NFC
Fri, Aug 11, 10:50 AM

Thu, Aug 10

davidxl added a comment to D36440: [ThinLTO] Fix thinLTO crash .

got time to look at it more. The root cause of why the call resolves to a static variable in another module is from the following code in BitcodeWriter.cpp. The call to ::link was recorded to GUID: 14802282251123568682 which was correct. Since it does not have a value-id, the logic for sample PGO kicks in. 14802282251123568682 was treated as 'original guid' and the from the map, the PGO annotated GUID for the static var was found (note that one original guid maps to multiple GUIDs). After this, the call to :link now resolves to static var 'link's GUID.

Thu, Aug 10, 10:46 PM

Wed, Aug 9

davidxl accepted D36566: Revert part of r310296 to make it really NFC for instrumentation PGO..

lgtm

Wed, Aug 9, 10:08 PM

Tue, Aug 8

davidxl added a comment to D36440: [ThinLTO] Fix thinLTO crash .

I will debug this a little more when I find the time.

Tue, Aug 8, 2:19 PM
davidxl accepted D36341: Make ICP uses PSI to check for hotness..

lgtm

Tue, Aug 8, 11:27 AM
davidxl added a comment to D36440: [ThinLTO] Fix thinLTO crash .

The relevant stack trace of the crash when invoking llvm-lto with -thinlto-action=import:

Tue, Aug 8, 8:59 AM
davidxl added inline comments to D36440: [ThinLTO] Fix thinLTO crash .
Tue, Aug 8, 8:47 AM

Mon, Aug 7

davidxl added inline comments to D36440: [ThinLTO] Fix thinLTO crash .
Mon, Aug 7, 10:16 PM
davidxl created D36440: [ThinLTO] Fix thinLTO crash .
Mon, Aug 7, 9:02 PM

Sat, Aug 5

davidxl added a comment to D34796: upporting -f(no)-reorder-functions flag, clang side change.

The patch itself is fine. The meta question is whether we expect this option to be generally useful?

Sat, Aug 5, 10:28 AM
davidxl added a comment to D35804: [BPI] Detect branches in loops that make themselves not taken.

This code looks fine to me. My next question is, is the pattern common? The amount of code and possible compile increase added is non trivial.

Sat, Aug 5, 10:27 AM
davidxl added a comment to D36341: Make ICP uses PSI to check for hotness..

This change may affect instrumentation PGO. I suspect it will greatly increase count threshold which may negatively affect performance. For instance some not so hot targets have small function body which is always good candidate to inline (after ICP).

Sat, Aug 5, 10:20 AM

Fri, Aug 4

davidxl added a comment to D36338: BlockPlacement: Consider hotness of blocks relative to a loop iteration rather than relative to the loop as a whole.

This looks fine to me. Is there any performance data with PGO?

Fri, Aug 4, 2:31 PM
davidxl added inline comments to D36333: Move the SampleProfileLoader right after EarlyFPM..
Fri, Aug 4, 1:36 PM
davidxl added a comment to D36317: Adjust the hotness threshold from 99.9% to 99%..

Bottom up ininling can also create lots of cold inline instances. Other than the effect of blocking hotter callers from being inlined, current Machine Block Layout also has problems forming long hot traces leaving holes in code layout.

Fri, Aug 4, 10:14 AM

Thu, Aug 3

davidxl added a comment to D36059: [memops] Add a new pass to inject fast-path code for specific library function calls..

Does it penalize cases on the border line? For instance, if a 7 byte memcpy requires a byte copy loop of 7 iterations which is above the threshold. In this case, the runtime check will purely add runtime overhead. This is unlike more general partial inlining of stringop operations where we may have internal APIs to by pass similar checks inside the library function.

Thu, Aug 3, 9:04 PM
davidxl accepted D36288: Use profile summary to disable peeling for huge working sets.

lgtm

Thu, Aug 3, 4:36 PM
davidxl added a comment to D36288: Use profile summary to disable peeling for huge working sets.

Have we considered limiting the max number of peeled iterations instead of disabling it?

Thu, Aug 3, 2:56 PM
davidxl accepted D36258: Disable loop peeling during full unrolling pass..

lgtm

Thu, Aug 3, 10:49 AM

Wed, Aug 2

davidxl added inline comments to D36258: Disable loop peeling during full unrolling pass..
Wed, Aug 2, 11:03 PM
davidxl added a comment to D36258: Disable loop peeling during full unrolling pass..

Perhaps it is cleaner to

  1. pass the information whether it is a full unroll to tryToUnroll
  2. add a internal option FullUnrollAllowPeeling and make it off by default? This will be similar to UnrollAllowPeeling flag, but takes precedence if it is full unroll.
Wed, Aug 2, 10:33 PM
davidxl accepted D36246: Fix the bug when SampleProfileWriter writes out number of callsites..

lgtm

Wed, Aug 2, 4:58 PM
davidxl added inline comments to D36246: Fix the bug when SampleProfileWriter writes out number of callsites..
Wed, Aug 2, 4:41 PM

Tue, Aug 1

davidxl added inline comments to D36059: [memops] Add a new pass to inject fast-path code for specific library function calls..
Tue, Aug 1, 12:27 AM

Mon, Jul 31

davidxl added inline comments to D36059: [memops] Add a new pass to inject fast-path code for specific library function calls..
Mon, Jul 31, 11:46 PM
davidxl added inline comments to D36059: [memops] Add a new pass to inject fast-path code for specific library function calls..
Mon, Jul 31, 11:14 PM
davidxl added inline comments to D34795: Supporting -f(no)-reorder-functions flag, llvm side change.
Mon, Jul 31, 9:45 AM
davidxl added a comment to D34796: upporting -f(no)-reorder-functions flag, clang side change.

The patch is missing documentation change.

Mon, Jul 31, 9:39 AM

Sat, Jul 29

davidxl added a comment to D36052: Update the new PM pipeline to make ICP aware if it is SamplePGO build..

Instr PGO has its own set of simplification passes, but still I think we should keep this patch NFC for instrPGO. If there are performance numbers that justify the InstrPGO pipeline changes, it can be discussed/done in a separate thread. This one should be left as SamplePGO only change.

Sat, Jul 29, 4:17 PM

Tue, Jul 25

davidxl added inline comments to D35804: [BPI] Detect branches in loops that make themselves not taken.
Tue, Jul 25, 11:13 AM
davidxl added a comment to D35804: [BPI] Detect branches in loops that make themselves not taken.

Sorry about the noise -- I was juggling around several different issues :)

Tue, Jul 25, 9:48 AM

Mon, Jul 24

davidxl added inline comments to D35811: A workaround for the bug caused by descrepancy between loop-unswitch and GVN about branch on undef.
Mon, Jul 24, 11:58 PM
davidxl added a comment to D35804: [BPI] Detect branches in loops that make themselves not taken.

Why not putting this change into BranchProbablityInfo.cpp, so that other components can benefit from the more precise BP info?

Mon, Jul 24, 9:34 AM

Sat, Jul 22

davidxl added a comment to D35750: [x86] Teach the x86 backend about general fast rep+movs and rep+stos features of modern x86 CPUs, and use this feature to drastically reduce the number of places we actually emit memset and memcpy library calls..

Do you have more benchmark numbers? For reference, here is GCC does (for sandybridge and above) for mempcy when size profile data is available:

  1. when the size is <= 24, use 8 byte copy loop or straightline code.
  2. when size is is between 24 and 128, use rep movsq
  3. when size is b above that, use libcall
Sat, Jul 22, 12:04 AM

Jul 21 2017

davidxl committed rL308785: [PGOInstr] Add a debug print.
[PGOInstr] Add a debug print
Jul 21 2017, 2:36 PM
davidxl added inline comments to D32249: [PartialInl] Enhance partial inliner to handle more complex conditions.
Jul 21 2017, 12:34 PM

Jul 19 2017

davidxl added inline comments to D35586: [ProfData] Detect if zlib is available .
Jul 19 2017, 2:51 PM

Jul 18 2017

davidxl accepted D35586: [ProfData] Detect if zlib is available .

lgtm

Jul 18 2017, 4:30 PM
davidxl accepted D34646: [opt-viewer] Handle file names that contain '#'.

lgtm

Jul 18 2017, 9:16 AM

Jul 13 2017

davidxl added a comment to D35384: PGOInstrumentation: Move profile matching warnings to remarks.

Blindly downgrade the warning into remarks can be bad -- profile mismatch problems can go undetected. People may spend more time diagnosing performance regressions due to missed warnings. Worse, we may regress in compiler without noticing.

Jul 13 2017, 2:03 PM

Jul 12 2017

davidxl committed rL307869: Fix broken test.
Fix broken test
Jul 12 2017, 5:22 PM
davidxl committed rL307864: [PGO] Add a test for 2-deep loop nest.
[PGO] Add a test for 2-deep loop nest
Jul 12 2017, 4:29 PM
davidxl committed rL307863: [PGO] Enhance pgo counter promotion.
[PGO] Enhance pgo counter promotion
Jul 12 2017, 4:28 PM

Jul 11 2017

davidxl committed rL307702: [ProfileData] Add new option to dump topn hottest functions.
[ProfileData] Add new option to dump topn hottest functions
Jul 11 2017, 1:31 PM
davidxl closed D35155: [ProfileData] Add new option to dump top N hottest functions by committing rL307702: [ProfileData] Add new option to dump topn hottest functions.
Jul 11 2017, 1:31 PM
davidxl added a comment to D35155: [ProfileData] Add new option to dump top N hottest functions.

Added documentation.

Jul 11 2017, 12:39 PM
davidxl updated the diff for D35155: [ProfileData] Add new option to dump top N hottest functions.

Addressed review feedback.

Jul 11 2017, 10:44 AM
davidxl added inline comments to D35155: [ProfileData] Add new option to dump top N hottest functions.
Jul 11 2017, 10:43 AM

Jul 9 2017

davidxl accepted D35149: llvm-profdata: Reduce memory usage by using Error callback rather than member.

lgtm

Jul 9 2017, 8:00 PM
davidxl added inline comments to D35149: llvm-profdata: Reduce memory usage by using Error callback rather than member.
Jul 9 2017, 7:05 PM
davidxl added a comment to D35149: llvm-profdata: Reduce memory usage by using Error callback rather than member.

It feels weird that the error handling is the first parameter of the interfaces. Also is it possible to provide a default error handling ?

Jul 9 2017, 6:13 AM
davidxl created D35155: [ProfileData] Add new option to dump top N hottest functions.
Jul 9 2017, 6:13 AM
davidxl accepted D34725: Add sample PGO integration test to cover profile annotation and inlining..

lgtm

Jul 9 2017, 6:13 AM

Jul 6 2017

davidxl accepted D35063: [ConstHoisting] Turn on consthoist-with-block-frequency by default.

lgtm

Jul 6 2017, 3:10 PM
davidxl accepted D35084: [ConstHoisting] choose to hoist when frequency is the same.

lgtm

Jul 6 2017, 1:58 PM
davidxl added a comment to D35043: [ADT] Enable reverse iteration for DenseMap.

The order is implementation defined which is deterministic -- and I am not sure about the annoyance part of it :) On the other hand, requiring everything related to output to have a fixed/well-defined order which is implementation independent can have its own cost in terms of compile time or memory usage.

Jul 6 2017, 12:14 PM
davidxl accepted D34838: Prototype: Reduce llvm-profdata merge memory usage further.

lgtm

Jul 6 2017, 11:24 AM
davidxl added a comment to D34838: Prototype: Reduce llvm-profdata merge memory usage further.

Limit NamedInstrProfRecord to the reader file seems ok -- but this class also looks like a useful convenience class. We can probably discuss that in a different thread if it is desirable.

Jul 6 2017, 11:04 AM
davidxl added inline comments to D35063: [ConstHoisting] Turn on consthoist-with-block-frequency by default.
Jul 6 2017, 10:28 AM
davidxl added inline comments to D35063: [ConstHoisting] Turn on consthoist-with-block-frequency by default.
Jul 6 2017, 9:53 AM
davidxl added inline comments to D35063: [ConstHoisting] Turn on consthoist-with-block-frequency by default.
Jul 6 2017, 9:31 AM
davidxl added a comment to D35043: [ADT] Enable reverse iteration for DenseMap.

The compilation output depends on 1) input source and 2) the compiler that produces it. When compiler changes, the output may change (though still semantically equivalent unless there is UB in the source). Changing the internal data structures such as hash functions basically changes the compiler, so the output is expected to have possible change. It is unrealistic to expect compiler behavior 'completely' fixed from version to version.

Jul 6 2017, 9:01 AM

Jun 29 2017

davidxl added a comment to D34838: Prototype: Reduce llvm-profdata merge memory usage further.

In summary, I think this patch should be minimized by

Jun 29 2017, 7:48 PM
davidxl added a comment to D34838: Prototype: Reduce llvm-profdata merge memory usage further.

Instead of exposing Counter members widely (and fixing tests), it is better to add wrapper methods in InstrProfRecord to forward the call to the nested member struct.

Jun 29 2017, 5:18 PM
davidxl added a comment to D34720: Hook the sample PGO machinery in the new PM.

No, I don't have other comments.

Jun 29 2017, 4:15 PM
davidxl added a comment to D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.

Looks like this thread has gone stale for a while.

Jun 29 2017, 3:31 PM