- User Since
- Apr 13 2015, 9:48 AM (122 w, 6 d)
Fri, Aug 18
Please add a comment why copy is needed.
I am not sure what you try to achieve here. Basically, there is no need to to use environment variable.
Addressed Chandler's review feedbacks
Thu, Aug 17
Tried the two patches with our internal benchmark -- the alignment related performance issue is still there. The problem disappears with -mllvm -x86-experimental-pref-loop-alignment=5 is used.
Wed, Aug 16
Can you send the full patch so that we can help evaluate it?
We verified that the regression we saw internally is due to loop alignment.
performance tests with large benchmarks showed no regression.
Tue, Aug 15
Will test on some very large programs that are sensitive to icache pressure and get back.
Mon, Aug 14
I will provide more comments on propagating the inline hint later. However, I do think it is wrong to propagate the cold attribute in inliner -- there should be already an inter-procedural attribute propagation pass that does this.
Addressed review feedback.
Sun, Aug 13
A follow up patch for a test is fine.
Is there a functional test case (i.e., better matching with this change)?
Sat, Aug 12
Fri, Aug 11
Update the fix.
Thu, Aug 10
got time to look at it more. The root cause of why the call resolves to a static variable in another module is from the following code in BitcodeWriter.cpp. The call to ::link was recorded to GUID: 14802282251123568682 which was correct. Since it does not have a value-id, the logic for sample PGO kicks in. 14802282251123568682 was treated as 'original guid' and the from the map, the PGO annotated GUID for the static var was found (note that one original guid maps to multiple GUIDs). After this, the call to :link now resolves to static var 'link's GUID.
Wed, Aug 9
Tue, Aug 8
I will debug this a little more when I find the time.
The relevant stack trace of the crash when invoking llvm-lto with -thinlto-action=import:
Mon, Aug 7
Sat, Aug 5
The patch itself is fine. The meta question is whether we expect this option to be generally useful?
This code looks fine to me. My next question is, is the pattern common? The amount of code and possible compile increase added is non trivial.
This change may affect instrumentation PGO. I suspect it will greatly increase count threshold which may negatively affect performance. For instance some not so hot targets have small function body which is always good candidate to inline (after ICP).
Fri, Aug 4
This looks fine to me. Is there any performance data with PGO?
Bottom up ininling can also create lots of cold inline instances. Other than the effect of blocking hotter callers from being inlined, current Machine Block Layout also has problems forming long hot traces leaving holes in code layout.
Thu, Aug 3
Does it penalize cases on the border line? For instance, if a 7 byte memcpy requires a byte copy loop of 7 iterations which is above the threshold. In this case, the runtime check will purely add runtime overhead. This is unlike more general partial inlining of stringop operations where we may have internal APIs to by pass similar checks inside the library function.
Have we considered limiting the max number of peeled iterations instead of disabling it?
Wed, Aug 2
Perhaps it is cleaner to
- pass the information whether it is a full unroll to tryToUnroll
- add a internal option FullUnrollAllowPeeling and make it off by default? This will be similar to UnrollAllowPeeling flag, but takes precedence if it is full unroll.
Tue, Aug 1
Mon, Jul 31
The patch is missing documentation change.
Sat, Jul 29
Instr PGO has its own set of simplification passes, but still I think we should keep this patch NFC for instrPGO. If there are performance numbers that justify the InstrPGO pipeline changes, it can be discussed/done in a separate thread. This one should be left as SamplePGO only change.
Tue, Jul 25
Sorry about the noise -- I was juggling around several different issues :)
Mon, Jul 24
Why not putting this change into BranchProbablityInfo.cpp, so that other components can benefit from the more precise BP info?
Sat, Jul 22
Do you have more benchmark numbers? For reference, here is GCC does (for sandybridge and above) for mempcy when size profile data is available:
- when the size is <= 24, use 8 byte copy loop or straightline code.
- when size is is between 24 and 128, use rep movsq
- when size is b above that, use libcall
Jul 21 2017
Jul 19 2017
Jul 18 2017
Jul 13 2017
Blindly downgrade the warning into remarks can be bad -- profile mismatch problems can go undetected. People may spend more time diagnosing performance regressions due to missed warnings. Worse, we may regress in compiler without noticing.
Jul 12 2017
Jul 11 2017
Addressed review feedback.
Jul 9 2017
It feels weird that the error handling is the first parameter of the interfaces. Also is it possible to provide a default error handling ?
Jul 6 2017
The order is implementation defined which is deterministic -- and I am not sure about the annoyance part of it :) On the other hand, requiring everything related to output to have a fixed/well-defined order which is implementation independent can have its own cost in terms of compile time or memory usage.
Limit NamedInstrProfRecord to the reader file seems ok -- but this class also looks like a useful convenience class. We can probably discuss that in a different thread if it is desirable.
The compilation output depends on 1) input source and 2) the compiler that produces it. When compiler changes, the output may change (though still semantically equivalent unless there is UB in the source). Changing the internal data structures such as hash functions basically changes the compiler, so the output is expected to have possible change. It is unrealistic to expect compiler behavior 'completely' fixed from version to version.
Jun 29 2017
In summary, I think this patch should be minimized by
Instead of exposing Counter members widely (and fixing tests), it is better to add wrapper methods in InstrProfRecord to forward the call to the nested member struct.
No, I don't have other comments.
Looks like this thread has gone stale for a while.