Thu, Feb 13
I have concerns on so much plumbing changes to make the small heuristic change.
Answering @davidxl 's comment on "The function entry count should have the information", line 73, HeatUtils.cpp:
This function counts the number of calls of one specific function by another specific function while Function.entryCount() returns number entries for this function.
Answering @davidxl 's comment on "for the program's max count, max function count, get it from ProfileSummaryInfo.", line 99, HeatUtils.cpp:
ProfileSummaryInfo doesn't contain info on the maximum frequency in the module, it needs to be calculated.
- Ran clang-format through all the files
- Deleted some function which had no use (see hasProfiling function)
- Deleted const unsigned variables
Wed, Feb 12
Add a test case (which would fail under the validation option) without the fix?
what is the use case of these options? Do hidden nodes create too much noise?
Tue, Feb 11
Mon, Feb 10
Fri, Feb 7
Here is a summary of today's discussion:
- this seems a straightforward implementation of your proposal
- this patch introduces a new notion of "Jump Relocations" to represent data related to rewriting jump instructions at end of sections. Even though the Jump Relocations are somewhat similar to existing Relocations, but they are maintained in different arrays, and they are handled and consumed by different code locations. So maybe we should avoid calling it a "relocation" because "jump relocations" aren't relocations in object files?
So you had a discussion about this:)
I attended a LLVM Social yesterday and asked some people about their feelings. We are still dubious whether doing all the heavy lifting on the linker side is the right approach. Disassembly at linker side may give some short term benefits, but the interaction with debug info/asynchronous unwind tables/symbol information is muddy. The improvement is fixed. Linkers don't really understand the semantics, so the available optimizations are rather limited. In a long term, this can cause some maintenance burden. I can think of several problems:
- Needs a psABI defined relocation type. The relocation type will probably have no use other than this very specific optimization.
- Needs non-trivial work to port to COFF.
- Another target may be a very different story.
- This duplicates some work of MachineBlockPlacement.
- The code will be duplicated on lld side may be a lot. emitNops can be reused with AsmPrinter. Branch terminators optimization may be reused with BranchFolding. etc It the current MachineFunction interface does not allow sharable MachineBasicBlocks with other MachineFunctions. Fix it.
Wed, Feb 5
Tue, Feb 4
This is sort of extension of the 'shouldBeDeferred' check in Inliner.cpp, except that here the caller may not actually be inlined. Blindly eliminating bonus may preventing the callee from being inlined while not actually enabling the caller to be inlined.
Mon, Feb 3
Also missing a test case on cold basic blocks.
thanks, this is a very useful feature. LGTM
Sat, Feb 1
Fri, Jan 31
Thu, Jan 30
Can you add more test cases to cover things like bb labels, and different bb section types (cold, EH, unique etc). For cold section types, the merging should kick in etc.
Wed, Jan 29
How about 'OnMissedSimplification' for simplicity?
Tue, Jan 28
Mon, Jan 27
Try to avoid monolithic patch like this. Please consider splitting it into a few smaller incremental patches with (possibly) independent testing. Logically, it can be split into 1) IR support; 2) machine BB level support; 3) debug support 4) CFI support 5) exception and 6) the 'main tranformation' part if there is one'.
Fri, Jan 24
The MBFIWrapper change seems NFC, can it be extracted out first?
Thu, Jan 23
ok with me if Reid is ok with windows specific logic.
Wed, Jan 22
Can you extract Window's specific code into its own helper function if possible?
there are checks of explicit occurrences of inline-threshold option later, so the behavior seems unchanged.
ok. I see the intention. As long as the behavior of --inline-threshold option is not changed (it still overrides the new option), the new option seems fine to me. Adding individually controlled option for size opt seems a good idea too.
This change won't work. See
Tue, Jan 21
Mon, Jan 20
Sun, Jan 19
Jan 17 2020
As other commented, please extract the code into its own function also add the support when AA is available ( as other parts of the function does).
Jan 16 2020
Jan 15 2020
thanks for the cleanup. The implicit conversion was not quite readable. LGTM
Jan 14 2020
InlineResult --> inlining related result (viability, etc) -- it captures two pieces of information: 1) inline decision and 2) when decision is 'no', related inline analysis that leads to the no decision. The class name seems fine. The patch makes the 'decision' part more explicit, and also fixes some bug in missing the right analysis message.
Class Name ResultWithMessage sounds too generic. Why not keeping the InlineResult class name? The rest of the changes look reasonable.
Jan 13 2020
Jan 10 2020
Jan 9 2020
Jan 8 2020
Jan 7 2020
The intention is to make the base CallAnalysis becomes a symbolic execution engine (virtual optimizations) what can be reused. The cost tracking is extracted into the derived class.
This looks useful. Is it possible to add a test case?
Jan 6 2020
this looks good to me. Easwaran, do you have any comments on the refactoring?
Dec 23 2019
This patch will be part of mtrofin's https://reviews.llvm.org/D71733, so there is no need for it.
Dec 20 2019
Use AAManager (include/llvm/Analysis/AliasAnalysis.h) by registering only BasicAA ?
Dec 19 2019
The SROA handling code should also be abstracted away and let the derived class handling cost accumulation. In particular, the common code pattern like:
BasicAA is stateless and should be available and used here to disambiguate.
There is comment like this:
Dec 17 2019
Dec 10 2019
Dec 6 2019
Dec 5 2019
Dec 3 2019
Dec 2 2019
Nov 25 2019
what I mentioned should be complementary to the top-down method in this patch -- it just allows the full top-down to be doable for cross module scenario as well.
One way to handle it is 1) delay early inlining of sites into a hot function if a big percentage of calls to the function are from other modules. This still allows intra module top down inlining of it; or 2) keep a clone of the unlined body of the original function and use that one during cross module inlining.
This looks good. Can this be handled for cross module (thinLTO) case somehow too?
Nov 22 2019
Nov 21 2019
How is the query type going to be checked?
Nov 20 2019
Can this be derived statically from the FuncT or BlockT?
Perhaps just add additional checks on the content of the dumped profile to make the test case more complete.
Nov 19 2019
thanks for the test case.
Nov 15 2019