User Details
- User Since
- Apr 16 2015, 3:05 AM (441 w, 7 h)
Tue, Sep 5
Fri, Sep 1
Aug 17 2023
Aug 16 2023
Aug 15 2023
Jul 21 2023
I missed that change. Yes, that should be sufficient to address the original issue.
I think a better structure would be:
- Don't emit the block name in HandleBasicBlock. Instead of printing the BB, just iterate through the instructions and print them 1 by 1.
- Unconditionally prepend the block name in CompleteNodeLabelString.
Jul 13 2023
We continue relying on normalization/denormalization round trip producing the original expression (e.g. D153004). Even if this patch doesn't fix all the cases where this property doesn't hold true, it is an improvement.
May 25 2023
Right, one of the options would be to encode the information about GC pointer address spaces explicitly in the statepoint stack map.
May 22 2023
Looks good.
Why do we even care that an edge from a widenable condition branch terminates with a deopt? Widenable condition alone should be enough for widening.
This seems oddly specific to the runtime you are working on. Why do you need to put compressed pointers into deopt bundes? Is it just to communicate the set of live compressed pointers to the runtime? In this case, coupling with deopt operand bundles seems incidental. We might be missing a more general mechanism for keeping track of different kinds of live GC pointers separately.
May 2 2023
We would like to have a way to take a snapshot of the IR at a specific point in the pipeline. Currently, you can either print the IR before/after a given pass (-print-before=<passname>) or print before/after all (-print-before-all). print-before will print the IR before all occurrences of the given pass. print-before-all will, well, print before all passes. While print-before-all can be used to achieve the same functionality of printing the pipeline and looking at the IR at a specific point in the pipeline, it's not very practical. For large compilations, print-before-all might take minutes to produce gigabytes of output.
Apr 6 2023
Given that it's the second time we have to fix a similar issue, we should make a common utility for widening that makes it impossible to emit a widened condition in an incorrect context.
Mar 20 2023
I don't think a separate analysis is justified here. If you want to improve the meta renamer just make the necessary changes there.
Mar 17 2023
Looks good to me. I don't think we had a specific motivation in mind for that change. And now we know a clear reason not to LICM widenable conditions.
Feb 23 2023
Jan 30 2023
Jan 24 2023
It occurred to me that this is an optimization only if the injected condition is likely. Then we would likely take the version with the loop with the removed loop variant condition. Otherwise, we spend code size and add an extra check before executing the loop with both loop variant conditions. I suggest using the profiling info on the branch to be eliminated to decide whether this is profitable.
Jan 10 2023
Jan 9 2023
Jan 2 2023
FYI, this change caused widespread performance regressions in our testing. We haven't investigated further since the change was reverted.
Dec 7 2022
Nov 29 2022
Nov 17 2022
Looks reasonable to me. I would give it a day or two for others to take a look and if there are no objections go ahead and land.
We also observe some regressions on internal benchmarks from this patch. We haven't yet analyzed why, but I will add details when we have something.
Nov 14 2022
A couple of high-level comments.
Nov 7 2022
Looks good to me.
Sep 23 2022
This structural constraint (gc.result being tied to a statepoint instruction) can be violated in dead code. We've recently fixed a similar issue for gc.relocates in D128904.
Sep 22 2022
Aug 25 2022
LGTM with a small comment to address.
Is it true that every block which has successors will necessarily fall into this infinite loop?
No. What is special about this test is the fact that bb1 and bb2 have a branch on the same condition (undef).
Aug 22 2022
So, you are crippling this transformation to avoid an infinite loop with another transform in SimplifyCFG. This is probably OK as we still handle the common case of a widenable condition going to a deoptimization block.
Aug 15 2022
Aug 11 2022
Aug 2 2022
Jul 21 2022
Jul 20 2022
Jul 19 2022
Jul 18 2022
Jul 15 2022
Jul 1 2022
Could you please give a brief outline of the approach?
Jun 22 2022
In other instructions, we seem to just duplicate the handling in findBaseDefiningValueOfVector and findBaseDefiningValue. Doing so for freeze would be more consistent with the existing code.
Jun 15 2022
The optimizer can insert new atomic memcpy/memmove calls, e.g loop-idiom-recognize would do that. Currently, when loop-idiom-recognize inserts such a call it doesn't tag it with the element type, which makes it an incorrect transform for us.
May 25 2022
Copying arrays of pointers in environments with moving garbage collector is different from copying of 'primitive' types of the same types.
Well, I understand that upstream may not take it as a 'justification'.
Apr 26 2022
Apr 19 2022
Apr 5 2022
Mar 31 2022
Mar 28 2022
Jan 24 2022
Jan 18 2022
Jan 12 2022
Dec 14 2021
Nov 24 2021
Do you mind if I put this change under an on by default cl::opt flag? This way we can temporarily turn it off downstream.
Nov 23 2021
This change broke the ability to discard checks in some cases. Consider this example:
define i1 @test1(i32 %a, i32* %b_ptr) { %b = load i32, i32* %b_ptr, align 8, !range !{i32 0, i32 2147483646} %c1 = icmp slt i32 %a, %b br i1 %c1, label %exit, label %cont.1
Nov 9 2021
I agree with Philip, widening transformation doesn't rely on deoptimizations.
Oct 19 2021
Oct 14 2021
Add a simple test case using -inline-cost-full cl::opt.
Oct 12 2021
Oct 5 2021
I also notice that the new memcmp intrinsic returns i1, which is different from libc memcmp. libc memcmp returns and int indicating which of the sides was greater. Looks like, the new intrinsic matches with bcmp semantic.
Since this is a change to LangRef, please, post the proposal to llvm-dev.
Aug 4 2021
There is a target-specific hook to emit code for regular (non-atomic) memcpys: EmitTargetCodeForMemcpy. Maybe we should just implement a similar hook for element-atomic copy?
I think the name of the pass is misleading. What you are doing here is you provide an inlined lowering for a memory builtin. Your current implementation has a limitation such that it only inlines gc-leaf element atomic memcpys. I don't think the fact that it is gc-leaf is critical here. This transform can be extended to handle non-GC leaf operations. It can also be extended to handle non-atomic operations.
Jul 1 2021
Looks good.
Jun 30 2021
Jun 18 2021
FYI, I tried to take this patch for performance verification, but it crashed in smoke testing. I haven't yet looked why.
Jun 8 2021
Jun 3 2021
Mar 24 2021
I looked at the GC part and it looks good to me.
Nov 19 2020
Nov 17 2020
Nov 11 2020
Nov 10 2020
Nov 9 2020
Can we split restructuring and generalization (possibly adding a test for the generalization part)? It's hard to assess correctness when you combine the two in one change.
Oct 26 2020
Accepting this change to unblock the progress.