- User Since
- Dec 17 2012, 10:03 AM (322 w, 4 d)
LGTM, one last nitpick below.
Tue, Feb 19
Mon, Feb 18
Thu, Feb 14
The generic part looks sensible to me.
LGTM, one nitpick below.
Wed, Feb 13
Tue, Feb 12
Thu, Feb 7
Wed, Feb 6
Thanks for the link @jfb.
- Switch to listing valid ordering instead of invalid ones
- Don't apply the transformation on volatile atomicrmw
Tue, Feb 5
Though it's worth pointing out that none of those currently build a description of what it wants to do like the legalizer currently does, they just make the change straightaway when the rules match. Removing in LegalizationAction in favour of just making the change brings the legalizer closer to the others.
The legalizer is already done the way I'm talking about on the rule matching side and performance was marginally better on average than the old setAction() interface.
Should we revert this change to have the conversation going or am I the only one unhappy with the absence of guarantees for the other operands?
Personally, I'd like to replace the LegalizationAction with a function pointer that just performs the desired change and have a standard library of actions that can be called for the common changes.
Thu, Jan 24
Thanks for your patience Matt!
Jan 23 2019
Jan 14 2019
Jan 8 2019
Jan 7 2019
Dec 21 2018
Dec 20 2018
Could we just say that ExpandISelPseudo doesn't preserve the liveness and have it be recomputed when needed?
I still believe we shouldn't do that here and more generally O0 should be kept out of doing things on the code.
Should we emit an error if we request x86_64h with an arch older than haswell?
I thought about this, but figured since in this case it's most likely for not really copyable condition registers it might end up being problematic
Looks mostly good to me.
Dec 7 2018
I'm not sure I follow this. These aren't spilled with ordinary copies
Dec 6 2018
Nov 29 2018
With this form, the first three loads can be base + immediate offset, while the last can perform the post increment to create %base.iv.next. What do you think?
Nov 28 2018
LGTM with @gilr's comments addressed.
I meant we should handle G_BUILD_VECTOR to avoid regressions as we are able to legalize the same function with G_MERGE_VALUES.
Nov 27 2018
Nov 14 2018
Nov 13 2018
I see a few checks that weaken the check (e.g., -next replaced by regular checks).
Is this expected and why?
Nov 8 2018
Looks reasonable to me.
Could of comments/nitpicks inlined.
Nov 7 2018
Nov 5 2018
Nov 2 2018
Since testing for optimized or illegal code in these cases isn't an option in these cases, such a LIT could check that the incorrect code is not generated, that the incorrect formula is not generated, or that the correct formula is generated - which all seem fragile and overfitted to this specific bug, so I'm reluctant to add any of them (especially since the bug is covered by 4 runtime tests). What do you think?
I think the issue of sequences becoming illegal because of things like copies being introduced is still a deal breaker for that, and it would also introduce a code motion barrier since we can't select sequences spanning blocks. E.g. it wouldn't be safe to move the insn in the predecessor due to other uses.
Nov 1 2018
The failing runtime tests do (a) and (b) but not (c).
Oct 31 2018
I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.
Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).
Oct 30 2018
Thanks @aditya_nandakumar for the context.
This allows implicit truncation.
Bug fix: ScalarEvolution::getAddExpr() may modify the vector of SCEVs it is given as argument, causing 2nd formula to be created with only part of the SCEVs in Ops.
This patch fixes this by using a temporary copy of Ops for the 1st formula.
Oct 29 2018
- Use m_CombineOr instead of repeating the pattern to match
Oct 24 2018
If it doesn't look too hard, moving it over to SimplifyCFG would be great. That way, we can be sure that the empty block removal always happens without interference from some other pass.
Oct 23 2018
Thanks for the detailed numbers!
Oct 19 2018
However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.
What do you think?
Oct 17 2018
Sorry for the delay, I haven’t had time to review the patch but I have one high level comment.
I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.
Oct 16 2018
Oct 9 2018
Oct 8 2018
One question, what is the impact on compile time?
Put differently are the heuristics to prune the search space able to still reduce the space to something manageable?
Sep 17 2018
But I still believe that trying to spot the identical case in subranges is pointless because we have already proved that the values are identical in main ranges. It is impossible for the values to be not identical in subranges; it's just a bit tricky to fix up the code to demonstrate that.
I finally tracked down why I couldn't get a mir test before: Compiling a ll file as usual, arriving at RegisterCoalescer, the numbering of the MBBs was not quite in order. Using -stop-before names the MBBs in the mir according to those out-of-order numbers, but reading the mir in again with -run-pass fails to preserve it. Instead it renumbers the MBBs in order. RegisterCoalescing processes MBBs with the same loop depth and critical edgeness and connectedness in numbering order, so the pass behaved differently.
Could you elaborate on the exact constraints?
Sep 14 2018
The fix here is to only detect an "identical erase" in the main range.
Could you elaborate on that part?
But it is not possible since can generate incorrect code in terms of exec mask.
Sep 13 2018
FWIW, I am still not convince this is a good thing to match SDISel behavior at O0, but this is the least bad solution of doing that IMHO.
Sep 11 2018
Sep 10 2018
Unfortunately I can only reproduce this in an ll test. Turning it into a mir test makes the problem go away.