- User Since
- Oct 23 2013, 6:32 PM (377 w, 3 d)
The performance impact of the microcode fix were highly variable depending on the exact details of each workload. Mostly barely moved, some really suffered. I know of a couple of organizations using the mitigation, but I agree with your general point about this being something that decays in value with time. Hopefully, in a few years, we can stop talking about this.
Thu, Jan 14
David, thanks for chiming in here. Reading your wording made me realize I was wrong in my take here. I had been thinking of this as a debug info quality problem, but you're right, that's not what is actually going on here.
I'm not convinced that disabling these optimizations is warranted. I'm not actively opposing the patch, just want to list my concerns for the record.
LGTM w/an MIR test case added before submission.
Tue, Jan 12
This doesn't get at the root cause though. Are those labels doing anything in the debug build? Why are they emitted? Can they be reasonably removed?
This review has been closed for nearly 10 months. Please move discussion of proposed changes to the bug or another relevant location.
Both the align and branch handling are "optimizations". I object to one being enabled and the other disabled. If you want them both on by default, fine. If you want them both off by default, fine. Having one off and one on is confusing.
Fixed in commit caafdf07b. This was a spurious assert.
rebase and ping
Side note: The -1/+1s appearing everywhere confused me until I looked into the getConstMetaVal function. Having the index returned be the second in the tag, value pair seems slightly confusing, but it fits the existing structure.
LGTM as a testing aid. Should be removed once invoke support is stabilized and fully enabled.
Also LGTM. Thanks.
Mon, Jan 11
I would strongly prefer that instead of simply asserting that we retie the registers. Doing so should be fairly simple - we just need to keep track of which register pairs we untied for the possible fold, and use a lambda to retie them.
Sun, Jan 10
I'm dropping my request which triggered this review. Feel free to close.
Context: This patch was reverted following failures in stage2 builders (only). Still investigating root cause. Reproduction is a bit of a pain.
rebase over landed tests
Add requested test.
Address stylistic updates. Test changes to be done separately in near future.
Mon, Jan 4
You can view this as a phase ordering issue, but really I see it as "this pass is supposed to delete dead loops" + "a case came up in discussion which we'd missed" + "easy no-cost fix".
Sun, Jan 3
No. I believe you're referring to optimizeLoopExits. That does not modify the CFG, it just folds conditions to constants. This will specifically modify the CFG and remove the loop.
Tue, Dec 29
Mon, Dec 28
This is InductiveRangeCheckElimination which implements a form of iteration set splitting. I'll warn that IRCE has been historically bug prone, and is not on by default upstream.
Ayal, Florian, thank you for the efforts on this review. I really appreciate the active review.
ping now that previous patch in series has landed
Sat, Dec 26
Wed, Dec 23
I can confirm the failure and will debug. Thank you again for finding a cornercase.
Tue, Dec 22
As implemented, this is a bit too specific to one particular pattern to be commitable, but I think you've got the seed of a good idea here.
Reading through the code, I had a couple of separate thoughts. If you're interested, feel free to follow up on these. If not, feel free to disregard.
Incorporate a fix for the issue Florian found. Essentially, we can't both treat single use exit conditions as dead, and allow predication to use them in a mask. Since we know they're evaluate to true for the entire vector body, we can simply ignore them when forming edge predicates.
Ok, I see what's going on w/Florian's example. It's an interaction with block predication and early exits. Essentially, block predication expects to be able to add uses of any condition in the loop, and the dead code elimination has eliminated the exit condition in the vector loop. I can restrict the legality slightly to avoid this interaction easily enough, but I want to think a bit first if there's a clean integration.
JFYI, multiple latch tests added in f106b2, but they don't impact the diff as SCEV can't prove exit counts and thus we don't vectorize.
Incorporate wording/style comments from Ayal, test rebase pending.
Ayal, thanks for all the great wordsmith comments!
Fri, Dec 18
Dec 15 2020
One further simplification enabled by previous rebase.
Rebase over a81db8b31. While that change is NFC, the code structure makes it much easier to disable vectorization when predication is demanded and we can't provide it.
Update tests to cover cases where we can't vectorize due to either a) size, or b) predication.
rebase over landed tests
Dec 14 2020
Dec 12 2020
Dec 8 2020
Dec 7 2020
Address review comments.
As a side note, I don't think you actually need to disallow conversion of the null pointer between two different NI address spaces. If you wanted to improve opt quality, folding your check into the previous one would be beneficial. I'm not going to require that as this a) fixes a correctness bug, and b) has been outstanding for way to long.
Hard reject. This patch will not be accepted. This is a major design change.
Add non-zero restriction. For now, restrict to constant case. I could have used isKnownNonZero, but that would require recursing through both operands which I'd rather not do from a compile time perspective.
Dec 5 2020
LGTM. I had to grab a sheet of paper and run some examples to convince myself that your code worked - even with the explanation - but I got there eventually. :)
Ok, you and I are clearly thinking about the same problems. :) I have a patch which I hadn't yet posted to handle this exact same case. I just needed to get prerequisites out of the way in terms of D92698 and D92694. I'm happy to defer to you and act as knowledgeable reviewer though!
Remove ptrtoint and inttoptr due to truncation issue pointed out in review. These can be handled in a follow up patch if needed, my motivating case doesn't require them.
Tests for the basic aa extension will be in the patch for basicaa. I could probably contrive a test case through the existing basic aa use of isKnownNonEqual, but it would take some thought. Are you okay with me landing this as is, and then including supplemental tests in the next patch?
Dec 4 2020
Rebase over landed tests