- User Since
- Aug 10 2016, 1:07 PM (40 w, 5 d)
Please regenerate the tests in a separate commit, to make it clear what this patch is actually changing. Please change the description to make it clear what this is actually fixing; as far as I can tell, we aren't generating "wrong code", just inlining memcpy and memset calls too aggressively.
Do you confirm that such behavior is acceptable?
I would suggest switching the option (arm-promote-constant) to default to false rather than commenting out the code, so we don't have to XFAIL the unit-tests.
Fri, May 19
Thu, May 18
Conservatively bail out if (min_mask_index*2 > NumElem || max_mask_index * 2 < NumElems) which means that we are accessing elements from one half of the input vector
Wed, May 17
Looks good! A few minor comments. Please test carefully before merging.
I don't know enough about the possible source languages to know with 100% certainty that it's not possible to mix, say, unordered loads with ordered stores.
Tue, May 16
Cleanup looks fine.
Do we get any practical benefit from separately specifying whether the source and destination require unordered operations?
Mon, May 15
We don't want to write out a bunch of patterns here which are completely redundant, but this pattern in particular is likely to come up in code involving bitfields, so it seems fine.
Typically, canonicalization wants things *hoisted* not *sunk*, to make the values available for redundancy elimination.
Wed, May 10
Looks fine, but someone who knows PPC better should approve.
Please update the comment in PPCISelLowering.h
(I'm planning to look at this and add some comments; expect a reply next week.)
Err, oh, meant to add one minor comment: please rename "method-bad-param.mm" to "interface-return-type.mm".
Err, I meant the one I wrote earlier in this thread: https://reviews.llvm.org/D32759#748007
Our behavior for your testcase is less than ideal (we should be generating an invalid declaration instead of just dropping it on the floor), but that's not something you need to fix here.
It's a canonicalization of sorts; could help pick up more complicated patterns like ((a ^ c1) | c2) ^ c3. Please make sure we have a regression test like this for instcombine.
The testcase I wrote should change behavior, I think.
A testcase would be nice... but otherwise, yes.
Tue, May 9
Mon, May 8
This makes sense to me, but I'll wait to see if anyone else has a comment.
clang only emits wchar_size on ARM at the moment; that might need to change before this gets merged.
On 5/8/2017 9:36 AM, Tim Northover wrote:
On 26 April 2017 at 16:33, Eli Friedman via Phabricator
If you're looking to save size, we could put the jump table into the text segment just after the branch, and compute the jump destination in three instructions: one to load the address of the table, one to load the offset from the table, and one to add the offset to the address of the table. Is there some reason we shouldn't do that?
To get that first load down to 1 instruction we'd need to guarantee
the jump table was within 128MB of the code, and I think we
technically support larger functions. So we'd have to add something
like ARM's full Constant Island pass instead of the newly generic
branch relaxation we currently do. I don't think that is worth the
maintenance burden, it's a constant source of bugs on ARM.
For the second offset addition, again you're compromising distance.
The jump table itself then has to be within 256 or 65536 bytes of
every basic block used. I suspect that even *with* complex placement
that would be tough to arrange.
Sat, May 6
The difference between returning true and false here is just the way error recovery works: when we return true, we know the type is invalid, so we suppress it, and subsequent errors involving the declaration. Example (Objective-C++) where we currently print two errors:
Fri, May 5
Thu, May 4
Oh, oops, you're right. Could you change it to check isVectorTy() instead, to make that more obvious?
The second testcase isn't getting folded because ConstantFoldCastOperand isn't getting called (you call ConstantExpr::getCast and never attempt to fold it). IIRC there was a thread recently about a similar problem with InstSimplify of GEPs.
Wed, May 3
This looks very similar to a change we made internally (down to the name; we call it -polly-use-llvm-names). We originally did it because we were having trouble with differences in generated code between Release and Release+Asserts builds, but the performance improvement is a nice side-effect.
Tue, May 2
Not sure if we need a target hook for this...? Replacing one instruction with two might not always be a good idea.
I'm not quite sure about this the way it's written. This doesn't reduce the total number of instructions on its own, and it seems like an overly complicated pattern. You could approach this from a different angle: fold xor(icmp(pred, x, y)) to (icmp (!pred, x, y)) even if the inner icmp has multiple uses. Or maybe this is okay as-is; my intuition about what's right here isn't strong.
LGTM, with some minor comments.
Mon, May 1
Fri, Apr 28
Your reasoning makes sense.
Thu, Apr 27
Another data-point: reassociate currently prefers to canonicalize "fadd %x, %x", to "fmul %x, 2". We don't want to fight back and forth in IR.
(I don't understand how this is supposed to work, so I don't feel comfortable approving it.)
It would be nice to have an ARMV5TE test to make sure that the intrinsics that are supposed to work for that target actually do.
I would guess we should prefer fmul as the canonical form, for the same reason we prefer "shl %x, 1" over "add %x, %x": we optimize instructions where hasOneUse() is true more aggressively. I guess it doesn't make a big difference either way, though.
Wed, Apr 26
If I'm following correctly, for PIC small-code-model code (which is what mostly matters these days), the comparison is this:
And use getAnalysisIfAvailable(). Yes, I think that's it.
Could you preserve the DT without requiring it? Not that it matters much for performance, but it would more clear why this pass is requesting the domtree.
I don't think you can mark aliasing like that;. see https://bugs.llvm.org/show_bug.cgi?id=11763 .
I think you've covered the interesting cases; LGTM.
Tue, Apr 25
I was hoping that ISD::isConstantSplatVector() would have worked here, but it doesn't
This should work correctly with alignment attributes, I think? IIRC those just change the return value of getDeclAlign().
Err, that's not what I meant...
If the alignment isn't explicitly set, it is initialized to 0.
I'm not sure your description is the full story about the ARM code for cmpgt_sext_inc_vec; it looks like the following gets simplified for AVX2 (might want to include this as a testcase):
Looks fine, but someone who knows the MSP430 backend better should look at this.
EarlyCSE is supposed to preserve the CFG (calls setPreservesCFG, depends on DomTree/MemorySSA etc.), so you can't replace a conditional branch with an unconditional branch. Replacing the value of the condition with "true" or "false" is fine, though.
Mon, Apr 24
Note: it would still not be able to catch the following case --
There's also another possibility: you could make the register allocator prefer to spill some other register which isn't on the critical path. Not sure if that's practical for the loops you care about.