- User Since
- Aug 10 2016, 1:07 PM (223 w, 5 d)
Fri, Oct 30
Thu, Oct 29
This looks like it should be at least two separate changes; I think the AArch64ISelLowering change should have some impact on its own.
Wed, Oct 28
but for some reason the matching does not work for f16
Maybe add a testcase for -0 in case someone else makes the same mistake?
Maybe add a testcase for -0 in case someone else makes the same mistake? Otherwise LGTM
Tue, Oct 27
I don't think this is likely to significantly impact most out-of-tree users; no need to temporarily deprecate.
We could turn this into a more general combine, We use fmov from a GPR to materialize fp constants in other cases. But maybe just zero is fine to start.
We should have corresponding handling for va_arg. (clang doesn't produce va_arg instructions, but other frontends do.)
Oct 22 2020
I think getNodeIfExists() should have the same semantics with respect to flags as getNode().
Sorry I didn't spot this sooner, but the current version of the memcpy-to-temp still isn't quite right. The operations aren't in the right order: you have to do all the copies to temporaries before any of the copies to arguments.
I'm worried the usage of ANY_EXTEND and TRUNCATE in D89950 is going to cause a mess in the future. LegalizeDAG doesn't know how to legalize most vector operations; LegalizeVectorOps handles them. If someone tries to use "Promote" on a target where those operations aren't legal, it'll break. And then someone will try to "fix" LegalizeDAG, and cause other problems.
At a high level, it should be fine to assume an indirect call without the "convergent" attribute isn't convergent. If the language rules for some language say the callee of an indirect call might be convergent, the frontend can add the convergent attribute to that call. (Unlike most attributes, it's a negative attribute: it restricts optimizations, and attribute inference optimizations would remove it,)
Oct 21 2020
This is not PowerPC specific patch, though ppcfp128 could be expanded.
Almost all the new setDoesNotCapture calls are wrong. If a call returns a pointer base on an argument, it can't be marked nocapture.
Oct 20 2020
Rebase and clang-format.
A couple special cases might improve performance:
but the common instructions are moved to the preceding block, and the IfConversion feasibility is then re-evaluated before any predication is performed.
Phabricator doesn't track incremental patches the way you're uploading them; please upload the full diff against master.
First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they shouldn't pay for it if not using coroutines.
Oct 19 2020
but that would also mean we would never be able to optimize out redundant pthread_self() calls
For the first point, if the IR definition isn't consistent, I'd prefer to actually fix that, instead of work around it. There are a lot of places that assume alias analysis is accurate.
Oct 18 2020
Oct 17 2020
Oct 16 2020
I'm a little suspicious that the DAGCombine changes don't have any effect on x86 regression tests; are we missing test coverage?
I think this looks good. But I'd like a second opinion since this is a substantial change to SCEV modeling.
Assuming we have properly typed SCEVs, we can pretend the high bits of the pointer don't exist in a pointer SCEV. We can't pretend they don't exist in an integer SCEV: that leads to miscompiles. This is why SCEVPtrToIntExpr, specifically, needs special handling.
I'm not sure I completely understand the history from https://reviews.llvm.org/D42123. But I think the idea is that for pointer-type SCEVs, we want to do SCEV arithmetic using the index type because the other bits aren't relevant: from my understanding, they don't participate in pointer arithmetic.
Oct 15 2020
Also update the documentation? See https://clang.llvm.org/docs/UsersManual.html#differences-between-various-standard-modes .
This seems like the right approach. It feels weird, but I can't think of anything better.
Remove the behaviour that the calling-convention is non-C when there's a mismatch between the triple and the float-abi and do something different to make sure we end up with the right calling convention in the end (maybe have -mfloat-abi implicitly modify the triple like -march and -mcpu do).
Makes a lot more sense now. LGTM
Oh, also, are you planning to implement the SCEV simplifications in this patch, or a followup? I can see reasons to go either way, I guess. I'd weakly lean towards implementing them here, so we can avoid churn implementing handling for cases that can't happen after simplification.
Oct 14 2020
In general, if TargetLibraryInfo says a library function is available, that means it's available with the C calling convention. Copying the calling convention from some other call shouldn't be necessary. In general, there is no call to copy the calling convention from.
Oct 13 2020
Try running opt -analyze -scalar-evolution on the following:
Statistics changes for CTMark, this patch w/o maxobjsize deduction.
What does the third operand of an align bundle mean? It doesn't seem to be documented in LangRef.
Can you add a testcase to check we properly trigger the error? Otherwise LGTM.
Oct 12 2020
The title is still "WIP", but I don't see anything obviously missing.
LGTM with one minor comment.
I thought we kept it around intentionally because it has a different set of callee-save registers? If it turns out we don't care about that, sure, I guess we can drop it.
Oct 11 2020
Would it make sense to add target-independent SADDO_CARRY/SSUBO_CARRY? We already have SETCCCARRY.
Oct 9 2020
To clarify, the suggestion here is to add a boolean flag to callCapturesBefore() that determines whether arguments should always be considered capturing?
Cost modeling is target-specific yes, and also depends on the alignment of the copy.
Do you have commit access? If not, I can commit it for you; what should I put in the git "Author" line?
Oh, I see, the unmodified callCapturesBefore result is correct if you're trying to determine whether the the call might modify a value that can be consumed by a load, but not if you're trying to determine if a store clobbers memory that might be used in another thread.
this and https://reviews.llvm.org/D72365 ?
I would assume that the first pass of ResolveUndefsIn isn't that expensive relative to the other work SCCP does, so it would be more important to ensure we don't need additional passes. And probably we want to work towards removing the branch handling before we invest more effort into optimizing ResolveUndefsIn.
Okay, I wasn't sure how the builtin tests handle that sort of thing.