- User Since
- Aug 10 2016, 1:07 PM (243 w, 5 d)
Fri, Apr 9
Thu, Apr 8
Please add a test for (char*)0-(char*)0.
Tue, Apr 6
The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:
llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c))
Fri, Apr 2
This feels scary: the C standard technically allows this, but we haven't done it in the past, and it could break otherwise functioning code. (We've only assumed alignment about pointers that are dereferenced/dereferenceable.)
Thu, Mar 18
Normally, I'd expect some register is naturally free in the prologue, but you could get into weird situations. On 32-bit specifically, consider compiling with -mregparm=3; I think there are no registers which are unconditionally safe in that case. One possibility is to always use EAX, and just save/restore it if necessary. See isEAXAlive in X86FrameLowering::emitPrologue.
Wed, Mar 17
To clarify, I've done some more reading now, and figured out where I went wrong. For a long time, LLVM did not emit accurate unwind info to describe the prologue/epilogue (and still doesn't on some targets), so I was under the impression it wasn't possible. Clearly, it is, and it's implemented on x86.
Err, sorry, please pretend I didn't write that. I have no idea what I just wrote.
Generally, unlike the formats used on Windows etc., DWARF unwind isn't accurate in the prologue. I mean, you could add separate unwind info for each relevant instruction, but that would be a lot of data, and we currently don't have any option to do that.
Mon, Mar 15
Mar 10 2021
Mar 3 2021
Feb 24 2021
I'd be satisfied with a description of the issue in Coroutines.ts. Having a description of the problem is the first step to fixing it.
Feb 18 2021
I want to see changes to LangRef and/or the coroutine documentation to describe the semantic restriction. If there's a correctness issue, it clearly isn't specific to LICM, so I want to see the rule described in general terms. And please ping llvm-dev when you have it written up.
Dec 18 2020
@rsmith, can you look at the changes to overloading? I haven't looked at that code in a very long time.
Dec 17 2020
Dec 15 2020
LGTM with one minor comment.
The approach looks reasonable. Please ensure this this documented in LangRef.
Dec 8 2020
I'm not sure this direction really makes sense.
Nov 30 2020
Oct 30 2020
Oct 29 2020
This looks like it should be at least two separate changes; I think the AArch64ISelLowering change should have some impact on its own.
Oct 28 2020
but for some reason the matching does not work for f16
Maybe add a testcase for -0 in case someone else makes the same mistake?
Maybe add a testcase for -0 in case someone else makes the same mistake? Otherwise LGTM
Oct 27 2020
I don't think this is likely to significantly impact most out-of-tree users; no need to temporarily deprecate.
We could turn this into a more general combine, We use fmov from a GPR to materialize fp constants in other cases. But maybe just zero is fine to start.
We should have corresponding handling for va_arg. (clang doesn't produce va_arg instructions, but other frontends do.)
Oct 22 2020
I think getNodeIfExists() should have the same semantics with respect to flags as getNode().
Sorry I didn't spot this sooner, but the current version of the memcpy-to-temp still isn't quite right. The operations aren't in the right order: you have to do all the copies to temporaries before any of the copies to arguments.
I'm worried the usage of ANY_EXTEND and TRUNCATE in D89950 is going to cause a mess in the future. LegalizeDAG doesn't know how to legalize most vector operations; LegalizeVectorOps handles them. If someone tries to use "Promote" on a target where those operations aren't legal, it'll break. And then someone will try to "fix" LegalizeDAG, and cause other problems.
At a high level, it should be fine to assume an indirect call without the "convergent" attribute isn't convergent. If the language rules for some language say the callee of an indirect call might be convergent, the frontend can add the convergent attribute to that call. (Unlike most attributes, it's a negative attribute: it restricts optimizations, and attribute inference optimizations would remove it,)
Oct 21 2020
This is not PowerPC specific patch, though ppcfp128 could be expanded.
Almost all the new setDoesNotCapture calls are wrong. If a call returns a pointer base on an argument, it can't be marked nocapture.
Oct 20 2020
Rebase and clang-format.
A couple special cases might improve performance:
but the common instructions are moved to the preceding block, and the IfConversion feasibility is then re-evaluated before any predication is performed.
Phabricator doesn't track incremental patches the way you're uploading them; please upload the full diff against master.
First of all, we cannot drop the readnone tag in the definition of pthread_self in Clang, the regression in the non-coroutine cases are likely unacceptable and they shouldn't pay for it if not using coroutines.
Oct 19 2020
but that would also mean we would never be able to optimize out redundant pthread_self() calls
For the first point, if the IR definition isn't consistent, I'd prefer to actually fix that, instead of work around it. There are a lot of places that assume alias analysis is accurate.
Oct 18 2020
Oct 17 2020
Oct 16 2020
I'm a little suspicious that the DAGCombine changes don't have any effect on x86 regression tests; are we missing test coverage?
I think this looks good. But I'd like a second opinion since this is a substantial change to SCEV modeling.
Assuming we have properly typed SCEVs, we can pretend the high bits of the pointer don't exist in a pointer SCEV. We can't pretend they don't exist in an integer SCEV: that leads to miscompiles. This is why SCEVPtrToIntExpr, specifically, needs special handling.
I'm not sure I completely understand the history from https://reviews.llvm.org/D42123. But I think the idea is that for pointer-type SCEVs, we want to do SCEV arithmetic using the index type because the other bits aren't relevant: from my understanding, they don't participate in pointer arithmetic.
Oct 15 2020
Also update the documentation? See https://clang.llvm.org/docs/UsersManual.html#differences-between-various-standard-modes .
This seems like the right approach. It feels weird, but I can't think of anything better.
Remove the behaviour that the calling-convention is non-C when there's a mismatch between the triple and the float-abi and do something different to make sure we end up with the right calling convention in the end (maybe have -mfloat-abi implicitly modify the triple like -march and -mcpu do).
Makes a lot more sense now. LGTM
Oh, also, are you planning to implement the SCEV simplifications in this patch, or a followup? I can see reasons to go either way, I guess. I'd weakly lean towards implementing them here, so we can avoid churn implementing handling for cases that can't happen after simplification.
Oct 14 2020
In general, if TargetLibraryInfo says a library function is available, that means it's available with the C calling convention. Copying the calling convention from some other call shouldn't be necessary. In general, there is no call to copy the calling convention from.
Oct 13 2020
Try running opt -analyze -scalar-evolution on the following:
Statistics changes for CTMark, this patch w/o maxobjsize deduction.