- User Since
- Aug 10 2016, 1:07 PM (70 w, 2 h)
Instead of scattering casts all over the place, can we just fix TargetLibraryInfoImpl::isValidProtoForLibFunc to reject libcalls which use integers with the wrong width?
Adding some reviewers to hopefully find someone comfortable reviewing the Linux multilib bits. (Not sure I'm adding the right people; this stuff hasn't been touched for a while, as far as I can tell.)
However, I do wonder, is the attribute propagation semantically correct?
How does this interact with https://reviews.llvm.org/D40863 ?
No, that ends up doing the wrong thing.
How to work around this?? The intended instruction is sub sp, sp, x15, lsl #4
I guess I didn't explain my comment completely in the other review. The point is that you submit the new tests with CHECK lines that pass on master, then modify CHECK lines in the second patch, to make it clear how the patch modifies the generated code.
You still never answered the question of what happens when there's multiple indirectbrs whose edges can't be split.
Minor typo: I meant "one function attribute set by clang", not "one clang attribute".
If it finds any vector code, the pass would override the prefer-vector-width attribute to the length of the widest vector in that code.
Mon, Dec 11
After this patch, I plan to start using this subtarget feature in X86ISelLowering.cpp to tell the type legalizer and assorted lowering code not to use 512-bit vectors.
Generally looks fine, but probably needs a pass from a PPC reviewer.
Oh, I see. Yes, sign-extending should work: https://rise4fun.com/Alive/Ux9 . Please add a your explanation as a comment (and fix the code so it actually sign-extends).
Fri, Dec 8
This might be something we want to turn on automatically for -Og.
Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.
It's interesting to me that these array-bound checks don't seem to use @llvm.objectsize in some form already.
I can't see how that DAGCombine transform is actually profitable if the type isn't legal... but I guess we might as well just fix the legalizer so we don't trip over it again.
Wed, Dec 6
I can reproduce the crash with the following:
LGTM. We clearly should be canonicalizing expressions like this, and DAGCombine can reverse the transform if necessary.
Okay, then this LGTM.
Does ReplaceAllUsesWith need to propagate changes to the "IsDivergent" bit?
Yes, that looks correct.
This change isn't a no-op: the code inside the if statement drops the calling convention/attributes/metadata/etc. on the call. Is that intentional?
Tue, Dec 5
As testcases with hundred of arguments are unwieldy
Mon, Dec 4
It's an improvement in that it can handle single indirectbr predecessor cases.
What is the best way to modify the code for this compiler change ?
The part which seems to be throwing you off is "or the class member access expression refers to a member of an unknown specialization". A "member of an unknown specialization" is defined in [temp.dep.type] in the standard.
Looking at LegalizeDAG, when we expand a TruncStore we do it as a store of a truncate which IIUC is strictly bitvector-level interpretation.
Fri, Dec 1
Looked a bit more; we're running out of registers because we don't consider r12 an allocatable register in Thumb1 mode. Changing that would be a project way outside the scope of this bugfix, so this is fine as-is for now. LGTM.
On a sort-of-related note, we currently miscompile the following:
Why are we running out of registers here? We should copy the function pointer to r12, like we do for Thumb2 or ARM.
Thu, Nov 30
Larger than 128 doesn't improve performance or increase unrolling -- at least on SPECcpu2017, libquantum and Google Protobufs, on T99.
The PGO gen/use passes currently fail with an assert failure if there's a critical edge whose source is an IndirectBr instruction.
128 is a really big number for LoopMicroOpBufferSize. You might want to consider modifying AArch64TTIImpl::getUnrollingPreferences with some more tailored heuristics.
Still worried about the effect on tools which parse clang diagnostics... please send a message to cfe-dev. Hopefully we'll get responses there.
_Float128 is only *sometimes* the same type as __float128.
I'm still kind of concerned that this is messier than it needs to be. I guess I'll try experimenting a bit.
Wed, Nov 29
You probably just need to pass a "-triple" flag so we don't use Windows mangling or something like that.
By many years of precedent, the "frem" instruction is supposed to match the C fmod(), as opposed to something else like the C99 remainder(); probably worth clarifying in LangRef.
Mon, Nov 27
Does this support any transforms which aren't supported by the memdep-based DSE?
I don't think we need any legality checking because the new insert is identical to the existing insert except that it may have a different constant insertion operand.
Do you know how much impact this has, in the sense of making this transform trigger more often in benchmarks?
Wed, Nov 22
Rather than trying to prevent the flattening from happening in the first place, it might make more sense to reverse it later in the pass pipeline. That would be more general, and avoid blocking other optimizations. The x86 backend already does this in some limited cases; see X86CmovConversion.cpp.
Please post patches with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface)
Tue, Nov 21
LGTM, but please test your patches more carefully in the future.
Mon, Nov 20
Oh, I see, for some silly reason there are actually *three* -mfloat-abi options: hard, soft, and softfp. hard means float instructions and a hard-float calling convention, soft means no floating-point and a soft-float convention, and softfp means float instructions and a soft-float convention. This is probably worth clarifying with a comment.
-mfpu controls what floating-point/vector instructions the compiler generates. -mfloat-abi controls whether floating-point arguments to functions are passed in floating-point registers. These are completely independent, and we need to support using both of them together to generate NEON code with a soft-float ABI. (Android is built like this.)
Fri, Nov 17
That said, this transform seems straightforward enough. LGTM.
From the perspective of canonicalization, it seems kind of weird to have "%x uno %y" rather than "(%x uno 0) | (%y uno 0)". I mean, the former is probably cheaper to lower, but it leads to a lot of special-case logic to handle the equivalency.
So now you make all returns (predicated or not) non-analyzable from analyzeBranch's perspective?
I think this is okay for now; we can clean it up when the backend is fixed.
Thu, Nov 16
This would be a lot simpler if we could disregard errno when we have relaxed FP math, but I'm assuming it's possible to do something like '-ffast-math -fmath-errno' and still expect errno to be set for domain errors?