- User Since
- Aug 10 2016, 1:07 PM (150 w, 8 h)
I think IEEE-754 does define this
there is a possibility of a miscompile when threading an edge across the LH
Is this a general comment or referring to a change in Kevin's patch?
Maybe mention in the commit message that this is explicitly not addressing target-specific code, or calls to memcpy?
The problem with poison is that it eventually leads to UB, and then your program has no defined meaning. Practically, it might mean some codepath that involves a call to llvm.experimental.constrained.fptosi.i32.f64 could get folded away because it's provably UB, or something like that.
It looks like the arm64-neon-2velem.ll regressions are a shuffle lowering issue, yes; we're creating a DUPLANE where the operand is an extract_subvector, and it doesn't simplify.
What happens if the input float is out of range? fptosi/fptoui instructions produce poison; not sure if you want that here.
Fix Thumb1FrameLowering::emitPrologue and Thumb1FrameLowering::emitEpilogue so they don't scavenge a register inside of frame setup/teardown. Instead, just pick a register we know is available.
The third instruction in the function looks like it is using r6 as the base pointer to save one of the argument registers on the stack, but r6 isn't set until further down.
The problem here is that the IR semantics don't allow this transform in general.
Sorry, I meant to reply to this earlier. If you've tested the performance on PowerPC, it's probably fine. ARM has fewer registers, but not so few that it's likely to cause problems here.
Mon, Jun 24
Improved comment on ExtraCSSpill.
Fri, Jun 21
It looks like you didn't change all the uses of isFast() in optimizePow?
It would be nice to use the exact necessary fast-math flags here, while we're thinking about it, instead of just "isFast()". From the discussion, it seems like we only need "afn"?
I believe it's the default strategy for O0 AArch64 codegen these days
Thu, Jun 20
Tests committed in r363991 (https://reviews.llvm.org/rL363991)
I think this is sound now.
The testcase I actually wanted, which incorrectly forms a tail call:
Wed, Jun 19
Should clang translate that call into "sqrtf" to be more accurate?
In AVR, it uses push instruction to store arguments. Therefore. the order of store(push) instruction can't be changed.
So we would need to mark every instruction that uses any part of DSPControl as having an post-isel hook and there are more than 100 instructions
How do we test that?
Tue, Jun 18
I'm just laying out the basic requirements for getting this patch back in, because the current patch is invalid given LLVM's current requirements.
If we're going to insert emms instructions automatically, it doesn't really make sense to do it in the frontend; the backend could figure out the most efficient placement itself. (See lib/Target/X86/X86VZeroUpper.cpp, which implements similar logic for AVX.) The part I'd be worried about is the potential performance hit from calling emms in places where other compilers wouldn't, for code using MMX intrinsics.
AVR intentionally makes the C type "double" a single-precision float ("float" in IR). So we could fail to recognize "fabs" etc. on AVR with this patch, I think?
Are you saying that using MMX in LLVM requires source-level workarounds in some way, and so we can't lower portable code to use MMX because that code will (reasonably) lack those workarounds?
Now, we could theoretically use a different ABI rule for vectors defined with Clang-specific extensions, but that seems like it would cause quite a few problems of its own.
Mon, Jun 17
Switched to a single vector for mapping symbols. Switched isArmElf() to use getEMachine(). Updated title/commit message.
Thu, Jun 13
Also, it looks like you never cc'ed llvm-commits; please abandon this and post a new revision with llvm-commits properly cc'ed
Please fix the "summary" to include the full expected commit message.
Is the isLoopEntryGuardedByCond actually proving what you need it to prove? Even if Start-Stride is in the range [0, End), that doesn't necessarily imply Start-Stride doesn't overflow. For example, suppose Start is 0, End is -1, and Stride is 2.
For format-strings.c, I'm not really happy suggesting #if defined(__sun) && !defined(__LP64__), but I don't think the alternative is better. We could restrict the test so it doesn't run using a Solaris target triple, but we actually want coverage here: the difference in wint_t affects the semantics of "%lc", so we want some coverage of that path.
Wed, Jun 12
Addressed review comments. Added release note. I think this patch contains all the changes necessary to reflect the change to IR semantics.
I'd prefer to take this soon, and deal with the performance later, given this is a serious bug. I wouldn't be surprised if this is actually causing issues in practice, but nobody realized it because it's non-deterministic.
What happens if findOptimalMemOpLowering fails? We have other ways of lowering memcpy/memset that could also violate the "one store per byte" rule.
Tue, Jun 11
The general idea of restricting the intrinsics to specific contexts makes sense. I'm not sure it makes sense to mark expressions, as opposed to types, though; can we really expect the user to know which expressions to apply this to?
Mon, Jun 10
Address review comments, minor code cleanup, fix ConstantExpr::getAsInstruction, fix regression tests.
We definitely need afn for this; powi performs multiple intermediate rounding steps, so it can be significantly less accurate than pow.
Fri, Jun 7
In terms of the correctness of this patch, I'm specifically concerned that this patch has no equivalent to the SCCNodes.size() != 1 check in FunctionAttrs, so it will derive norecurse in cases where FunctionAttrs would not. For example:
Feel free to take it over, particularly since it looks like you have a testcase you can easily share.
Thu, Jun 6
We already fold llvm.is.constant to true early, in ConstantFoldScalarCall.
Instead of making LCSSA preserve StackProtector, could you run LCSSA before StackProtector, and make StackProtector preserve LCSSA?
This probably doesn't precisely match the generated code for all simple types, but it makes sense for at least i64/float/double. LGTM
Wed, Jun 5
Is it actually correct to derive norecurse like this?
I think fundamentally, the problem here is that the registers in question aren't being handled consistently.
Tue, Jun 4
It's legitimate for a target to reject addrspacecasts which don't make sense.
If you look at some targets, they provide multiple address-spaces which are essentially identical; see, for example, the implementation of isNoopAddrSpaceCast on x86. Not sure what the complete justification was, but it can be convenient if you need to dereference null.
For the cases involving sign bits, are we actually expecting the IR to look like this when we reach SelectionDAG? With some form of D62818, I would expect we end up with "icmp slt"...
Mon, Jun 3
Is it likely that we're ever going to be able to do anything useful with the fast-math flags?
It should be possible to write a testcase based on the simplification? Otherwise looks fine.
On the instcombine side, one thing worth noting which isn't called out in the commit message is the interaction with other instcombine patterns. In the testcase, note that the final IR actually doesn't contain any mask; instead, it checks icmp slt i32 [[SHL]], 0. Huihui, please update the commit message to make this clear.
I would like to avoid having "undead" registers, which are marked dead but are semantically necessary, at any point. That means never calling setIsDead(false); as part of instruction selection, even if the flags are eventually correct after post-isel hooks run. The reason for this is that otherwise it's very confusing for anyone reading the code, or MIR dumps, to understand how it's supposed to work.
Fri, May 31
I looked more carefully just as I was about to commit this, and discovered the hashing for MO_RegisterMask is also wrong. I think I'll fix that separately, though.
Oh, I see...
If you don't want to continue working on this, that's okay, I think. D62450 looks like it's the more interesting fix.
Can you add test coverage for some other target, maybe x86?
LGTM. I'm okay with working on adding more test coverage and fixes for the ADDCARRY cases in a followup.
We could possibly use a custom inserter to generate the vins sequence, but it would probably involve some benchmarking to make sure there aren't any unexpected performance penalties due to the weird register usage. So I'm happy to put that off for now.
The following testcase still crashes with this patch. Given that, I don't think it makes sense to merge this patch; even if it lets some current version kernel build, someone else is likely to hit the same issue in the near future.
Eventually I think the code in IsEligibleForTailCallOptimization will disappear if we make ip an allocatable register, but this seems okay for now.
Something I ran into when reviewing https://reviews.llvm.org/D62639 is that on AArch64, for varargs functions, we emit floating-point stores when noimplicitfloat is specified. That seems fine for -mno-implicit-float, but maybe not for -mgeneral-regs-only?