- User Since
- Aug 10 2016, 1:07 PM (105 w, 18 h)
The C++ standard just says "An integer literal of type std::size_t whose value is the alignment guaranteed by a call to operator new(std::size_t) or operator new(std::size_t)." I would take that in the obvious sense, that any pointer returned by operator new must have alignment __STDCPP_DEFAULT_NEW_ALIGNMENT__.
Tue, Aug 14
Mon, Aug 13
LGTM, but I'd feel more comfortable if a second set of eyes looked at this.
LGTM. (Do you want me to commit this for you?)
Fri, Aug 10
Code change LGTM, but someone familiar with the MIPS backend should review the test changes.
It might be worth discussing on llvmdev the question of whether it makes sense to add a flag to clang to control general undefined-behavior diagnostics, as a sort of best-effort thing when the optimizer finds it. We have lib/Analysis/Lint.cpp, but there isn't any clang flag to turn it on. Undefined behavior diagnostics are not something that would ever be reliable in the sense of consistently printing the same diagnostics across compiler versions/optimization levels, and it would always have false positives, so I don't think we would ever turn it on by default. But it could be useful to help users figure out "why did my code disappear" when the compiler generates something unexpected.
Example of what I mean; build with clang --target=hexagon -S -O2:
There's nothing wrong with simplifying code that you've proven has undefined behavior; we do that all over the place in LLVM. We already have code that specifically detects invalid loads/stores in instcombine (although it currently doesn't try to check alignment). That said, the Hexagon backend still has to be prepared for the possibility that some later pass will introduce the sequence "r0 = ##74565; r0 = memw(r0+#0)", because each of those instructions is independently valid. So this is just papering over the underlying compiler crash.
Thu, Aug 9
I guess we could assume anyone using memory-mapped hardware will only do it with legal types, so reducing the number of operations for illegal types doesn't matter. But please add a comment here (and to the related store transform) explaining that.
Your patch miscompiles the following:
Somewhere around CompilerInvocation::CreateFromArgs is probably appropriate.
Seems fine to me.
Do we need to check for homogeneous aggregates of half vectors somewhere?
Fix the pattern to generate a load with the misaligned address? That already happens. The problem is that we have passes that make changes based on the assumption that the instructions are valid.
Converting the memory operation to an explicit trap seems overly complicated, as opposed to just fixing whatever isel pattern is assuming the immediate is divisible by 4. But I don't know the Hexagon backend that well.
For volatile loads/stores, we should probably try to preserve the number and addresses of memory operations. If the source and destination types would lower to equivalent memory operations, like i32 vs. f32 on x86, the transform is fine. If they wouldn't, like f64 vs. i64 on x86-32, the transform is dubious; memory-mapped I/O can respond in strange ways if you don't use the right access width.
Wed, Aug 8
The computation of %LHS.HI * %RHS.HI is only necessary to compute the overflow bit.
We generally prefer not to emit diagnostics for undefined behavior from the backend because the code might be dynamically unreachable (and the user might not be able to do anything about it if the code is unreachable because the compiler cloned the code).
Please choose a different name for the tests; "muloti2.ll" isn't usefully indicating what the files actually test.
Oh, oops, I should have spotted that. No, the point of this was to separate the timing infrastructure, not to change the behavior of -ftime-report.
Tue, Aug 7
Updated to use a pass after isel to convert a call in tail position to a tail-call, rather than try to do the opposite in frame lowering. I'm not completely happy with this; the pass is a lot more code than I'd really like. But I don't have any better ideas.
I'd like an explanation for why the generated code is changing for AArch64... generating extra copies clearly seems like a downside. And there isn't any obvious reason for this change to impact register allocation: on AArch64, all i32 register classes contain exactly the same set of allocatable registers.
We should handle f16 and vectors of f16 in a consistent manner, for the sake of maintaining the code in the future. (Either handle both in the backend, or handle both in clang.)
Mon, Aug 6
This version makes a lot more sense. :)
SelectionDAGBuilder::visitSelect can also generate FMINNAN etc.; please add testcases.
I mean, which of the callers of startFrontendTimer() is calling it with a pointer to std::declval()?
What happened to this patch? It looks like it's a fix for a miscompile?
I would very much love to add a test that actually executes some machine code with a number of test vectors, but I believe no such test family exists in LLVM.
Fri, Aug 3
i16_cmpz specifically is a little weird... TargetLowering::SimplifySetCC has a transform which specifically impacts that case, but only if the immediate isn't legal. Maybe I can come up with something else to exercise the uxtb codepath.
This fix the bug reported in https://reviews.llvm.org/rL337903
getSCEV is still recurses without a limit, so this still blows out the stack, and gets very slow for complex expressions.
I think you need to add something like "dereferenceable(4)" or "byval" to the parameter. so it's recognized as dereferenceable.
Please fix the tests so they don't strcmp against uninitialized memory. Probably the simplest way to do that is use an argument instead of an alloca.
SmallVector has a method set_size to change the size of the vector without initializing the elements; you don't need to switch to unique_ptr just for that.
Thu, Aug 2
"0.0040" is four milliseconds? You're probably crediting time incorrectly, somehow. Can you tell which FrontendTimeRAII the time is coming from?
Needs tests for hardfloat ABI, in addition to soft-float.
Add to BitCode/compatibility-6.0.ll test and just keep this codegen one too?
Perhaps I can create a MIR test case with the SEH opcodes, and use llvm-objdump to check the .xdata section?
So, why was it added?
I'd like to get this right... both the CheckRetType check, and the "infinite loop" check from optimizeUnaryDoubleFP are probably relevant. Can we share code between the two functions?
Tue, Jul 31
These clearly shouldn't be target-specific.
I'm not really a fan of the LegalTypes boolean; it's just asking for people to pass the wrong value. Can we just check whether the type is legal, or something like that?
Yes, that's the right list of functions. Like the discussion noted, we don't really want to match gcc's behavior here because it breaks legacy code for little practical benefit.
I don't think we can reasonably do this for the functions which take a pointer+size pair; see http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html etc. Maybe we can mark specific calls where we know the size is nonzero.
I think there's something weird happening with your timers if they're showing half a second spent parsing or instantiating std::declval: it doesn't have a definition, anywhere (it's just a placeholder for template metaprogramming tricks).
Can we make <4 x half> a legal type on all subtargets with NEON, and just mark all the operations "expand" for subtargets which don't support math on it?
Not sure how often that shows up, but yes, if you can prove the the pointer is dereferenceable, it's okay to transform to memcmp.
Sure, I can add a few more tests.
Please don't commit patches on behalf of someone else if the author hasn't specifically requested it. There might be some issue which the author forgot to note on the review.
Mon, Jul 30
Yes, that's right. (Well, except that ISD::ROTL specifically is defined to wrap in the obvious way.)
If you want to expand strcmp inline, probably the best way to do that is to extend the existing ExpandMemCmpPass pass.
Oh, okay, that makes more sense.
This is not correct: consider, for example, strcmp("xx", "x").