- User Since
- Aug 10 2016, 1:07 PM (132 w, 2 d)
Please read http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface and repost the patch.
Thu, Feb 21
Is that up for debate in case this is an edge case that is unlikely to be hit?
I think the general approach is still fine. Given we have funnel shifts, we might want to reassociate to form funnel shifts, rather than just rotates, on targets which have native funnel shift instructions. (We'd still want to prefer rotates where possible, I think.)
Do we need special handling for fp16 NEON instructions in Thumb mode? I think the compiler won't predicate them anyway (it's deprecated, and some CPUs have known errata; see ARMBaseInstrInfo::isPredicable), but it might affect the assembler.
So I guess overall, there are three fixes here:
Yes, we should error in the backend, regardless of what clang does, in case some other frontend is missing the appropriate diagnostics.
Wed, Feb 20
Sorry about the delay.
Are you sure that Bugzilla link is right?
Needs a testcase.
Tue, Feb 19
The atomics documentation is out of date; the implementation was changed to lower "unsupported" atomic operations to __atomic_*& calls.
What's the general state of the LTO pipeline at this point? PassManagerBuilder::addLTOOptimizationPasses is adding passes in a really weird order (in particular, the placement of the inliner is really strange). Has anyone looked at it recently? Would it be worth killing it off in favor of the ThinLTO pipeline just so we don't have to maintain it?
I can understand why tests that use -O1 or -O2 would produce different results with the new pass manager, but it looks like not all the tests are like that. Do you know why those tests are failing?
It's odd nobody has run into this before, but LGTM with minor nits.
Yes, we fold loads from alloca to undef. (In GVN like you mention, but also in mem2reg.) LangRef should state memory allocated with alloca is uninitialized, and that loading from uninitialized memory produces undef; if either of those is missing, patch welcome to fix it.
Fri, Feb 15
I'm concerned that "nocapture" is not sufficient to describe the necessary property here.
Thu, Feb 14
I think this is in good shape.
Generally we should prefer to perform combines in DAGCombine in cases where it's straightforward. (There are a few scattered optimizations in legalization, but mostly for cases that can't simplify after legalization, like libcalls.)
Wed, Feb 13
Formally, I don't think C11 is a normative reference for C++11 or C++14, only C++17 (see [intro.refs] in the standard).
Tue, Feb 12
The official documentation still says "Your app must perform runtime detection to confirm that NEON-capable machine code can be run on the target device" (https://developer.android.com/ndk/guides/cpu-arm-neon#runtime_detection). Is that wrong?
Even if you just want to trigger the patched code, the smaller test case still need to be at least 16KB.
This looks essentially fine, but I'd like to see some basic test coverage for the changes to warnings and constant evaluation.
Mon, Feb 11
Did you mean declare as a target feature in RISCV.td or I misunderstanding something?
(I think we can improve the generated sequences for some of these, but you obviously don't need to do that here.)
and+rev16/rev32 isn't really any better than rev+lsr; that's fine as far as it goes. But please make sure we have coverage for cases where the zero-extension is free (e.g. the operand is a load, or a zeroext value, or the result of i32 arithmetic).
Fri, Feb 8
But since this block itself shouldn't be reachable this is pointless.
Without this patch is the fneg instruction getting turned into a subtraction instruction? I hope not on any platform!
I guess this is fine. LGTM
Thu, Feb 7
LGTM. Do you want me to commit this for you?
Yes, it probably makes sense to high vector extracts in some cases. Actually, for some loops, it might even make sense to introduce an extra shuffle outside the loop to avoid a high-vector extract inside the loop.
I guess we can track inlining separately, if you want to merge this quickly to unblock the Chrome build. LGTM
I did some quick testing with MSVC; apparently it inlines the implementations of these functions when optimizations are on. We definitely want to support inlining these. Since these are commonly used in performance-sensitive code, I'd prefer to implement the required changes to CodeGenFunction::EmitMSVCBuiltinExpr now, rather than chase after weird performance regressions in the future.
I'd like to see testcases for all the possible add/sub opcodes, if it isn't too much work.
Should clang IR generation be lowering these to bswap calls anyway? Even if the function technically exists in the ucrt, it's going to be pretty slow to call it.
Wed, Feb 6
It's sort of weird to have an instruction that has a predicate operand, yet can't be predicated... but I guess if that's the way the ISA is constructed, we should respect it.
Tue, Feb 5
That case is specifically interesting because when it's split, each half is lowered to a different shuffle, and both shuffles are expensive.
Mon, Feb 4
In test/CodeGen/arm64-microsoft-status-reg.cpp, you can write something like // CHECK-IR: %[[VAR:.*]] = call i64 @llvm.read_register.i64(metadata ![[MD2:.*]]), then // CHECK-IR-NEXT: store i64 %[[VAR]] on the next line. See also http://llvm.org/docs/CommandGuide/FileCheck.html .
Yes, we should fix CodeGenFunction::EmitAArch64BuiltinExpr to eliminated the unnecessary calls to CreateZext/CreateTrunc. (With this patch, they're no-ops, but better to clean up the code.)
How do we actually want this to work for LTO? Would it make sense to encode this on global variables somehow, to allow different thresholds for different source files?
Missing testcase changes. (It should be possible to check that we aren't inserting incorrect truncation/extension operations in the IR.)
In terms of a testcase, can you check the memory operand in MIR?
Sun, Feb 3
In terms of the general approach, what problem are we really trying to solve? The ability to forward a variadic argument list on to another variadic function doesn't grant any semantic power if you control the implementation; you can always just use a va_list instead (e.g. convert printf to vprintf). In C++, you can also just use a variadic template. And if the goal is just to allow the compiler to emit fortified calls to known library functions, it would be more straightforward to add a flag to implicitly instrument code.
Sat, Feb 2
Maybe a silly question, but is this actually worth implementing, at this point? The new IR additions have unusual semantics that transforms are continually going to trip over; I'm not convinced you found all the places that currently need checks. And there isn't any existing code we need to be compatible with, I think. If we were designing a frontend extension from scratch, we would probably take a different approach.
Fri, Feb 1
Forgot about protected scopes...