- User Since
- May 24 2016, 8:35 AM (155 w, 6 d)
LGTM. Thanks for making the fix.
Thu, May 16
LGTM. With one comment that I will leave to you for what you think is best.
Wed, May 15
Sorry guys, I was distracted by some other work.
Tue, May 14
Mon, May 6
Wed, May 1
I was under the impression that the shift was by the bottom byte amount. i.e the mask is 255, and a shift of 256 is the same as a shift of 0. I have not tried it though.
This is true for NEON vectors, not scalars, as far as I can tell.
I was under the impression that the shift was by the bottom byte amount. i.e the mask is 255, and a shift of 256 is the same as a shift of 0. I hve not tried it though.
Tue, Apr 30
Hello. I added a constant address case in rL358114, and a couple of other tests for minsize in rL358128. I was trying to look into X86 tests, but the addressing modes being so extensive makes it difficult. There will possibly be some RISCV testcases out of D60294 too.
Mon, Apr 29
Sun, Apr 28
I claim that it's better to make it obvious that these checks are slow, and make developers either guard them with EXPENSIVE_CHECKS or opt-in with Fast.
Looks like there is similar code in DAGCombiner that may need a volatile check too, if you search for isMultiStoresCheaperThanBitsMerge.
That being said, I don't think that people should asserting correctness of domtree within transforms, unless there's a strong need. This is generally not cheap, even at the Fast level, and because of that I think that if someone really wants to have a fast domtree verification in an assertion, they should make it explicit. Historically, Fast had been the default, and the running time penalty started to pile up, as many places assumed it's very cheap to perform. It's much easier to realize these assertions are fishy if they are not usually-cheap-but-not-quite by default.
Yeah, sounds like a good idea to me. I almost sugested the same thing. There are a load of tests that should still be using Full (i.e. the domtree construction still needs to be checked with basic/full).
Hello. I feel like this splitting just shouldn't be done for volatile stores.
Hello. I think it's fine to always use DominatorTree::VerificationLevel::Fast, the others are more about checking that the DomTree construction code was correct. Fast will compare against a new tree, so for updates will check the the tree is correct.
Fri, Apr 26
Wed, Apr 24
Tue, Apr 23
Since there aren't that many jump tables, the increase in code size is negligible.
With rL358845 in, the remaining results I ran look good. Consider us out of your way.
Apr 21 2019
Apr 17 2019
I don't think we here care about auto-updating, not supporting bitcode/lto libraries.
Apr 16 2019
OK. Sounds good to me. We can always change it if necessary.
Apr 15 2019
What are you thinking is best here? I agree that lifetime.end shouldn't prohibit optimisations. Do you think it's best to fix asan and go with this as-is, or to try and prevent sinking if it will only sink a lifetime.time?
I agree with this. Does it have a test anywhere?
Apr 11 2019
Hello. This looks like an interesting patch. Thanks for working on it. I ran some numbers and on Thumb1 targets (where resources are generally very constrained) this looks like a nice improvement.
Apr 10 2019
Thanks. I've reverted in r358113 and will attempt to add some extra tests in.
The patch makes SelectionDAG see more context, so it provides more optimization opportunities.
Apr 8 2019
I'm seeing a bit of an annoying error after this commit. On a native armv7a build, using this command line:
cmake -GNinja /work/llvm/llvm-project/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLIBCXXABI_USE_LLVM_UNWINDER=ON -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi;libunwind" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ '-DCMAKE_C_FLAGS= -target armv7a-linux-gnueabihf -march=armv7a' '-DCMAKE_CXX_FLAGS= -target armv7a-linux-gnueabihf -march=armv7a' ninja cxx
The LIBCXXABI_USE_LLVM_UNWINDER seems to be important. I see an error like:
projects/libcxxabi/src/CMakeFiles/cxxabi_shared.dir/cxa_vector.cpp.o: In function `__cxa_vec_new': cxa_vector.cpp:(.text.__cxa_vec_new+0x28): undefined reference to `__aeabi_uidiv' projects/libcxxabi/src/CMakeFiles/cxxabi_shared.dir/cxa_vector.cpp.o: In function `__cxa_vec_new2': cxa_vector.cpp:(.text.__cxa_vec_new2+0x28): undefined reference to `__aeabi_uidiv' projects/libcxxabi/src/CMakeFiles/cxxabi_shared.dir/cxa_vector.cpp.o: In function `__cxa_vec_new3': cxa_vector.cpp:(.text.__cxa_vec_new3+0x28): undefined reference to `__aeabi_uidiv' clang: error: linker command failed with exit code 1 (use -v to see invocation)
On v7a, the hardware divide was an optional feature so we generate those __aeabi_uidiv calls. I don't know the usual libcxx build process well enough, a lot has changed around here recently. Should we be adding -lgcc somewhere, in order to provide references to these symbols? Or is there a better way to build this? We don't see the same error's elsewhere, I don't think.
What does this do for codesize?
We are seeing a lot of codesize increases and performance changes from this patch, unfortunately. Consider this code, which I believe is fairly common in an embedded context:
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "thumbv6m-arm-none-eabi"
Apr 6 2019
Apr 3 2019
We saw some good looking codesize decreases from this, by the way. Just to add to the "this patch does good" side of the argument :)
Apr 1 2019
Looks like a good improvement to me.
Mar 29 2019
Do these need tests? Or does that come later once they do something?
Mar 28 2019
Mar 26 2019
Thanks. I had another look through and think this is in a really good state. I'm very happy for it to go in.
Yeah, OK. This looks like a good patch to me. As I said, I'm not a clang expert, but the code looks sensible enough. (Perhaps wait a couple of days in case others have objections.)
Looks like a nice improvement to me.
I don't pretend to know a huge amount about the frame lowering code, but this looks fine to me.
Mar 24 2019
Hello. I also don't feel very familiar with clang, but had a poke around and I think it looks pretty good. I see unroll and jam is being awkward again.
Oh, forgot to mention, I am looking forward to getting this in tree, and think it's very close. If we can sort out the last couple of things here, I think it would be good to get it committed so we can have a play around and sort out some of the questions like where in the pass pipeline it fits.
Mar 23 2019
From my understanding of TTI, LGTM.
Mar 22 2019
LGTM, thanks for putting the fix together.
Mar 21 2019
Mar 17 2019
Mar 14 2019
Removed isARMLowRegister check, reversed the condition of DestOffset - BrOffset, and added some Kill flag handling and tests.
Now with some mir tests.