User Details
- User Since
- Jun 10 2015, 2:25 AM (305 w, 6 d)
Jan 30 2021
LGTM, thanks @rsanthir.quic !
Jan 29 2021
Jan 5 2021
Jun 23 2020
LGTM with a nit. Can you also remove FPRegs16Pat from ARMInstrFormats.td now that is no longer used?
Hi Simon, thanks for working on this. Looks good overall. A few remarks inline.
Jun 18 2020
Rebased
Addressed last round's review comments.
Jun 17 2020
Changes from last revision:
- the code generation relies on fullfp16 being present,
- the unit test also checks the codegen for soft float abi
Jun 16 2020
Jun 11 2020
- Separated the fp16-specific codegen patterns in https://reviews.llvm.org/D81505.
- Removed the bfloat type handling from isHomogeneousAggregate since it is not considered as one according to the AAPCS reference.
- Rebased on top of https://reviews.llvm.org/D75169.
- Added some very basic tests.
Jun 9 2020
Jun 8 2020
Hey Oliver, thanks for looking at this.
I believe the codegen patterns for vmov and load/store half are incorrect on the bf16 type. Can someone suggest what is the right approach?
Jun 4 2020
Jun 3 2020
Jun 1 2020
May 28 2020
May 26 2020
Should poly128_t be available on AArch32 too? I don't see anything in the ACLE version you linked restricting it to AArch64 only, and the intrinsics reference has a number of intrinsics available for both ISAs using it.
It should but it is not that simple. The reason it is not available is that __int128_t is not supported in AArch32. I think that is future work, since this patch unblocks the bfloat reinterpret_cast patch, which btw is annotated with TODO comments regarding the poly128_t type for AArch32.
May 20 2020
Jan 20 2020
Sep 29 2019
ARM and AArch64 have a way to list the implied target features using the TargetParser but we can't directly use that in CodeGenModule because it's tied to the backend.
Sep 27 2019
However, passing the AArch64 architecture names in target-cpu isn't supported by LLVM
The Clang documentation suggests that arch is used to override the CPU, not the Architecture (which is rather confusing if you ask me). GCC makes more sense having separate target attributes for CPU and Architecture (see the equivalent GCC documentation). I think target-cpu should remain generic when it is not explicitly specified either on the command line (-mcpu) or as a function attribute (i.e target("arch=cortex-a57")). However, if the function attribute specifies an Architecture (i.e target("arch=armv8.4a")), I agree we should favor the subtarget features corresponding to armv8.4 over those of the command line. Similarly we should favor the subtarget features corresponding to cortex-a57 (not sure if we do so atm - I think we don't). ARM and AArch64 have a way to list the implied target features using the TargetParser but we can't directly use that in CodeGenModule because it's tied to the backend.
Sep 26 2019
Updated the Filecheck labels as suggested.
Sep 25 2019
Sep 23 2019
I am not sure I am following here. According to https://llvm.org/docs/Atomics.html the AtomicExpandPass will translate atomic operations on data sizes above MaxAtomicSizeInBitsSupported into calls to atomic libcalls. The docs say that even though the libcalls share the same names with clang builtins they are not directly related to them. Indeed, I hacked the AArhc64 backend to disallow codegen for 128-bit atomics and as a result LLVM emitted calls to __atomic_store_16 and __atomic_load_16. Are those legacy names? I also tried emitting IR for the clang builtins and I saw atomic load/store IR instructions (like those in your tests), no libcalls. Anyhow, my concern here is that if sometime in the future we replace the broken CAS loop with a libcall, the current patch will break ABI compatibity between v8.4 objects with atomic ldp/stp and v8.X objects without the extension. Moreover, this ABI incompatibility already exists between objects built with LLVM and GCC. Any thoughts?
Sep 20 2019
Hi Tim, thanks for looking into this optimization opportunity. I have a few remarks regarding this change:
- First, it appears that the current codegen (CAS loop) for 128-bit atomic accesses is broken based on this comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814#c3. There are two problematic cases as far as I understand: (1) const and (2) volatile atomic objects. Const objects disallow write access to the underlying memory, volatile objects mandate that each byte of the underlying memory shall be accessed exactly once according to the AAPCS. The CAS loop violates both.
Jul 14 2019
Jul 4 2019
Added the dependency of mve on dsp and some missing tests to cover those cases.
Jul 3 2019
Jul 2 2019
Jul 1 2019
I've split the patch.
@simon_tatham, thanks for clarifying. I think my change is doing the right thing then: favors the -mfpu option over the default CPU features. I will split the patch as @ostannard suggested.
Jun 28 2019
Dec 17 2018
Committed as https://reviews.llvm.org/rL349338
Dec 12 2018
I've tested the patch with native builds of the llvm-test-suite on an AArch64 Cortex-A72 and couldn't spot anything interesting in terms of compilation time.
Dec 11 2018
Ping
Dec 4 2018
Nov 30 2018
Looks fine. Thanks!
Nov 28 2018
Nov 6 2018
Nov 2 2018
Rebased and clang-formatted.
Oct 29 2018
I've autogenerated the filecheck lines to show the diff compared to the trunk codegen. For making sure we never fall-through to the next block, having changed the CC but not swapped (N2, N3), I've moved all the preconditions to the beginning of the block (instead of moving the block into a helper function).
Oct 24 2018
Oct 15 2018
Oct 12 2018
Sep 26 2018
Not ready for review. Using this as reference to and RFC in llvm-dev.
Sep 12 2018
Sep 7 2018
Aug 28 2018
Rebase rL338240 since the excessive memory usage observed when using GVNHoist with UBSan has been fixed by rL340818 (https://reviews.llvm.org/D50323).
Apologies for delaying this, I was out of office. I'll rebase and push it asap.
Aug 10 2018
Aug 9 2018
So, is everyone happy with this change?
Aug 7 2018
Aug 6 2018
This got reverted because of an out-of-memory error on an ubsan buildbot. Details and fix here -> https://reviews.llvm.org/D50323. I'll update the tests upon rebase.
Jul 30 2018
Did you test it with some benchmarks? Results?
I am running lnt, spec2000 and spec2006 on AArch64 at the moment. I'll post results soon.