LLVM on RISCV (LucidCircuit)
clang/LLVM on ARM64 (Marvell).
Binutils, GCC
Soccer, Weightlifting
Photography
User Details
- User Since
- Sep 3 2013, 12:07 AM (497 w, 6 d)
May 28 2021
Jun 21 2020
Feb 27 2019
Feb 26 2019
Feb 19 2019
One minor update:
Feb 15 2019
Corrected manual sorting in TargetLibraryInfo.def as per comments.
Reverted change in TargetLibraryInfo.cpp re: exp10|exp10f|exp10l on Linux with GLIBC.
Feb 14 2019
Addressed comments from Renato:
Updated changeset to reflect comments/questions from Renato.
Added inline comments/answers to Renato's questions.
Feb 13 2019
Hi Renato,
Hi Renato,
Feb 6 2019
This is a small - but signifcant - update to the original changeset.
This is a significant update to the original version of the SLEEF
changeset.
Jan 20 2019
Thank you Hal and Renato.
Jan 17 2019
Jan 10 2019
Yes, I know, everyone was away for the holidays. :-)
Jan 9 2019
Yes, I realize people were on holidays. :-)
Dec 19 2018
Removed spurious patch for an unrelated change.
Dec 17 2018
Updated version of this changeset/patch, as per Renato's latest comments from D53927 (https://reviews.llvm.org/D53927).
Updated version of the patch/changeset, as per comments from Renato:
Dec 11 2018
Hi Renato,
Dec 10 2018
Nov 24 2018
Nov 20 2018
Nov 16 2018
Ping!!!
Nov 12 2018
If there are no more comments directly related to this changeset, can we move this along?
- @steleman I don't understand some of the values in your benchmarks. In particular, sin and cos should have similar timings, not differ so much as in your report. I wonder whether the choice of the CLOCK_PROCESS_CPUTIME_ID might have caused this. I think that CLOCK_PROCESS_CPUTIME_ID might translate in a syscall, and therefore cause much overhead in the measurement. I'd rather use CLOCK_MONOTONIC. Also, to make sure you are just measuring the function latency, I think you should invoke the benchmark on array of smaller size, and invoke the call a couple of times before actually starting the time measurement, to reduce the amount of noise causes by warm up effects.
Nov 9 2018
Ping!!
Nov 6 2018
- Reverted to using the non _u35 SLEEF function names as per comment from @shibatch.
Nov 5 2018
- changed the -fveclib=<X> argument value to 'sleefgnuabi'.
- added atan2 and pow.
- spreadsheet with comparison between libm and sleef is here:
https://docs.google.com/spreadsheets/d/1lcpESCnuzEoTl_XHBqE9FLL0tXJB_tZGR8yciCx1yjg/edit?usp=sharing
- comprehensive test case in C with timings is here:
https://drive.google.com/open?id=1PGKRUdL29_ANoYebOo3Q59syhKp_mNSj
- changed the -fveclib=<X> argument value to 'sleefgnuabi'.
- added atan2 and pow
- spreadsheet with comparison between libm and sleef is here:
https://docs.google.com/spreadsheets/d/1lcpESCnuzEoTl_XHBqE9FLL0tXJB_tZGR8yciCx1yjg/edit?usp=sharing
- comprehensive test case in C with timings is here:
https://drive.google.com/open?id=1PGKRUdL29_ANoYebOo3Q59syhKp_mNSj
Nov 1 2018
Oct 31 2018
The corresponding Clang changeset is: https://reviews.llvm.org/D53928.
Sep 27 2018
Added unit test case for Cavium processors - T99 and T88.
Sep 21 2018
Sep 19 2018
Le Ping!
Sep 11 2018
Jan 24 2018
Updated per latest comments.
Jan 23 2018
Updated per latest comments:
Jan 22 2018
I am not sure I follow. I think @MatzeB 's issue was using getProcFamily() and it should be fine to just check the subtarget feature in enableAggressiveFMAFusion. @MatzeB summarized some benefits of making it a subtarget feature here: https://reviews.llvm.org/D40177#936974
Jan 11 2018
Ping!
Jan 8 2018
Ping!
Jan 7 2018
Updated diff with latest changes:
Dec 7 2017
Updated large test case to use --check-prefix={CHECK-FMA|CHECK-GENERIC}.
Included Florian Hahn's small test case for FMA.
Metadata in the large LLVM IR test case is required.
Dec 5 2017
Dec 4 2017
Dec 1 2017
Nov 30 2017
Superseded by:
Nov 28 2017
Nov 27 2017
Please avoid getProcFamily() checks outside of AArch64SubtargetInfo. Use target features and transfer the logic into SubtargetInfo!
Nov 22 2017
Also if I understand correctly, this should have an impact on scheduling instructions using WriteAtomic, like CASB.
Nov 21 2017
Added test case.
Nov 17 2017
Aug 4 2017
Restored definition for defm atomic_load_nand in TargetSelectionDAG.td,
which I had removed by mistake.
Aug 3 2017
Updated changeset to use LDOPregister_patterns.
Implements all the LD<OP> LSE Atomics with the exception of NAND.
Updated test in atomic-ops.lse to cover all memory ordering models.
Aug 1 2017
I have uploaded the LLVM patch that exhibits this problem here:
Jul 31 2017
It does not appear that the multiclass design you are advocating here does what we'd expect it to do.
Jul 26 2017
Progress update: updated with the latest changes.
Jul 13 2017
Updated and corrected AArch64DeadRegisterDefinitions::ShouldSkip.