User Details
- User Since
- Dec 19 2022, 12:21 PM (23 w, 2 d)
Apr 28 2023
Apr 27 2023
(no change; re-trigger CI)
Apr 26 2023
(no change, re-trigger CI)
Remove unnecessary checks
Add test case for sret fallback and replace CHECK-NOT with -fast-isel-abort=3
Apr 21 2023
We care a lot about compile times and in our application, GlobalISel (when it succeeds, about 10% fallback rate) increases compile times by ~50% compared to FastISel. We do plan to look at GlobalISel performance at some point, but it seems like there's a lot of work to be done to bring it at least on-par with FastISel wrt. compile times.
Add check whether subtarget actually supports CRC.
Gentle ping
Note that I'm not entirely happy with duplicating the test cases; changing the existing cases in arm64-crc32.ll could also work, but currently generates an extra uxtb/uxth with FastISel. Let me know what you think.
(format changes)
Apr 20 2023
ping?
Apr 14 2023
Apr 13 2023
Make Name const.
Apr 12 2023
Change test name.
Also test llvm.x86.sse42.crc32.64.8 auto-upgrade
Apr 11 2023
Add i686 tests. -fast-isel-abort=3 doesn't work, so I instead check that the call wasn't missed.
Format changes + re-trigger CI
Apr 6 2023
Formatting fixes
No changes, just to trigger rebuild from updated D145791.
Apr 5 2023
@RKSimon ping? We get a consistent 0.5%–1.5% compile-time improvement from these changes.
Mar 15 2023
@RKSimon Are you ok with these two patches or do you have any concerns/alternative ideas?
Mar 10 2023
I don't think there are "obviously missed" problems; IMHO the main problem is the abstraction itself, which for every write involves a function call (raw_ostream::write, unlikely to be inlined due to size and unlikely to be optimizable as the buffer pointers/mode are mutable members) and an indirect function call (write_impl, extremely unlikely to be devirtualized due to general code complexity). The impact of the latter could probably be reduced by buffering in raw_svector_ostream, but that'd be a substantial and massively breaking change with unknown (and hardly quantifiable) benefits. If you have any suggestions for more generally applicable approaches, I'd be happy to try them; but right now getting rid of the (apparently completely unneeded) abstraction appears to be the easiest, least intrusive, and most performant change.
Attempt to fix bolt
It essentially boils down to having function calls for every written byte, which adds up. I haven't tested with (Thin)LTO yet, but a quick glance at the disassembly of a Fedora-built LLVM (which uses ThinLTO) seems to indicate that such function calls are not eliminated.
Jan 3 2023
Thanks! I have no commit access; could you commit this for me (Alexis Engelke <engelke@in.tum.de>)?
Ping