DC GZVA can operate on multiple granules at a time (corresponding to
the CPU's cache line size) so we can generally expect it to be faster
than STZG in a loop.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
LGTM
I wonder if doing the size check before the DCZID check could speed up small allocations, and maybe raising the threshold value could help.
But we can worry about that later.
clang-format: please reformat the code