DC GZVA can operate on multiple granules at a time (corresponding to
the CPU's cache line size) so we can generally expect it to be faster
than STZG in a loop.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
LGTM
I wonder if doing the size check before the DCZID check could speed up small allocations, and maybe raising the threshold value could help.
But we can worry about that later.