This is a roll forward of D101895 with two additional fixes:
Original Patch description:
This is a follow up on D101524 which:
- simplifies cpu features detection and usage,
- flattens target dependent optimizations so it's obvious which implementations are generated,
- provides an implementation targeting the host (march/mtune=native) for the mem* functions,
- makes sure all implementations are unittested (provided the host can run them).
Additional fixes:
- Fix uninitialized ALL_CPU_FEATURES
- Use non pseudo microarch as it is only supported from Clang 12 on