As noted in the code comment, a potential follow-on would be to remove the builtins themselves. Other than ord/unord, this already works as expected. Eg:
typedef float v4sf __attribute__((__vector_size__(16))); v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }
I'll link a patch for the corresponding LLVM codegen tests next. A follow-on for that side would be to auto-upgrade and remove the LLVM intrinsics.