GCC provides these functions (e.g. addtf3, etc.) in libgcc on x86_64.
Since Clang supports float128, we can enable the existing code by using
float128 for fp_t if either FLOAT128__ or SIZEOF_FLOAT128 is defined
instead of only supporting these builtins for platforms with 128-bit IEEE
This change also replaces the CRT_LDBL_128BIT macro with CRT_HAS_F128 to
indicate that it doesn't depend on long double being a 128-bit IEEE float.
The commit is rather large since it also updates all the tests. If this makes
it difficult to review, I'm happy to split the test changes into a separate