GCC provides these functions (e.g. __addtf3, etc.) in libgcc on x86_64.
Since Clang supports float128, we can enable the existing code by using
float128 for fp_t if either __FLOAT128__ or __SIZEOF_FLOAT128__
is defined instead of only supporting these builtins for platforms with
128-bit IEEE long doubles.
This change also replaces the CRT_LDBL_128BIT macro with CRT_HAS_F128 to
indicate that it doesn't depend on long double being a 128-bit IEEE float.
The commit is rather large since it also updates all the tests. If this makes
it difficult to review, I'm happy to split the test changes into a separate