This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] [cmake] Disable appending -msse4.2 flag implicitly
ClosedPublic

Authored by mgorny on Jan 4 2017, 10:41 AM.

Details

Summary

Disable the code appending -msse4.2 flag implicitly when the compiler
supports it. The compiler support for this flags do not indicate that
the underlying CPU will support SSE4.2, and passing it may result in
SSE4.2 code being emitted *implicitly*.

If the target platform supports SSE4.2 appropriately, the relevant bits
should be already enabled via -march= or equivalent. In this case
passing -msse4.2 is redundant.

If a runtime detection is desired (which seems to be a case with SCUDO),
then (as gcc manpage points out) the specific SSE4.2 needs to be
isolated into a separate file, the -msse4.2 flag can be forced only
for that file and the function defined in that file can only be called
when the CPU is determined to support SSE4.2.

This fixes SIGILL on SCUDO when it is compiled using gcc-5.4.

Diff Detail

Repository
rL LLVM

Event Timeline

mgorny updated this revision to Diff 83082.Jan 4 2017, 10:41 AM
mgorny retitled this revision from to [compiler-rt] [cmake] Disable appending -msse* flags implicitly.
mgorny updated this object.
mgorny added reviewers: cryptoad, kcc, samsonov.
mgorny added a subscriber: cfe-commits.
cryptoad edited edge metadata.Jan 4 2017, 10:54 AM

Hey Michal,
If I understand correctly, my next move is to move the CRC32 code into it's own file an only enable SSE 4.2 for this file?
Shouldn't COMPILER_RT_HAS_MSSE4_2_FLAG be kept for that purpose or is there an alternative way to do it?
Thanks

Hey Michal,
If I understand correctly, my next move is to move the CRC32 code into it's own file an only enable SSE 4.2 for this file?

Yes, like that. Have a SSE4.2 CRC32 function in a separate file, and call it only when SSE4.2 is determined to be supported. Note that for this you'd need to refactor things a bit since it seems that the hw CRC32 code is used commonly be SSE4.2 and ARM.

Furthermore, you might want to do the runtime check only when SSE4.2 was not enabled at build time, i.e. skip it if user specified -msse4.2 or -march= option enabling it explicitly. I guess you could use the #ifdef for that (on a file where -msse4.2 is not forced).

Shouldn't COMPILER_RT_HAS_MSSE4_2_FLAG be kept for that purpose or is there an alternative way to do it?

I don't really see a point in keeping it if you can equally easily re-add it in a later patch.

Understood.
I'll let someone else chime on on the TSAN aspect of the patch.

dvyukov edited edge metadata.Jan 5 2017, 1:36 AM

Yes, for tsan runtime detection does not make lots of sense -- we only have few SSE instructions on inlined fast path. The SSE code provides quite substantial speedup. User -march flag won't help as runtime is prebuilt. SCUDO is sse4.2, but tsan is sse3. Do you actually see any problems with sse3? It's like 10 years old. I agree with you that strictly speaking we do wrong thing for old processors. But if we disable it, we disable it for everybody. So if you don't hit cashes with it, I would prefer to keep sse3 in tsan.

mgorny added a comment.Jan 5 2017, 7:52 AM

User -march flag won't help as runtime is prebuilt.

I meant the flag used by the compiler vendor when compiler-rt is built.

SCUDO is sse4.2, but tsan is sse3. Do you actually see any problems with sse3? It's like 10 years old. I agree with you that strictly speaking we do wrong thing for old processors.

Well, it seems that SSE3 is supported since Pentium 4 / newer versions of Athlon64. I don't see a big issue enabling it for x86-64 but for 32-bit systems you're ruling out quite a lot of hardware. My only pure x86-32 box (which I use to test stuff) does not support SSE3.

But if we disable it, we disable it for everybody. So if you don't hit cashes with it, I would prefer to keep sse3 in tsan.

Well, I guess that depends on how you build compiler-rt. On Gentoo, we just pass user CFLAGS (I need to think more on that, tbh). I would rather see the compiler vendor deciding on which CPUs are to be supported by the compiler, rather than the build system silently bumping the minimum for one component.

That said, I think we need to work on improved portability of compiler runtimes shipped with clang. I will probably start an open discussion on that soonish. FWICS we pretty much 'officially' support building just one or two variants of compiler-rt for different ABIs of a single architecture. This doesn't scale well. While I'm not sure how significant the performance impact is but right now we either force compiler-rt to not use newer instructions (= make it slower) or randomly cause -march= not to be respected.

I don't see a big issue enabling it for x86-64 but for 32-bit systems you're ruling out quite a lot of hardware.

Tsan does not support 32-bits.

mgorny added a comment.Jan 5 2017, 1:33 PM

Oh, sorry that I didn't check that. Then I guess it's good enough for me, we're losing just few old Athlon 64 CPUs, I guess. I'll update the patch to remove 4.2 only.

mgorny updated this revision to Diff 83298.Jan 5 2017, 1:36 PM
mgorny retitled this revision from [compiler-rt] [cmake] Disable appending -msse* flags implicitly to [compiler-rt] [cmake] Disable appending -msse4.2 flag implicitly.
mgorny updated this object.
mgorny edited edge metadata.

@cryptoad good to go then?

cryptoad accepted this revision.Jan 5 2017, 2:24 PM
cryptoad edited edge metadata.
This revision is now accepted and ready to land.Jan 5 2017, 2:24 PM
This revision was automatically updated to reflect the committed changes.