This is an archive of the discontinued LLVM Phabricator instance.

tsan: switch from SSE3 to SSE4.2
ClosedPublic

Authored by dvyukov on Jul 28 2021, 4:51 AM.

Details

Summary

Switch x86_64 requirement for optimized code from SSE3 to SSE4.2.
The new tsan runtime will need few instructions that are only
supported by SSE4:

_mm_max_epu32
_mm_extract_epi8
_mm_insert_epi32

SSE3 was introcued in 2004 and SSE4 in 2006:
https://en.wikipedia.org/wiki/SSE3
https://en.wikipedia.org/wiki/SSE4

We are still providing non-optimized C++ version of the code,
so either way it's possible to build tsan runtime for any CPU.

But for Go this will bump strict requirement for -race because
Go contains prebuilt versions and these will be built with -msse4.2.
But requiring a CPU produced at least in 2006 looks reasonable for
a debugging tool (more reasonable than disabling optimizations
for everybody).

Diff Detail

Event Timeline

dvyukov created this revision.Jul 28 2021, 4:51 AM
dvyukov requested review of this revision.Jul 28 2021, 4:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2021, 4:51 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

+Keith, so that you know what's coming. Do you see any issues with this?

vitalybuka accepted this revision.Jul 28 2021, 9:41 AM
This revision is now accepted and ready to land.Jul 28 2021, 9:41 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Aug 3 2021, 3:54 AM

I'm getting oodles of build warnings [1], "clang: warning: argument unused during compilation: '-msse4.2' [-Wunused-command-line-argument]". I guess this is due to this change?

This is on macOS.

1: http://codepad.org/rJ7Y2BQE

I'm getting oodles of build warnings [1], "clang: warning: argument unused during compilation: '-msse4.2' [-Wunused-command-line-argument]". I guess this is due to this change?

This is on macOS.

1: http://codepad.org/rJ7Y2BQE

Weren't you getting the same warning for sse3 before?
This change only changes the version, it does not change anything in nature.
Addition of this flag is also guarded by COMPILER_RT_HAS_MSSE4_2_FLAG. Are these COMPILER_RT_HAS* broken on macOS? I see it's done here:

compiler-rt/cmake/config-ix.cmake:check_cxx_compiler_flag("-Werror -msse3" COMPILER_RT_HAS_MSSE3_FLAG)
compiler-rt/cmake/config-ix.cmake:check_cxx_compiler_flag("-Werror -msse4.2" COMPILER_RT_HAS_MSSE4_2_FLAG)

Are these warnings turned into errors by -Werror on macOS?

It is x86_64, right? Not arm64? If it's arm64, then COMPILER_RT_HAS* are definitely broken somehow.

I'm posting here, because this change broke LLVM tests on a machine only supporting SSE3.

The feature check for COMPILER_RT_HAS_MSSE4_2_FLAG does not really check, whether the current architecture supports SSE 4.2, but rather whether the compiler can target such architecture. Would it be reasonable to change the feature check to something like the following (assuming LLVM_TARGET_ARCH=host)?

clang -march=native -dM -E - < /dev/null | grep __SSE4_2__

otherwise, LLVM_TARGET_ARCH could be used for -march. This would avoid to generate illegal instructions for the target architecture.

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 8:23 AM
Herald added a subscriber: Enna1. · View Herald Transcript

I'm posting here, because this change broke LLVM tests on a machine only supporting SSE3.

The feature check for COMPILER_RT_HAS_MSSE4_2_FLAG does not really check, whether the current architecture supports SSE 4.2, but rather whether the compiler can target such architecture. Would it be reasonable to change the feature check to something like the following (assuming LLVM_TARGET_ARCH=host)?

clang -march=native -dM -E - < /dev/null | grep __SSE4_2__

otherwise, LLVM_TARGET_ARCH could be used for -march. This would avoid to generate illegal instructions for the target architecture.

I dunno. Are there any similar precedents? I see COMPILER_RT_HAS_MSSE4_2_FLAG is also used in scudo. And I also see check_cxx_compiler_flag() macro is used to detect lots of other things
I think 'native' arch during build should also not be used to figure out target properties. Using LLVM_TARGET_ARCH looks more reasonable (at least based on the name). But is it really such detailed that it will prohibit -msse4.2 in your case (i.e. not just x86_64, but "some old cpu type")?