I found a cheap and cross-platform way of detecting whether the target supports AVX2.
Note that you need to call __builtin_cpu_init() before calling __builtin_cpu_supports().
I don't know what platforms this needs to support. But __builtin_cpu_support only works when compiled with clang or gcc. And it requires compiler-rt or libgcc. I don't know if that's guaranteed to exist on Windows.
I doubt this test was ever passing on windows, as our RegisterContextWindows does not even acknowledge the existence of sse registers. If we wanted to be fancy, we could do some manual cpuid parsing here (the test contains inline assembly anyway), but that's probably not necessary.
Gcc manual says:
This built-in (__builtin_cpu_init) function needs to be invoked ..., only when used in a function that is executed before any constructors are called.
So calling it here should not be necessary.
However, I am still unable to get gcc (6.3) to return 1 here. Clang (since at least 3.8) seems to be doing fine however, so that's probably enough for this test.
fwiw I'm working on upstreaming on zmm (avx512) patches that we have locally (there's one testsuite fail I still need to find time to fix) and the TestZMMRegister.py test that ChrisB wrote to test this is written as skip-unless-darwin, and there's a new skipUnlessFeature() method added to decorators.py which runs sysctl to detect hardware features (in this case, hw.optional.avx512f) which, I suspect, is an even more mac-specific way of doing this. While Adrian's approach would be gcc/clang specific, it would def be better than depending on a sysctl.
I suppose a possible alternative would be to figure out the avx2 / avx512 features manually based on the cpuid instead of letting the compiler do it for us. e.g. https://stackoverflow.com/questions/1666093/cpuid-implementations-in-c and then checking the bits as e.g. described in https://en.wikipedia.org/wiki/CPUID . Bummer to do it so low level if we can delegate this to the compiler though.
I don't know MSVC well enough and don't have access to one to test it but: This would also only work if there were a compiler-independent way of writing inline assembler. Is that possible?
Other fun facts: Clang doesn't even define __builtin_cpu_init().
Why not just look for the AVX registers by name that are only available if they are correctly detected by the native lldb-server or debugserver? Then we can avoid all of this. If we don't execute any instructions that crash the program, we can stop before any specialized AVX instructions are executed and kill the program is we don't see a register by name?
I considered doing something like this, but I want to avoid relying on the AVX2 support in LLDB to work in order to detect AVX2. If I use an LLDB mechanism for this then (exaggerating here!) someone could remove AVX support from LLDB and this test would still pass.
there's a new skipUnlessFeature() method added to decorators.py which runs sysctl to detect hardware features (in this case, hw.optional.avx512f)
How does one execute a program like sysctl on the remote? I have seen code in TestLldbGdbServer.py that uses platform get-file /proc/cpuinfo to achieve something similar for Linux, but that works without executing a new process.
this skipUnlessFeature sysctl check was all performed on the system running the testsuite. Checking whether the feature exists in the program (the approach you're taking) is more correct. We usually do host != target testsuite runs for arm devices, but there's no reason why someone couldn't do a macos x freebsd testsuite run and the sysctl check would be invalid in that case.