This is an archive of the discontinued LLVM Phabricator instance.

Loop Vectorizer doesn't use %zmm registers on targets supporting AVX512.
AbandonedPublic

Authored by zinovy.nis on Mar 14 2014, 4:58 AM.

Details

Summary

This code

void foo(float * restrict a, float * restrict b, float * restrict c, unsigned n)
{
   for (unsigned i = 0; i < n; ++i)
       c[i] = a[i] * b[i];
}

was compiled with "-march=knl" but used only %ymm registers.

With my fix applied the generated code will use %zmm registers as it should be.

Diff Detail

Event Timeline

zinovy.nis updated this revision to Unknown Object (????).Mar 14 2014, 5:42 AM

Hi zinovy,

Could you please add a test for it?

You should be able to generate a test starting from your original example and building it with -emit-llvm. The resulting IR can then be used/adapted to create an 'opt' test that verifies that the body of the loop is correctly vectorized.
The idea is that your new opt test would RUN opt with flags '-mattr=+avx512f' (or -mcpu=knl) and '-loop-vectorize'.

You should be able to check that, with your change, the body of the vectorized loop now only contains fmul <16 x float> instructions. Before your change, it was producing instead a sequence of fmul <8 x float>.

I hope this make sense :-)

Andrea.

zinovy.nis updated this revision to Unknown Object (????).Mar 14 2014, 8:22 AM

Thanks. Done. Plz see my update.

2014-03-14 18:21 GMT+04:00 Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>:

Hi zinovy,

Could you please add a test for it?

You should be able to generate a test starting from your original

example and building it with -emit-llvm. The resulting IR can then be
used/adapted to create an 'opt' test that verifies that the body of the
loop is correctly vectorized.

The idea is that your new opt test would RUN opt with flags

'-mattr=+avx512f' (or -mcpu=knl) and '-loop-vectorize'.

You should be able to check that, with your change, the body of the

vectorized loop now only contains fmul <16 x float> instructions. Before
your change, it was producing instead a sequence of fmul <8 x float>.

I hope this make sense :-)

Andrea.

http://llvm-reviews.chandlerc.com/D3078

Curious, did you test whether the cost model works reasonably well across the test-suite with this turned on? Are there any/many major regressions with this patch applied vs without on an avx512 architecture?

Thanks,
Arnold

test/CodeGen/X86/avx512-vectorizer.ll
8

You could remove the no capture and readonly attributes. They are not needed for this test case.

I am also surprised that llvm accepts dangling function attributes ("#0”) but it does :).

I have no real avx512 architecture at hands.
But I believe that using wider vectors is better than using shorter ones in
this case.
BTW, the KNL cost model in LLVM was taken from the Haswell cost model (see
TODOs in x86.td), so it can't be very accurate for KNL.

-----Исходное сообщение-----
From: Arnold Schwaighofer
Sent: Friday, March 14, 2014 8:08 PM
To: elena.demikhovsky@intel.com ; rob.khasanov@gmail.com ;
avolkov.intel@gmail.com ; nrotem@apple.com ; zinovy.nis@gmail.com
Cc: llvm-commits@cs.uiuc.edu ; Andrea_DiBiagio@sn.scee.net ;
aschwaighofer@apple.com
Subject: Re: [PATCH] Loop Vectorizer doesn't use %zmm registers on targets
supporting AVX512.

Curious, did you test whether the cost model works reasonably well across

the test-suite with this turned on? Are there any/many major regressions
with this patch applied vs without on an avx512 architecture?

Thanks,
Arnold

Comment at: test/CodeGen/X86/avx512-vectorizer.ll:8
@@ +7,3 @@
+
+define void @foo(float* noalias nocapture readonly %a, float* noalias
nocapture readonly %b, float* noalias nocapture %c, i32 %n) #0 {

+entry:

You could remove the no capture and readonly attributes. They are not needed
for this test case.

I am also surprised that llvm accepts dangling function attributes ("#0”)
but it does :).

http://llvm-reviews.chandlerc.com/D3078

zinovy.nis updated this revision to Unknown Object (????).Mar 17 2014, 1:08 AM

"nocapture" & "readonly" attributes removed from the test.

Hi Zinovy,

A minor comment:
your new test should be probably moved to 'test/Transforms/LoopVectorize/X86' since it is clearly testing the loop vectorizer behavior on a specific x86 target.

test/CodeGen/X86/avx512-vectorizer.ll
9

You can remove the '#0' from the signature of @foo since it is not used.

zinovy.nis updated this revision to Unknown Object (????).Mar 17 2014, 4:44 AM

The test file was moved to 'test/Transforms/LoopVectorize/X86'.

zinovy.nis updated this revision to Unknown Object (????).Mar 17 2014, 6:31 AM

Fix number of vector registers for 64-bit mode in KNL to 32 instead of 16.

Was commited in rL212634 by Adam Nemet.

zinovy.nis abandoned this revision.Apr 7 2018, 10:26 AM