This is an archive of the discontinued LLVM Phabricator instance.

Clzero intrinsic and its addition under znver1
ClosedPublic

Authored by GGanesh on Feb 1 2017, 2:38 AM.

Details

Summary

This patch does the following.

  1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
  2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
  3. Adds the clzero feature under znver1 architecture.
  4. The custom inserter is added in Lowering.
  5. A testcase is added to check the intrinsic.
  6. The clzero instruction is added to assembler test.

Diff Detail

Repository
rL LLVM

Event Timeline

GGanesh created this revision.Feb 1 2017, 2:38 AM
RKSimon edited edge metadata.Feb 1 2017, 4:05 AM

Copyright notices in the test files are rare - do you need this?

test/CodeGen/X86/clzero.ll
36 ↗(On Diff #86597)

Please check on i686 target triples as well as x86_64

test/MC/X86/x86-64.s
1508 ↗(On Diff #86597)

Should we test for the clzero %rax / clzero %eax aliases as well?

GGanesh updated this revision to Diff 87385.Feb 7 2017, 2:45 AM

Updated for the review comments

One minor but no other comments from me. @craig.topper ?

test/MC/X86/x86-32.s
450 ↗(On Diff #87385)

Add clzero only test as well

// CHECK:       clzero
// CHECK:  encoding: [0x0f,0x01,0xfc]
                clzero
GGanesh updated this revision to Diff 87438.Feb 7 2017, 7:56 AM

Updated the test file "x86-32.s" for clzero only test!

RKSimon added inline comments.Feb 7 2017, 8:59 AM
lib/Target/X86/X86InstrInfo.td
2461 ↗(On Diff #87438)

Given that clzero writes to memory shouldn't it have a mayStore = 1 attribute? I'm not sure if WriteSystem alone creates the necessary barrier.

This needs testing - check if separate load+instruction and instruction+store don't get folded across the clzero?

I have a (probably dumb) question: should CLZERO be treated as a memory read/write barrier for the purpose of scheduling? Is it okay to reoder CLFLUSH based on the EAX/RAX register dependency only?

I am asking this because it looks to me that we don't model the 'flush/zero' behavior of cache line operations on x86. For what I can see (please correct me if I am wrong), nothing in the code suggests that CLZERO (or even CLFLUSH) is treated as a memory barrier for scheduling purpose. Is that behavior intended (i.e. do we care about it)? Am I reading those tablegen definitions incorrectly?

Ah, I see that Simon asked a similar question :-)

craig.topper edited edge metadata.Feb 7 2017, 8:59 PM

The instruction will default to having MCID::UnmodeledSideEffects set. Is that sufficient to protect the scheduling?

The instruction will default to having MCID::UnmodeledSideEffects set. Is that sufficient to protect the scheduling?

In which case we just need a test to check for it

I think it is okay even if we don't set the mayStore attribute.
I wrote a simple test to check the following

  1. Schedules based on the instruction attribute
  2. Side-effect handling

As Craig Topper says it falls back the UnmodeledSideEffects case by default.

Test:

#include "xmmintrin.h"
int a[2];
int b[2];
int c,d, *p;
int main ()
{
  b[0] += a[0];
  __builtin_ia32_clzero (p) ;
  d += c;
  b[1] += a[0];
  return 0;
}

We can notice that the schedule without the intrinsic has two mov scheduled for the first cycle. Without the intrinsic they go back to safe-order.
Without clzero intrinsic: $>clang -O3 ClzeroFoldTest.c -S -march=znver1

# BB#0:                                 # %entry
        movl    a(%rip), %eax
        movl    c(%rip), %ecx
        addl    %eax, b(%rip)
        addl    %eax, b+4(%rip)
        addl    %ecx, d(%rip)
        xorl    %eax, %eax
        retq

With clzero intrinsic: $>clang -O3 ClzeroFoldTest.c -S -march=znver1

# BB#0:
        movl    a(%rip), %eax
        addl    %eax, b(%rip)
        movq    p(%rip), %rax
        leaq    (%rax), %rax
        clzero
        movl    c(%rip), %eax
        addl    %eax, d(%rip)
        movl    a(%rip), %eax
        addl    %eax, b+4(%rip)
        xorl    %eax, %eax
        retq

The instruction will default to having MCID::UnmodeledSideEffects set. Is that sufficient to protect the scheduling?

Yes, that would work.
An instruction with unmodeled side effects is basically treated as a scheduling barrier. So we are safe.

@craig.topper, is it fair to say that flag UnmodeledSideEffects is conservatively implicitly set because the tablegen definition for CLZEROr does not provide a primary instruction pattern, and flag hasSideEffects_Unset is true (i.e. hasSideEffects is not explicitly mentioned)?

@craig.topper If you are okay, can you please commit the changes on my behalf?

This revision is now accepted and ready to land.Feb 8 2017, 8:07 PM

I'll commit shortly.

This revision was automatically updated to reflect the committed changes.