This is an archive of the discontinued LLVM Phabricator instance.

[X86][BtVer2][MCA] Recognize CMPEQ one-idioms
AbandonedPublic

Authored by lebedev.ri on Jul 3 2018, 6:35 AM.

Details

Summary

Commit message of rL334303 said:

As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and
Jaguar pipeline - Dependency-breaking instructions), these instructions
are dependency breaking and fast-path zero the destination register
(and appropriate EFLAGS bits).

That very section also listed PCMPEQx right before PCMPGTx in that very list.
So these are also dependency-breaking, although they produce ones and still consume resources.

Found accidentally while continuing trying to look into bdver2 scheduling profile..

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Jul 3 2018, 6:35 AM

I think you're confusing zero-idioms with dependency breaking instructions.

Zero idioms break dependencies, don't use any resources (the frontend just sets the PRF value to zero) and then retires.

PCMPEQ 'ones' patterns can break dependencies but still use resources to set the bits to all ones before retiring.

lebedev.ri planned changes to this revision.Jul 3 2018, 7:05 AM

I think you're confusing zero-idioms with dependency breaking instructions.

Yes, indeed.

Zero idioms break dependencies, don't use any resources (the frontend just sets the PRF value to zero) and then retires.

PCMPEQ 'ones' patterns can break dependencies but still use resources to set the bits to all ones before retiring.

Would it be valuable to salvage those into one-idioms.s with a comment that these are dep-breaking?
I'm not sure any special handling is needed in the sched model.

lebedev.ri retitled this revision from [X86][BtVer2][MCA] Recognize CMPEQ zero-idioms too to [X86][BtVer2][MCA] Recognize CMPEQ one-idioms.
lebedev.ri edited the summary of this revision. (Show Details)

Hmm, so if they still consume resources, does this mean it is the lack of latency is what making them special?

FIXME: how do i test a custom snippet in llvm-exegesis? CC @courbet

$ perl -e 'print "pcmpgtb %xmm5,%xmm5\n"x10000; print "retq"' > /tmp/snippet-ones,txt

On Intel hardware, in register renaming it basically rewrites the input register to an internal all zeros register. The instruction still needs to execute and now compares zero with zero and gets all ones. But by rewriting the input register to all zeros instead of the real input, the instruction doesn't need to wait for the previous writer of the input register to execute.

Hmm, so if they still consume resources, does this mean it is the lack of latency is what making them special?

FIXME: how do i test a custom snippet in llvm-exegesis? CC @courbet

$ perl -e 'print "pcmpgtb %xmm5,%xmm5\n"x10000; print "retq"' > /tmp/snippet-ones,txt

For now, you can't, but it's very easy to add, so we should support it. I've created PR38048.

Hmm, so if they still consume resources, does this mean it is the lack of latency is what making them special?

The PCMPEQ 'all ones' idiom is a regular instruction - it consumes resources and has a latency before its result is available for any instructions that depend on it.

What it doesn't have to do is wait for its source resisters to be available:

VDIVPS %xmm1, %xmm0, %xmm0    <---- Big latency 
VPCMPEQB %xmm1, %xmm0, %xmm0  <---- Must wait a loooooong time until VDIVPS has completed

vs

VDIVPS %xmm1, %xmm0, %xmm0    <---- Big latency 
VPCMPEQB %xmm0, %xmm0, %xmm0  <---- 'Ones Idiom' - can execute immediately, doesn't wait for VDIVPS

Hmm, so if they still consume resources, does this mean it is the lack of latency is what making them special?

The PCMPEQ 'all ones' idiom is a regular instruction - it consumes resources and has a latency before its result is available for any instructions that depend on it.

What it doesn't have to do is wait for its source resisters to be available:

VDIVPS %xmm1, %xmm0, %xmm0    <---- Big latency 
VPCMPEQB %xmm1, %xmm0, %xmm0  <---- Must wait a loooooong time until VDIVPS has completed

vs

VDIVPS %xmm1, %xmm0, %xmm0    <---- Big latency 
VPCMPEQB %xmm0, %xmm0, %xmm0  <---- 'Ones Idiom' - can execute immediately, doesn't wait for VDIVPS

Ok, well, i guess what i was trying to ask/understand is, is that already properly represented https://godbolt.org/g/9rYPYA, or not?

Hmm, so if they still consume resources, does this mean it is the lack of latency is what making them special?

The PCMPEQ 'all ones' idiom is a regular instruction - it consumes resources and has a latency before its result is available for any instructions that depend on it.

What it doesn't have to do is wait for its source resisters to be available:

VDIVPS %xmm1, %xmm0, %xmm0    <---- Big latency 
VPCMPEQB %xmm1, %xmm0, %xmm0  <---- Must wait a loooooong time until VDIVPS has completed

vs

VDIVPS %xmm1, %xmm0, %xmm0    <---- Big latency 
VPCMPEQB %xmm0, %xmm0, %xmm0  <---- 'Ones Idiom' - can execute immediately, doesn't wait for VDIVPS

Ok, well, i guess what i was trying to ask/understand is, is that already properly represented https://godbolt.org/g/9rYPYA, or not?

No, we don't properly model dependency breaking instructions yet - zero-idioms are making use of a special case of llvm-mca that assumes dependency breaking if no resources are used - IMO that's something that should be removed and we come up with a better way to model this.

Ok, well, i guess what i was trying to ask/understand is, is that already properly represented https://godbolt.org/g/9rYPYA, or not?

No, we don't properly model dependency breaking instructions yet - zero-idioms are making use of a special case of llvm-mca that assumes dependency breaking if no resources are used - IMO that's something that should be removed and we come up with a better way to model this.

Simon is right on this.
We still don't model dependency breaking instructions. There is already a plan to teach llvm-mca how to identify those instructions, and that is next on my TODO list. Once we have that system in place, we can remove the "zero-latency implies dependency-breaking" hack in llvm-mca.

This patch doesn't do the right thing. The timeline clearly shows how dependencies are not broken.

-Andrea

Ok, well, i guess what i was trying to ask/understand is, is that already properly represented https://godbolt.org/g/9rYPYA, or not?

No, we don't properly model dependency breaking instructions yet - zero-idioms are making use of a special case of llvm-mca that assumes dependency breaking if no resources are used - IMO that's something that should be removed and we come up with a better way to model this.

Simon is right on this.
We still don't model dependency breaking instructions. There is already a plan to teach llvm-mca how to identify those instructions, and that is next on my TODO list. Once we have that system in place, we can remove the "zero-latency implies dependency-breaking" hack in llvm-mca.

This patch doesn't do the right thing. The timeline clearly shows how dependencies are not broken.

Ok, that is actually good, i was starting to question my [rudimentary] understanding of all this.

-Andrea

Then, back to square one, are D48876 tests of any use? :)

Ok, well, i guess what i was trying to ask/understand is, is that already properly represented https://godbolt.org/g/9rYPYA, or not?

No, we don't properly model dependency breaking instructions yet - zero-idioms are making use of a special case of llvm-mca that assumes dependency breaking if no resources are used - IMO that's something that should be removed and we come up with a better way to model this.

Simon is right on this.
We still don't model dependency breaking instructions. There is already a plan to teach llvm-mca how to identify those instructions, and that is next on my TODO list. Once we have that system in place, we can remove the "zero-latency implies dependency-breaking" hack in llvm-mca.

This patch doesn't do the right thing. The timeline clearly shows how dependencies are not broken.

Ok, that is actually good, i was starting to question my [rudimentary] understanding of all this.

-Andrea

Then, back to square one, are D48876 tests of any use? :)

You can commit those tests to show that we don't correctly model dependency breaking packed compare instructions on BtVer2. However, I would remove the padd from the tests.

lebedev.ri abandoned this revision.Jul 4 2018, 9:01 AM

Great, thank you all for pointing out that this is bogus :)