This is an archive of the discontinued LLVM Phabricator instance.

[MCA][NFC] Add generic XOP resource tests
ClosedPublic

Authored by lebedev.ri on Jun 17 2018, 1:31 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Jun 17 2018, 1:31 AM

Fix one more typo from the refdocs: it's VPHADDUBW, not VPHADDUBWD.

RKSimon added inline comments.Jun 18 2018, 4:32 AM
test/tools/llvm-mca/X86/Generic/resources-xop.s
29 ↗(On Diff #151639)

Don't duplicate, just put all the memory folds together:

vpcmov %xmm0, %xmm1, %xmm2, %xmm4
vpcmov (%rax), %xmm1, %xmm2, %xmm4
vpcmov  %xmm0, (%rax),%xmm2, %xmm4
lebedev.ri marked an inline comment as done.

Remove duplication, put all the memory folds together.

RKSimon added inline comments.Jun 18 2018, 6:48 AM
test/tools/llvm-mca/X86/Generic/resources-xop.s
385 ↗(On Diff #151702)

I don't understand this comment?

393 ↗(On Diff #151702)

Is it useful to test rotl/rotr here? Isn't this more of a MC test?

29 ↗(On Diff #151639)

Sorry, I meant it like this:

vpcmov %xmm0, %xmm1, %xmm2, %xmm4
vpcmov (%rax), %xmm1, %xmm2, %xmm4
vpcmov  %xmm0, (%rax),%xmm2, %xmm4

vpcmov %ymm0, %ymm1, %ymm2, %ymm4
vpcmov (%rax), %ymm1, %ymm2, %ymm4
vpcmov  %ymm0, (%rax),%ymm2, %ymm4

That way more similar instruction are next to one another and any diffs (AGU usage, etc.) are more obvious.

lebedev.ri added inline comments.Jun 18 2018, 6:51 AM
test/tools/llvm-mca/X86/Generic/resources-xop.s
393 ↗(On Diff #151702)

AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions, page 239:

VPROTB Packed Rotate Bytes 
...
The VPROTB instruction is an XOP instruction.

So i think we want to test the form with immediate here?
Or is the question about testing multiple immediate?

lebedev.ri marked an inline comment as done.

Sort once more.

lebedev.ri added inline comments.Jun 18 2018, 9:37 AM
test/tools/llvm-mca/X86/Generic/resources-xop.s
385 ↗(On Diff #151702)

The same doc/page i linked in the \/ next comment.
It lists three patterns:

VPROTB xmm1, xmm2/mem128, xmm8 <- does this simply imply the %xmm8 register, which is only available in 64-bit more?
VPROTB xmm1, xmm2, xmm3/mem128
VPROTB xmm1, xmm2/mem128, imm8
RKSimon added inline comments.Jun 18 2018, 10:14 AM
test/tools/llvm-mca/X86/Generic/resources-xop.s
385 ↗(On Diff #151702)

Nope that is a classic AMD docs team copy+paste typo from copying the imm8 version :-) It takes any xmm[0-7]/xmm[0-15] on 32/64

lebedev.ri added inline comments.Jun 18 2018, 10:22 AM
test/tools/llvm-mca/X86/Generic/resources-xop.s
385 ↗(On Diff #151702)

Ok, thank you, that makes more sense, and resolves these fixme's :)

So what was meant is:

VPROTB xmm1, xmm2, xmm3
VPROTB xmm1, mem128, xmm3
VPROTB xmm1, xmm2, mem128

VPROTB xmm1, xmm2, imm8
VPROTB xmm1, mem128, imm8

So just 5 tests.

lebedev.ri marked 4 inline comments as done.

Address last FIXME's. Should be good now.

RKSimon added inline comments.Jun 18 2018, 1:18 PM
test/tools/llvm-mca/X86/Generic/resources-xop.s
393 ↗(On Diff #151702)

Testing different immediates seems over the top - we don't test every shuffle mask immediate etc.

Is there any known differences in behaviour? The only thing I can recall is that TRUE/FALSE in VPCOM breaks dependencies - but that kind of thing needs to wait for bdver specific tests like we do for zero idioms etc. and isn't that useful in resource tests.

lebedev.ri added inline comments.Jun 18 2018, 1:21 PM
test/tools/llvm-mca/X86/Generic/resources-xop.s
393 ↗(On Diff #151702)

Hmm, you're right, this is over the top.
Let's keep only one immediate variant.

lebedev.ri marked 4 inline comments as done.

Drop excessive variants with immediates - keep just one variant with immediate each.
It's easier to drop tests than to write them :)

Remove one last more duplication, and add one missing pattern.

Actually regenerate the diff :)

RKSimon accepted this revision.Jun 19 2018, 1:57 AM

LGTM - thanks

This revision is now accepted and ready to land.Jun 19 2018, 1:57 AM

LGTM - thanks

Thank you for the review!
Sorry that such seemingly simple change took so many iterations.

This revision was automatically updated to reflect the committed changes.