This is an archive of the discontinued LLVM Phabricator instance.

[X86] Adding vpopcntd and vpopcntq instructions
ClosedPublic

Authored by oren_ben_simhon on May 14 2017, 5:09 AM.

Details

Summary

VPOPCNTDQ is a new feature set that Intel published here.

The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Notice that this patch does not include pattern matching (auto generation) of the commands. This will be introduced in future review.

Diff Detail

Repository
rL LLVM

Event Timeline

oren_ben_simhon added a reviewer: m_zuckerman.
RKSimon added a subscriber: RKSimon.

Please can you rebase against trunk latest?

Add cost-model support in X86TTIImpl::getIntrinsicInstrCost

lib/Target/X86/X86InstrAVX512.td
8656 ↗(On Diff #98919)

You should be able to make 128/512 vector lowering legal as well using the same pattern technique as we do for VPABS and VPLZCNT on NoVLX targets

craig.topper added inline comments.May 14 2017, 9:08 AM
lib/Target/X86/X86InstrInfo.cpp
7017 ↗(On Diff #98919)

I don't think belongs here. These instructions are ones that update EFLAGS Z flag based on their output being 0. That's not ture of VPOPCNTD/Q.

lib/Target/X86/X86InstrInfo.td
812 ↗(On Diff #98919)

I don't see NoVPOPCNTDQ being used anywhere so we probably shouldn't add it.

craig.topper edited edge metadata.May 14 2017, 10:12 AM

Can you add vpopcnt command lines to test/CodeGen/X86/vector-tzcnt-*.ll as well. I believe we create ctpop nodes as part of cttz lowering that are currently expanding to a lookup table implementation.

RKSimon edited edge metadata.May 14 2017, 10:40 AM

Disassembler tests?

oren_ben_simhon marked 3 inline comments as done.

Implemented comments posted until 05/15 (Thanks Simon and Craig)

RKSimon added inline comments.May 15 2017, 12:44 PM
lib/Target/X86/X86InstrAVX512.td
8693 ↗(On Diff #99045)

The doc you reference doesn't refer to VLX versions of VPOPCNT - just zmm versions. So shouldn't the NoVLX predicate be dropped?

lib/Target/X86/X86InstrInfo.cpp
496 ↗(On Diff #99045)

Please revert these whitespace changes.

Please notice that clang-format reformatted some lists that I modified in the file lib/Target/X86/X86InstrInfo.cpp.
It caused major cosmetic changes which introduce many diffs in that file.

Disassembler tests?

I believe that the test test/MC/X86/x86-64-avx512vpopcntdq.s covers the required tests.
If you think additional tests are required i will appreciate an example.

Add cost-model support in X86TTIImpl::getIntrinsicInstrCost

Since AVX512 is missing from the cost table and since i need some investigation on this subject, I prefer to make this changes in different patch.

lib/Target/X86/X86InstrInfo.cpp
496 ↗(On Diff #99045)

Shouldn't we follow clang-format formatting?

oren_ben_simhon marked 2 inline comments as done.

Reverted clang-format for lib/Target/X86/X86InstrInfo.cpp and removed NoVLX predicate (Thanks Simon)

Do we have a generic ctpop test like we do for tzcnt and lzcnt? If so should we just add command lines to that instead of a new intrinsic test?

lib/Support/Host.cpp
1401 ↗(On Diff #99054)

I think we're trying to keep the checks in order by bit position here. Can you move this up?

lib/Target/X86/X86InstrInfo.cpp
7042 ↗(On Diff #99054)

I think this also an EFLAGS related piece of code. So the vector pop br shouldn't be here.

Do we have a generic ctpop test like we do for tzcnt and lzcnt? If so should we just add command lines to that instead of a new intrinsic test?

llvm\test\CodeGen\X86\vector-popcnt-*.ll - there wasn't any need to add avx512 to anything less than 512 but we should probably add avx512vpopcntdq tests to all three.

A possible addition would be to custom lower i8/i16 vectors with a trunc(popcnt(zext))) pattern.

Disassembler tests?

I believe that the test test/MC/X86/x86-64-avx512vpopcntdq.s covers the required tests.
If you think additional tests are required i will appreciate an example.

I don't think the MC tests actually use the disassembler code to try and get back to the instruction - @craig.topper can you confirm?

Add cost-model support in X86TTIImpl::getIntrinsicInstrCost

Since AVX512 is missing from the cost table and since i need some investigation on this subject, I prefer to make this changes in different patch.

OK.

lib/Target/X86/X86InstrInfo.cpp
7042 ↗(On Diff #99054)

+1 - I don't think this is going to work for vectors, but you can try later if you want.

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2 ↗(On Diff #99054)

Add --show-mc-encoding ?

Disassembler tests?

I believe that the test test/MC/X86/x86-64-avx512vpopcntdq.s covers the required tests.
If you think additional tests are required i will appreciate an example.

I don't think the MC tests actually use the disassembler code to try and get back to the instruction - @craig.topper can you confirm?

Yeah the .s MC tests don't check the disassembler. That will have to be done from something like test/MC/Disassembler/X86/avx-512.txt

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2 ↗(On Diff #99054)

But should we even have this test file or just use the the popcount tests since these aren't x86 specific intrinsics anymore?

craig.topper added inline comments.May 15 2017, 2:33 PM
test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2 ↗(On Diff #99054)

Nevermind I guess we need this for mask testing? I assume the generic test doesn't cover it.

RKSimon added inline comments.May 17 2017, 6:04 AM
lib/Target/X86/X86InstrInfo.cpp
881 ↗(On Diff #99054)

Why TB_NO_REVERSE? This is typically only used for instructions where the mem size doesn't match the reg size to prevent out of bounds loads.

oren_ben_simhon marked 5 inline comments as done.May 17 2017, 6:56 AM

A possible addition would be to custom lower i8/i16 vectors with a trunc(popcnt(zext))) pattern.

I agree with you, Will it be OK to create a separate patch for it?

Disassembler tests?

Thanks for catching that, I added disassembly tests to avx-512.txt

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2 ↗(On Diff #99054)

I moved the tests to corresponding vector_popcnt_*.ll files.
Also the tests check for X86 instructions as such should reside in X86 directory,

2 ↗(On Diff #99054)

Add --show-mc-encoding ?

Encoding is not tested in this file. See file test/MC/X86/x86-64-avx512vpopcntdq.s.

oren_ben_simhon marked an inline comment as done.

Implemented comments posted until 05/16 (Thanks again Simon and Craig)

oren_ben_simhon marked an inline comment as done.

Removed TB_NO_REVERSE flag (Thanks Simon)

The test files need some attention.

test/CodeGen/X86/vector-popcnt-128.ll
8 ↗(On Diff #99296)

Re-generate these files, don't manually edit them.

Keep to the x86_64-unknown-unknown triple

test/CodeGen/X86/vector-popcnt-256.ll
4 ↗(On Diff #99296)

Re-generate these files, don't manually edit them.

Keep to the x86_64-unknown-unknown triple

test/CodeGen/X86/vector-popcnt-512.ll
4 ↗(On Diff #99296)

Re-generate these files, don't manually edit them.

198 ↗(On Diff #99296)

Don't include mask tests here - full coverage is what the -intrinsics test files are for - please re-add it.

test/CodeGen/X86/vector-tzcnt-128.ll
10 ↗(On Diff #99296)

Re-generate these files, don't manually edit them.

test/CodeGen/X86/vector-tzcnt-256.ll
6 ↗(On Diff #99296)

Re-generate these files, don't manually edit them.

test/CodeGen/X86/vector-tzcnt-512.ll
5 ↗(On Diff #99296)

Re-generate these files, don't manually edit them.

oren_ben_simhon marked 7 inline comments as done.

Updated the tests (Thanks Simon)

I will appreciate any additional comments.
Please help me finish the review.

Any idea why phabricator is showing so many unchanged lines from X86InstrInfo.cpp? Have you changed the line endings or something? They aren't appearing in the the downloaded diff FWIW.

A couple of minors, but I think you're almost there.

lib/Target/X86/X86InstrAVX512.td
8692 ↗(On Diff #99401)

Don't put Predicate on a new line.

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2 ↗(On Diff #99401)

Probably worth testing on i686-unknown-unknown triple as well.

I know its overlaps mc test coverage but adding --show-mc-encoding would be trivial since the filechecks are auto-generated, we do this on many other intrinsics test files.

craig.topper added inline comments.May 24 2017, 8:15 AM
lib/Target/X86/X86InstrInfo.cpp
881 ↗(On Diff #99054)

These should also be alphabetized with the rest of the instructions in this section.

2310 ↗(On Diff #99401)

Alphabetize

2933 ↗(On Diff #99401)

Alphabetize

oren_ben_simhon marked 6 inline comments as done.May 25 2017, 1:57 AM

Any idea why phabricator is showing so many unchanged lines from X86InstrInfo.cpp? Have you changed the line endings or something? They aren't appearing in the the downloaded diff FWIW.

Probably because I changed the indentation but reverted it back in an updated revision.
There are currently no indentation changes.

Implemented comments posted until 05/25 (Thanks Simon and Craig)

RKSimon accepted this revision.May 25 2017, 4:11 AM

LGTM

This revision is now accepted and ready to land.May 25 2017, 4:11 AM
This revision was automatically updated to reflect the committed changes.