This is an archive of the discontinued LLVM Phabricator instance.

Adding doxygen comments to the LLVM intrinsics (part 2, _wmmintrin_pclmul.h)
ClosedPublic

Authored by kromanova on Jan 8 2016, 10:58 AM.

Details

Summary

Hello,

Here is the patch with the doxygen comments for the intrinsincs in the header file __wmmintrin_pclmul.h.
The doxygen comments are automatically generated based on our internal intrinsics document.

I will submit more doxygen comments for the other intrinsic header files as soon as this patch is approved.

Here is the link to the general discussion about adding comments to x86 intrinsics headers.
http://permalink.gmane.org/gmane.comp.compilers.clang.devel/42032

Here is the link to the similar code review for the doxygen comments for the header file ammintrin.h "Adding doxygen comments to the LLVM intrinsics (part 1, ammintrin.h)"
http://reviews.llvm.org/D8762

Please review.

Katya.

Diff Detail

Repository
rL LLVM

Event Timeline

kromanova updated this revision to Diff 44338.Jan 8 2016, 10:58 AM
kromanova retitled this revision from to Adding doxygen comments to the LLVM intrinsics (part 2, _wmmintrin_pclmul.h).
kromanova updated this object.
kromanova added reviewers: gribozavr, jroelofs, ygao.
kromanova set the repository for this revision to rL LLVM.
kromanova added a subscriber: cfe-commits.
silvas added a subscriber: silvas.Jan 19 2016, 8:04 PM

This may sound stupid, but: can you benchmark the time it takes to build some project (that actually uses intrinsics in most translation units, e.g. a game) with the headers w/ and w/o the doxygen comments to check that all the extra comment skipping doesn't affect compilation time? I.e. run your script to add the comments for "all" the intrinsic headers (similar to what you expect the final state to be after all these patches) and test the build time of a game (and compare with the unmodified headers).

Also, can you post a patch that changes "all" the headers to have doxygen comments like you intend, so that others can test and verify?

This may sound stupid, but: can you benchmark the time it takes to build some project (that actually uses intrinsics in most translation units, e.g. a game) with the headers w/ and w/o the doxygen comments to check that all the extra comment skipping doesn't affect compilation time? I.e. run your script to add the comments for "all" the intrinsic headers (similar to what you expect the final state to be after all these patches) and test the build time of a game (and compare with the unmodified headers).

This makes sense. I can do this, though it might take a couple of days, since I don't have a setup for a game build.
I suspect that that we shouldn't use an application/game that is using precompiled headers. Any other limitation should be imposed on the beanchmark?

Also, can you post a patch that changes "all" the headers to have doxygen comments like you intend, so that others can test and verify?

Yes, I'll do this. However, there are a couple of things that the people should be aware:
(a) *only* intrinsics that are supported on PS4 will have doxygen comments. PS4 documentation is parsed to generate these comments, and it doesn't document the intrinsics that are not supported on PS4.
(b) the doxygen comments for the rest (not submitted upstream) headers files might have some problems (typos, incorrect parameter types, bugs in the description, etc). The tool that generates the comments might have some glitches too. Most of this things get fixed after the manual review that I do before I submit the Phabricator code review. So, please don't assume that this is the final version of the header files with doxygen comments.

This may sound stupid, but: can you benchmark the time it takes to build some project (that actually uses intrinsics in most translation units, e.g. a game) with the headers w/ and w/o the doxygen comments to check that all the extra comment skipping doesn't affect compilation time? I.e. run your script to add the comments for "all" the intrinsic headers (similar to what you expect the final state to be after all these patches) and test the build time of a game (and compare with the unmodified headers).

Also, can you post a patch that changes "all" the headers to have doxygen comments like you intend, so that others can test and verify?

Out of curiosity, do you know if the impact to the build time for Eric's change of starting to use the target attributes instead of conditional inclusion was measured (r239883) on a large scale application. If so, what were the results? I should probably include Eric.

This may sound stupid, but: can you benchmark the time it takes to build some project (that actually uses intrinsics in most translation units, e.g. a game) with the headers w/ and w/o the doxygen comments to check that all the extra comment skipping doesn't affect compilation time? I.e. run your script to add the comments for "all" the intrinsic headers (similar to what you expect the final state to be after all these patches) and test the build time of a game (and compare with the unmodified headers).

Also, can you post a patch that changes "all" the headers to have doxygen comments like you intend, so that others can test and verify?

Out of curiosity, do you know if the impact to the build time for Eric's change of starting to use the target attributes instead of conditional inclusion was measured (r239883) on a large scale application. If so, what were the results? I should probably include Eric.

I don't think we did any testing at SCE, but we probably should have. I don't think Google's primary codebases (nor Apple's, or anybody else pretty much) include the intrinsic headers in basically every TU so the impact is probably not a huge concern overall. However, games include these headers in basically every TU so we at SCE should take a look.

Also, the target attributes stuff had some correctness issues (e.g. crashes in backend) that took a while to iron out and those issues took the limelight.

This may sound stupid, but: can you benchmark the time it takes to build some project (that actually uses intrinsics in most translation units, e.g. a game) with the headers w/ and w/o the doxygen comments to check that all the extra comment skipping doesn't affect compilation time? I.e. run your script to add the comments for "all" the intrinsic headers (similar to what you expect the final state to be after all these patches) and test the build time of a game (and compare with the unmodified headers).

This makes sense. I can do this, though it might take a couple of days, since I don't have a setup for a game build.
I suspect that that we shouldn't use an application/game that is using precompiled headers. Any other limitation should be imposed on the beanchmark?

Just make sure it is realistic.

Also, can you post a patch that changes "all" the headers to have doxygen comments like you intend, so that others can test and verify?

Yes, I'll do this. However, there are a couple of things that the people should be aware:
(a) *only* intrinsics that are supported on PS4 will have doxygen comments. PS4 documentation is parsed to generate these comments, and it doesn't document the intrinsics that are not supported on PS4.

If we don't have coverage of every intrinsic, we will need to add documentation for Clang developers about the comment format so that future intrinsics (and intrinsics not covered) can be documented.

(b) the doxygen comments for the rest (not submitted upstream) headers files might have some problems (typos, incorrect parameter types, bugs in the description, etc). The tool that generates the comments might have some glitches too. Most of this things get fixed after the manual review that I do before I submit the Phabricator code review. So, please don't assume that this is the final version of the header files with doxygen comments.

Sure. Makes sense.

I see. I was hoping that if you were, I could have used the same benchmark and compare what build time increase caused by using target attributes (might be much more substantial) and by adding doxygen comments.

From: Eric Christopher [mailto:echristo@gmail.com]
Sent: Wednesday, January 20, 2016 3:09 PM
To: reviews+D15999+public+772a9901b981d210@reviews.llvm.org; Romanova, Katya; gribozavr@gmail.com; jonathan@codesourcery.com; Gao, Yunzhong; Sean Silva
Cc: cfe-commits@lists.llvm.org
Subject: Re: [PATCH] D15999: Adding doxygen comments to the LLVM intrinsics (part 2, _wmmintrin_pclmul.h)

I have some benchmarks, but none of them moved when I added the support
that I could see. It wouldn't catch small regressions though.

-eric

I don't think we did any testing at SCE, but we probably should have. I don't think Google's primary codebases (nor Apple's, or anybody else pretty much) include the intrinsic headers in basically every TU so the impact is probably not a huge concern overall. However, games include these headers in basically every TU so we at SCE should take a look.

Yes, typical PS4 codebase will heavily include x86 intrinsics headers. At the same time, typical PS4 codebase will use precompiled headers or modules for most of the time, so I hope the real world impact of doxygen comment will not be huge in any case. BTW, in my experiment, I'm planning to measure worst case scenario (no modules, no PCH). I think t(ahe benefit of having (1) tooltips and autocomplication in XCode, (2) Intellisense tooltips in MS Visual Studio, (3) internal intrinsics documentation for clang, (4) ability to generate up-to-date internal intrinsic documentation based on the latest Clang headers, etc, should overweigh the drawback of slower compile time for the applications/games that are heavily using intrinsic headers, but opted not to use modules or PCH.
Still, I agree that we need to make this measurement. And I'm curious too :)

Also, can you post a patch that changes "all" the headers to have doxygen comments like you intend, so that others can test and verify?

Yes, I'll do this. However, there are a couple of things that the people should be aware:
(a) *only* intrinsics that are supported on PS4 will have doxygen comments. PS4 documentation is parsed to generate these comments, and it doesn't document the intrinsics that are not supported on PS4.

If we don't have coverage of every intrinsic, we will need to add documentation for Clang developers about the comment format so that future intrinsics (and intrinsics not covered) can be documented.

That was the plan... The consistency of the doxygen documentation will be especially important for the individuals/companies that decide to generate their up-to-date intrinsics guide based on the latest Clang headers. One doesn't want to see differently formatted intrinsics guide.

(b) the doxygen comments for the rest (not submitted upstream) headers files might have some problems (typos, incorrect parameter types, bugs in the description, etc). The tool that generates the comments might have some glitches too. Most of this things get fixed after the manual review that I do before I submit the Phabricator code review. So, please don't assume that this is the final version of the header files with doxygen comments.

Sure. Makes sense.

kromanova updated this revision to Diff 45637.Jan 21 2016, 7:11 PM

In this code review, Sean Silva asked me to post a patch with "all" the headers with doxygen comments, so that others can test and verify.
Here is the patch automatically generated by DCG tool (doxygen comment generator) that I wrote.

There are a couple of things that the people should be aware:
(a) *only* intrinsics that are supported on PS4 will have doxygen comments. PS4 documentation is parsed to generate the comments and it doesn't document the intrinsics that are *not* supported on PS4.
(b) the doxygen comments for all the headers (except ammintrin.h that was already accepted upstream and _wmmintrin_pclmul.h that is currently in the review) might have some known problems (typos, incorrect parameter types, bugs in the description, etc). DCG might have some glitches too (e.g. currently it doesn't parse HTML tables correctly). Most of these things will get fixed before I submit the Phabricator code review for a specific file.

So, please don't assume that this is the final version of the headers with doxygen comments. It's a rough dratf. Use with caution! :)
However, it should be perfectly acceptable for measuring build time performance impact.

For the preview of all the changes, can you please put that in a separate patch from this one?

For the preview of all the changes, can you please put that in a separate patch from this one?

Done.
See http://reviews.llvm.org/D16442

I did some build time measurement on a big game code where the the intrisics are heavily used (PHC was enabled). The presence of comments didn't make any noticeable difference.
Sean did similar measurements without PCH and didn't see any difference in performance neither.

kromanova updated this revision to Diff 45902.Jan 25 2016, 12:27 PM
kromanova removed rL LLVM as the repository for this revision.

SCE's techinical writer, Craig Flores, did the code review and suggested a few changes to the documentation.

Hello,
I have a lot of intrinsic header files with doxygen documentation that I will be submitting in the near future.

We have been reviewing this internally at SCE to make sure the patch it in the best shape before submitting and make the reviews easier. Every patch has been already reviewed by at least three people (me, one of SCE's compiler developer and SCE's technical writer). So, hopefully the quality should be pretty good.

Still, there are a lot of intrinsics (~800) to review and I don’t want to put the burden on only 1-2 open source reviewers.

Who else (except people on the 'reviewers' and 'subscribers' list) will be a good reviewer for X86 intrinsics?
Are there any intrinsics experts from MS, Intel, AMD, Apple on the mailing list who might be willing to help out?”

Thanks!
Katya.

Honestly if they've been reviewed like that internally I'm ok with you just committing them - especially if they look like this.

The only concerns I'd have are in the case of "This intrinsic corresponds to the <blank> instruction" (side note, use the "the"? I commented on a case inline). This isn't always the case with all of our intrinsics when the compiler lowers them to a shuffle intrinsic or some such, or it's optimized, etc. Personally I'd leave that line out, though I understand it exists in a lot of similar documentation.

-eric

lib/Headers/__wmmintrin_pclmul.h
37 ↗(On Diff #45902)

"corresponds to the" ?

Honestly if they've been reviewed like that internally I'm ok with you just committing them - especially if they look like this.

The only concerns I'd have are in the case of "This intrinsic corresponds to the <blank> instruction" (side note, use the "the"? I commented on a case inline). This isn't always the case with all of our intrinsics when the compiler lowers them to a shuffle intrinsic or some such, or it's optimized, etc. Personally I'd leave that line out, though I understand it exists in a lot of similar documentation.

Hi Eric,

I agree. Sometimes the instruction that corresponds to a specific intrinsic is optimized out, sometimes it will get lowered to something else, etc.

However, I think keeping the instruction name in the documentation is extremely useful. In general, intrinsic documentation (especially in the form of comments) is not very complete. When I need to know what a specific intrinsic is doing (and I very often have to look up intrinsics!), I find the corresponding instruction name and go dig in AMD's or Intel's Architecture Programmer's manuals, where I could find all the details I need. Programmer's manuals instruction descriptions are much more detailed and complete. However, it's too much information to add to the comments. :)

As you know, Intel's and MS's intrinsics guides are also specifying corresponding instruction names for the intrinsics. I suspect they had the same idea that I just described.

I briefly chatted with Paul Robinson and he suggested to say "This intrinsic is equivalent to the <blah> instruction" instead, because this sentence doesn't give a false impression that one will definitely see this particular instruction in the generated code. Intel's intrinsics documentation says something like that, e.g: "The corresponding Intel® AVX instruction is VBLENDPD"

What do you think/prefer?

And, yes, I will add "the" before "corresponds to". Thanks! Easy enough with the generator. :)

Honestly if they've been reviewed like that internally I'm ok with you just committing them - especially if they look like this.

The only concerns I'd have are in the case of "This intrinsic corresponds to the <blank> instruction" (side note, use the "the"? I commented on a case inline). This isn't always the
case with all of our intrinsics when the compiler lowers them to a shuffle intrinsic or some such, or it's optimized, etc. Personally I'd leave that line out, though I understand it exists > in a lot of similar documentation.

BTW, in some cases, our documentation won't be as specific and will say "This intrinsic (e.g. _mm_store_ps1 ) corresponds to the Shuffling + MOVSS instruction" or
"No AVX instruction corresponds to this intrinsic (e.g. _mm256_set_pd)" or "Composite SSE2 instruction corresponds to this intrinsic (e.g. _mm_set_sd).

Microsoft and Intel's documentation are very similar with this respect. See the description of _mm_set_sd intrinsic that I just mentioned.

https://software.intel.com/en-us/node/524261
https://msdn.microsoft.com/en-us/library/dksztbt9%28v=vs.90%29.aspx

echristo accepted this revision.Jan 28 2016, 11:56 AM
echristo added a reviewer: echristo.

Yep. As I said, I expect to be on the losing side of that argument - just too much prior art :)

-eric

This revision is now accepted and ready to land.Jan 28 2016, 11:56 AM
This revision was automatically updated to reflect the committed changes.