This is an archive of the discontinued LLVM Phabricator instance.

Add tips for generic IR vs architecture specific code.
Needs RevisionPublic

Authored by asbirlea on Oct 28 2016, 3:54 PM.

Details

Reviewers
reames
Summary

The patch is a start for encouraging frontends to codegen generic IR, while tuning for specific architectures.
Starting off with two examples: strided loads and stores for ARM/AArch64.
I expect the doc to be expand to more patterns, some still under discussion.

The doc could also refer to something more specific, such as "codegen of vector code".
If the content gets too large, it could be moved as a subpage.

Please suggest who is best to review this.
Adding Philip as the doc owner, and Michael as fyi for future AVX512 doc.

Event Timeline

asbirlea updated this revision to Diff 76264.Oct 28 2016, 3:54 PM
asbirlea retitled this revision from to Add tips for generic IR vs architecture specific code..
asbirlea updated this object.
asbirlea added a reviewer: reames.
asbirlea added subscribers: llvm-commits, mkuper.
reames requested changes to this revision.Nov 30 2016, 6:02 PM
reames edited edge metadata.

After reading through the draft text a couple of times, I'm really not clear what your message is and why it belongs here. Having target specific lowering details in generic documentation seems strange?

docs/Frontend/PerformanceTips.rst
126

What is the take away from this piece of advice?

small edits:
e.g. intrinsics or inline asm.
define "generic IR" or use alternate phrase

130

From this sentance, I'm not sure what to expect. Are these patterns where generic IR *does* work, or does not work?

132

Why should this pass get special treatment in target neutral documentation? We don't talk about ISEL here for instance.

143

Lowered by whom, and why does a frontend author care?

205

This sentence does not parse for me.

This revision now requires changes to proceed.Nov 30 2016, 6:02 PM

I tend to agree to that, that's why I suggested this could go into a a separate page. The generic documentation would point to each target specific subpage.
This is part of the feedback I was hoping for; there aren't currently any such pages, so as a draft I dropped the content in here, but I believe it would be better on its own.

docs/Frontend/PerformanceTips.rst
126

I tried to explain more in the last comment.

Is it better to replace "generic IR' with "architecture independent (generic) IR"?
All suggestion to make the doc clearer are more than welcome.

130

Patterns where generic IR *does* work. This draft certainly does not cover everything, I'd expect it to be expanded. I found no other documentation (other than the comments in ISEL code) that would help a frontend writer find these.

132

Agreed - separate page for each architecture where we do talk about ISEL?

143

In this example, by ISEL. The purpose is to have the frontend authors not generate custom intrinsics when generating generic IR should give the same asm in the end.
The example I've dealt with is Halide, which has special code generation of ARM/AArch64 code in some particular cases. These (used to) ge nerate intrinsics (some still do) for cases where the lowering would not get the right asm instruction. The example of the interleaved access pass is a case where there's no reason for intrinsics to be generated.
Changing their code generation to use the right patterns makes the resulting IR architecture independent, gets the same performance on the arm targets and at least the same on x86 and is easier to maintain.

The high-level idea I'm trying to convey is: if llvm's lowering passes can get the same performance, try to rely on those and generate architecture independent IR; if not, use target specific IR (intrinsics, inline asm) but please let the LLVM community know about it and perhaps it's something it should be addressed.

205

It was meant to be as a sort of disclaimer; if you could suggest how to make this clearer that would be great.

The idea is that such patterns are lowered to a particular asm instruction on this arch, known to be effective there. It may not give the best performance on another architecture.

The aim is to give the frontend authors the info on existing patterns, encourage them to generate generic IR whenever possible, while still testing if they get the expected performance on *other* archs. Then get their feedback when they do see such performance regressions, or when lowering could do a better job.