This is an archive of the discontinued LLVM Phabricator instance.

[Tablegen/DAG Debug] - Instrumenting table gen DAGGenISelDAG to allow printing selected patterns(and their sources) during ISEL
ClosedPublic

Authored by aditya_nandakumar on Feb 7 2017, 1:07 PM.

Details

Summary

To help assist in debugging ISEL or to prioritize GlobalISel backend work, this patch adds two more tables to <Target>GenISelDAGISel.inc - one which contains the patterns that are used during selection (The text comment Src and Dst comment you see in MatcherTable for each of the MorphNode and EmitNode cases.
It adds another Tablegen OpcodeType called OPC_Coverage. We encode the offset of the pattern (in the table) with OPC_Coverage.
It also modifies the CheckComplexPattern to also print the ComplexPattern that returned true.
Sometimes, due to several layers of defm instantiations, to make it easier to identify where a particular pattern came from, the source locations have also been encoded in another table. This would get printed as

COMPLEX_PATTERN: SelectNegArithImmed
MORPHTO: (add:i32 GPR32:i32:$Rn, neg_addsub_shifted_imm32:i32:$imm) -> (SUBSWri:i32:i32 GPR32:i32:$Rn, neg_addsub_shifted_imm32:i32:$imm)
INCLUDED: /Volumes/Data/workspace/gpu/compiler_checkouts/open_source/llvm/lib/Target/AArch64/AArch64InstrInfo.td:659
.....

It's currently enabled through CMAKE variable DAG_PRINT_USED_ISEL_PATTERNS which passes a -instrument-coverage flag to table gen for emitting the *GenDAGISel.inc file.
This information can also be used (with a python script which greps/counts patterns) for building a test-suite(such as llvm-test-suite) and see which instructions/patterns need to be selected (say for global isel work prioritization) and can also print the tablegen emitted patterns that are unused during the compilation of the llvm-test-suite.

Looking forward to feedback.

Diff Detail

Repository
rL LLVM

Event Timeline

rengolin edited reviewers, added: rovka; removed: llvm-commits.Feb 8 2017, 3:38 AM
ab edited edge metadata.Feb 9 2017, 7:21 PM

A few minor nitpicks, but this seems like a nice feature!

CMakeLists.txt
177

I'd still prefix the name with 'LLVM_ENABLE_': it makes it easier to figure out where individual options come from in ccmake or CMakeCache.txt.

Maybe LLVM_ENABLE_DAGISEL_COV or something?

179

Maybe do this directly in TableGen.cmake, appending to LLVM_TABLEGEN_FLAGS? That way we don't need to change AArch64/CMakeLists.txt.

In theory, we should only add it to -gen-dagisel invocation, but in practice, there's only one binary, so the flag will just be useless on the other invocations.
You can fix that too by only adding -instrument-coverage to the flags inside tablegen(), only if ${ARGN} contains '-gen-dag-isel'.

utils/TableGen/DAGISelMatcherEmitter.cpp
64

If we never look into the components, maybe just concatenate the strings into one when you create them and avoid the std::pair?

70

While this isn't a huge deal (because it's a debug feature), the complexity of std::find makes me twitchy.

Maybe use a UniqueVector or std::lower_bound?

143

Print -> Emit

Also, can you declare it with the other Emit* functions above, and define the body elsewhere in the file?

149

Can you use the standard naming style? (index -> Index; here and elsewhere)

150

"char*" -> "char *" (here and elsewhere)

164

Since these tables are defined inside the functions, maybe remove the Prefix and give them a regular name?

701

NumCoverageBytes or something? Gets rid of the magic '* 3' later.

809

Give it a name? Maybe "Succeeded" or something?

dsanders accepted this revision.Feb 10 2017, 6:46 AM

It will LGTM with some nits fixed in addition to the ones Ahmed mentioned.

CMakeLists.txt
177

I had a similar comment typed but the name you suggest is better.

include/llvm/CodeGen/SelectionDAGISel.h
218–226

SelectionDAGISel is already abstract (because of Select()) so I think we should use make these pure virtual with '= 0'.

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
3500

'Morph' has a particular meaning so I'm not sure we should use it for both OPC_EmitNode* and OPC_MorphNodeTo*. I'd suggest 'COVERED: ' or similar.

utils/TableGen/DAGISelMatcherEmitter.cpp
64

If we do need to keep the std::pair<> then I think there ought to be a comment here indicating that the first string is the src pattern and the second is the dst patten.

70

Do we really need to de-dupe them? I wouldn't expect there to be duplicates since this indicates redundant patterns in the tablegen files and the MatcherTable entries for one of them will be unreachable.

145

isUInt<16>(VecPatternsCovered.size())

149

Could you also add the missing whitespace before the '{' and emit indentation to the tablegen-erated file?

155–156

I'd make this unconditional since trailing commas are ok

168–169

Similarly, I'd make this unconditional.

623

Personally I prefer underscored names but we should use CamelCase. Similarly below.

This revision is now accepted and ready to land.Feb 10 2017, 6:46 AM
ab added inline comments.Feb 10 2017, 10:00 AM
utils/TableGen/DAGISelMatcherEmitter.cpp
70

You're absolutely right, seems like this should just be a vector.

aditya_nandakumar marked 16 inline comments as done.Feb 10 2017, 1:35 PM

Will post the updated diff shortly.

CMakeLists.txt
177

Thanks. That sounds much better than what I picked.

179

Makes sense (I was initially hesitating to add to that function) . I'll go ahead and add this.

include/llvm/CodeGen/SelectionDAGISel.h
218–226

Currently Tablegen emits definitions for those two functions only when -instrument-coverage is passed in. This means if we make it pure virtual, then those two methods would be unimplemented by the <Target>DAGISels when -instrument-coverage is not passed in.

utils/TableGen/DAGISelMatcherEmitter.cpp
70

The problem I saw was that EmitMatcherList was called several times (since we need child and offset of failure code before emitting either of them, they buffer the output to a temporary string). I noticed w/o deduplication - the same patterns show up several times.
Not sure if lower_bound will work as ordering by '<' for strings seem weird?

There's no reason to use pair currently - I'll just create the concatenated strings directly and have a vector<string>

Updated based on feedback.

dsanders added inline comments.Feb 13 2017, 1:38 AM
include/llvm/CodeGen/SelectionDAGISel.h
218–226

Good point. Tablegen could emit the llvm_unreachable version when not given -instrument-coverage but there's no strong reason to to that over what you have here.

utils/TableGen/DAGISelMatcherEmitter.cpp
70

The problem I saw was that EmitMatcherList was called several times (since we need child
and offset of failure code before emitting either of them, they buffer the output to a
temporary string). I noticed w/o deduplication - the same patterns show up several times.

Ah ok, I see where the dupes come from now. In that case I'd suggest having some kind of map/set to make the dupe check less expensive for large numbers of patterns. MapVector might be a good choice given that we need to refer to the strings by their index. Using something else would mean we have to maintain the vector and map/set separately.

Not sure if lower_bound will work as ordering by '<' for strings seem weird?

The assembly parser uses lower_bound for the mnemonic but the catch is that the vector has to be sorted for it to work correctly.

Updated based on feedback. Used MapVector

aditya_nandakumar marked an inline comment as done.Feb 13 2017, 9:51 AM