This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][utils] Adding the init version of Instruction Select Testgen
AbandonedPublic

Authored by rtereshin on Mar 1 2018, 12:37 PM.

Details

Summary

This is the first version of the testgen - a tool, currently implemented as an
llc MIR-pass, that generates regression lit-tests for GlobalISel's Instruction
Selector. The generation is done on rule-by-rule basis and currently covers
selection rules automatically imported by TableGen from SelectionDAGISel.

This is a prerequisite for the tests generated for

  1. AArch64: https://reviews.llvm.org/D43976
  2. ARM: https://reviews.llvm.org/D43979
    1. updated in https://reviews.llvm.org/D43982 (this also demonstrates how the tests could be updated when GlobalISel changes)
  3. X86: https://reviews.llvm.org/D43994 (this also demonstrates the usefulness of ABI speculation done by Testgen)

What this tool is and isn't:

  1. This is not a fuzzer for Instruction Selector, it isn't trying to come up with a malicious input for it and break it, nor it's trying to discover bugs in it.
  1. This is a regression testing tool, it's main goal is to capture the current state of the GlobalISel's InstructionSelect pass providing the best test coverage for it with small and highly targeted tests that pass, and catch any regressions later due to changes in:
    1. *.td-definitions of the instructions and selection patterns;
    2. GlobalISel's emitter (the TableGen backend), including the ones that intend to change rules' priorities and the ones that don't;
    3. manually written parts of the Instruction Selector.
  1. Potentially this is also an analysis tool that may make it easier to see and control the actual effects of changes like listed above on the Selector, detect dead rules, etc.
  1. It may be extended in the future to generate tests for other passes of the GlobalISel's pipeline, and / or have a fuzzer mode of operation, but currently these aren't the goals.
  1. It also may be turned into a benchmark-gen for Instruction Selector relatively easily. That way we could benchmark the Selector on large inputs created in-memory, avoiding huge (~90%) overhead of parsing MIR off disk, and having the input with any desirable probabilities of any pattern supported, as well as having only the patterns supported, avoiding any input that is not selectable, thus getting more stable, targeted, and precise performance data.

Potential user stories:

  1. New backend development.
  2. Porting an existing backend from SelectionDAG ISel to GlobalISel.

While the first one is promising, it appears that the second one is more
prominent right now and therefore the main target of this tool.

Design goals:

  1. As we mostly care about providing regression testing of InstructionSelect pass of GlobalISel's implementations early in the development for pre-existing targets, we can not rely on any other GlobalISel passes being well-developed and fully functional, in particular, we expect InstructionSelect pass to be well ahead everything else due to the semi-automatic porting mechanism.

    See https://reviews.llvm.org/rL326396 as an example of breaking ties with the Legalizer, selectUnconstrainedRegBanks of this patch as an example of the same w.r.t. RegBankSelect.
  1. We want the testgen to be relatively robust and able to handle gracefully non-functional changes, for instance, changes in the typical order of the MatchTable opcodes for rules, or even presense of specific opcodes, like the number of operands check, or changes in concrete serialization format for MIR.
  1. We want the testgen to be as target-independent and generic as possible and impose as less maintainance burden on backend writers as possible.
  1. If it's not jeopardizing other goals and not too difficult to do, we want testgen to generate naturally-looking tests that are likely to come out the same if written by a human.

Design decisions made:

  1. Current implementation of testgen uses TableGen'erated MatchTable's to generate the tests. We could've branched off input data-wise earlier, but that would mean re-implementing too much of the GlobalISel's emitter.
  1. We're using only matching parts of the MatchTable to generate MIR and relying on the selector itself to generate FileCheck's for the expected output for a few major reasons:
    1. it simplifies the implementation;
    2. it reduces the number of tests failing as of time of their generation due to the MIR being selected not by a rule intended, which is desirable as we aren't fuzzing the selector, but trying to generate passing tests;

Usage:

llc -mtriple aarch64-- -run-pass instruction-select-testgen -simplify-mir input.mir -o output.mir

will add a number of Machine Function's, one per every imported *.td-defined
selection rule, into intput.mir and write the result as output.mir.

Command line options:

  1. -testgen-from-rule=N -testgen-until-rule=M - generate tests for a subrange of rules only;
  1. -testgen-exclude-rules=N{,N} - skip specific rules;
  1. -testgen-include-only=N{,N} - generate tests for explicitly listed rules only;
  1. -testgen-set-all-features - speculatively satisfy all target / module / and function features requirements to cover feature-specific rules;
  1. -testgen-no-abi - don't speculate on ABI boundaries tring to make the test look natural and test COPY's selection, but just IMPLICIT_DEF undefined vregs instead.

Note:
-testgen-no-abi=false tried to emit real RET opcodes at some point by using
CallLowering::lowerReturn and deriving IR Types from LLTs, but it proved
to be unreliable for most targets and created an extra dependency.

This patch also provides utils/update_instruction_select_testgen_tests.sh tool
that would generate a couple of lit-tests:

usage: ./utils/update_instruction_select_testgen_tests.sh <testgen'd file> <llc binary> <target triple> [extra llc args]

for instance, executing

../../utils/update_instruction_select_testgen_tests.sh ../../test/CodeGen/AArch64/GlobalISel/arm64-instruction-select-testgen-testgend.mir ./bin/llc aarch64--

from a build/obj directory would create 2 files:

../../test/CodeGen/AArch64/GlobalISel/arm64-instruction-select-testgen-testgend.mir
and
../../test/CodeGen/AArch64/GlobalISel/arm64-instruction-select-testgen-selected.mir

testing that the testgen outputs the same MIR and the selector selects that MIR
the same way respectively.

Coverage:

Target  | Rules    | Fail to | Tests     | Selected by the
        | Imported | Select  | Generated | Rule Intended
--------+----------+---------+-----------+----------------
AArch64 |  1654    |  0.0%   |  1449     |  85%
ARM     |  1055    |  0.2%   |   991     |  78%
x86     |   887    | 13.8%   |   765     |  68%

"Fail to Select" stands for "a generated test crashed / asserted the selector",
this is something to -testgen-exclude-rules in practice. The major reason
for this right now is a limited support of COPY_TO_REGCLASS in *.td-defined
patterns by the GlobalISel importing mechanism.

"Selected by the Rule Intended" basically means the target coverage provided by
the tool. A test could be selected by a rule different from the rule that was
used to generate it for a variety of reasons, approximately in order from most
prominent ones to the rarest ones:

  1. The test generated isn't specific enough due to:
    1. lack of support of complex patterns by the testgen;
    2. too basic support of immediate predicates by the testgen;
    3. rules genuinely intersecting with each other and local approach of the testgen not considering rules partially hiding each other.
  1. A rule is genuinely dead and
    1. it was rendered dead by GlobalISel;
    2. it was dead in SelectionDAG ISel to beging with;
    3. it is rendered dead by manually written parts of the selector executing before trying TableGen'erated selectImpl.

Known deficiencies:

  1. Testgen could not be currently easily extended by a target to support complex patterns, which should greatly improve coverage.
  1. Testgen's way of dealing with features is very sketchy at the moment and needs to improved.
  1. Testgen should probably be a separate from llc binary tool

approximately from the most important to fix soon to the least important.

A couple of dependencies for this patch as well as the tests generated
are coming soon in separate patches.

See also test/CodeGen/AArch64/GlobalISel/select-with-no-legality-check.mir
currently committed for an example output of the testgen.

Diff Detail

Repository
rL LLVM

Event Timeline

rtereshin created this revision.Mar 1 2018, 12:37 PM
rtereshin edited the summary of this revision. (Show Details)Mar 1 2018, 12:41 PM
rtereshin edited the summary of this revision. (Show Details)Mar 1 2018, 1:27 PM
rtereshin edited the summary of this revision. (Show Details)Mar 1 2018, 1:43 PM
rtereshin edited the summary of this revision. (Show Details)Mar 1 2018, 3:31 PM
rtereshin edited the summary of this revision. (Show Details)Mar 1 2018, 4:35 PM
rtereshin updated this revision to Diff 136669.Mar 1 2018, 7:24 PM
rtereshin edited the summary of this revision. (Show Details)

Fixing a little non-determinism in picking live-in phys regs used to define input vregs.

rtereshin updated this revision to Diff 136675.Mar 1 2018, 9:07 PM
rtereshin edited the summary of this revision. (Show Details)Mar 1 2018, 9:07 PM

fixing little formatting issues, NFC

rtereshin added a comment.EditedMar 2 2018, 3:47 PM

Performance

or why all the tests are in a single file?

in short, with test cases in separate test-files it takes at least ~30 times (or at least 2 minutes just for AArch64 and only for the rules currently imported by GlobalISel emitter) longer to update the tests, and at least 15 times (or at least 40 seconds just for AArch64 and only for the rules currently imported by GlobalISel emitter) longer to run the tests on 4-cores SSD-only iMac running macOS depending on a build type. In some cases, the difference reaches 130x / 5 minutes.

With all the tests in a single file it takes about 2 seconds to update them for AArch64 at the moment, and about 1.33 seconds to run them on the same machine as described above (both tests ran in parallel, where the test testing the Testgen itself takes ~1/3 of a second, and the test testing the Instruction Selector takes the full 1.33 seconds).

rtereshin edited the summary of this revision. (Show Details)Mar 5 2018, 9:47 AM
rtereshin updated this revision to Diff 139188.Mar 20 2018, 1:55 PM
  1. Rebased against master.
  2. GIM_RecordInsn handler is made more tolerant to alternative orderings of GIM_RecordInsn's with respect to other opcodes while matching multi-instruction patterns.
  3. A use-after-free bug is fixed in emitReturnIntoTestCase caught by ASAN.

Hi Roman,

This is a very large patch with lots of complexity, and is hard to review. While your description outlines the general idea of the tool, there's little documentation in the code of how this actually works under the hood. Can you add more descriptions so someone can follow what's happening when it comes to maintenance later, e.g. what's LiveInRA supposed to do, what are the pre-conditions and expected outputs of each phase of this tool?

Without that I suspect this patch will continue to lie in the review queue for a long while.

Thanks,
Amara

Hi Roman,

This is a very large patch with lots of complexity, and is hard to review. While your description outlines the general idea of the tool, there's little documentation in the code of how this actually works under the hood. Can you add more descriptions so someone can follow what's happening when it comes to maintenance later, e.g. what's LiveInRA supposed to do, what are the pre-conditions and expected outputs of each phase of this tool?

Without that I suspect this patch will continue to lie in the review queue for a long while.

Thanks,
Amara

Hi Amara,

Thanks for taking a look at this! Will do.

Brining perf-data up as the comment is hidden due to the diff update: https://reviews.llvm.org/D43962#1026201

Aside from being a large patch with few function-level comments, some of the non-functional changes also make this difficult to review. It would be helpful to move things like the indentation correction on testImm*(), the introduction of buildTable and getMatchTable(), the changes to coverage, moving the emission of selectImpl() down, etc. into a separate patch(es).

I think the overall approach of parsing the match table, emitting something that matches, and constructing some scaffolding around it is a good plan. We'll need to find a good way of handling immediates, C++ predicates, ComplexPattern and similar but that should definitely be left for later patches.

Some targets have a rather large number of rules (e.g. X86 is around 17k IIRC). Do we have a mechanism for keeping the number of generated tests to a reasonable level?

include/llvm/CodeGen/GlobalISel/InstructionSelector.h
238

Is this comment accurate? GIR_AddRegister is listed as having 2 operands and I only see 2 in my build

include/llvm/CodeGen/GlobalISel/InstructionSelectorTestgen.h
27–29

This one probably isn't harmful since InstructionSelectorTestgen.h isn't going to be widely included but we ought to avoid anonymous namespaces in headers since each compilation unit that includes the header will get it's own version of the declaration. It's probably best to put it in the llvm namespace if we can't push it into the cpp entirely.

InstructionSelectorTestgen doesn't seem to have any state so it looks like generateTestCase() could be a static function outside the class

33–38

Are all of these really needed in the header? Most seem to only be used from InstructionSelectorTestgen.cpp

include/llvm/Support/CodeGenCoverage.h
26 ↗(On Diff #139188)

Just a small nit: we should probably have 'const' somewhere in the 'covered_iterator' name.

lib/CodeGen/GlobalISel/InstructionSelectTestgen.cpp
46–48

Could you add a comment indicating that verification is the only thing we do with these functions and why?

lib/CodeGen/GlobalISel/InstructionSelectorTestgen.cpp
59

If the function already exists somehow then this might not return an empty test case due to the getOrInsertFunction() we should probably fail in that case

103

LLT's scalars aren't always integers. So long as we inject bitcasts this will be fine though

124

If I understand correctly, this is a register allocator that is used to generate the live-ins for the test function. This needs some explaining in the comments and possibly also renaming (I don't have a good spelling for it, maybe InputRegAllocator)

274–309

This lambda is getting pretty big. I'd be inclined to make it a static function in its own right

283

I think a word is missing from "of the required by"

307

greadily -> greedily

344–346

As a general thing: Writes to dbgs() should generally be wrapped in DEBUG(...) or DEBUG_ONLY(...) so that we don't format strings we're not going to emit.

354

I'm not sure IMPLICIT_DEF is the right thing to use for these if-statements. IMPLICIT_DEF is a definition with an unknown value (much like UNDEF) so it wouldn't be wrong to propagate it in something like

%0 = IMPLICIT_DEF
%2 = G_ADD %0, %1

to:

%2 = IMPLICIT_DEF

a COPY of a live-in phys-reg would be safer but it's probably ok since constant propagation isn't ISel's job. I think this would only become an issue if we started porting this to combiners.

367

This table is going to be quite fragile. We should at put this somewhere near the GIM_*/GIR_* declarations or at least cross-reference them in the comments. Tablegen-erating them might be sensible if we get additional metadata.

422

If we have a rule with variadic instructions, what do we do about the number of defs?

469

Is this ever false? ensureNumOperands() looks like it adds operands until this is true

485

What is this for?

486–489

This comment explains the reasons behind something but doesn't really explain what that something is

498

I don't think I understand this variable. What does it represent?

503

This should probably indicate what is being skipped

504

In some ways it would be nice to support this (e.g. to check the tests are the same) but I agree it's way too big a task for a first patch.

505–511

Doesn't NDEBUG also disable the dbgs() stream?

That assert can't succeed (NestingLevel > 1 vs NestingLevel == 1). It looks like this ought to be report_fatal_error() or similar

597

I don't think I understand the Def.getReg() part of this. Both the Use and the Def must be the same register so I'd expect to always use one or the other for all cases.

681–682

Function-level features can be handled by listing them in the function attributes:

attributes #0 = { "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }

The tricky bit will be mapping the enum back to the feature name.

Module-level features are harder since they're only read once per module. You'd have to separate them into multiple files

715

There's no guarantee that 1 will match. There's a few predicates that check they're multiples of N

761–784

One thing to mention here is that if something changes the register number of OtherInsnB.getOperand(OtherOpIdx) after this opcode is processed then these might diverge. I think we're ok on that since setReg() doesn't seem to be called from latePass()

796

As with the other immediate case, '1' might not match all predicates

849

I was confused by this function name at first. It sounded like the regbanks were unconstrained and that didn't make sense. I see it's about assigning regbanks to vregs that don't already check one. I haven't thought of a good spelling ('assignRegBanksToUnconstrainedVregs()' or 'ensureRegistersHaveBanks()' are a couple ideas) but we may be able to find a clearer name.

884

We should have a couple examples of how duplicates can happen (e.g. i32 and f32 both mapping to s32)

utils/TableGen/GlobalISelEmitter.cpp
4192–4196

It seems that this block has been moved down from the other side of the sort. Non-functional changes like this ought to be in separate patches

4214

This definition should probably be wrapped in a #ifdef

utils/update_instruction_select_testgen_tests.sh
40–43

The 'perl' commands might be an issue for the windows bots.

rtereshin marked 16 inline comments as done.Apr 25 2018, 5:25 PM
rtereshin added inline comments.
include/llvm/CodeGen/GlobalISel/InstructionSelector.h
238

The comments in this section describe an opcode below the comments, not above. So this is all about GIR_AddTempRegister and it appears to have 3 operands (https://github.com/llvm-mirror/llvm/blob/master/include/llvm/CodeGen/GlobalISel/InstructionSelectorImpl.h#L598-L601)

include/llvm/CodeGen/GlobalISel/InstructionSelectorTestgen.h
27–29

That's a good insight, thanks, I didn't think of that.

generateTestCase calls virtual member-functions and may do so more in the future and I'd rather keep it that way and keep it a class member.

I'm moving the helper class (LiveInRA or however we would (re)name it) in InstructionSelectorTestgenas a member-class instead, I hope that solves the problem well enough.

33–38

Only TestgenSetAllFeatures is required to be here as it's used by the instruction selector itself.

I'm not exactly sure why not to list all the options in a header, in a sense it's part of its interface, CLI in this case, but sure, I'll remove all of those that don't have to be here. After all, nothing would force this list to be complete, so maybe it's better if it's explicitly incomplete, and also this is rarely (if ever) done across other parts of LLVM.

include/llvm/Support/CodeGenCoverage.h
26 ↗(On Diff #139188)

Sure, good catch! I'm renaming it to const_covered_iterator

lib/CodeGen/GlobalISel/InstructionSelectTestgen.cpp
46–48

Sure, adding the following comment:

// Generally it is possible to run the Testgen over a non-empty module,
// practically probably a lit-test, and have it add additional test-cases
// to it. The easiest way to limit the number of verification failures on
// the final output caused by initially present machine functions rather
// than the ones added, is to run the verification on those already
// present functions first.
lib/CodeGen/GlobalISel/InstructionSelectorTestgen.cpp
59

True. Let's just MachineFunction::reset() it for now if such a thing happens, that would make llc update the existing test cases if ran over an already generated test-module. It's a sensible thing to do, I think.

Generally, if not reset, it would actually just add another machine basic block and repeat the instruction sequence derived from the selection rule, so we'll end up with a single machine-function test-case with multiple identical basic blocks testing the same pattern. W/ no bug fixes it will screw up the ABI lowering at the moment though, so it will only work for patterns w/ no input vregs, such as having only constants for operands. And that means it will generate a small test-module, it won't fail or crash, but rather just filter out the broken machine functions using machine verifier, as it always does. Or tries to, at least.

Technically, I'm still considering turning this into a sort of a benchmark-gen, not sure at the moment if that could be actually useful though. Theoretically, we could have a tool that will be able to generate long def-use chains of instruction sequences corresponding to a particular pattern or a subset of pattern in a type-driven manner and use them to benchmark the selector as well as other parts of the pipeline.

103

They aren't, but it appears to me that this is the best we could do w/o going to great lengths, as the information about the actual type is pretty much lost at match table level.

At some point, I tried to lower ABI, specifically to insert an appropriate return sequence, by re-using CallLoweringInfo::lowerReturn provided by the target. The method expects an LLVM Value, however, the targets only analyze (or were at the moment) the Value's type, so it seemed sufficient to provide it an IR constant of an appropriate IR type.

It didn't work our for a number of reasons, most notably not all the targets could handle all the types, especially vector types. So I started to use the PATCHABLE_RET instead for this purpose.

So currently deduceIRType is only used to build appropriate machine memory operands to make GIM_CheckAtomicOrdering happy, specifically, figure out a reasonable size and alignment. As the size is directly inferable from LLT, this is mostly to figure out the alignment.

124

Your understanding is correct.

I'm adding the following comment:

+/// A helper providing sensible phys regs to define patterns' input vregs.
+///
+/// It appears that the most portable and robust way to define all the input
+/// vregs (used, but not defined) of an instruction sequence being generated is
+/// to define them as COPY's from appropriate and preferably distinct physical
+/// registers, live-in to the basic containing the instruction sequence and
+/// preferably the entire function as well. The best pick is to have the same
+/// size in bits as the vreg, same register bank, to be allocatable, and from
+/// the beginning of the allocation order. The class also handles the following
+/// issues:
+///
+/// 1) RegisterBankInfo::getRegBankFromRegClass not being defined for
+///   tablegen'erated reg classes.
+/// 2) Register classes containing the same physical register and yet having
+///   different sizes, and therefore physical registers having only weekly
+///   defined size as the maximum of the sizes of all register classes they
+///   belong to.
+/// 3) Register banks not having a full list of register sizes available
+///   directly.
+///
+/// The latter is also used for picking sensible register banks for internal
+/// (defined and used both by the instruction sequence being generated) vregs.

and renaming the class from LiveInRA to InputRegAllocator (from (anonymous namespace)::LiveInRA to llvm::InstructionSelectorTestgen::InputRegAllocator to be exact).

274–309

Agreed, doing that.

I'm also changing Size2RegBanksTy from DenseMap to std::map, that gets rid of the extracting keys (Sizes) and sorting them every time, makes the implementation a little simpler and reduces the number of arguments (former closure captured variables) of the static function being extracted.

283

An example sentence would be "Didn't find a register bank containing allocatable registers of the required by LLT <2 x s16> size of 32 bits or larger for an unconstrained vreg %1".

Breaking it down as follows:
"Didn't find a register bank containing allocatable registers of a compatible size; vreg: %1(<2 x s16>), size: 32 bits (or larger)."

307

Oops, good catch, thanks!

354

COPY of a live-in phys-reg would be safer

It is, this is why I'm using IMPLICIT_DEF only if I couldn't find a phys-reg with the appropriate size and within the required register bank (or if this behavior is explicitly requested by a command line option).

367

Agreed. I'm making this less fragile by doing the following:

  1. Specifying a list of pairs implicitly defining a mapping from an opcode to its number of operands instead so
    1. there is no need to maintain the records in a specific order, matching the order of the opcode definitions
    2. opcodes are used directly, not just as a comment, thus making sure there are no non-existent opcodes mentioned in the list
  2. adding assertions that would make sure that there are no opcodes missing from the mapping
  3. putting the definition right next to the opcode definitions
469

Of course, the number of operands could be greater than OpIdx, ensureNumOperands doesn't add or remove operands in that case.

485

Often I need to create a generic virtual register w/o knowing which type is expected from it yet, and it's not exactly possible to create a virtual register w/o a type. This constant exists to be consistent with the type I use as a default / initial option.

utils/TableGen/GlobalISelEmitter.cpp
4192–4196

I'm extracting a number of patches from this one to break it down a little.

rtereshin updated this revision to Diff 144043.Apr 25 2018, 5:28 PM
rtereshin marked 16 inline comments as done.

Hi Daniel @dsanders,

Thank you for looking into this and the detailed review.

It would be helpful to move things like the indentation correction on testImm*(), the introduction of buildTable and getMatchTable(), the changes to coverage, moving the emission of selectImpl() down, etc. into a separate patch(es).

I believe I have this done, please take a look at the extracted patches:
https://reviews.llvm.org/D46095
https://reviews.llvm.org/D46096
https://reviews.llvm.org/D46097
https://reviews.llvm.org/D46098

I'm also half-through the inline comments.

rtereshin marked 31 inline comments as done.Apr 26 2018, 6:47 PM
rtereshin added inline comments.
lib/CodeGen/GlobalISel/InstructionSelectorTestgen.cpp
344–346

Done.

so that we don't format strings we're not going to emit.

w/o DEBUG macro we actually emit all of this in assert and release builds likewise.

422

That's a very good question, thanks!

I suppose, I will have to replace NumDefs = InsnB->getDesc().NumDefs; line below with NumDefs = std::max(InsnB->getDesc().NumDefs, InsnB->getNumExplicitDefs()); as soon as we get MachineInstr::getNumExplicitDefs() merged in ;-) (https://reviews.llvm.org/D45640).

It will help, but won't solve the problem.

It's not a problem for now, though, as InstructionSelector can't really handle those either. For instance, record instruction opcode clearly assumes that the definition is always the operand 0.

When it does support the case, however, most likely it won't be doing that by checking how many definitions an instruction has, it will most likely just rely on MIR being valid.

That naturally assumes machine verifier can check this stuff. So I guess at some point machine verifier will special case instructions like G_UNMERGE_VALUES. We can implement that check in the machine verifier as unsigned getNumExplicitDefsExpected(const MachineInstr &I) refactored out and then checking if the actual I has the number of defs expected.

And then we can reuse getNumExplicitDefsExpected right here in Testgen.

If the generic opcode has a very flexible number of defs, as in, it's not derivable from the number of operands and their types (do we even have these?), it will probably be still all right, we just see the highest operand index that the pattern explicitly requires to be a definition (via record instruction opcodes, for instance), and we say that that operand is the last def (and of course every operand with a lower index is also a def). And that should produce a) valid MIR, we started by saying this mysterious opcode is very flexible with defs b) MIR that could be matched by the pattern, and it's all we care about.

Not to mention, if we end up having generic opcodes like this - with non-derivable number of defs and the number of defs having an impact on the instructions' semantics - we will end up having a match table opcode checking the number of defs explicitly.

But again, it's not a problem for now.

486–489

I'm replacing the comment with the following:

// Raw representation of a single match table rule as an ordered union of
// several continuous regions of the match table. The representation tries its
// best to ignore parts of the table that don't affect semantics of a single
// rule in isolation, like labels and rule IDs.
//
// Assuming that all the parts of the MatchTable that don't affect the
// selection process but only identify a rule, like GIR_Coverage opcodes, come
// within a rule as a single continuous block, the meaningful parts of the
// rule could be represented as some prefix (starting from the first
// non-control flow opcode, in other words, skipping GIM_Try and its
// label-operand) and suffix:
using RuleBodyTy = std::pair<ArrayRef<int64_t>, ArrayRef<int64_t>>;
498

I'm renaming the variable from CoverageBlockPassed to ExcludedRegionPassed and adding a bunch of comments that should make it clearer, like this:

// Get a rule descriptor, containing the index of its GIR_Done opcode, RuleID,
// and a raw representation of the entire body.
//
// \pre MatchTable is a non-optimized linear match table.
// \pre CurrentIdx points to the first (and only) GIM_Try opcode of the rule
//   that has all its semantically meaningless opcodes that to be excluded from
//   the body as a single continuous subregion somewhere.
// \post the prefix of the body is a range from the first opcode after the
//   initial GIM_Try until GIR_Coverage (or, more generally, the first
//   semantically meaningless opcode), the suffix is a range from the first
//   opcode after the last semantically meaningless opcode until GIR_Done
//   (exclusive).
static std::tuple<uint64_t, int64_t, RuleBodyTy>
getDoneIdxRuleIDAndBody(const int64_t *MatchTable, uint64_t CurrentIdx) {
  // skipping GIM_Try
  const uint64_t BodyIdx = nextGIOpcodeIdx(MatchTable, CurrentIdx);
  uint64_t ExcludedFirst = BodyIdx;
  uint64_t ExcludedLast = ExcludedFirst;
  // RuleID we discovered so far. Or the first one if we have many O_o
  int64_t FirstRuleID = -1;
  // Did we already iterated over that continuous region of unimportant opcodes
  // we are going to exclude from the body?
  bool ExcludedRegionPassed = false;
  unsigned NestingLevel = 0;
  do {
503

I'm adding the following comment:

// Skipping the OnFail label operand
504

It will be hard to make sure that the tests are the same.

For instance, let's suppose that non-optimized table does some meaningless checks, for instance, checks register banks on internal vregs (the registers defined and used inside the pattern and not being the pattern's overall inputs or outputs), while optimized table doesn't. Semantically they are the same, but in the case of the latter Testgen will have to guess more regbanks, and it might guess it differently (from what is explicitly checked by the non-optimized table). It makes no difference in the selected code, and the *selected test will pass, however, the *testgend test (and that one only tests the Testgen itself, not the selector) will be technically different.

Also, if optimization reorders opcodes, the Testgen might easily end up with different virtual register names, while keeping the actual def-use chain the same, or schedule the def-use chain differently.

Not to mention, it will noticeably increase the maintenance burden, and I think it's best to keep that at a minimum.

505–511

Doesn't NDEBUG also disable the dbgs() stream?

No, the only difference is that NDEBUG resolves dbgs() directly to errs() and therefore sends the output straight to stderr while in assert builds there is a circular buffer in the middle (that smart enough to flush if killed).

That assert can't succeed (NestingLevel > 1 vs NestingLevel == 1). It looks like this ought to be report_fatal_error() or similar

True, good catch, I'm tidying this up.

597

Both the Use and the Def must be the same register so I'd expect to always use one or the other for all cases.

They must be the same register by the end of this case, but we know little in the beginning. We know that Def is a register and that's about it. Either (and we don't know which) or both Def and Use could be %noregs (0). If one of them is an actual vreg (not %noreg) it might have a meaningful type or / and a bank assigned, and we can't loose that assignment. Technically, to make it even more robust we need to intersect both of their constraints here (what if both of them are valid vregs already, one has a bank checked, but not a type yet, and the other has the type checked, but not the bank?), but so far it was working pretty well as is.

681–682

Thanks, I didn't dig into this deep enough yet to know that for myself!

And yes, the mapping is the hardest part. So this is why that ugly TestgenSetAllFeatures command line option for now, to overcome this the cheap and dirty way. Of course, that strips us off testing the checking features part of every pattern and the match table as a whole.

This is for future patches and improvements, though.

715

Absolutely no guarantee. However, I actually took a quick look and it appeared to me that 1 would match more often then any other fixed constant, did't measure it though.

Full support for immediate predicates is for future patches.

761–784

True, and that's the whole point of having more than one pass: satisfying dependencies.

If the number and complexity of the dependencies were much greater, I would probably process the table once, "box" (as in incapsulate) every opcode (with its operands) in an object, and then sort the objects in a way that satisfies the dependencies.

However, the dependencies we've got seem to be simple enough to get away with just a few passes over the table, so I find the "an object per opcode" solution greatly over-engineered and not needed.

782

@dsanders This situation, btw, is very similar to the one with record instruction opcode: we don't know which register is defined and which is not, and we can not afford loosing the definition. With record instruction we don't know which register might already have LLT and / or regbank "checked", and we can't afford loosing that info.

796

For future patches.

849

I'm renaming selectUnconstrainedRegBanks to assignRegBanksToUnconstrainedVRegs and selectRegBank helper function (former lambda) to assignRegBank for consistency.

884

I'm adding the following comment:

// The major source of literal duplicates is the fact that we map MVTs
// like i32 and f32 to the same s32 LLT, therefore 2 or more patterns
// originally written for SelectionDAG ISel get imported as the exact same
// sequence of semantically meaningful match table opcodes, matching and
// rendering opcodes both:
utils/TableGen/GlobalISelEmitter.cpp
4214

It is wrapped in GET_GLOBALISEL_IMPL along with the InstructionSelector implementation. Due to InstructionSelector::getTestgen method we need the class definition even if we aren't planning to use testgen, but why would we not?

utils/update_instruction_select_testgen_tests.sh
40–43

sed is horrible with multiline patterns, hm... This script isn't required to run tests, only to generate / update them, so maybe we could deal with it a bit later. Maybe with a little help from somebody running windows?

rtereshin marked 33 inline comments as done.Apr 26 2018, 6:49 PM
rtereshin updated this revision to Diff 144261.Apr 26 2018, 7:11 PM

Hi Daniel @dsanders

I've addressed all of the inline comments, I believe.

We'll need to find a good way of handling immediates, C++ predicates, ComplexPattern and similar but that should definitely be left for later patches.

Yes.

Some targets have a rather large number of rules (e.g. X86 is around 17k IIRC). Do we have a mechanism for keeping the number of generated tests to a reasonable level?

Not yet, but it's not a problem yet either as we import only a fraction of the rules as of now.

with few function-level comments

+ @aemerson (Hi Amara)

there's little documentation in the code of how this actually works under the hood. Can you add more descriptions so someone can follow what's happening when it comes to maintenance later, e.g. what's LiveInRA supposed to do, what are the pre-conditions and expected outputs of each phase of this tool?

All of this improved noticeably, I hope, but not everything is commented and I'll probably add on it later.

Thank you!
Roman

  1. Improved in-source comments
  2. Improved usability and scripts, in particular, added find_failing_instruction_select_rules.sh script that will automatically find the selection rules that will fail if executed (most of the time will crash the selector)
  3. Added support for extending loads / truncating stores MatchTable checks
  4. Rebased against master and re-solved namespace / visibility issues introduced by AMDGPU backend (the only backend that put its target-specific derived InstructionSelector's declaration in a header)
  5. Increased Testgen's tolerance towards register banks quirks: AMD GPU is the only backend that has banks "covering" register classes that span cross multiple banks.
  6. Decreased invasiveness of the patch in its surroundings, especially within the global isel emitter.
  7. Made sure that if the tests are updated they are updated in place as much as possible (w/o re-ordering machine functions representing selection rules within the test file) and identified by their much more stable Rule IDs rather than number / position or initial index in the MatchTable. This makes diffs much more manageable for a manual review.
  8. Other small improvements and bug fixes.
rtereshin updated this revision to Diff 147646.May 18 2018, 8:55 PM

I have:

  1. Fixed IR-building code that failed to update def-use chains properly until very end of the test generation, rendering GIM_CheckSameOperands fragile and dependent on which operand was already defined / checked explicitly and which one is not
  2. Started sorting machine instructions inserted in topological order explicitly (rather than implicity relying on GIM_RecordInsn's opcodes), thus a) reducing the size of the diffs in case of changes b) making GIM_CheckSameOperands less fragile (it could have created uses not dominated by defs previously)
  3. Stabilized resulting vreg numbers to reduce the diffs in case of changes
mgrang added inline comments.May 19 2018, 11:26 PM
lib/CodeGen/GlobalISel/InstructionSelectorTestgen.cpp
260

ping

lib/CodeGen/GlobalISel/InstructionSelectorTestgen.cpp
260

Good catch, thanks, will do!

aemerson resigned from this revision.Jan 30 2019, 1:36 PM

Closing as we decided not to pursue this.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2019, 9:11 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
rtereshin abandoned this revision.Feb 26 2019, 9:12 AM