This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Improve algorithms for processing template arguments; add type checking
ClosedPublic

Authored by Paul-C-Anagnostopoulos on Feb 10 2021, 7:04 AM.

Details

Summary

The initial goal of this revision was to improve the algorithms for resolving the template arguments of classes and multiclasses. However, along the way I discovered that the type checking of template arguments was inconsistent.

  • Template argument values passed to inherited classes were type-checked and cast if necessary.
  • Template argument values to anonymous class instantiations were type-checked but not cast.
  • Template argument values to multiclasses were not type-checked at all.

Adding type checking for multiclasses revealed many invalid template argument values. These have been corrected in the various target TableGen files. See https://reviews.llvm.org/D95874.

So this revision improves the algorithms and type-checks/casts all template arguments.

While this revision is being reviewed, I will add a large test for all the template argument checking.

Diff Detail

Event Timeline

Paul-C-Anagnostopoulos requested review of this revision.Feb 10 2021, 7:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 7:04 AM
lattner accepted this revision.Feb 10 2021, 10:24 AM

I don't have technical competence to review the logic here (and, likely no one other than you do) but the code looks reasonable structurally.

Unrelated to the patch, you have looked into the vg_leak issues with tblgen tests?

This revision is now accepted and ready to land.Feb 10 2021, 10:24 AM

@lattner Can you give me a link to a bug report or other discussion?

I also have to look into this bug: https://bugs.llvm.org/show_bug.cgi?id=48254. I don't feel particularly competent to investigate either one, but I will spend some time on them.

I just noticed this in the testcase: "// XFAIL: vg_leak". Valgrind checks for memory leaks, so I assume this is required because tblgen itself is leaking memory internally.

-Chris

Ah, I wondered what that was. 86 tests specify that XFAIL. When I can, I will spend the time to figure out how to track these down, though I remember something about the leak tracker-downers not working with Visual Studio.

If I recall it is more basic than that - lots of tblgen data structures don't have destructors or something. It has been forever since I've looked at this stuff so I don't recall. It might not be worth fixing

I cleaned up the tidies. I also made two simple changes to the TableGen Programmer's Reference.

I added a test for template argument checking and casting.

I had to change the invalid template argument type message to fatal. Otherwise all hell can break loose downstream.

@foad Something interesting is going on with the AMDGPU TableGen files. When I print all the definitions under main, there are 18,195 anonymous records. When I print them under these changes, there are 18,152 anonymous records. This suggests that fewer casts of records from one class to another are done under these changes. I'm not sure how that is possible. I will investigate further.

foad added a comment.Feb 17 2021, 9:03 AM

@foad Something interesting is going on with the AMDGPU TableGen files. When I print all the definitions under main, there are 18,195 anonymous records. When I print them under these changes, there are 18,152 anonymous records. This suggests that fewer casts of records from one class to another are done under these changes. I'm not sure how that is possible. I will investigate further.

Interesting. Do the generated files lib/Target/AMDGPU/*.inc come out identical?

No, because the .inc files contain thousands of lines such as those below. I think it's rather strange that something like an enum has an enumerator with "anonymous_12703" in its name, but there might be good reasons for it.

enum {
  GIPFP_MI_Predicate_AMDGPUmul_i24_oneuse = GIPFP_MI_Invalid + 1,
  . . .
  GIPFP_MI_Predicate_and_oneuse,
  GIPFP_MI_Predicate_anonymous_12703,
  GIPFP_MI_Predicate_anonymous_12704,
  GIPFP_MI_Predicate_anonymous_12706,

I don't have any good ideas about how to approach this problem. However, while looking around, I'm finding things that confuse me even more. In AMDGPUSearchableTables.td:

class RsrcIntrinsic<AMDGPURsrcIntrinsic intr> {
  Intrinsic Intr = !cast<Intrinsic>(intr);
  bits<8> RsrcArg = intr.RsrcArg;
  bit IsImage = intr.IsImage;
}
. . .
foreach intr = !listconcat(AMDGPUBufferIntrinsics,
                         AMDGPUImageDimIntrinsics,
                         AMDGPUImageDimAtomicIntrinsics) in {
def : RsrcIntrinsic<!cast<AMDGPURsrcIntrinsic>(intr)>;

}

Notice that the class casts the 'intr' argument to type Intrinsic. But the lists that are concatenated in the foreach include AMDGPUBufferInstrinsics, which is a list of records of type AMDGPURsrcIntrinsic. That class is not a subclass of Intrinsic. So why aren't a pile of errors produced when trying to cast those record to type Intrinsic?

Ah, but now I see that one of the records in the AMDGPUBufferInstrinsics list is int_amdgcn_buffer_load_format, which is actually of type AMDGPUBufferLoad, which is a subclass of Instrinsic. Another record in the lits is int_amdgcn_s_buffer_load, which is of type Intrinsic. So probably all the record in the list are of type Intrinsic, which means there is no problem casting them.

This points out a problem with defset and/or !listconcat. I think it's !listconcat. But it probably has nothing to do with the issue at hand.

Okay, so the only trick I know to check the difference in anonymous record count is to take a copy of AMDGPUGenGlobalISel.inc generated under main and a copy generated under these changes and global replace all occurrences of 'anonymous_nnnnn' with 'xxx'. That normalizes them with respect to anonymous record names. Then I diff the two files. They are identical.

So I am going to somewhat rashly assume that the reduction in anonymous records is not a problem. I will push this revision on Thursday and stand by to see what happens. Meanwhile I will think about why this might happen.

foad added a comment.EditedFeb 18 2021, 6:24 AM

I'm seeing:

$ ninja -C ~/llvm-release/ check-llvm-codegen-amdgpu
ninja: Entering directory `/home/jayfoad2/llvm-release/'
[125/1115] Building RISCVGenAsmMatcher.inc...
FAILED: lib/Target/RISCV/RISCVGenAsmMatcher.inc 
cd /home/jayfoad2/llvm-release && /home/jayfoad2/llvm-release/bin/llvm-tblgen -gen-asm-matcher -I /home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV -I/home/jayfoad2/llvm-release/include -I/home/jayfoad2/git/llvm-project/llvm/include -I /home/jayfoad2/git/llvm-project/llvm/lib/Target /home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV/RISCV.td --write-if-changed -o lib/Target/RISCV/RISCVGenAsmMatcher.inc -d lib/Target/RISCV/RISCVGenAsmMatcher.inc.d
Included from /home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV/RISCV.td:234:
Included from /home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfo.td:1262:
Included from /home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfoV.td:1166:
Included from /home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td:4439:
/home/jayfoad2/git/llvm-project/llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td:388:13: error: Value specified for template argument 'VPatUSLoadStoreWholeVRSDNode::type' (#0) is of type ValueType; expected type LLVMType: vti.Vector
  defm "" : VPatUSLoadStoreWholeVRSDNode<vti.Vector, vti.SEW, vti.LMul,
            ^
[150/1042] Building AMDGPUGenInstrInfo.inc...
ninja: build stopped: subcommand failed.

Could this be caused by your patch? Or maybe it's D96937?

Yes, that's due to this patch. As is this: http://lab.llvm.org:8011/#/builders/68/builds/7221/steps/4/logs/stdio

I've reverted the patch. I'll start with the RISCV problem and then ponder the Clang problem.

Paul-C-Anagnostopoulos reopened this revision.EditedFeb 18 2021, 7:12 AM

I am reopening this revision to pursue problems found when pushing the revision.

Please see the latest post here: https://reviews.llvm.org/D95874

This revision is now accepted and ready to land.Feb 18 2021, 7:12 AM

@t.p.northover

I think the Clang problem might be in this code in arm_mve.td. I've flagged 'top', which is a bit. However, EmitterBase::getCodeForDagArg (in MveEmiiter.cpp) does not handle a bit as argument #2 of this DAG. At least, that's what I gather. I have improved the error message that it produces, which is shown below the code. What I don't understand is why this revision suddenly produces an error where none was produced before. I'll continue to investigate.

multiclass vmovl<bit top> {
  let params = [s8, u8, s16, u16] in {
    def "": Intrinsic<DblVector, (args Vector:$a),
      (extend (unzip $a, top), DblVector, (unsignedflag Scalar))>;
    defm "": IntrinsicMX<DblVector, (args Vector:$a, DblPredicate:$pred),
        (IRInt<"vmovl_predicated", [DblVector, Vector, DblPredicate]>
         $a, (unsignedflag Scalar), **top**, $pred, $inactive)>;
  }
}

error: bad DAG argument type for code generation
note: DAG: (anonymous_336 ?:$a, (unsignedflag Scalar), 0, ?:$pred, ?:$inactive)
note: argument type: bit
note: argument number 2: 0
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.

Ah, it's simple. 'top' is a template argument to the vmovl multiclass, with type bit. That multiclass is invoked:

defm vmovlbq: vmovl<0>;
defm vmovltq: vmovl<1>;

Before this revision, the code didn't check that 0 and 1 were integers, so they passed through to the DAG as integers, which is what EmitterBase::getCodeForDagArg wants. Now they are checked and cast to bits, which getCodeForDagArg does not handle.

Note that multiclass vshll_imm also has a top argument, but it is declared int.

Since I haven't heard a peep from anyone wanting to fix arm_mve.td, I'll try to fix it myself.

(I've been incommunicado for a week while moving to my new machine.)

Okay, I've fixed the arm_mve.td template argument problem with a change to MveEmitter.cpp. I will update this revision, let it cook for a couple of days, and then try pushing it again.

This update fixes the arm_mve.td problem with a change to MveEmitter.cpp.

@Paul-C-Anagnostopoulos Whats the status of this now?

I will push this again on Thursday. I expect it to fail with some new and interesting issue, so I need to be around. I have been in and out too much recently.

Sorry, but I have to wait until Friday to push this again.

Okay, I will push this revision within the next half hour.