This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add the !substr() bang operator
ClosedPublic

Authored by Paul-C-Anagnostopoulos on Dec 16 2020, 1:37 PM.

Details

Summary

This revision adds the !substr bang operator, which, as you might expect, extracts a substring of a string.

I update the documentation and added a test.

I'm not concerned about confusion between !subst and !substr because I plan on replacing !subst with !replace in the near future.

Diff Detail

Event Timeline

Paul-C-Anagnostopoulos requested review of this revision.Dec 16 2020, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2020, 1:37 PM

Clean up the relevant clang tidies.

jansvoboda11 accepted this revision.Dec 18 2020, 7:25 AM

Thanks for working on this! This will LGTM with the nits fixed.

llvm/docs/TableGen/ProgRef.rst
1728

This should be indented with 4 spaces as the rest of the operators.

llvm/include/llvm/TableGen/Record.h
832

Move unrelated/stylistic changes to a separate patch.

llvm/lib/TableGen/Record.cpp
1388

Move unrelated/stylistic changes to a separate patch.

llvm/lib/TableGen/TGParser.cpp
1505

Move unrelated/stylistic changes to a separate patch.

1753

Move unrelated/stylistic changes to a separate patch.

This revision is now accepted and ready to land.Dec 18 2020, 7:25 AM

I'm afraid I may have gotten your hopes up. It turns out that it's quite difficult to add an assert statement given the structure of the TableGen parser. A couple of the issues:

  1. It's not clear at all when the assert expression should be evaluated, particularly when it appears in a def or class.
  1. There is no straightforward mechanism for carrying an assertion along with a multiclass definition, for use when the multiclass is instantiated.

I haven't given up, but it will take some time.

Paul-C-Anagnostopoulos marked 5 inline comments as done.Dec 18 2020, 8:45 AM
Paul-C-Anagnostopoulos added inline comments.
llvm/include/llvm/TableGen/Record.h
832

I wouldn't say this is unrelated, since it is the enum that contains SUBSTR. The previous sort order was apparently arbitrary.

llvm/lib/TableGen/Record.cpp
1388

Likewise, the previous sort order was apparently arbitrary.

Paul-C-Anagnostopoulos marked 2 inline comments as done.

Incorporated @jansvoboda11 comments.

llvm/docs/TableGen/ProgRef.rst
1728

I will rewrap the lines before I push this revision.

jansvoboda11 added inline comments.Dec 21 2020, 4:43 AM
llvm/include/llvm/TableGen/Record.h
832

I understand.

However, it's a good practice to put formatting changes (and changes that introduce no functionality change) into separate "NFC" commits.
This helps to keep the interesting patches short and easy to understand.

In this case, you can create new NFC patch that reorders the enum cases and rebase this patch on top of the new one.

Yeah, there's no easy way to implement it right now.

I think looking into how loops and if statements are handle might be useful. Their ASTs are stored in the Loops member and are interpreted later on during multiclass instantiation.

Generalizing that infrastructure might be a good start. If we could store different kinds of ASTs there and handle them in specific ways during the instantiation, we might be able to inject the assertion handling there.

I'm afraid I may have gotten your hopes up. It turns out that it's quite difficult to add an assert statement given the structure of the TableGen parser. A couple of the issues:

  1. It's not clear at all when the assert expression should be evaluated, particularly when it appears in a def or class.
  1. There is no straightforward mechanism for carrying an assertion along with a multiclass definition, for use when the multiclass is instantiated.

I haven't given up, but it will take some time.

llvm/include/llvm/TableGen/Record.h
832

But then if I revert to the previous order, where should I insert SUBSTR? New ones have not necessarily been inserted at the end of the list. Anyway, I'll revert to the original order.

Yes, that is what needs to be done: The assertions have to be stored somehow and checked later.

The same goes for assertions in classes and records. Just as with field values, they have to be evaluated after the record is built. But that may feel wrong when writing something like this:

int foo = 13;
assert ...foo...
let foo = 42;

This revision was automatically updated to reflect the committed changes.

Works like a charm when I run the TableGen tests on my machine. Hmm.

I reverted this revision.

Now I expect the directive2.td test to fail, because it does on my machine.

RKSimon added inline comments.
llvm/lib/TableGen/TGParser.cpp
1698

@Paul-C-Anagnostopoulos This might be the cause of the build failures - SIZE_MAX (unsigned ~0ULL) being seen as a -ve value when cast to int64_t

I'm sorry, what is a -ve value?

What is the correct way to specify the maximum size value?

I'm sorry, what is a -ve value?

Sorry - "-ve" is just shorthand for "negative".

What is the correct way to specify the maximum size value?

RHS = IntInit::get(std::numeric_limits<int64_t>::max())

is probably better - but I haven't tested it

Well now, that's new to me. Thanks!

The value of std::numeric_limits<int64_t>::max() is 0x7FFFFFFFFFFFFFFF.

What I wrote before seems like the perfect opportunity for a compiler warning.

Paul-C-Anagnostopoulos reopened this revision.EditedDec 21 2020, 9:55 AM

What is the correct way to copy the first !substr commit and then amend it with the fix for this build problem?

Never mind, I falsely assumed that -amend does not make a new commit, but of course it does.

This revision is now accepted and ready to land.Dec 21 2020, 9:55 AM

Change SIZE_MAX to std::numeric_limits<int64_t>::max() to fix a build failure.

Do you have a linux (or wsl?) machine that you can test this?

I'm afraid not. However, a search of the code base reveals that std::numeric_limits<int64_t>::max() is used all over the place, so I think we're okay.

RKSimon accepted this revision.Dec 22 2020, 8:11 AM

LGTM (with one minor style fix) - but please watch the buildbots after the commit

llvm/lib/TableGen/TGParser.cpp
1501

(style) remove unnecessary braces

Paul-C-Anagnostopoulos marked an inline comment as done.Dec 22 2020, 10:10 AM
This revision was automatically updated to reflect the committed changes.