This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add new getAllDerivedDefinitionsTwo function to RecordKeeper
ClosedPublic

Authored by Paul-C-Anagnostopoulos on Oct 5 2020, 6:30 AM.

Details

Summary

I added the new RecordKeeper::getAllDerivedDefinitionsTwo function, as documented in the "TableGen Backend Developer's Guide." This is used in the pseudo lowering backend.

I noticed that the RISC V Compress Inst emitter didn't use getAllDerivedDefinitions(), so I modified it to do so.

Diff Detail

Event Timeline

Paul-C-Anagnostopoulos requested review of this revision.Oct 5 2020, 6:30 AM
jrtc27 added a comment.Oct 5 2020, 6:41 AM

Why not make it general and take an array?

jrtc27 added a comment.Oct 5 2020, 6:42 AM

Also please update your review with a full context diff.

madhur13490 added inline comments.Oct 5 2020, 6:47 AM
llvm/include/llvm/TableGen/Record.h
1793

Can we please name it as getAllCommonDerivedDefinitions? I feel this more scalable, readable and intuitive. Suffixing "Two" is not scalable. If we have the word "common" in the name then we can overload this function for more than two parents in future to obtain common records. Also this function can take array and be more general.

madhur13490 added inline comments.Oct 5 2020, 6:49 AM
llvm/include/llvm/TableGen/Record.h
1793

In fact, if you want to generalize it then you should be able to use std::set because you're doing intersection here, anyway.

Is there a way to get full context selectively? I suspect you want full context only on the last two files.

I thought about naming it in a more scalable way. I also thought about simply overloading the existing getAllDerivedDefinitions. I decided not to because there is exactly one backend that looks for definitions with two specific parents. I really can't conceive that anyone would ever want to do three, though, of course, I could be wrong.

If people think it's important to allow an arbitrary number of parent classes, I think I will overload getAllDerivedDefinitions with one that takes an array.

jrtc27 added a comment.EditedOct 5 2020, 7:36 AM

Is there a way to get full context selectively? I suspect you want full context only on the last two files.

There's no harm in including the full context for everything given it's hidden by default. But no, if you wanted to do that you'd have to run git show -U10000000 <commit> <file> [file ..] (directories also work) and replace just the bit of the diff that corresponds to those files.

I thought about naming it in a more scalable way. I also thought about simply overloading the existing getAllDerivedDefinitions. I decided not to because there is exactly one backend that looks for definitions with two specific parents. I really can't conceive that anyone would ever want to do three, though, of course, I could be wrong.

If people think it's important to allow an arbitrary number of parent classes, I think I will overload getAllDerivedDefinitions with one that takes an array.

Well, (a) we can't predict the future; (b) if it's possible then maybe people will find ways to use it and make their code cleaner; (c) it allows you to implement the existing single-class function in terms of the general one, rather than having two copies of almost the same code.

Okay, I'm sold. I will overload the existing function with one that takes a set.

I didn't realize that the full context is hidden by default, so when I submit the revised patch I will include full context.

jrtc27 added a comment.Oct 5 2020, 7:42 AM

Okay, I'm sold. I will overload the existing function with one that takes a set.

I didn't realize that the full context is hidden by default, so when I submit the revised patch I will include full context.

It's not if you use arc, but if you're manually uploading diffs yourself then it only knows about the context you provide.

I replaced getAllDerivedDefinitionsTwo with an overload of getAllDerivedDefinitions and reimplemented the original function in terms of the new one.

Use an ArrayRef for the argument and put the one-class overload in a header so it can be inlined?

I will do both of those things.

I'm new to ArrayRef. Can you give me the reasons for using it in this situation?

@jrtc27

It turns out that ArrayRef has a constructor that will convert std::string to ArrayRef<std::string>, making a 1-element array. Therefore, these two overloads are ambiguous when passing a string:

std::vector<Record *>
getAllDerivedDefinitions(const ArrayRef<std::string> Classes) const;

--versus--

std::vector<Record *>
getAllDerivedDefinitions(StringRef ClassName) const;  // The original definition.

I can't think of a solution except to leave the first one as std::vector<std::string> or change it to SmallVectorImpl<std::string>.

jrtc27 added a comment.Oct 7 2020, 8:16 AM

@jrtc27

It turns out that ArrayRef has a constructor that will convert std::string to ArrayRef<std::string>, making a 1-element array. Therefore, these two overloads are ambiguous when passing a string:

std::vector<Record *>
getAllDerivedDefinitions(const ArrayRef<std::string> Classes) const;

--versus--

std::vector<Record *>
getAllDerivedDefinitions(StringRef ClassName) const;  // The original definition.

I can't think of a solution except to leave the first one as std::vector<std::string> or change it to SmallVectorImpl<std::string>.

Then you can just drop the single-string one and it'll automatically construct the ArrayRef for you and call the general one?

I tried that, but there is no conversion from a string literal to an ArrayRef<std::string>. Most calls to getAllDerivedDefinitions() pass a literal.

jrtc27 added a comment.Oct 7 2020, 8:50 AM

I tried that, but there is no conversion from a string literal to an ArrayRef<std::string>. Most calls to getAllDerivedDefinitions() pass a literal.

Does it work with ArrayRef<StringRef>?

No, there is no ArrayRef constructor that accepts a StringRef. You get the expected error "cannot convert argument 1 ..."

I changed the new definition to accept a SmallVector rather than a std::vector, which at least gets rid of the malloc.

No, there is no ArrayRef constructor that accepts a StringRef. You get the expected error "cannot convert argument 1 ..."

I changed the new definition to accept a SmallVector rather than a std::vector, which at least gets rid of the malloc.

Is the conversion error because a string literal isn't a StringRef so you have to go through two conversions? string literal -> StringRef -> ArrayRef<StringRef>
Could you change all callers that only pass one string to do getAllDerivedDefinitions({"string"}); That seems to be what's done in Attributes::emitTargetIndependentNames for example.

The problem is that there is no conversion from string literal to StringRef. But I really hesitate to modify every TableGen backend in the system. A backward incompatible change seems dangerous.

Now that I've changed to a SmallVector, I don't think we have to worry about the overhead too much. There are only a few calls to getAllDerivedDefinitions in each backend execution.

Edited to add: Another reason I hesitate to change the way string literals are passed is that there are dozens of other data structure functions that take class, record, and field names as string literals. Consistency is helpful.

Paul-C-Anagnostopoulos set the repository for this revision to rG LLVM Github Monorepo.

I changed the new getAllDerivedDefinitions() to use a SmallVector rather than a std::vector. Once people have had a look, then I will move the old definition into record.h so it can be inlined.

jrtc27 added inline comments.Oct 7 2020, 3:16 PM
llvm/lib/TableGen/Record.cpp
2322–2325

If you're not touching this function leave the formatting alone.

2477

This really should be ArrayRef, it should not be a specific array-like implementation, and it should have StringRef's in it not require std::string.

llvm/utils/TableGen/RISCVCompressInstEmitter.cpp
887 ↗(On Diff #296810)

Please split this out into a separate commit.

So why don't I make the new function take ArrayRef<StringRef> and simply restore the original function to its original form. Then there is no problem with the original function accepting a string literal.

Is that a good solution? Otherwise I really don't know what to do.

jrtc27 added a comment.Oct 7 2020, 3:40 PM

So why don't I make the new function take ArrayRef<StringRef> and simply restore the original function to its original form. Then there is no problem with the original function accepting a string literal.

Is that a good solution? Otherwise I really don't know what to do.

That's what I was originally suggesting and then you started saying you had issues with ambiguity when doing a different variant of that, no?

jrtc27 added a comment.Oct 7 2020, 3:41 PM

But yes, this whole time I've wanted a StringRef function and an ArrayRef<StringRef> function, with the former wrapping the latter.

It was suggested that the old function be implemented in terms of the new one.

Now I'm confused about my own findings. I'll do some more work on it.

Okay, I think I got it. Thanks for your patience.

The new function takes an ArrayRef<StringRef>. The old function takes a StringRef and simply invokes the new function with makeArrayRef(param).

I had to get used to StringRef and ArrayRef. New to me.

jrtc27 added a comment.Oct 7 2020, 4:15 PM

Okay, I think I got it. Thanks for your patience.

The new function takes an ArrayRef<StringRef>. The old function takes a StringRef and simply invokes the new function with makeArrayRef(param).

I had to get used to StringRef and ArrayRef. New to me.

Yes, though you shouldn't even need makeArrayRef, {param} should just work.

Now the two getAllDerivedDefinitions() functions use StringRef and ArrayRef.

I cleaned up the lints and removed RISCVCompressInstEmitter.cpp from this patch.

I could not use {param} to call the ArrayRef<StringRef> overload; it resulted in an "all paths are infinitely recursive" error.

jrtc27 added a comment.Oct 8 2020, 9:11 AM

Now the two getAllDerivedDefinitions() functions use StringRef and ArrayRef.

I cleaned up the lints and removed RISCVCompressInstEmitter.cpp from this patch.

I could not use {param} to call the ArrayRef<StringRef> overload; it resulted in an "all paths are infinitely recursive" error.

I guess the single-element initialiser list competes with one of StringRef's own constructors then.

jrtc27 added inline comments.Oct 8 2020, 9:20 AM
llvm/docs/TableGen/BackGuide.rst
565

classnames

llvm/lib/TableGen/Record.cpp
2322

Don't reformat this.

2472–2473

Don't duplicate what's in the header, it'll only get stale

2480–2481

This comment seems a bit pointless

2486

Why change the error message?

2490–2491

Also pointless IMO, the code is pretty simple to understand and comments should be for insight into anything of note rather than just repeating what you can easily read for yourself

llvm/utils/TableGen/PseudoLoweringEmitter.cpp
296–298

StringRef[] should work without needing to mess with classes

llvm/lib/TableGen/Record.cpp
2480–2481

I'm a fan of commenting on the major steps of a function. I was certainly appreciative of them while trying to understand code over the past month.

2486

I got rid of ERROR: because no other TableGen message includes it. I reworded the message because "Couldn't find ..." strikes me as too informal. I replaced the bang with a newline because that is what most TableGen messages have.

llvm/utils/TableGen/PseudoLoweringEmitter.cpp
296–298

Sorry, could you clarify?

jrtc27 added inline comments.Oct 8 2020, 9:37 AM
llvm/lib/TableGen/Record.cpp
2480–2481

It's a loop to call getClass and add the result to a vector. If we commented everything like that in LLVM it would be a complete nightmare.

2486

There are a few others like that in llvm/utils/TableGen (and there was already a newline)

llvm/utils/TableGen/PseudoLoweringEmitter.cpp
296–298
StringRef Classes[] = ...
Paul-C-Anagnostopoulos marked 3 inline comments as done.Oct 8 2020, 9:43 AM
Paul-C-Anagnostopoulos added inline comments.
llvm/lib/TableGen/Record.cpp
2480–2481

Okay, I'll remove them. The code with more comments has been easier for me to understand.

2486

Do you want me to restore the original comment?

jrtc27 added inline comments.Oct 8 2020, 9:48 AM
llvm/lib/TableGen/Record.cpp
2486

I prefer how the old message reads (though the ERROR: does seem not so useful) but not overly fussed.

2493–2500

This can avoid some boilerplate by using llvm::any_of

Paul-C-Anagnostopoulos marked 7 inline comments as done.Oct 8 2020, 9:52 AM
Paul-C-Anagnostopoulos added inline comments.
llvm/lib/TableGen/Record.cpp
2493–2500

Ah, something new to learn. Okay, I'll check it out.

llvm/lib/TableGen/Record.cpp
2493–2500

I'm sorry, please help me out. I coded:

for (const auto &OneDef : getDefs()) {

  if (!any_of(ClassRecs, [OneDef](Record *const Class) {
                           return !OneDef.second->isSubClassOf(Class);
                         }))
Defs.push_back(OneDef.second.get());

I get a sequence of messages I cannot figure out. Sorry, I can't copy and paste them. The first one is "... attempting to reference a deleted function." It looks like it's about the std::pair class, which would be the OneDef.

llvm/lib/TableGen/Record.cpp
2493–2500

A screen capture.

llvm/lib/TableGen/Record.cpp
2493–2500

Got it. I didn't realize that OneDef would be captured by copying.

Reworked getAllDerivedDefinitions() to use all_of(). Nice!

If people are satisfied with this patch now, may I get a "looks good"?

This revision is now accepted and ready to land.Oct 12 2020, 10:43 AM
jrtc27 requested changes to this revision.Oct 12 2020, 10:46 AM
jrtc27 added inline comments.
llvm/docs/TableGen/BackGuide.rst
55–62

What happened here? Why has this gone?

565

Not relevant to this diff, please just commit this separately as a trivial grammar fix.

704

What's this doing here? You're not touching error reporting.

This revision now requires changes to proceed.Oct 12 2020, 10:46 AM
llvm/docs/TableGen/BackGuide.rst
55–62

The RecordMap and GlobalMap types are not public. I was mistaken when I first wrote the document.

565

I'm correcting errors in this new document as I find them. Is it really necessary to patch each one separately?

704

I removed this because all 20 combinations will probably never be added.

jrtc27 added inline comments.Oct 12 2020, 10:54 AM
llvm/docs/TableGen/BackGuide.rst
565

No, but I think "correcting a bunch of errors" should be a separate commit (it can be a single commit though) from "adding this new function and documenting it".

Paul-C-Anagnostopoulos marked an inline comment as done.Oct 12 2020, 11:02 AM
Paul-C-Anagnostopoulos added inline comments.
llvm/docs/TableGen/BackGuide.rst
565

Okay, I will remove these two miscellaneous corrections.

lattner accepted this revision.Oct 12 2020, 11:28 AM

This LGTM

Paul-C-Anagnostopoulos updated this revision to Diff 297650.EditedOct 12 2020, 11:35 AM

Eliminated all the changes to the developer's guide that weren't pertinent to this patch.

I will submit them as a separate patch right away.

This patch has been pushed.

Please use arc patch next time to get the Differential Revision: header so the commit and the revision are linked in both directions, which will also automatically close the revision.

(and please also close the revision)

The option no longer appears in the Add Action dropdown. Weird. It was closed after the LGTM but then re-opened.

It was never closed, only accepted.

Oops, yes, you are correct.

The only two Revision Actions available now are Plan Changes and Abandon Revision. Is there something I'm missing?

jrtc27 accepted this revision.Oct 13 2020, 1:40 PM

Try now; depending on instance configuration it sometimes only lets you close accepted revisions (based on the assumption that the commit shouldn't have landed unless accepted).

This revision is now accepted and ready to land.Oct 13 2020, 1:40 PM

Much better. Thank you!