This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add StringRef consume_front_lower and consume_back_lower
ClosedPublic

Authored by mstorsjo on Jun 14 2021, 4:47 AM.

Details

Summary

These serve as a convenient combination of consume_front/back and
startswith_lower/endswith_lower.

Diff Detail

Event Timeline

mstorsjo created this revision.Jun 14 2021, 4:47 AM
mstorsjo requested review of this revision.Jun 14 2021, 4:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 4:47 AM

Can you show a patch which will benefit from this? Otherwise this looks good.

I don't think this is general enough requirement to add a specialized method for it. Did you consider adding a consumes_front/back overload that takes a lambda?

Can you show a patch which will benefit from this? Otherwise this looks good.

I have one existing use with a local helper function in llvm-rc here: https://github.com/llvm/llvm-project/blob/d8c5a4d6b6efad405c71ead8997276d8d3a7c5ad/llvm/tools/llvm-rc/llvm-rc.cpp#L259-L277 And I want to use it similarly in D104212. (llvm-ar does case insensitive switching between different modes of the same binary based on argv[0] this way, https://github.com/llvm/llvm-project/blob/d8c5a4d6b6efad405c71ead8997276d8d3a7c5ad/llvm/tools/llvm-ar/llvm-ar.cpp#L1273-L1291, so therefore I suppose it's good to keep the rest of the parsing of argv[0] similarly case insensitive.)

I don't think this is general enough requirement to add a specialized method for it. Did you consider adding a consumes_front/back overload that takes a lambda?

I didn't, do you mean something like this?

bool consume_front(function_ref<bool(StringRef)> F) {
  if (!F(*this))
    return false;

  *this = drop_front(/*how much?*/);
  return true;
}

Then the info about how much to drop has to be passed separately somehow, maybe by having the lambda return a std::pair<bool, size_t>? I guess that'd work and keep it generic, but it loses all the convenience of the function in such string parsing code - right now it's pretty straightforward to write compact and efficient parsing code with the other StringRef functions.

I was thinking that the closure would take a character, not a stringref. So you could do something like str.consume_back(islower).

I was thinking that the closure would take a character, not a stringref. So you could do something like str.consume_back(islower).

It's not about whether characters are lowercase or not, but whether the string ends with the given pattern, ignoring case. The _lower suffix is maybe a bit of a misnomer here, as it's not about lowercase itself but about case insensitivity, but it follows the existing pattern of endswith_lower() and startswith_lower().

Oh, I completely missed that. I would recommend using the word case_insensitive here. I don't think stringref has any other case insensitive support, and I don't think this is the obvious place to start. I would start with case insensitive equality comparison. Your client can use that as a primitive that it builds higher level functionality on top of.

-Chris

mstorsjo added a comment.EditedJun 15 2021, 10:13 PM

Oh, I completely missed that. I would recommend using the word case_insensitive here. I don't think stringref has any other case insensitive support, and I don't think this is the obvious place to start. I would start with case insensitive equality comparison. Your client can use that as a primitive that it builds higher level functionality on top of.

StringRef does already have a bunch of case insensitive comparisons, and those are already used as primitives that this new method builds on. If it didn't, then clearly this wouldn't be the place to start, but it already has almost complete coverage of case insensitive versions of methods. E.g. equals_lower, compare_lower, startswith_lower, endswith_lower, find_lower, rfind_lower, contains_lower. This patch mostly just fills in a gap in the coverage of the existing case insensitive methods. So therefore I'd prefer to add this method to the same naming scheme, even if it isn't entirely obvious (this isn't the place to start deviating from the established pattern).

As consume_back is a convenience wrapper over endswith and drop_back, I'm suggesting to add consume_back_lower which builds on endswith_lower (and drop_back).

mstorsjo updated this revision to Diff 352665.Jun 17 2021, 3:17 AM

Squashed in two changes to two tools that can get rid of their own private helper function if this is hoisted to StringRef.

@lattner Do you want to follow up on this one? StringRef already has a bunch of case insensitive operations, and this patch just fills a gap in the set of those operations.

lattner accepted this revision.Jun 21 2021, 9:53 PM

Aha, I see this is consistent with the existing naming convention for this, so I'm ok with it.

That said, xxx_lower() is a really terrible name here for insensitive comparisons. We have StringRef::lower() which returns a lower case string - it isn't insensitive. It seems that the whole family needs a rename. Could you take a crack at this as a follow-up? It should be a mechanical change.

This revision is now accepted and ready to land.Jun 21 2021, 9:53 PM

Aha, I see this is consistent with the existing naming convention for this, so I'm ok with it.

Thanks!

That said, xxx_lower() is a really terrible name here for insensitive comparisons. We have StringRef::lower() which returns a lower case string - it isn't insensitive. It seems that the whole family needs a rename. Could you take a crack at this as a follow-up? It should be a mechanical change.

Sure, I can give that a shot. Any suggestion on what the name pattern should be? xxx_caseinsensitive() is a bit of a mouthful, xxx_i() is terse but less understandable and discoverable.

I see that the discussion thread from last year regarding C++ API (lack of) stability didn't end up into something that was committed (https://lists.llvm.org/pipermail/llvm-dev/2020-July/143613.html); changing ADT APIs can probably cause a fair bit of churn for e.g. out of tree users. Then again, I guess these particular methods are a bit less frequently used than the rest of ADT, so it's maybe not a biggie? Or should we be kind and do it as a two-stage change, adding new methods with new names and flagging the old ones as deprecated, and removing the deprecated ones in a week or two?

foad added a subscriber: foad.Jun 22 2021, 1:51 AM

Or should we be kind and do it as a two-stage change, adding new methods with new names and flagging the old ones as deprecated, and removing the deprecated ones in a week or two?

Drive-by comment from a downstream user: I would appreciate you doing it as a two-stage change but there is no need to wait a week or two in between. All I want is a point in git history that I can sync to, that has both the old and new APIs available.

Or should we be kind and do it as a two-stage change, adding new methods with new names and flagging the old ones as deprecated, and removing the deprecated ones in a week or two?

Drive-by comment from a downstream user: I would appreciate you doing it as a two-stage change but there is no need to wait a week or two in between. All I want is a point in git history that I can sync to, that has both the old and new APIs available.

Thanks for the input! Btw, if you happen to be able to check easily in the diff between your downstream fork vs upstream, do you happen to have a lot use of these _lower() method calls, outside of code in upstream? (A quick grep in llvm-project shows about only half a dozen per target in llvm/lib/Targets, but a fair bit more in lldb/lld/clang.)

foad added a comment.Jun 22 2021, 2:48 AM

Btw, if you happen to be able to check easily in the diff between your downstream fork vs upstream, do you happen to have a lot use of these _lower() method calls, outside of code in upstream?

No, none actually! This was more of a general comment about how to do incompatible changes in a downstream-friendly way.

This revision was landed with ongoing or failed builds.Jun 22 2021, 2:57 AM
This revision was automatically updated to reflect the committed changes.

That said, xxx_lower() is a really terrible name here for insensitive comparisons. We have StringRef::lower() which returns a lower case string - it isn't insensitive. It seems that the whole family needs a rename. Could you take a crack at this as a follow-up? It should be a mechanical change.

Sure, I can give that a shot. Any suggestion on what the name pattern should be? xxx_caseinsensitive() is a bit of a mouthful, xxx_i() is terse but less understandable and discoverable.

A few ideas to consider:

  • X_casei()
  • X_nocase()
  • X_ignorecase()
  • X_foldcase() (term from text search)

(Looks like X_lower() has been with us for over a decade (68e4945c03eeeb04773587f3b265b0888bb22549 / r87020), but I think equal_lower and compare_lower were less ambiguous than the newer ones.)

Btw, if you happen to be able to check easily in the diff between your downstream fork vs upstream, do you happen to have a lot use of these _lower() method calls, outside of code in upstream?

No, none actually! This was more of a general comment about how to do incompatible changes in a downstream-friendly way.

Right, no real need for a buffer in time. Once the change is split in two, it's easy for downstreams to temporarily revert the second commit while they update out-of-tree code at their own pace.

I'd recommend you use "insensitive" instead of "lower", e.g. consume_front_insensitive.

That said, xxx_lower() is a really terrible name here for insensitive comparisons. We have StringRef::lower() which returns a lower case string - it isn't insensitive. It seems that the whole family needs a rename. Could you take a crack at this as a follow-up? It should be a mechanical change.

Sure, I can give that a shot. Any suggestion on what the name pattern should be? xxx_caseinsensitive() is a bit of a mouthful, xxx_i() is terse but less understandable and discoverable.

A few ideas to consider:

  • X_casei()
  • X_nocase()
  • X_ignorecase()
  • X_foldcase() (term from text search)

Thanks for the suggestions! What do you think of @lattner ‘s suggestion (X_insensitive)?

That said, xxx_lower() is a really terrible name here for insensitive comparisons. We have StringRef::lower() which returns a lower case string - it isn't insensitive. It seems that the whole family needs a rename. Could you take a crack at this as a follow-up? It should be a mechanical change.

Sure, I can give that a shot. Any suggestion on what the name pattern should be? xxx_caseinsensitive() is a bit of a mouthful, xxx_i() is terse but less understandable and discoverable.

A few ideas to consider:

  • X_casei()
  • X_nocase()
  • X_ignorecase()
  • X_foldcase() (term from text search)

Thanks for the suggestions! What do you think of @lattner ‘s suggestion (X_insensitive)?

SGTM!