This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add edit_distance_insensitive to StringRef
ClosedPublic

Authored by njames93 on May 22 2022, 1:23 AM.

Details

Summary

In some instances its advantageous to calculate edit distances without worrying about casing.
Currently to achieve this both strings need to be converted to the same case first, then edit distance can be calculated.

Diff Detail

Event Timeline

njames93 created this revision.May 22 2022, 1:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2022, 1:23 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
njames93 requested review of this revision.May 22 2022, 1:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2022, 1:23 AM
xgupta added a subscriber: xgupta.May 22 2022, 3:28 AM

Any chance of a template to avoid duplicating this code between case sensitive and case insensitive - looks like a long enough function with room for bugs/fixes/changes that it'd be worth avoiding duplication?

njames93 updated this revision to Diff 431570.May 23 2022, 8:19 PM

Remove code duplication by adding an extra Map parameter to llvm::ComputeEditDistance.

dblaikie added inline comments.May 24 2022, 1:14 PM
llvm/include/llvm/ADT/edit_distance.h
45–48

Do you need the default type argument here? The default (or explicit, in the other call) value below would allow template argument deduction, right?

49

I'm not sure this + is worthwhile - I'd say either make it a non-template entirely, and hardcode this parameter as a function pointer type (that'd work for the two callers here) or make the functor a template parameter and drop the + here (so that there's no call indirection overhead). (I guess a third option would be llvm::function_ref for functor-level generality without the template, but somewhat more runtime overhead, probably)

I don't have /super/ strong feelings either way.

njames93 added inline comments.May 25 2022, 3:32 AM
llvm/include/llvm/ADT/edit_distance.h
45–48

The default value doesn't seem to deduce the template argument, however explicitly passing an argument would deduce the type correctly.

49

You're right, the + isn't necessary

njames93 updated this revision to Diff 431931.May 25 2022, 3:47 AM

Remove unnecessary '+'

dblaikie added inline comments.May 25 2022, 10:03 AM
llvm/include/llvm/ADT/edit_distance.h
45–48

I think this default templtae argument is unused? (argument deduction kicks in for both uses, doesn't it?) & wrong anyway - the functor type won't be a function pointer type, it'll be the specific type of each lambda, I think?

Could you remove this default template argument?

85–91

is there any concern about the number of times the map operation is used? Looks like the algorithm visits elements more than once, so might be worth some caching?

Like an easy one would probably be to compute Map(FromArray[y-1]) outside the for x loop at least? (but maybe other caching should be done too, I don't know - I guess toLower is cheap enough that it's not worth much more involved caching? - I guess Map(ToArray[x]) could be computed and cached for the next round's references to Map(ToArray[x-1]) for instance?)

njames93 updated this revision to Diff 433682.Jun 2 2022, 1:30 AM

Create a new functino ComputeMappedEditDistance to avoid template argument deduction issues.

njames93 marked 4 inline comments as done.Jun 2 2022, 1:40 AM
njames93 added inline comments.
llvm/include/llvm/ADT/edit_distance.h
85–91

Caching the outer loop value makes sense, but for the inner loop, probably not so much. If its expensive then it'd be better off to just create new arrays with the mapped values, then call ComputeEditDistance with no map functor.

njames93 updated this revision to Diff 433684.Jun 2 2022, 1:48 AM
njames93 marked an inline comment as done.

Cache outer loop map item.

dblaikie accepted this revision.Jun 2 2022, 1:46 PM
dblaikie added inline comments.
llvm/include/llvm/ADT/edit_distance.h
81

I'm not sure this amounts to anything different than using auto?

But if the intent was to allow reference types here - maybe this could rely on reference lifetime extension?

const auto &CurItem = ...

If the map function returns by value, this'll do reference lifetime extension, and if it returns by reference it'll be a reference.

This revision is now accepted and ready to land.Jun 2 2022, 1:46 PM
njames93 added inline comments.Jun 3 2022, 4:40 PM
llvm/include/llvm/ADT/edit_distance.h
81

Reference lifetime extension is exactly what I needed, just one of those cases I always forget about

njames93 updated this revision to Diff 434311.Jun 5 2022, 2:06 AM

Use reference lifetime extension.

alexander-shaposhnikov added inline comments.
llvm/include/llvm/ADT/edit_distance.h
46

as a side note, wouldn't it be a bit clearer / more flexible to pass a comparator instead ?

This revision was landed with ongoing or failed builds.Jun 5 2022, 4:03 AM
This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.Jun 5 2022, 12:21 PM
llvm/include/llvm/ADT/edit_distance.h
46

oh, fair - I'd be open to that, but wouldn't insist