In some instances its advantageous to calculate edit distances without worrying about casing.
Currently to achieve this both strings need to be converted to the same case first, then edit distance can be calculated.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Any chance of a template to avoid duplicating this code between case sensitive and case insensitive - looks like a long enough function with room for bugs/fixes/changes that it'd be worth avoiding duplication?
Remove code duplication by adding an extra Map parameter to llvm::ComputeEditDistance.
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
45 ↗ | (On Diff #431570) | Do you need the default type argument here? The default (or explicit, in the other call) value below would allow template argument deduction, right? |
49 ↗ | (On Diff #431570) | I'm not sure this + is worthwhile - I'd say either make it a non-template entirely, and hardcode this parameter as a function pointer type (that'd work for the two callers here) or make the functor a template parameter and drop the + here (so that there's no call indirection overhead). (I guess a third option would be llvm::function_ref for functor-level generality without the template, but somewhat more runtime overhead, probably) I don't have /super/ strong feelings either way. |
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
45 ↗ | (On Diff #431931) | I think this default templtae argument is unused? (argument deduction kicks in for both uses, doesn't it?) & wrong anyway - the functor type won't be a function pointer type, it'll be the specific type of each lambda, I think? Could you remove this default template argument? |
86–91 ↗ | (On Diff #431931) | is there any concern about the number of times the map operation is used? Looks like the algorithm visits elements more than once, so might be worth some caching? Like an easy one would probably be to compute Map(FromArray[y-1]) outside the for x loop at least? (but maybe other caching should be done too, I don't know - I guess toLower is cheap enough that it's not worth much more involved caching? - I guess Map(ToArray[x]) could be computed and cached for the next round's references to Map(ToArray[x-1]) for instance?) |
Create a new functino ComputeMappedEditDistance to avoid template argument deduction issues.
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
86–91 ↗ | (On Diff #431931) | Caching the outer loop value makes sense, but for the inner loop, probably not so much. If its expensive then it'd be better off to just create new arrays with the mapped values, then call ComputeEditDistance with no map functor. |
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
81 ↗ | (On Diff #433684) | I'm not sure this amounts to anything different than using auto? But if the intent was to allow reference types here - maybe this could rely on reference lifetime extension? const auto &CurItem = ... If the map function returns by value, this'll do reference lifetime extension, and if it returns by reference it'll be a reference. |
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
81 ↗ | (On Diff #433684) | Reference lifetime extension is exactly what I needed, just one of those cases I always forget about |
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
46 ↗ | (On Diff #434311) | as a side note, wouldn't it be a bit clearer / more flexible to pass a comparator instead ? |
llvm/include/llvm/ADT/edit_distance.h | ||
---|---|---|
46 ↗ | (On Diff #434311) | oh, fair - I'd be open to that, but wouldn't insist |