Motivation:
#include <string_view> size_t findFirst_ABCDEF(std::string_view sv) { return sv.find_first_of("ABCDEF"); }
memchr("ABCDEF", C, 6) != NULL -> (C == 'A' || C == 'B' || C == 'C' || C == 'D' || C == 'E' || C == 'F') != 0
Differential D128011
[SimplifyLibCalls] Transform memchr(STR, C, N) to chain of ORs xbolva00 on Jun 16 2022, 2:51 PM. Authored by
Details Motivation: #include <string_view> size_t findFirst_ABCDEF(std::string_view sv) { return sv.find_first_of("ABCDEF"); } memchr("ABCDEF", C, 6) != NULL -> (C == 'A' || C == 'B' || C == 'C' || C == 'D' || C == 'E' || C == 'F') != 0
Diff Detail
Event TimelineComment Actions This looks quite nice for the examples -- but can't this create a very large chain of comparisons in some cases? Like if you check every second character you'd end up with 128 comparisons? (Or maybe you need every third, I can imagine that we manage to fold every second character to am mask). Comment Actions Added PhaseOrdering test to ensure instcombine + simplifycfg produces optimal switch code. Comment Actions Maybe we could introduce some limit (feel free to suggest some), but on the other hand, we often end up with nice switch or very simple code. int g(char a) { if (__builtin_memchr ("abcdefghijklmnopqrstuvwxyz", a, 27) != 0) return foo(); else return -1; } define dso_local noundef i32 @_Z1gc(i8 noundef signext %a) local_unnamed_addr #1 { entry: %0 = icmp ne i8 %a, 0 %1 = add i8 %a, -123 %2 = icmp ult i8 %1, -26 %cmp.not = and i1 %0, %2 br i1 %cmp.not, label %return, label %if.then if.then: ; preds = %entry %call1 = tail call noundef i32 @_Z3foov() br label %return return: ; preds = %entry, %if.then %retval.0 = phi i32 [ %call1, %if.then ], [ -1, %entry ] ret i32 %retval.0 } or int g(char a) { if (__builtin_memchr ("acegikmoqsuwyz13456789", a, 22) != 0) return foo(); else return -1; } define dso_local noundef i32 @_Z1gc(i8 noundef signext %a) local_unnamed_addr #1 { entry: switch i8 %a, label %return [ i8 122, label %if.then i8 121, label %if.then i8 119, label %if.then i8 117, label %if.then i8 115, label %if.then i8 113, label %if.then i8 111, label %if.then i8 109, label %if.then i8 107, label %if.then i8 105, label %if.then i8 103, label %if.then i8 101, label %if.then i8 99, label %if.then i8 97, label %if.then i8 57, label %if.then i8 56, label %if.then i8 55, label %if.then i8 54, label %if.then i8 53, label %if.then i8 52, label %if.then i8 51, label %if.then i8 49, label %if.then ] ... Comment Actions An analogous optimization is applicable to other string functions besides memchr (and strchr), including for example strlen or even memcmp and strcmp (with one short constant array/string). With this one accepted as a precedent the other handlers can be enhanced as follow ups. It would be nice if the algorithm could be made general enough to let them all share it. It does seem that the expansion in all of them should be capped at some small number. For what it's worth, a prototype of the corresponding patch for GCC is attached to PR 103798. It looks like it has a hardwired limit of 8 bytes. Comment Actions
strchr is transformed to memchr, if possible. See test strchr_to_memchr_n_equals_len.
Very small, imho, We usually end up with one switch instruction, not a big deal. But I am fine with 32 I think.
I would not overcomplicate things for know, currently just 5 lines of code. Any possible followup could create create a new helper function.
Yes, I saw it. Oh, that expanded implementation in match.pd = D Comment Actions It seems important to put a limit on the length of the chain to avoid pathological expansion (that might come up in generated code, but not only there). I expect the optimal maximum will vary across different targets but a suitably low minimum should be target-independent. I don't have a good sense of what the ideal lower bound might be but I would hesitate to go over the proposed GCC limit of 8 unless the ORs could be assumed to collapse to something much simpler (like in memchr-7.ll). Instrumenting the compiler and building a few projects might help dial it in. Having said that, I don't think it's a guaranteed win to set the limit to zero (or to disable the subsequent folding) when optimizing for size. But whatever the effect of -Os might be here, it should be exercised by the test suite if only to make the decision explicit. Comment Actions I think for the limit, the important thing to consider is comparisons that collapse into range checks. We don't want to limit the total number of characters being checked, but rather the number of of non-contiguous ranges. The motivating cases reduce down to just one or two range checks, which is certainly profitable. The cases that would require a switch for decent lowering are less clearly profitable, especially when we take into account that such a switch is not being reliably produced (as your phase ordering test shows). Comment Actions
I believe there is no easy StringRef API, right? Not sure if I want to work on such new API. For now, I would go with GCC's limit of 8. Comment Actions Something like this seems to do the right thing for a sorted list (s is a container like StringRef): int last = 0; int n = std::count_if (s.begin(), s.end(), [&last](char c) { bool ret = (unsigned char)c > (unsigned char)last + 1; last = c; return ret; }); (It would be a bit more involved if it had to handle different charsets between the host and the target.) Comment Actions Why does this need a phase ordering test? What is the output after just InstCombine for it? Does InstCombine not reliably reduce the comparisons and require SimplifyCFG switch optimization? What if there is no actual branch involved and SimplifyCFG doesn't reduce it?
|
Drop "logical". The or is not logical in the sense we use the word.