This is an archive of the discontinued LLVM Phabricator instance.

[ExpandMemCmp] Improve non-equality comparisons with zero.
Changes PlannedPublic

Authored by courbet on Mar 9 2020, 3:53 AM.

Details

Reviewers
spatel
efriedma
Summary

This is one step towards fixing PR45147.

Diff Detail

Event Timeline

courbet created this revision.Mar 9 2020, 3:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2020, 3:53 AM
courbet updated this revision to Diff 249283.Mar 10 2020, 12:26 AM

clang-format

Benchmark results from test-suite. TBH I'm not exactly sure why only the "greater" case gets better, but overall the results are promising.

BM_MemCmp<1, EqZero, None>                        2.51µs ± 2%             2.47µs ± 2%   -1.48%         (p=0.017 n=9+10)
BM_MemCmp<1, EqZero, First>                       2.50µs ± 3%             2.49µs ± 2%     ~            (p=0.356 n=10+9)
BM_MemCmp<1, EqZero, Mid>                         2.49µs ± 3%             2.48µs ± 2%     ~            (p=0.447 n=10+9)
BM_MemCmp<1, EqZero, Last>                        2.51µs ± 2%             2.48µs ± 2%     ~             (p=0.113 n=9+9)
BM_MemCmp<1, LessThanZero, None>                  2.48µs ± 3%             2.48µs ± 2%     ~             (p=0.605 n=9+9)
BM_MemCmp<1, LessThanZero, First>                 2.49µs ± 3%             2.49µs ± 2%     ~             (p=1.000 n=9+9)
BM_MemCmp<1, LessThanZero, Mid>                   2.49µs ± 3%             2.49µs ± 2%     ~             (p=0.931 n=9+9)
BM_MemCmp<1, LessThanZero, Last>                  2.49µs ± 3%             2.48µs ± 2%     ~            (p=0.905 n=9+10)
BM_MemCmp<1, GreaterThanZero, None>               2.89µs ± 3%             2.89µs ± 2%     ~            (p=0.720 n=9+10)
BM_MemCmp<1, GreaterThanZero, First>              2.89µs ± 3%             2.90µs ± 2%     ~            (p=0.515 n=8+10)
BM_MemCmp<1, GreaterThanZero, Mid>                2.91µs ± 5%             2.91µs ± 1%     ~             (p=0.546 n=9+9)
BM_MemCmp<1, GreaterThanZero, Last>               2.89µs ± 2%             2.90µs ± 2%     ~             (p=0.606 n=8+9)
BM_MemCmp<2, EqZero, None>                        1.24µs ± 2%             1.24µs ± 2%     ~            (p=0.696 n=8+10)
BM_MemCmp<2, EqZero, First>                       1.24µs ± 2%             1.24µs ± 2%     ~            (p=0.460 n=8+10)
BM_MemCmp<2, EqZero, Mid>                         1.24µs ± 2%             1.24µs ± 2%     ~             (p=1.000 n=8+9)
BM_MemCmp<2, EqZero, Last>                        1.23µs ± 2%             1.24µs ± 2%     ~             (p=0.743 n=8+9)
BM_MemCmp<2, LessThanZero, None>                  1.70µs ± 4%             1.71µs ± 2%     ~             (p=0.481 n=9+8)
BM_MemCmp<2, LessThanZero, First>                 1.71µs ± 3%             1.83µs ±15%     ~           (p=0.247 n=10+10)
BM_MemCmp<2, LessThanZero, Mid>                   1.71µs ± 3%             1.83µs ±16%     ~            (p=0.497 n=9+10)
BM_MemCmp<2, LessThanZero, Last>                  1.71µs ± 3%             1.83µs ±16%     ~            (p=0.604 n=9+10)
BM_MemCmp<2, GreaterThanZero, None>               2.16µs ±15%             1.65µs ±16%  -23.59%        (p=0.000 n=10+10)
BM_MemCmp<2, GreaterThanZero, First>              2.16µs ±15%             1.55µs ± 2%  -28.48%         (p=0.000 n=10+8)
BM_MemCmp<2, GreaterThanZero, Mid>                2.17µs ±15%             1.55µs ± 4%  -28.40%         (p=0.000 n=10+8)
BM_MemCmp<2, GreaterThanZero, Last>               2.16µs ±15%             1.56µs ± 6%  -27.65%         (p=0.000 n=10+8)
BM_MemCmp<3, EqZero, None>                        1.14µs ± 3%             1.22µs ±15%     ~            (p=0.068 n=8+10)
BM_MemCmp<3, EqZero, First>                       1.15µs ± 3%             1.23µs ±15%     ~            (p=0.315 n=8+10)
BM_MemCmp<3, EqZero, Mid>                         1.15µs ± 5%             1.22µs ±15%     ~            (p=0.356 n=9+10)
BM_MemCmp<3, EqZero, Last>                        1.14µs ± 3%             1.22µs ±16%     ~            (p=0.315 n=9+10)
BM_MemCmp<3, LessThanZero, None>                  1.24µs ± 3%             1.25µs ± 3%     ~             (p=0.370 n=9+8)
BM_MemCmp<3, LessThanZero, First>                 1.53µs ± 3%             1.52µs ± 3%     ~             (p=0.796 n=9+9)
BM_MemCmp<3, LessThanZero, Mid>                   1.47µs ± 3%             1.47µs ± 2%     ~            (p=0.897 n=8+10)
BM_MemCmp<3, LessThanZero, Last>                  1.24µs ± 3%             1.24µs ± 2%     ~             (p=0.963 n=8+9)
BM_MemCmp<3, GreaterThanZero, None>               1.44µs ± 3%             1.38µs ± 2%   -4.56%          (p=0.000 n=9+9)
BM_MemCmp<3, GreaterThanZero, First>              1.66µs ± 3%             1.38µs ± 2%  -16.60%          (p=0.000 n=9+9)
BM_MemCmp<3, GreaterThanZero, Mid>                1.66µs ± 3%             0.93µs ± 2%  -43.92%          (p=0.000 n=9+9)
BM_MemCmp<3, GreaterThanZero, Last>               1.45µs ± 3%             1.37µs ± 2%   -5.07%          (p=0.000 n=9+9)
BM_MemCmp<4, EqZero, None>                         624ns ± 3%              621ns ± 2%     ~             (p=0.796 n=9+9)
BM_MemCmp<4, EqZero, First>                        625ns ± 3%              621ns ± 2%     ~            (p=0.549 n=9+10)
BM_MemCmp<4, EqZero, Mid>                          625ns ± 3%              620ns ± 2%     ~             (p=0.340 n=9+9)
BM_MemCmp<4, EqZero, Last>                         624ns ± 3%              620ns ± 2%     ~             (p=0.546 n=9+9)
BM_MemCmp<4, LessThanZero, None>                  1.01µs ± 3%             1.01µs ± 2%     ~             (p=0.605 n=9+9)
BM_MemCmp<4, LessThanZero, First>                 1.02µs ± 4%             1.01µs ± 2%     ~             (p=0.190 n=9+9)
BM_MemCmp<4, LessThanZero, Mid>                   1.01µs ± 3%             1.01µs ± 2%     ~             (p=0.963 n=9+8)
BM_MemCmp<4, LessThanZero, Last>                  1.01µs ± 3%             1.01µs ± 2%     ~             (p=0.815 n=9+8)
BM_MemCmp<4, GreaterThanZero, None>               1.17µs ± 3%             0.77µs ± 2%  -33.58%          (p=0.000 n=9+9)
BM_MemCmp<4, GreaterThanZero, First>              1.17µs ± 3%             0.77µs ± 2%  -33.64%          (p=0.000 n=9+9)
BM_MemCmp<4, GreaterThanZero, Mid>                1.16µs ± 4%             0.78µs ± 3%  -33.20%          (p=0.000 n=9+8)
BM_MemCmp<4, GreaterThanZero, Last>               1.16µs ± 3%             0.78µs ± 2%  -33.33%          (p=0.000 n=9+9)
BM_MemCmp<5, EqZero, None>                         687ns ± 4%              688ns ± 2%     ~            (p=0.549 n=10+9)
BM_MemCmp<5, EqZero, First>                        688ns ± 3%              688ns ± 2%     ~            (p=0.661 n=10+9)
BM_MemCmp<5, EqZero, Mid>                          748ns ± 3%              691ns ± 1%   -7.61%         (p=0.000 n=10+9)
BM_MemCmp<5, EqZero, Last>                         747ns ± 5%              691ns ± 1%   -7.61%        (p=0.000 n=10+10)
BM_MemCmp<5, LessThanZero, None>                   745ns ± 4%              744ns ± 2%     ~            (p=0.842 n=9+10)
BM_MemCmp<5, LessThanZero, First>                  875ns ± 4%              887ns ± 3%   +1.36%         (p=0.043 n=9+10)
BM_MemCmp<5, LessThanZero, Mid>                    884ns ± 4%              879ns ± 2%     ~            (p=0.720 n=9+10)
BM_MemCmp<5, LessThanZero, Last>                   751ns ± 6%              744ns ± 2%     ~            (p=0.842 n=9+10)
BM_MemCmp<5, GreaterThanZero, None>                921ns ±17%              826ns ± 2%  -10.37%        (p=0.000 n=10+10)
BM_MemCmp<5, GreaterThanZero, First>               995ns ± 3%              822ns ± 2%  -17.41%          (p=0.000 n=8+9)
BM_MemCmp<5, GreaterThanZero, Mid>                 990ns ± 2%              558ns ± 2%  -43.67%          (p=0.000 n=8+8)
BM_MemCmp<5, GreaterThanZero, Last>                864ns ± 2%              824ns ± 2%   -4.63%          (p=0.000 n=8+8)
BM_MemCmp<6, EqZero, None>                         569ns ± 2%              611ns ±16%     ~            (p=0.274 n=8+10)
BM_MemCmp<6, EqZero, First>                        576ns ± 6%              610ns ±16%     ~            (p=0.780 n=9+10)
BM_MemCmp<6, EqZero, Mid>                          576ns ± 3%              654ns ±18%  +13.51%         (p=0.000 n=8+10)
BM_MemCmp<6, EqZero, Last>                         577ns ± 3%              664ns ±16%  +15.10%         (p=0.000 n=8+10)
BM_MemCmp<6, LessThanZero, None>                  1.00µs ±15%             1.00µs ±15%     ~           (p=0.631 n=10+10)
BM_MemCmp<6, LessThanZero, First>                  785ns ±15%              780ns ±16%     ~           (p=0.853 n=10+10)
BM_MemCmp<6, LessThanZero, Mid>                    745ns ±11%              781ns ±16%     ~            (p=0.497 n=9+10)
BM_MemCmp<6, LessThanZero, Last>                  1.14µs ± 3%             1.21µs ±17%     ~            (p=0.633 n=8+10)
BM_MemCmp<6, GreaterThanZero, None>               1.08µs ± 4%             0.77µs ± 1%  -28.73%          (p=0.000 n=8+8)
BM_MemCmp<6, GreaterThanZero, First>               832ns ± 3%              772ns ± 2%   -7.24%         (p=0.000 n=9+10)
BM_MemCmp<6, GreaterThanZero, Mid>                 831ns ± 3%              464ns ± 1%  -44.12%          (p=0.000 n=8+9)
BM_MemCmp<6, GreaterThanZero, Last>               1.24µs ± 3%             0.77µs ± 1%  -37.93%          (p=0.000 n=8+9)
BM_MemCmp<7, EqZero, None>                         452ns ± 3%              446ns ± 1%   -1.50%          (p=0.027 n=8+9)
BM_MemCmp<7, EqZero, First>                        451ns ± 3%              447ns ± 2%     ~             (p=0.222 n=9+9)
BM_MemCmp<7, EqZero, Mid>                          452ns ± 3%              446ns ± 2%     ~             (p=0.094 n=9+9)
BM_MemCmp<7, EqZero, Last>                         451ns ± 3%              444ns ± 1%   -1.38%         (p=0.022 n=10+9)
BM_MemCmp<7, LessThanZero, None>                  1.90µs ± 3%             1.89µs ± 3%     ~             (p=0.546 n=9+9)
BM_MemCmp<7, LessThanZero, First>                 1.59µs ± 3%             1.58µs ± 2%     ~             (p=0.489 n=9+9)
BM_MemCmp<7, LessThanZero, Mid>                   1.59µs ± 3%             1.40µs ± 2%  -11.88%          (p=0.000 n=9+9)
BM_MemCmp<7, LessThanZero, Last>                  1.90µs ± 3%             1.89µs ± 2%     ~             (p=0.863 n=9+9)
BM_MemCmp<7, GreaterThanZero, None>               1.99µs ± 3%             1.98µs ± 4%     ~             (p=0.931 n=9+9)
BM_MemCmp<7, GreaterThanZero, First>              1.59µs ± 6%             1.58µs ± 1%     ~             (p=0.606 n=9+8)
BM_MemCmp<7, GreaterThanZero, Mid>                1.59µs ± 3%             1.40µs ± 2%  -11.46%          (p=0.000 n=9+8)
BM_MemCmp<7, GreaterThanZero, Last>               1.98µs ± 4%             1.98µs ± 3%     ~             (p=0.863 n=9+9)
BM_MemCmp<8, EqZero, None>                         313ns ± 4%              311ns ± 2%     ~             (p=0.863 n=9+9)
BM_MemCmp<8, EqZero, First>                        313ns ± 4%              312ns ± 4%     ~             (p=1.000 n=9+9)
BM_MemCmp<8, EqZero, Mid>                          313ns ± 4%              312ns ± 2%     ~             (p=0.888 n=9+8)
BM_MemCmp<8, EqZero, Last>                         313ns ± 4%              312ns ± 1%     ~             (p=0.796 n=9+9)
BM_MemCmp<8, LessThanZero, None>                   586ns ± 3%              584ns ± 2%     ~            (p=0.842 n=10+9)
BM_MemCmp<8, LessThanZero, First>                  585ns ± 2%              584ns ± 2%     ~             (p=0.931 n=9+9)
BM_MemCmp<8, LessThanZero, Mid>                    584ns ± 2%              583ns ± 1%     ~             (p=0.931 n=9+9)
BM_MemCmp<8, LessThanZero, Last>                   584ns ± 3%              582ns ± 1%     ~             (p=0.673 n=9+8)
BM_MemCmp<8, GreaterThanZero, None>                655ns ± 1%              464ns ± 1%  -29.07%         (p=0.000 n=8+10)
BM_MemCmp<8, GreaterThanZero, First>               657ns ± 2%              464ns ± 1%  -29.39%          (p=0.000 n=9+9)
BM_MemCmp<8, GreaterThanZero, Mid>                 661ns ± 3%              465ns ± 2%  -29.73%         (p=0.000 n=9+10)
BM_MemCmp<8, GreaterThanZero, Last>                655ns ± 1%              465ns ± 2%  -29.08%         (p=0.000 n=8+10)
BM_MemCmp<15, EqZero, None>                        213ns ± 1%              213ns ± 1%     ~            (p=0.897 n=8+10)
BM_MemCmp<15, EqZero, First>                       214ns ± 3%              214ns ± 2%     ~            (p=0.356 n=9+10)
BM_MemCmp<15, EqZero, Mid>                         214ns ± 1%              214ns ± 2%     ~            (p=0.897 n=8+10)
BM_MemCmp<15, EqZero, Last>                        227ns ±17%              214ns ± 1%     ~           (p=0.218 n=10+10)
BM_MemCmp<15, LessThanZero, None>                  856ns ± 2%              859ns ± 3%     ~             (p=0.606 n=8+9)
BM_MemCmp<15, LessThanZero, First>                 578ns ± 1%              576ns ± 2%     ~             (p=0.574 n=8+8)
BM_MemCmp<15, LessThanZero, Mid>                   656ns ± 2%              617ns ±17%     ~            (p=0.173 n=8+10)
BM_MemCmp<15, LessThanZero, Last>                  854ns ± 2%              920ns ±16%     ~            (p=0.203 n=8+10)
BM_MemCmp<15, GreaterThanZero, None>               962ns ±17%              969ns ±16%     ~           (p=0.796 n=10+10)
BM_MemCmp<15, GreaterThanZero, First>              656ns ± 3%              720ns ±13%     ~            (p=0.101 n=8+10)
BM_MemCmp<15, GreaterThanZero, Mid>                704ns ±16%              724ns ±14%     ~           (p=0.218 n=10+10)
BM_MemCmp<15, GreaterThanZero, Last>               973ns ±16%              980ns ±15%     ~           (p=0.436 n=10+10)
BM_MemCmp<16, EqZero, None>                        212ns ±16%              212ns ±16%     ~           (p=0.739 n=10+10)
BM_MemCmp<16, EqZero, First>                       200ns ± 8%              212ns ±16%     ~            (p=1.000 n=9+10)
BM_MemCmp<16, EqZero, Mid>                         197ns ± 3%              206ns ±19%     ~            (p=0.515 n=8+10)
BM_MemCmp<16, EqZero, Last>                        198ns ± 3%              196ns ± 2%     ~             (p=0.277 n=8+9)
BM_MemCmp<16, LessThanZero, None>                  390ns ± 3%              388ns ± 2%     ~            (p=0.447 n=9+10)
BM_MemCmp<16, LessThanZero, First>                 315ns ± 3%              313ns ± 1%     ~             (p=0.321 n=8+9)
BM_MemCmp<16, LessThanZero, Mid>                   467ns ± 3%              465ns ± 1%     ~             (p=0.481 n=8+9)
BM_MemCmp<16, LessThanZero, Last>                  471ns ± 4%              465ns ± 2%     ~             (p=0.258 n=9+9)
BM_MemCmp<16, GreaterThanZero, None>               427ns ± 3%              369ns ± 1%  -13.68%          (p=0.000 n=9+8)
BM_MemCmp<16, GreaterThanZero, First>              355ns ± 3%              369ns ± 1%   +3.86%          (p=0.000 n=9+8)
BM_MemCmp<16, GreaterThanZero, Mid>                503ns ± 2%              371ns ± 3%  -26.12%          (p=0.000 n=9+9)
BM_MemCmp<16, GreaterThanZero, Last>               505ns ± 3%              371ns ± 2%  -26.56%         (p=0.000 n=9+10)
BM_MemCmp<31, EqZero, None>                        146ns ± 3%              145ns ± 2%     ~             (p=0.796 n=9+9)
BM_MemCmp<31, EqZero, First>                       145ns ± 4%              145ns ± 2%     ~             (p=0.546 n=9+9)
BM_MemCmp<31, EqZero, Mid>                         146ns ± 4%              145ns ± 1%     ~             (p=0.666 n=9+9)
BM_MemCmp<31, EqZero, Last>                        146ns ± 3%              146ns ± 1%     ~             (p=0.666 n=9+9)
BM_MemCmp<31, LessThanZero, None>                  465ns ± 8%              463ns ± 4%     ~            (p=0.762 n=10+8)
BM_MemCmp<31, LessThanZero, First>                 434ns ± 3%              435ns ± 2%     ~             (p=0.888 n=9+8)
BM_MemCmp<31, LessThanZero, Mid>                   480ns ± 3%              480ns ± 2%     ~             (p=0.666 n=9+9)
BM_MemCmp<31, LessThanZero, Last>                  549ns ± 1%              552ns ± 2%     ~             (p=0.525 n=8+9)
BM_MemCmp<31, GreaterThanZero, None>               446ns ± 4%              447ns ± 2%     ~             (p=0.743 n=9+8)
BM_MemCmp<31, GreaterThanZero, First>              433ns ± 3%              435ns ± 2%     ~             (p=0.222 n=9+9)
BM_MemCmp<31, GreaterThanZero, Mid>                474ns ± 3%              475ns ± 2%     ~            (p=0.447 n=10+9)
BM_MemCmp<31, GreaterThanZero, Last>               552ns ± 2%              551ns ± 1%     ~            (p=0.965 n=10+8)
BM_MemCmp<32, EqZero, None>                        140ns ± 2%              141ns ± 3%     ~            (p=0.447 n=10+9)
BM_MemCmp<32, EqZero, First>                       139ns ± 1%              140ns ± 1%     ~             (p=0.297 n=9+9)
BM_MemCmp<32, EqZero, Mid>                         139ns ± 2%              140ns ± 1%     ~            (p=0.356 n=9+10)
BM_MemCmp<32, EqZero, Last>                        139ns ± 2%              139ns ± 2%     ~            (p=0.447 n=9+10)
BM_MemCmp<32, LessThanZero, None>                  519ns ± 4%              513ns ± 3%     ~            (p=0.133 n=10+9)
BM_MemCmp<32, LessThanZero, First>                 418ns ± 1%              418ns ± 3%     ~            (p=0.573 n=8+10)
BM_MemCmp<32, LessThanZero, Mid>                   494ns ± 1%              493ns ± 1%     ~             (p=0.815 n=8+9)
BM_MemCmp<32, LessThanZero, Last>                  535ns ± 3%              534ns ± 2%     ~            (p=0.720 n=9+10)
BM_MemCmp<32, GreaterThanZero, None>               544ns ± 3%              545ns ± 2%     ~            (p=0.661 n=9+10)
BM_MemCmp<32, GreaterThanZero, First>              419ns ± 3%              419ns ± 1%     ~            (p=0.842 n=9+10)
BM_MemCmp<32, GreaterThanZero, Mid>                508ns ± 2%              510ns ± 2%     ~            (p=0.356 n=9+10)
BM_MemCmp<32, GreaterThanZero, Last>               549ns ± 2%              552ns ± 1%     ~             (p=0.321 n=8+9)
BM_MemCmp<63, EqZero, None>                        364ns ± 1%              363ns ± 2%     ~             (p=0.234 n=8+8)
BM_MemCmp<63, EqZero, First>                       161ns ± 2%              161ns ± 2%     ~             (p=0.878 n=8+8)
BM_MemCmp<63, EqZero, Mid>                         315ns ± 1%              316ns ± 2%     ~             (p=0.574 n=8+8)
BM_MemCmp<63, EqZero, Last>                        161ns ± 2%              173ns ±16%     ~            (p=0.101 n=8+10)
BM_MemCmp<63, LessThanZero, None>                  350ns ± 2%              375ns ±17%     ~            (p=0.515 n=8+10)
BM_MemCmp<63, LessThanZero, First>                 214ns ± 2%              215ns ± 3%     ~             (p=0.798 n=8+8)
BM_MemCmp<63, LessThanZero, Mid>                   288ns ±18%              293ns ±16%     ~           (p=0.853 n=10+10)
BM_MemCmp<63, LessThanZero, Last>                  376ns ±16%              384ns ±14%     ~           (p=0.912 n=10+10)
BM_MemCmp<63, GreaterThanZero, None>               326ns ±16%              328ns ±16%     ~           (p=0.631 n=10+10)
BM_MemCmp<63, GreaterThanZero, First>              228ns ±18%              229ns ±17%     ~           (p=0.393 n=10+10)
BM_MemCmp<63, GreaterThanZero, Mid>                279ns ± 3%              299ns ±17%     ~            (p=0.573 n=8+10)
BM_MemCmp<63, GreaterThanZero, Last>               350ns ± 4%              352ns ± 3%     ~             (p=0.423 n=8+9)
BM_MemCmp<64, EqZero, None>                        407ns ± 2%              406ns ± 2%     ~            (p=0.720 n=9+10)
BM_MemCmp<64, EqZero, First>                       160ns ± 2%              158ns ± 1%     ~            (p=0.068 n=8+10)
BM_MemCmp<64, EqZero, Mid>                         321ns ± 2%              316ns ± 2%   -1.78%          (p=0.006 n=8+9)
BM_MemCmp<64, EqZero, Last>                        159ns ± 2%              158ns ± 1%     ~             (p=0.606 n=8+9)
BM_MemCmp<64, LessThanZero, None>                  401ns ± 2%              396ns ± 2%     ~             (p=0.063 n=9+9)
BM_MemCmp<64, LessThanZero, First>                 210ns ± 1%              209ns ± 1%     ~             (p=0.142 n=7+9)
BM_MemCmp<64, LessThanZero, Mid>                   285ns ± 1%              285ns ± 1%     ~             (p=0.606 n=8+9)
BM_MemCmp<64, LessThanZero, Last>                  402ns ± 3%              407ns ± 1%     ~            (p=0.053 n=10+9)
BM_MemCmp<64, GreaterThanZero, None>               375ns ± 4%              377ns ± 4%     ~            (p=0.497 n=9+10)
BM_MemCmp<64, GreaterThanZero, First>              210ns ± 2%              210ns ± 1%     ~             (p=0.340 n=9+9)
BM_MemCmp<64, GreaterThanZero, Mid>                306ns ± 2%              306ns ± 1%     ~             (p=0.863 n=9+9)
BM_MemCmp<64, GreaterThanZero, Last>               406ns ± 3%              407ns ± 1%     ~             (p=0.546 n=9+9)
spatel added inline comments.Mar 11 2020, 9:04 AM
llvm/lib/CodeGen/ExpandMemCmp.cpp
682–686

Is there a need to check for multiple users that are identical? Ie, doesn't CSE make this unlikely?

courbet marked an inline comment as done.Mar 11 2020, 9:06 AM
courbet added inline comments.
llvm/lib/CodeGen/ExpandMemCmp.cpp
682–686

It might be unlikely, but it's not guaranteed. I'm not sure how I would handle diverging users, did yo have anything specific in mind ?

spatel added inline comments.Mar 11 2020, 10:08 AM
llvm/lib/CodeGen/ExpandMemCmp.cpp
371–375

This is not correct. memcmp() compares as unsigned bytes:
http://www.cplusplus.com/reference/cstring/memcmp/

This program will show a failure compiled with -O1 vs -O0 (do we have some kind of correctness test like this in the test-suite?):

#include <string.h>
#include <stdio.h>

void wrapper(char *c, char *d) {
  if (memcmp(c, d, 4) > 0)
    printf(">0\n");
  else
    printf("<=0\n");
}

int main() {
  int A = 0x00000000;
  int B = 0x00000080;
  wrapper(&A, &B);
  return 0;
}
llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll
128

I think that's a miscompile...should be "setb"?

courbet updated this revision to Diff 249872.Mar 12 2020, 2:58 AM

Use the right unsigned equivalent for comparing loads instead of the signed predicate from the zero relational compare.

courbet marked an inline comment as done.Mar 12 2020, 3:01 AM
courbet added inline comments.
llvm/lib/CodeGen/ExpandMemCmp.cpp
371–375

Thank for the catch. I've added more tests in 4edd050c7e97 to make the issue more obvious to the eye than the setl/setb issue.

I though the test-suite benchmark was validating the result, but it's only benchmarking and throwing the result away. I'll fix the benchmarks.

spatel added inline comments.Mar 12 2020, 6:07 AM
llvm/lib/CodeGen/ExpandMemCmp.cpp
371–375

Great - thanks!
If we want another option that's easier to spot diffs in than x86 asm - add to the IR tests in "llvm/test/Transforms/ExpandMemCmp/".

682–686

We could just give up if the memcmp() call !hasOneUse(). It simplifies the code, but it might miss some strange case...but if there's no test coverage/evidence that the multi-user case exists, it should be fine?

Side note: I was wondering what these patterns look like in DAG/IR, and why we're missing them, so:
D75961

We could argue that it's worth solving these more generally in DAGCombiner, but if we can produce the optimal code here without too much effort, that's good too.

llvm/lib/CodeGen/ExpandMemCmp.cpp
397

That looks like the right fix, but it could use a comment so we'll remember why. Something like this:

// The result of memcmp is a signed value, so the relational 
// compare kind predicate is always signed. But the byte
// comparison used within memcmp is always unsigned.
spatel added inline comments.Mar 12 2020, 6:41 AM
llvm/test/Transforms/ExpandMemCmp/X86/memcmp.ll
455

Wait - this is another miscompile. Can we split the equality compare part of this into its own patch?

If the 1st load pair compares equal, we must check the 2nd load pair, not go to the result block.

courbet planned changes to this revision.Mar 13 2020, 5:12 AM

Actually I'm actually now convinced that the whole approach is incorrect when there are several blocks. I'm going to take some time to rethink this.