This is one step towards fixing PR45147.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Benchmark results from test-suite. TBH I'm not exactly sure why only the "greater" case gets better, but overall the results are promising.
BM_MemCmp<1, EqZero, None> 2.51µs ± 2% 2.47µs ± 2% -1.48% (p=0.017 n=9+10) BM_MemCmp<1, EqZero, First> 2.50µs ± 3% 2.49µs ± 2% ~ (p=0.356 n=10+9) BM_MemCmp<1, EqZero, Mid> 2.49µs ± 3% 2.48µs ± 2% ~ (p=0.447 n=10+9) BM_MemCmp<1, EqZero, Last> 2.51µs ± 2% 2.48µs ± 2% ~ (p=0.113 n=9+9) BM_MemCmp<1, LessThanZero, None> 2.48µs ± 3% 2.48µs ± 2% ~ (p=0.605 n=9+9) BM_MemCmp<1, LessThanZero, First> 2.49µs ± 3% 2.49µs ± 2% ~ (p=1.000 n=9+9) BM_MemCmp<1, LessThanZero, Mid> 2.49µs ± 3% 2.49µs ± 2% ~ (p=0.931 n=9+9) BM_MemCmp<1, LessThanZero, Last> 2.49µs ± 3% 2.48µs ± 2% ~ (p=0.905 n=9+10) BM_MemCmp<1, GreaterThanZero, None> 2.89µs ± 3% 2.89µs ± 2% ~ (p=0.720 n=9+10) BM_MemCmp<1, GreaterThanZero, First> 2.89µs ± 3% 2.90µs ± 2% ~ (p=0.515 n=8+10) BM_MemCmp<1, GreaterThanZero, Mid> 2.91µs ± 5% 2.91µs ± 1% ~ (p=0.546 n=9+9) BM_MemCmp<1, GreaterThanZero, Last> 2.89µs ± 2% 2.90µs ± 2% ~ (p=0.606 n=8+9) BM_MemCmp<2, EqZero, None> 1.24µs ± 2% 1.24µs ± 2% ~ (p=0.696 n=8+10) BM_MemCmp<2, EqZero, First> 1.24µs ± 2% 1.24µs ± 2% ~ (p=0.460 n=8+10) BM_MemCmp<2, EqZero, Mid> 1.24µs ± 2% 1.24µs ± 2% ~ (p=1.000 n=8+9) BM_MemCmp<2, EqZero, Last> 1.23µs ± 2% 1.24µs ± 2% ~ (p=0.743 n=8+9) BM_MemCmp<2, LessThanZero, None> 1.70µs ± 4% 1.71µs ± 2% ~ (p=0.481 n=9+8) BM_MemCmp<2, LessThanZero, First> 1.71µs ± 3% 1.83µs ±15% ~ (p=0.247 n=10+10) BM_MemCmp<2, LessThanZero, Mid> 1.71µs ± 3% 1.83µs ±16% ~ (p=0.497 n=9+10) BM_MemCmp<2, LessThanZero, Last> 1.71µs ± 3% 1.83µs ±16% ~ (p=0.604 n=9+10) BM_MemCmp<2, GreaterThanZero, None> 2.16µs ±15% 1.65µs ±16% -23.59% (p=0.000 n=10+10) BM_MemCmp<2, GreaterThanZero, First> 2.16µs ±15% 1.55µs ± 2% -28.48% (p=0.000 n=10+8) BM_MemCmp<2, GreaterThanZero, Mid> 2.17µs ±15% 1.55µs ± 4% -28.40% (p=0.000 n=10+8) BM_MemCmp<2, GreaterThanZero, Last> 2.16µs ±15% 1.56µs ± 6% -27.65% (p=0.000 n=10+8) BM_MemCmp<3, EqZero, None> 1.14µs ± 3% 1.22µs ±15% ~ (p=0.068 n=8+10) BM_MemCmp<3, EqZero, First> 1.15µs ± 3% 1.23µs ±15% ~ (p=0.315 n=8+10) BM_MemCmp<3, EqZero, Mid> 1.15µs ± 5% 1.22µs ±15% ~ (p=0.356 n=9+10) BM_MemCmp<3, EqZero, Last> 1.14µs ± 3% 1.22µs ±16% ~ (p=0.315 n=9+10) BM_MemCmp<3, LessThanZero, None> 1.24µs ± 3% 1.25µs ± 3% ~ (p=0.370 n=9+8) BM_MemCmp<3, LessThanZero, First> 1.53µs ± 3% 1.52µs ± 3% ~ (p=0.796 n=9+9) BM_MemCmp<3, LessThanZero, Mid> 1.47µs ± 3% 1.47µs ± 2% ~ (p=0.897 n=8+10) BM_MemCmp<3, LessThanZero, Last> 1.24µs ± 3% 1.24µs ± 2% ~ (p=0.963 n=8+9) BM_MemCmp<3, GreaterThanZero, None> 1.44µs ± 3% 1.38µs ± 2% -4.56% (p=0.000 n=9+9) BM_MemCmp<3, GreaterThanZero, First> 1.66µs ± 3% 1.38µs ± 2% -16.60% (p=0.000 n=9+9) BM_MemCmp<3, GreaterThanZero, Mid> 1.66µs ± 3% 0.93µs ± 2% -43.92% (p=0.000 n=9+9) BM_MemCmp<3, GreaterThanZero, Last> 1.45µs ± 3% 1.37µs ± 2% -5.07% (p=0.000 n=9+9) BM_MemCmp<4, EqZero, None> 624ns ± 3% 621ns ± 2% ~ (p=0.796 n=9+9) BM_MemCmp<4, EqZero, First> 625ns ± 3% 621ns ± 2% ~ (p=0.549 n=9+10) BM_MemCmp<4, EqZero, Mid> 625ns ± 3% 620ns ± 2% ~ (p=0.340 n=9+9) BM_MemCmp<4, EqZero, Last> 624ns ± 3% 620ns ± 2% ~ (p=0.546 n=9+9) BM_MemCmp<4, LessThanZero, None> 1.01µs ± 3% 1.01µs ± 2% ~ (p=0.605 n=9+9) BM_MemCmp<4, LessThanZero, First> 1.02µs ± 4% 1.01µs ± 2% ~ (p=0.190 n=9+9) BM_MemCmp<4, LessThanZero, Mid> 1.01µs ± 3% 1.01µs ± 2% ~ (p=0.963 n=9+8) BM_MemCmp<4, LessThanZero, Last> 1.01µs ± 3% 1.01µs ± 2% ~ (p=0.815 n=9+8) BM_MemCmp<4, GreaterThanZero, None> 1.17µs ± 3% 0.77µs ± 2% -33.58% (p=0.000 n=9+9) BM_MemCmp<4, GreaterThanZero, First> 1.17µs ± 3% 0.77µs ± 2% -33.64% (p=0.000 n=9+9) BM_MemCmp<4, GreaterThanZero, Mid> 1.16µs ± 4% 0.78µs ± 3% -33.20% (p=0.000 n=9+8) BM_MemCmp<4, GreaterThanZero, Last> 1.16µs ± 3% 0.78µs ± 2% -33.33% (p=0.000 n=9+9) BM_MemCmp<5, EqZero, None> 687ns ± 4% 688ns ± 2% ~ (p=0.549 n=10+9) BM_MemCmp<5, EqZero, First> 688ns ± 3% 688ns ± 2% ~ (p=0.661 n=10+9) BM_MemCmp<5, EqZero, Mid> 748ns ± 3% 691ns ± 1% -7.61% (p=0.000 n=10+9) BM_MemCmp<5, EqZero, Last> 747ns ± 5% 691ns ± 1% -7.61% (p=0.000 n=10+10) BM_MemCmp<5, LessThanZero, None> 745ns ± 4% 744ns ± 2% ~ (p=0.842 n=9+10) BM_MemCmp<5, LessThanZero, First> 875ns ± 4% 887ns ± 3% +1.36% (p=0.043 n=9+10) BM_MemCmp<5, LessThanZero, Mid> 884ns ± 4% 879ns ± 2% ~ (p=0.720 n=9+10) BM_MemCmp<5, LessThanZero, Last> 751ns ± 6% 744ns ± 2% ~ (p=0.842 n=9+10) BM_MemCmp<5, GreaterThanZero, None> 921ns ±17% 826ns ± 2% -10.37% (p=0.000 n=10+10) BM_MemCmp<5, GreaterThanZero, First> 995ns ± 3% 822ns ± 2% -17.41% (p=0.000 n=8+9) BM_MemCmp<5, GreaterThanZero, Mid> 990ns ± 2% 558ns ± 2% -43.67% (p=0.000 n=8+8) BM_MemCmp<5, GreaterThanZero, Last> 864ns ± 2% 824ns ± 2% -4.63% (p=0.000 n=8+8) BM_MemCmp<6, EqZero, None> 569ns ± 2% 611ns ±16% ~ (p=0.274 n=8+10) BM_MemCmp<6, EqZero, First> 576ns ± 6% 610ns ±16% ~ (p=0.780 n=9+10) BM_MemCmp<6, EqZero, Mid> 576ns ± 3% 654ns ±18% +13.51% (p=0.000 n=8+10) BM_MemCmp<6, EqZero, Last> 577ns ± 3% 664ns ±16% +15.10% (p=0.000 n=8+10) BM_MemCmp<6, LessThanZero, None> 1.00µs ±15% 1.00µs ±15% ~ (p=0.631 n=10+10) BM_MemCmp<6, LessThanZero, First> 785ns ±15% 780ns ±16% ~ (p=0.853 n=10+10) BM_MemCmp<6, LessThanZero, Mid> 745ns ±11% 781ns ±16% ~ (p=0.497 n=9+10) BM_MemCmp<6, LessThanZero, Last> 1.14µs ± 3% 1.21µs ±17% ~ (p=0.633 n=8+10) BM_MemCmp<6, GreaterThanZero, None> 1.08µs ± 4% 0.77µs ± 1% -28.73% (p=0.000 n=8+8) BM_MemCmp<6, GreaterThanZero, First> 832ns ± 3% 772ns ± 2% -7.24% (p=0.000 n=9+10) BM_MemCmp<6, GreaterThanZero, Mid> 831ns ± 3% 464ns ± 1% -44.12% (p=0.000 n=8+9) BM_MemCmp<6, GreaterThanZero, Last> 1.24µs ± 3% 0.77µs ± 1% -37.93% (p=0.000 n=8+9) BM_MemCmp<7, EqZero, None> 452ns ± 3% 446ns ± 1% -1.50% (p=0.027 n=8+9) BM_MemCmp<7, EqZero, First> 451ns ± 3% 447ns ± 2% ~ (p=0.222 n=9+9) BM_MemCmp<7, EqZero, Mid> 452ns ± 3% 446ns ± 2% ~ (p=0.094 n=9+9) BM_MemCmp<7, EqZero, Last> 451ns ± 3% 444ns ± 1% -1.38% (p=0.022 n=10+9) BM_MemCmp<7, LessThanZero, None> 1.90µs ± 3% 1.89µs ± 3% ~ (p=0.546 n=9+9) BM_MemCmp<7, LessThanZero, First> 1.59µs ± 3% 1.58µs ± 2% ~ (p=0.489 n=9+9) BM_MemCmp<7, LessThanZero, Mid> 1.59µs ± 3% 1.40µs ± 2% -11.88% (p=0.000 n=9+9) BM_MemCmp<7, LessThanZero, Last> 1.90µs ± 3% 1.89µs ± 2% ~ (p=0.863 n=9+9) BM_MemCmp<7, GreaterThanZero, None> 1.99µs ± 3% 1.98µs ± 4% ~ (p=0.931 n=9+9) BM_MemCmp<7, GreaterThanZero, First> 1.59µs ± 6% 1.58µs ± 1% ~ (p=0.606 n=9+8) BM_MemCmp<7, GreaterThanZero, Mid> 1.59µs ± 3% 1.40µs ± 2% -11.46% (p=0.000 n=9+8) BM_MemCmp<7, GreaterThanZero, Last> 1.98µs ± 4% 1.98µs ± 3% ~ (p=0.863 n=9+9) BM_MemCmp<8, EqZero, None> 313ns ± 4% 311ns ± 2% ~ (p=0.863 n=9+9) BM_MemCmp<8, EqZero, First> 313ns ± 4% 312ns ± 4% ~ (p=1.000 n=9+9) BM_MemCmp<8, EqZero, Mid> 313ns ± 4% 312ns ± 2% ~ (p=0.888 n=9+8) BM_MemCmp<8, EqZero, Last> 313ns ± 4% 312ns ± 1% ~ (p=0.796 n=9+9) BM_MemCmp<8, LessThanZero, None> 586ns ± 3% 584ns ± 2% ~ (p=0.842 n=10+9) BM_MemCmp<8, LessThanZero, First> 585ns ± 2% 584ns ± 2% ~ (p=0.931 n=9+9) BM_MemCmp<8, LessThanZero, Mid> 584ns ± 2% 583ns ± 1% ~ (p=0.931 n=9+9) BM_MemCmp<8, LessThanZero, Last> 584ns ± 3% 582ns ± 1% ~ (p=0.673 n=9+8) BM_MemCmp<8, GreaterThanZero, None> 655ns ± 1% 464ns ± 1% -29.07% (p=0.000 n=8+10) BM_MemCmp<8, GreaterThanZero, First> 657ns ± 2% 464ns ± 1% -29.39% (p=0.000 n=9+9) BM_MemCmp<8, GreaterThanZero, Mid> 661ns ± 3% 465ns ± 2% -29.73% (p=0.000 n=9+10) BM_MemCmp<8, GreaterThanZero, Last> 655ns ± 1% 465ns ± 2% -29.08% (p=0.000 n=8+10) BM_MemCmp<15, EqZero, None> 213ns ± 1% 213ns ± 1% ~ (p=0.897 n=8+10) BM_MemCmp<15, EqZero, First> 214ns ± 3% 214ns ± 2% ~ (p=0.356 n=9+10) BM_MemCmp<15, EqZero, Mid> 214ns ± 1% 214ns ± 2% ~ (p=0.897 n=8+10) BM_MemCmp<15, EqZero, Last> 227ns ±17% 214ns ± 1% ~ (p=0.218 n=10+10) BM_MemCmp<15, LessThanZero, None> 856ns ± 2% 859ns ± 3% ~ (p=0.606 n=8+9) BM_MemCmp<15, LessThanZero, First> 578ns ± 1% 576ns ± 2% ~ (p=0.574 n=8+8) BM_MemCmp<15, LessThanZero, Mid> 656ns ± 2% 617ns ±17% ~ (p=0.173 n=8+10) BM_MemCmp<15, LessThanZero, Last> 854ns ± 2% 920ns ±16% ~ (p=0.203 n=8+10) BM_MemCmp<15, GreaterThanZero, None> 962ns ±17% 969ns ±16% ~ (p=0.796 n=10+10) BM_MemCmp<15, GreaterThanZero, First> 656ns ± 3% 720ns ±13% ~ (p=0.101 n=8+10) BM_MemCmp<15, GreaterThanZero, Mid> 704ns ±16% 724ns ±14% ~ (p=0.218 n=10+10) BM_MemCmp<15, GreaterThanZero, Last> 973ns ±16% 980ns ±15% ~ (p=0.436 n=10+10) BM_MemCmp<16, EqZero, None> 212ns ±16% 212ns ±16% ~ (p=0.739 n=10+10) BM_MemCmp<16, EqZero, First> 200ns ± 8% 212ns ±16% ~ (p=1.000 n=9+10) BM_MemCmp<16, EqZero, Mid> 197ns ± 3% 206ns ±19% ~ (p=0.515 n=8+10) BM_MemCmp<16, EqZero, Last> 198ns ± 3% 196ns ± 2% ~ (p=0.277 n=8+9) BM_MemCmp<16, LessThanZero, None> 390ns ± 3% 388ns ± 2% ~ (p=0.447 n=9+10) BM_MemCmp<16, LessThanZero, First> 315ns ± 3% 313ns ± 1% ~ (p=0.321 n=8+9) BM_MemCmp<16, LessThanZero, Mid> 467ns ± 3% 465ns ± 1% ~ (p=0.481 n=8+9) BM_MemCmp<16, LessThanZero, Last> 471ns ± 4% 465ns ± 2% ~ (p=0.258 n=9+9) BM_MemCmp<16, GreaterThanZero, None> 427ns ± 3% 369ns ± 1% -13.68% (p=0.000 n=9+8) BM_MemCmp<16, GreaterThanZero, First> 355ns ± 3% 369ns ± 1% +3.86% (p=0.000 n=9+8) BM_MemCmp<16, GreaterThanZero, Mid> 503ns ± 2% 371ns ± 3% -26.12% (p=0.000 n=9+9) BM_MemCmp<16, GreaterThanZero, Last> 505ns ± 3% 371ns ± 2% -26.56% (p=0.000 n=9+10) BM_MemCmp<31, EqZero, None> 146ns ± 3% 145ns ± 2% ~ (p=0.796 n=9+9) BM_MemCmp<31, EqZero, First> 145ns ± 4% 145ns ± 2% ~ (p=0.546 n=9+9) BM_MemCmp<31, EqZero, Mid> 146ns ± 4% 145ns ± 1% ~ (p=0.666 n=9+9) BM_MemCmp<31, EqZero, Last> 146ns ± 3% 146ns ± 1% ~ (p=0.666 n=9+9) BM_MemCmp<31, LessThanZero, None> 465ns ± 8% 463ns ± 4% ~ (p=0.762 n=10+8) BM_MemCmp<31, LessThanZero, First> 434ns ± 3% 435ns ± 2% ~ (p=0.888 n=9+8) BM_MemCmp<31, LessThanZero, Mid> 480ns ± 3% 480ns ± 2% ~ (p=0.666 n=9+9) BM_MemCmp<31, LessThanZero, Last> 549ns ± 1% 552ns ± 2% ~ (p=0.525 n=8+9) BM_MemCmp<31, GreaterThanZero, None> 446ns ± 4% 447ns ± 2% ~ (p=0.743 n=9+8) BM_MemCmp<31, GreaterThanZero, First> 433ns ± 3% 435ns ± 2% ~ (p=0.222 n=9+9) BM_MemCmp<31, GreaterThanZero, Mid> 474ns ± 3% 475ns ± 2% ~ (p=0.447 n=10+9) BM_MemCmp<31, GreaterThanZero, Last> 552ns ± 2% 551ns ± 1% ~ (p=0.965 n=10+8) BM_MemCmp<32, EqZero, None> 140ns ± 2% 141ns ± 3% ~ (p=0.447 n=10+9) BM_MemCmp<32, EqZero, First> 139ns ± 1% 140ns ± 1% ~ (p=0.297 n=9+9) BM_MemCmp<32, EqZero, Mid> 139ns ± 2% 140ns ± 1% ~ (p=0.356 n=9+10) BM_MemCmp<32, EqZero, Last> 139ns ± 2% 139ns ± 2% ~ (p=0.447 n=9+10) BM_MemCmp<32, LessThanZero, None> 519ns ± 4% 513ns ± 3% ~ (p=0.133 n=10+9) BM_MemCmp<32, LessThanZero, First> 418ns ± 1% 418ns ± 3% ~ (p=0.573 n=8+10) BM_MemCmp<32, LessThanZero, Mid> 494ns ± 1% 493ns ± 1% ~ (p=0.815 n=8+9) BM_MemCmp<32, LessThanZero, Last> 535ns ± 3% 534ns ± 2% ~ (p=0.720 n=9+10) BM_MemCmp<32, GreaterThanZero, None> 544ns ± 3% 545ns ± 2% ~ (p=0.661 n=9+10) BM_MemCmp<32, GreaterThanZero, First> 419ns ± 3% 419ns ± 1% ~ (p=0.842 n=9+10) BM_MemCmp<32, GreaterThanZero, Mid> 508ns ± 2% 510ns ± 2% ~ (p=0.356 n=9+10) BM_MemCmp<32, GreaterThanZero, Last> 549ns ± 2% 552ns ± 1% ~ (p=0.321 n=8+9) BM_MemCmp<63, EqZero, None> 364ns ± 1% 363ns ± 2% ~ (p=0.234 n=8+8) BM_MemCmp<63, EqZero, First> 161ns ± 2% 161ns ± 2% ~ (p=0.878 n=8+8) BM_MemCmp<63, EqZero, Mid> 315ns ± 1% 316ns ± 2% ~ (p=0.574 n=8+8) BM_MemCmp<63, EqZero, Last> 161ns ± 2% 173ns ±16% ~ (p=0.101 n=8+10) BM_MemCmp<63, LessThanZero, None> 350ns ± 2% 375ns ±17% ~ (p=0.515 n=8+10) BM_MemCmp<63, LessThanZero, First> 214ns ± 2% 215ns ± 3% ~ (p=0.798 n=8+8) BM_MemCmp<63, LessThanZero, Mid> 288ns ±18% 293ns ±16% ~ (p=0.853 n=10+10) BM_MemCmp<63, LessThanZero, Last> 376ns ±16% 384ns ±14% ~ (p=0.912 n=10+10) BM_MemCmp<63, GreaterThanZero, None> 326ns ±16% 328ns ±16% ~ (p=0.631 n=10+10) BM_MemCmp<63, GreaterThanZero, First> 228ns ±18% 229ns ±17% ~ (p=0.393 n=10+10) BM_MemCmp<63, GreaterThanZero, Mid> 279ns ± 3% 299ns ±17% ~ (p=0.573 n=8+10) BM_MemCmp<63, GreaterThanZero, Last> 350ns ± 4% 352ns ± 3% ~ (p=0.423 n=8+9) BM_MemCmp<64, EqZero, None> 407ns ± 2% 406ns ± 2% ~ (p=0.720 n=9+10) BM_MemCmp<64, EqZero, First> 160ns ± 2% 158ns ± 1% ~ (p=0.068 n=8+10) BM_MemCmp<64, EqZero, Mid> 321ns ± 2% 316ns ± 2% -1.78% (p=0.006 n=8+9) BM_MemCmp<64, EqZero, Last> 159ns ± 2% 158ns ± 1% ~ (p=0.606 n=8+9) BM_MemCmp<64, LessThanZero, None> 401ns ± 2% 396ns ± 2% ~ (p=0.063 n=9+9) BM_MemCmp<64, LessThanZero, First> 210ns ± 1% 209ns ± 1% ~ (p=0.142 n=7+9) BM_MemCmp<64, LessThanZero, Mid> 285ns ± 1% 285ns ± 1% ~ (p=0.606 n=8+9) BM_MemCmp<64, LessThanZero, Last> 402ns ± 3% 407ns ± 1% ~ (p=0.053 n=10+9) BM_MemCmp<64, GreaterThanZero, None> 375ns ± 4% 377ns ± 4% ~ (p=0.497 n=9+10) BM_MemCmp<64, GreaterThanZero, First> 210ns ± 2% 210ns ± 1% ~ (p=0.340 n=9+9) BM_MemCmp<64, GreaterThanZero, Mid> 306ns ± 2% 306ns ± 1% ~ (p=0.863 n=9+9) BM_MemCmp<64, GreaterThanZero, Last> 406ns ± 3% 407ns ± 1% ~ (p=0.546 n=9+9)
llvm/lib/CodeGen/ExpandMemCmp.cpp | ||
---|---|---|
682–686 | Is there a need to check for multiple users that are identical? Ie, doesn't CSE make this unlikely? |
llvm/lib/CodeGen/ExpandMemCmp.cpp | ||
---|---|---|
682–686 | It might be unlikely, but it's not guaranteed. I'm not sure how I would handle diverging users, did yo have anything specific in mind ? |
llvm/lib/CodeGen/ExpandMemCmp.cpp | ||
---|---|---|
371–375 | This is not correct. memcmp() compares as unsigned bytes: This program will show a failure compiled with -O1 vs -O0 (do we have some kind of correctness test like this in the test-suite?): #include <string.h> #include <stdio.h> void wrapper(char *c, char *d) { if (memcmp(c, d, 4) > 0) printf(">0\n"); else printf("<=0\n"); } int main() { int A = 0x00000000; int B = 0x00000080; wrapper(&A, &B); return 0; } | |
llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll | ||
128 | I think that's a miscompile...should be "setb"? |
Use the right unsigned equivalent for comparing loads instead of the signed predicate from the zero relational compare.
llvm/lib/CodeGen/ExpandMemCmp.cpp | ||
---|---|---|
371–375 | Thank for the catch. I've added more tests in 4edd050c7e97 to make the issue more obvious to the eye than the setl/setb issue. I though the test-suite benchmark was validating the result, but it's only benchmarking and throwing the result away. I'll fix the benchmarks. |
llvm/lib/CodeGen/ExpandMemCmp.cpp | ||
---|---|---|
371–375 | Great - thanks! | |
682–686 | We could just give up if the memcmp() call !hasOneUse(). It simplifies the code, but it might miss some strange case...but if there's no test coverage/evidence that the multi-user case exists, it should be fine? |
Side note: I was wondering what these patterns look like in DAG/IR, and why we're missing them, so:
D75961
We could argue that it's worth solving these more generally in DAGCombiner, but if we can produce the optimal code here without too much effort, that's good too.
llvm/lib/CodeGen/ExpandMemCmp.cpp | ||
---|---|---|
397 | That looks like the right fix, but it could use a comment so we'll remember why. Something like this: // The result of memcmp is a signed value, so the relational // compare kind predicate is always signed. But the byte // comparison used within memcmp is always unsigned. |
llvm/test/Transforms/ExpandMemCmp/X86/memcmp.ll | ||
---|---|---|
455 | Wait - this is another miscompile. Can we split the equality compare part of this into its own patch? If the 1st load pair compares equal, we must check the 2nd load pair, not go to the result block. |
Actually I'm actually now convinced that the whole approach is incorrect when there are several blocks. I'm going to take some time to rethink this.
This is not correct. memcmp() compares as unsigned bytes:
http://www.cplusplus.com/reference/cstring/memcmp/
This program will show a failure compiled with -O1 vs -O0 (do we have some kind of correctness test like this in the test-suite?):