This patch introduces mechanism for std::visit backed by switch.
The switch is structured such that it's a flattened manual vtable (an n-ary array).
The switch mechanism is enabled if (1 * ... * vs.size()) < 256.
The following are performance numbers from the benchmarks added in D85419, tested on my 2017 Macbook Pro.
Before: using only the manual vtable mechanism.
$ ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out 2020-08-09 23:55:14 Running ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out Run on (8 X 3100 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 2.03, 2.36, 2.43 ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ BM_Visit<1, 1> 0.260 ns 0.260 ns 1000000000 BM_Visit<1, 2> 1.56 ns 1.56 ns 435925220 BM_Visit<1, 3> 1.55 ns 1.55 ns 444416228 BM_Visit<1, 4> 1.57 ns 1.57 ns 427951336 BM_Visit<1, 5> 1.57 ns 1.56 ns 444766371 BM_Visit<1, 6> 1.70 ns 1.68 ns 446639358 BM_Visit<1, 7> 1.64 ns 1.64 ns 400441630 BM_Visit<1, 8> 1.56 ns 1.56 ns 430729471 BM_Visit<1, 9> 1.58 ns 1.58 ns 449894596 BM_Visit<1, 10> 1.54 ns 1.54 ns 449660506 BM_Visit<1, 20> 1.56 ns 1.56 ns 450813074 BM_Visit<1, 30> 1.59 ns 1.59 ns 440032940 BM_Visit<1, 40> 1.59 ns 1.59 ns 443731656 BM_Visit<1, 50> 1.56 ns 1.56 ns 444709859 BM_Visit<1, 60> 1.59 ns 1.58 ns 439527320 BM_Visit<1, 70> 1.57 ns 1.57 ns 438450890 BM_Visit<1, 80> 1.58 ns 1.58 ns 443001525 BM_Visit<1, 90> 1.63 ns 1.62 ns 448456349 BM_Visit<1, 100> 1.57 ns 1.57 ns 445740630 $ ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out 2020-08-09 23:59:35 Running ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out Run on (8 X 3100 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 1.40, 1.94, 2.22 ----------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------- BM_Visit<2, 1> 0.261 ns 0.260 ns 1000000000 BM_Visit<2, 2> 1.55 ns 1.54 ns 432844219 BM_Visit<2, 3> 1.30 ns 1.30 ns 532529974 BM_Visit<2, 4> 1.54 ns 1.54 ns 446055910 BM_Visit<2, 5> 1.31 ns 1.31 ns 531099680 BM_Visit<2, 6> 1.56 ns 1.56 ns 443203475 BM_Visit<2, 7> 1.29 ns 1.29 ns 526478087 BM_Visit<2, 8> 1.56 ns 1.56 ns 439000834 BM_Visit<2, 9> 1.30 ns 1.30 ns 528756817 BM_Visit<2, 10> 1.56 ns 1.55 ns 442923039 BM_Visit<2, 20> 1.35 ns 1.35 ns 517021072 BM_Visit<2, 30> 1.60 ns 1.59 ns 419724661 BM_Visit<2, 40> 1.45 ns 1.44 ns 472137163 BM_Visit<2, 50> 1.65 ns 1.65 ns 421389743 $ ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out 2020-08-10 00:01:32 Running ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out Run on (8 X 3100 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 2.20, 2.01, 2.21 ----------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------- BM_Visit<3, 1> 0.272 ns 0.271 ns 1000000000 BM_Visit<3, 2> 1.87 ns 1.86 ns 361858090 BM_Visit<3, 3> 1.77 ns 1.77 ns 391192579 BM_Visit<3, 4> 1.84 ns 1.84 ns 374694223 BM_Visit<3, 5> 1.75 ns 1.75 ns 408270392 BM_Visit<3, 6> 1.88 ns 1.88 ns 378759185 BM_Visit<3, 7> 1.79 ns 1.79 ns 395498102 BM_Visit<3, 8> 1.85 ns 1.85 ns 371660366 BM_Visit<3, 9> 1.80 ns 1.80 ns 386872851 BM_Visit<3, 10> 1.84 ns 1.84 ns 362367606 BM_Visit<3, 15> 1.77 ns 1.77 ns 392060220 BM_Visit<3, 20> 1.85 ns 1.85 ns 379157188
After: using the switch-based mechanism.
$ ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out 2020-08-12 04:17:33 Running ./projects/libcxx/benchmarks/variant_visit_1.libcxx.out Run on (8 X 3100 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 1.96, 2.26, 3.49 ------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------ BM_Visit<1, 1> 0.280 ns 0.279 ns 1000000000 BM_Visit<1, 2> 0.285 ns 0.282 ns 1000000000 BM_Visit<1, 3> 0.281 ns 0.279 ns 1000000000 BM_Visit<1, 4> 0.278 ns 0.277 ns 1000000000 BM_Visit<1, 5> 0.283 ns 0.282 ns 1000000000 BM_Visit<1, 6> 0.279 ns 0.278 ns 1000000000 BM_Visit<1, 7> 0.279 ns 0.278 ns 1000000000 BM_Visit<1, 8> 0.273 ns 0.272 ns 1000000000 BM_Visit<1, 9> 0.275 ns 0.274 ns 1000000000 BM_Visit<1, 10> 0.274 ns 0.273 ns 1000000000 BM_Visit<1, 20> 0.277 ns 0.276 ns 1000000000 BM_Visit<1, 30> 0.278 ns 0.277 ns 1000000000 BM_Visit<1, 40> 0.269 ns 0.268 ns 1000000000 BM_Visit<1, 50> 0.271 ns 0.271 ns 1000000000 BM_Visit<1, 60> 0.272 ns 0.270 ns 1000000000 BM_Visit<1, 70> 0.275 ns 0.274 ns 1000000000 BM_Visit<1, 80> 0.278 ns 0.276 ns 1000000000 BM_Visit<1, 90> 0.272 ns 0.271 ns 1000000000 BM_Visit<1, 100> 0.269 ns 0.268 ns 1000000000 $ ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out 2020-08-12 04:09:59 Running ./projects/libcxx/benchmarks/variant_visit_2.libcxx.out Run on (8 X 3100 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 2.17, 4.20, 4.78 ----------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------- BM_Visit<2, 1> 0.302 ns 0.301 ns 1000000000 BM_Visit<2, 2> 0.297 ns 0.295 ns 1000000000 BM_Visit<2, 3> 0.353 ns 0.351 ns 1000000000 BM_Visit<2, 4> 0.276 ns 0.276 ns 1000000000 BM_Visit<2, 5> 0.285 ns 0.283 ns 1000000000 BM_Visit<2, 6> 0.290 ns 0.287 ns 1000000000 BM_Visit<2, 7> 0.282 ns 0.280 ns 1000000000 BM_Visit<2, 8> 0.290 ns 0.287 ns 1000000000 BM_Visit<2, 9> 0.291 ns 0.285 ns 1000000000 BM_Visit<2, 10> 0.293 ns 0.287 ns 1000000000 BM_Visit<2, 20> 1.70 ns 1.68 ns 391400375 BM_Visit<2, 30> 1.64 ns 1.63 ns 418925874 BM_Visit<2, 40> 1.63 ns 1.62 ns 423623677 BM_Visit<2, 50> 1.68 ns 1.67 ns 411687212 $ ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out 2020-08-12 04:10:43 Running ./projects/libcxx/benchmarks/variant_visit_3.libcxx.out Run on (8 X 3100 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 1.57, 3.76, 4.59 ----------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------- BM_Visit<3, 1> 0.271 ns 0.270 ns 1000000000 BM_Visit<3, 2> 0.344 ns 0.334 ns 1000000000 BM_Visit<3, 3> 0.347 ns 0.336 ns 1000000000 BM_Visit<3, 4> 0.300 ns 0.296 ns 1000000000 BM_Visit<3, 5> 0.290 ns 0.286 ns 1000000000 BM_Visit<3, 6> 0.272 ns 0.271 ns 1000000000 BM_Visit<3, 7> 1.72 ns 1.71 ns 415765841 BM_Visit<3, 8> 1.73 ns 1.72 ns 408909555 BM_Visit<3, 9> 2.16 ns 2.04 ns 380898485 BM_Visit<3, 10> 2.45 ns 2.40 ns 295714256 BM_Visit<3, 15> 1.92 ns 1.85 ns 375990332 BM_Visit<3, 20> 1.66 ns 1.65 ns 414456233
_LIBCPP_VARIANT_DISPATCH(__index);
?