Adds the -fast-16-labels flag, which enables efficient instrumentation
for DFSan when the user needs <=16 labels. The instrumentation
eliminates most branches and most calls to dfsan_union or
dfsan_union_load.
We also add a call into the runtime during preinit to enable
fast16labels mode there, since we may still call __dfsan_union[_load] in
rare cases.
maybe -dataflow-fast-16-labels?