During initialisation, check whether the llvm.arm.sel intrinsic is used in the current module. If it has, we conservatively choose that it would be unsafe to generate the DSP add/sub instructions that will write to the GE flags - the same flags which sel will read.
This isn't really correct, in any useful sense... we can't assume the current translation unit reflects the entire program.
I mean, if you really think this optimization is sufficiently useful, we can add a flag for it. Presumably user code could enable it if they want to, but it would have to be disabled for libraries shipped with the compiler (like compiler-rt).
I don't intend to remove the -arm-enable-scalar-dsp flag or change the default from false, meaning that the user will have to enable it and even then the pass can decide against it. From your comments in D48832 and John's in D49239, this seems like the most conservative approach to me. By the ACLE standard I don't have to worry about the whole program and I could just look at the current function. I will want to use the same approach in the parallel dsp pass but I want to be as safe as possible.
Because the ABI says that the GE bits are undefined on entry / return from public interfaces (i.e. non-static functions) I think that it is possible to just look at the current translation unit and make a decision based on that. I'm not sure if just looking at whether the sel intrinsic is used will work though - the GE bits could be read by __arm_rsr("APSR"), or by inline assembly, and maybe there are other situations that I haven't thought of.
Yes, it's certainly complicating that the ABI and ACLE don't talk about inline assembly or other intrinsics and this is why I still want to have the option guarded by a command line option. I will update the patch to look at the function only.