Details
- Reviewers
hans
Diff Detail
Event Timeline
lib/Headers/x86intrin.h | ||
---|---|---|
49 | I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a faster encoding for BSF, but now we're putting a conditional in the way, so this will be slower actually. On the other hand, the alternative is weird too :-/ |
lib/Headers/x86intrin.h | ||
---|---|---|
49 | I thought there was a peephole to notice a guard like this and do the right thing? In which case having the guard is fine. |
lib/Headers/x86intrin.h | ||
---|---|---|
49 | But for non-BMI targets it won't be able to remove the guard (unless it can prove the argument is zero). MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg wants. |
lib/Headers/x86intrin.h | ||
---|---|---|
49 | I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) is well defined on zero input. Changing that semantic for non-BMI targets sounds a bit odd/confusing to me. I guess ffmpeg is reliant on the fact that other compilers would either preserve the intrinsic call until instruction selection, or fold it to a meaningful value (i.e. 32 in this case) when the input is known to be zero. If we remove the zero check from tzcnt_u32 (for non BMI targets), we let the optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when the input is zero. As a side note: I noticed that gcc provides intrinsic __bsfd in ia32intrin.h. Maybe we should do something similar? |
lib/Headers/x86intrin.h | ||
---|---|---|
49 | Yes, ffmpeg relies on the compiler to emit the raw TZCNT instruction even for non-BMI targets and don't call it with zero inputs. They just want a faster BSF. It's all pretty weird, but maybe Nico's original patch is the way to do this after all. At least then, the intrinsic will always do what it says in the doc. |
I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a faster encoding for BSF, but now we're putting a conditional in the way, so this will be slower actually. On the other hand, the alternative is weird too :-/