This is an archive of the discontinued LLVM Phabricator instance.

[ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291
Needs ReviewPublic

Authored by thakis on Nov 6 2016, 1:06 PM.

Details

Reviewers
hans

Diff Detail

Event Timeline

thakis updated this revision to Diff 76992.Nov 6 2016, 1:06 PM
thakis retitled this revision from to [ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291.
thakis updated this object.
thakis added a reviewer: hans.
thakis added a subscriber: cfe-commits.
hans added inline comments.Nov 7 2016, 11:00 AM
lib/Headers/x86intrin.h
49

I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a faster encoding for BSF, but now we're putting a conditional in the way, so this will be slower actually. On the other hand, the alternative is weird too :-/

probinson added inline comments.
lib/Headers/x86intrin.h
49

I thought there was a peephole to notice a guard like this and do the right thing? In which case having the guard is fine.

hans added inline comments.Nov 7 2016, 1:18 PM
lib/Headers/x86intrin.h
49

But for non-BMI targets it won't be able to remove the guard (unless it can prove the argument is zero).

MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg wants.

andreadb added inline comments.Nov 9 2016, 3:27 AM
lib/Headers/x86intrin.h
49

I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) is well defined on zero input. Changing that semantic for non-BMI targets sounds a bit odd/confusing to me.

I guess ffmpeg is reliant on the fact that other compilers would either preserve the intrinsic call until instruction selection, or fold it to a meaningful value (i.e. 32 in this case) when the input is known to be zero. If we remove the zero check from tzcnt_u32 (for non BMI targets), we let the optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when the input is zero.

As a side note: I noticed that gcc provides intrinsic __bsfd in ia32intrin.h. Maybe we should do something similar?

hans added inline comments.Nov 28 2016, 9:29 AM
lib/Headers/x86intrin.h
49

Yes, ffmpeg relies on the compiler to emit the raw TZCNT instruction even for non-BMI targets and don't call it with zero inputs. They just want a faster BSF.

It's all pretty weird, but maybe Nico's original patch is the way to do this after all. At least then, the intrinsic will always do what it says in the doc.