This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (from "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.