This patch avoids the initial memset at the cost of making iterators
slightly more complex. This should be beneficial as most SmallPtrSets
hold no or only a few elements, while iterating over them is less
common.
It's not a measurable change in compiletime but valgrind/callgrind shows saving of ~0.25% instructions in my tests.
If we're going to iterate specially in the small case, why not completely specialize the iteration? Notably, can't you skip the tombstone test in the loop when small?