SmallVectorTemplateCommon wants to know the address of the first element so it can detect whether it's in "small size" mode.
The old implementation split the small array, creating the storage for the first element in SmallVectorTemplateCommon, and pulling the rest into SmallVectorStorage where we know the size of the array. This bloats SmallVector small-size 0 by the larger of sizeof(void*) and sizeof(T) unnecessarily.
The new implementation leaves the full small storage to SmallVectorStorage. To calculate the offset of the first element in SmallVectorTemplateCommon, we just need to know how far to jump, which we can calculate out-of-band. One subtlety is that we need SmallVectorStorage to be properly aligned even when the size is 0, to be sure that (for large alignments) we actually have the padding and it's well defined to do the pointer math.
This assumes that base classes are laid out more or less the same as fields, but we were already assuming that there wouldn't be padding between FirstEl and InlineElts, so this seems like an improvement.