This patch fixes PR19657. As Arnold pointed out the reason we miss to vectorize these loads is we are processing the smaller subtree before the larger subtree. This patch calculates the depth of the subtress before calling buildTree_rec and calls the buildTree_rec for the larger subtree before calling it for smaller subtree.
This seems to fix the problem and there are no regressions.
But I'm not sure if there is any easier and more efficient way to check this than actually traversing the subtrees and calculating their depth before we call buildTree_rec. It would be great if you could suggest improvements to this patch.