Allows the result to be more dynamically-sized than the source.
since the srcTensorTp is always more specific than the dest (but they are always congruent), passing in srcTensorTp too will actually make the dense alloc more static than it is now. Of course, you will have to cast the srcTensorTp to dstTensorTp eventually if they are not the same
perhaps not worth it, since the big savings (avoiding runtime calls) comes from your change in L627 already, and this will only make the alloc take less arguments...
you had remove this in a previous revision, but somehow it came back
Please remove the two lines as requested, and make sure we build green before submitting but no need for other review. LGTM
This seems a bit redundant, since this applies to all branches (and all ops we generate anyway).
per our offline discussion, not worth it, big savings is already done, and constant prop will do the rest