Attempts to fix https://bugs.llvm.org/show_bug.cgi?id=49322
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/SystemZ/args-11.ll | ||
---|---|---|
14 | Not sure why 24 extra bytes are allocated and not 20... Alignment? |
I'm not sure if "getTypeToTransformTo(*DAG.getContext(), OrigArgVT)" results in the same type that is used by common code in all cases.
If you look at code in TargetLowering::LowerCallTo and getCopyToParts, the logic seems to be different, and explicitly goes via a set of full "registers":
MVT RegisterVT = getRegisterTypeForCallingConv(CLI.RetTy->getContext(), CLI.CallConv, VT); unsigned NumRegs = getNumRegistersForCallingConv(CLI.RetTy->getContext(), CLI.CallConv, VT);
and then
if (NumParts * PartBits > ValueVT.getSizeInBits()) { [...] ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits); Val = DAG.getNode(ExtendKind, DL, ValueVT, Val);
so I'm wondering whether it wouldn't be better to follow that model here. (E.g. by using CreateStackTemporary with a size instead of a type, and computing the size as "number-of-parts * size-of-part".)
Also, I think in any case it would be good to add an assertion when emitting the Store nodes that the target memory range of the store is actually within the allocated space, so that if we do get something wrong in the size calcuation the result will be an internal compiler error rather than wrong code.
llvm/test/CodeGen/SystemZ/args-11.ll | ||
---|---|---|
14 | Yes, the stack always needs to be 8-byte aligned, and therefore all stack allocations are rounded up to the next multiple of 8. |
Updated per review.
I'm wondering whether it wouldn't be better to follow that model here. (E.g. by using CreateStackTemporary with a size instead of a type, and computing the size as "number-of-parts * size-of-part".)
I think you are right - reading the comment yesterday for getTypeToTransformTo() seemed to make sense to me for the promoted integer types, but I see now that the second (new) test I added got with that call an i136 -> i256 promotion (space for 4 parts) . The updated patch now only creates space for 3 parts, which is also what is expected...
The rest of the cases are expected to be "regular", and the added assert will catch them if they are not...
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
1594 | This was easier than calling CreateStackTemporary with a TypeSize and an alignment value. |
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp | ||
---|---|---|
1588 | I'm not sure this is the correct check, it doesn't appear to match anything that is done in common code ... I think it would be better to explicitly check for the case of multiple parts (e.g. via if (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex)). If it's just a single part, I think the current approach to just use Outs[I].ArgVT is the best; if it is multiple parts, then we should do the NumParts * PartSize allocation as you do below. | |
1594 | Ah, OK. Makes sense as well. |
Just a heads-up about a possible pitfall of this patch, though I'm not sure it actually affects SystemZ.
In D99068 I basically ported this patch to the RISC-V target, as it was affected by the same problem that this patch fixes.
While comparing that patch with an alternative implementation (D99087) it became clear that this approach can produce stack slots with incorrect alignment.
The relevant part of the patch is SlotVT = EVT::getIntegerVT(Ctx, PartVT.getSizeInBits() * N);.
An example where it produces wrong results is when the original type fp128, which has 16-byte alignment, becomes i128, which for riscv32 has a lower alignment.
Perhaps more relevant for SystemZ, vectors can also suffer from similar (but even more dramatic) changes in alignment.
Although I had abandoned D99068 in favour of D99087, I've updated D99068 to address the alignment issues, in case it's useful.
Thanks for the heads-up, @luismarques ! However, this is not really an issue on SystemZ as all arguments in question have an alignment requirement of 8 bytes on our platform.
I'm not sure this is the correct check, it doesn't appear to match anything that is done in common code ...
I think it would be better to explicitly check for the case of multiple parts (e.g. via if (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex)). If it's just a single part, I think the current approach to just use Outs[I].ArgVT is the best; if it is multiple parts, then we should do the NumParts * PartSize allocation as you do below.