This is an archive of the discontinued LLVM Phabricator instance.

[TypeSize] Allow returning scalable size in implicit conversion to uint64_t
ClosedPublic

Authored by sdesmalen on Feb 27 2020, 1:04 PM.

Details

Summary

This patch removes compiler runtime assertions that ensure the implicit
conversion are only guaranteed to work for fixed-width vectors.

With the assert it would be impossible to get _anything_ to build until the
entire codebase has been upgraded, even when the indiscriminate uses of
the size as uint64_t would work fine for both scalable and fixed-width
types.

This issue will need to be addressed differently, with build-time errors
rather than assertion failures, but that effort falls beyond the scope
of this patch.

Returning the scalable size and avoiding the assert in getFixedSize()
is a temporary stop-gap in order to use LLVM for compiling and using
the SVE ACLE intrinsics.

Diff Detail

Event Timeline

sdesmalen created this revision.Feb 27 2020, 1:04 PM
Herald added a project: Restricted Project. · View Herald Transcript

how pervasive is this issue? I worry that adding this stopgap today will allow a bunch of dangerous code to get committed, and will make future efforts to correct the issue more difficult.

Can the places where this currently blows up just call getKnownMinSize()?

how pervasive is this issue? I worry that adding this stopgap today will allow a bunch of dangerous code to get committed, and will make future efforts to correct the issue more difficult.

We should probably move to a different approach soon (as we discussed in our last sync-up call), but until then I think there is little down-side to removing this assert. Removing the assert would allow us to use the SVE ACLE intrinsics which don't rely on common optimisations.
Until we get auto-vec support there are currently no other uses of scalable types. Moving to auto-vec will require a systematic approach to upgrading the code-base to distinguish scalable types, so we may have to fix up a couple more cases for new code that's added until we do so, but even with the assert nothing would break unless someone adds new code and/or tests that uses scalable types.

Can the places where this currently blows up just call getKnownMinSize()?

I'd rather not change all places to use getKnownMinSize() as this suggests the code is successfully ported.

Removing the assert would allow us to use the SVE ACLE intrinsics which don't rely on common optimisations.

Even without any optimizations that are specifically SVE-aware, we're going to generate a lot of instructions with SVE types that aren't SVE intrinsics. In particular, I assume plain C assignment operations are going to lower to "load" and "store" instructions. So really, this is trading assertions for subtle miscompiles in a lot of cases.

If you're trying to hit a certain deadline, maybe being unable to build any SVE code at all is worse than a bunch of subtle miscompiles? That seems like a terrible tradeoff either way, though.

We should probably move to a different approach soon (as we discussed in our last sync-up call), but until then I think there is little down-side to removing this assert. Removing the assert would allow us to use the SVE ACLE intrinsics which don't rely on common optimisations.
Until we get auto-vec support there are currently no other uses of scalable types. Moving to auto-vec will require a systematic approach to upgrading the code-base to distinguish scalable types, so we may have to fix up a couple more cases for new code that's added until we do so, but even with the assert nothing would break unless someone adds new code and/or tests that uses scalable types.

I have concerns about introducing technical debt that we plan to pay off "soon". Life happens and things get deemed "good enough". If this patch is necessary for work to proceed on ACLE, perhaps those working on it should locally cherry pick this patch and do their work?

Currently, this implicit conversion is an alias for getFixedSize(). Lots of code is written using this assumption, and changing the meaning of the conversion is only going to add bugs. In my opinion, the conversion should be removed; if this had been done previously then we wouldn't be in this position today. Failing that, we should fix the breakages, not hide them.

Even without any optimizations that are specifically SVE-aware, we're going to generate a lot of instructions with SVE types that aren't SVE intrinsics. In particular, I assume plain C assignment operations are going to lower to "load" and "store" instructions. So really, this is trading assertions for subtle miscompiles in a lot of cases.

That's right. Plain assignment operations will use the unpredicated load/store instructions. There are definitely optimiziations for those that we'd need to disable for scalable vectors.

If you're trying to hit a certain deadline, maybe being unable to build any SVE code at all is worse than a bunch of subtle miscompiles? That seems like a terrible tradeoff either way, though.

It's also about making progress on getting any support for SVE/SVE2 in Clang/LLVM whilst in the meantime addressing the fixed-width assumptions more structurally. The one doesn't necessarily have to block the other, we can do this in parallel. At the moment scalable vectors are not yet actually supported in LLVM, so it is not unexpected that things are broken while we work towards adding support for it.
I agree though that the trade-off is not the most comfortable one, but by minimising the use of common parts of LLVM by implementing most functionality with custom intrinsics, a lot of that risk can be mitigated for the ACLE.

I have concerns about introducing technical debt that we plan to pay off "soon". Life happens and things get deemed "good enough". If this patch is necessary for work to proceed on ACLE, perhaps those working on it should locally cherry pick this patch and do their work?

It depends on what you mean with proceeding. Keeping the assert means that LLVM can't be used for scalable vectors until we've fully upgraded the code-base to use the proper interfaces to TypeSize (and ElementCount, given D75478), which is a really significant migration effort. The upgrading also becomes harder than it needs to be because tests need to be written that don't trigger the assert. In the worst case, we may not be able to write any tests until significant upgrading has occurred (I think we've already seen this in some cases).

Currently, this implicit conversion is an alias for getFixedSize(). Lots of code is written using this assumption, and changing the meaning of the conversion is only going to add bugs. In my opinion, the conversion should be removed; if this had been done previously then we wouldn't be in this position today. Failing that, we should fix the breakages, not hide them.

Yes, I agree we should remove the conversion, the question is more how we go about doing that. It is a trade-off between working until we've fixed up all uses of getSize() before we can build a simple example, vs gradually making an implementation we have more stable.

We've found that having the assertion makes it much easier to track down specific callers. That means we don't get some weird crash/verifier failure/miscompile later on, we can be reasonably sure that we aren't hitting an issue with this if the assertion doesn't trigger, and we accumulate testcases for various codepaths. So I'd like to try to keep the assertion if possible.

If it's blocking your work on the intrinsics, though, that's worse.

I've said my piece; I'm fairly strongly opposed to this change because I feel that it will turn obvious bugs into subtle bugs. That said, if you can convince Eli that this is fine then I'll defer to his judgement.

dancgr added a subscriber: dancgr.Mar 4 2020, 2:39 PM
sdesmalen updated this revision to Diff 248521.Mar 5 2020, 9:51 AM
  • Added cmake option to enable use of getFixedSize() in implicit conversion of TypeSize to uint64_t, which triggers an assertion failure when the size is scalable.
  • Added deprecation warning for the implicit conversion of TypeSize->uint64_t.

I updated the patch to reflect the discussion in the sync-up call earlier. The LLVM_ATTRIBUTE_DEPRECATED causes a lot of deprecated warnings though. These warnings are of course easy to silence, but I'm not sure how palatable that is for others?

efriedma added inline comments.Mar 5 2020, 9:59 AM
llvm/include/llvm/Support/TypeSize.h
165

Do deprecated warnings trigger on the LLVM codebase itself? I think producing a ton of warnings is going to cause issues.

Yes, producing warnings by default is going to be a problem. One, it's going to really annoy everyone else, and two, some bots build with -Werror.

sdesmalen updated this revision to Diff 248763.Mar 6 2020, 9:30 AM
  • Replaced deprecated warning (at LLVM compile time) with runtime warning message.
miyuki added a subscriber: miyuki.Mar 9 2020, 10:01 AM
efriedma accepted this revision.Mar 11 2020, 6:57 PM

LGTM

llvm/CMakeLists.txt
426

80 columns

llvm/lib/Target/AMDGPU/AMDGPULibFunc.cpp
608 ↗(On Diff #248763)

Unrelated change?

This revision is now accepted and ready to land.Mar 11 2020, 6:57 PM
sdesmalen updated this revision to Diff 250168.Mar 13 2020, 3:13 AM
sdesmalen marked 3 inline comments as done.
  • Removed isDigit from AMDGPULibFunc.cpp to use the one defined in StringExtras.h instead.
llvm/CMakeLists.txt
426

The convention in this file doesn't seem to honour 80 chars. I've put the message on a separate line similar to how this is done for other options.

llvm/lib/Target/AMDGPU/AMDGPULibFunc.cpp
608 ↗(On Diff #248763)

llvm/Support/WithColor.h indirectly includes llvm/ADT/StringExtras.h which also defines a isDigit function. That makes this call ambiguous with the isDigit defined in this file.
The better fix is probably to remove the local isDigit and include StringExtras.h in this file directly.

ctetreau added inline comments.Mar 13 2020, 8:35 AM
llvm/CMakeLists.txt
426

No need to change it unless you need to upload a patch for some other reason, but I believe the policy is to run clang-format on lines that you touch. The fact that this policy creates local inconsistency is a separate flame war we need not get into here.

I often get a bot that complains when I mess this up elsewhere in the codebase, but it doesn't seem to run consistently for some reason...

This revision was automatically updated to reflect the committed changes.
sdesmalen marked an inline comment as done.
sdesmalen added inline comments.Mar 15 2020, 7:38 AM
llvm/CMakeLists.txt
426

Okay thanks, I wasn't aware that was the official policy. For this specific case though, clang-format doesn't seem to apply to CMake files.