This is an archive of the discontinued LLVM Phabricator instance.

[llvm] Use pointer index type for more GEP offsets (pre-codegen)
ClosedPublic

Authored by krzysz00 on Feb 6 2023, 3:14 PM.

Details

Summary

Many uses of getIntPtrType() were using that type to calculate the
neened type for GEP offset arguments. However, some time ago,
DataLayout was extended to support pointers where the size of the
pointer is not equal to the size of the values used to index it.

Much code was already migrated to, for example, use getIndexSizeInBits
instead of getPtrSizeInBits, but some rewrites still used
getIntPtrType() to get the type for GEP offsets.

This commit changes uses of getIntPtrType() to getIndexType() where
they are involved in a GEP-related calculation.

In at least one case (bounds check insertion) this resolves a compiler
crash that the new test added here would previously trigger.

This commit does not impact

  • C library-related rewriting (memcpy()), which are operating under

the assumption that intptr_t == size_t. While all the mechanisms for
breaking this assumption now exist, doing so is outside the scope of
this commit.

  • Code generation and below. Note that the use of getIntPtrType() in

CodeGenPrepare will be changed in a future commit.

  • Usage of getIntPtrType() in any backend

Depends on D143435

Diff Detail

Event Timeline

krzysz00 created this revision.Feb 6 2023, 3:14 PM
Herald added a reviewer: ributzka. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
krzysz00 requested review of this revision.Feb 6 2023, 3:14 PM
krzysz00 added a reviewer: Restricted Project.Feb 6 2023, 3:14 PM

Thanks for this patch! In the CHERI fork I added a rather ugly workaround in getIngPtrType() for this instead of updating all callers (https://github.com/CTSRD-CHERI/llvm-project/blob/19d402e23fcaa197e1d40547da403dc17e13c7ae/llvm/lib/IR/DataLayout.cpp#L862). I will try applying the patch to CHERI LLVM to see if there are any further places that need updating.

arichardson added inline comments.Feb 7 2023, 3:38 PM
llvm/include/llvm/IR/IRBuilder.h
572

I'd suggest removing the default zero here to ensure that callers always pass the address space explicitly.

foad added a subscriber: foad.Feb 8 2023, 3:22 AM
foad added inline comments.
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
454

Not necessarily related to your patch, but the "OrTrunc" makes me nervous (here and in a couple of other places in this patch). My understanding of the index type in datalayout is that it is a preferred type to use GEP indexes, but shouldn't actually have any semantic effect, so it would be wrong to truncate an index to the index type unless you can prove that it won't change the value.

@arichardson I'd like to also draw your attention to https://reviews.llvm.org/D143526, its global isel equivalent.

It'd be nice to fix up SelectionDAG to also generate ADD nodes using the index width and not the data width, but that's not what I'm trying first.

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
454

That's not my understanding of that field. The index field is maximal size of an offset into a pointer in address space N. For example, CHERI's capability pointers are (to my understanding) 128:128:128:64 meaning that the pointer value is 128 bits long, but only 64 bits of offset can be applied to it.

That is, in C terms, the pointer type is intptr_t and the index field is size_t. This patch fixes up a bunch of code that normalized pointer offsets to intptr_t instead of size_t, since LLVM historically assumed they were the same.

krzysz00 updated this revision to Diff 496564.Feb 10 2023, 11:17 AM
krzysz00 edited the summary of this revision. (Show Details)

... turns out I just wanted getIndexType() this whole time

arsenm added inline comments.Feb 10 2023, 11:38 AM
llvm/include/llvm-c/Target.h
249 ↗(On Diff #495283)

Adding new stuff to the C API should be done separately and should get unit tests

llvm/lib/IR/DataLayout.cpp
878

Can this just be inline in the header?

llvm/lib/Transforms/Scalar/NaryReassociate.cpp
355

nary reassociate not tested

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
694

slsr not tested

llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
327

Vectorizer not tested

krzysz00 updated this revision to Diff 497038.Feb 13 2023, 10:21 AM

While I was there, found another case where the constant offset extractor was called with pointer bitwidth APInt and not an index bitwidth APInt.

arsenm added inline comments.Feb 13 2023, 11:08 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
283–287

Don't think this one is tested

llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
414

Don't think this one is tested

llvm/lib/Transforms/Scalar/MergeICmps.cpp
160

Don't think this one is tested

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
930

Doesn't creating a ptrtoint from a non-integral pointer break horribly?

llvm/lib/Transforms/Utils/FunctionComparator.cpp
752

Don't think this one is tested

krzysz00 updated this revision to Diff 497111.Feb 13 2023, 2:20 PM
krzysz00 marked 4 inline comments as done.

Add test for icmp change, guard the inttoptr based GEP lowering so it doesn't reach fat pointers or non-integeral ones.

krzysz00 marked an inline comment as done.Feb 13 2023, 2:21 PM
krzysz00 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
283–287

You're right, added a test.

llvm/lib/Transforms/Scalar/ConstantHoisting.cpp
414

GEPOperator::accumulateConstantOffset asserts that its input bitwidth has bitwidth equal to the index size of the GEP base, so this was a straight-up bug.

llvm/lib/Transforms/Scalar/MergeICmps.cpp
160

Same thing with accumulateConstantOffset

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
930

Good catch, I've added guards to make sure non-integral address spaces don't get sent this direction. And that fat pointers don't enter this code path either.

llvm/lib/Transforms/Utils/FunctionComparator.cpp
752

Same note re accumulateConstantOffset()

jrtc27 added a subscriber: jrtc27.Feb 16 2023, 10:34 AM
jrtc27 added inline comments.
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
454

This is correct; see SelectionDAGBuilder::visitGetElementPtr for example where the use of the index type is explicit (and there's some commentary, too). It is legal to use an oversized type in the IR, but it will eventually be legalised. Undersized types are implicitly sign-extended.

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
794

Arguably this is now a slight misnomer

921

IR is not C, plus the assert documents that anyway?

926

Why not assert they're the same type?

930

Hm, weirdly I don't see any changes to this function in CHERI LLVM, which makes me wonder how we don't hit issues here on CHERI-RISC-V given the RISC-V backend doesn't return true for STI->useAA()...

1063

Comment probably needs updating to match

1064–1072

Why is AddrSpace before the comment but IsFatPointer after?

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
677–678

If you're changing the indentation below please also fix the curlies (and break) to match

arsenm added inline comments.Feb 16 2023, 10:38 AM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1064–1072

Don't know what FatPointer is meaning here, but isNonIntegralAddressSpace should cover it?

jrtc27 added inline comments.Feb 16 2023, 10:40 AM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1064–1072

Non-integral is overly strong, it also implies instability of the value if converted to an integer multiple times, which is not true of CHERI, nor AMDGPU AFAIK

krzysz00 updated this revision to Diff 498143.Feb 16 2023, 2:40 PM
krzysz00 marked 6 inline comments as done.

Address review comments

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
794

... yeah, I should fix that

1064–1072

Yeah, this is safeguarding against the possibility of having the buffer descriptor be an integral value, where you need to be very careful when adding things (to avoid carries into the metadata).

llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
677–678

I don't think this is changing the indentation?

jrtc27 added inline comments.Feb 16 2023, 2:59 PM
llvm/lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
677–678

Phabricator seems to think you've de-indented by 2 spaces

jrtc27 added inline comments.Feb 16 2023, 3:27 PM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1066

As discussed on discourse, this isn't really what that means, at least not for CHERI, which was one of the influences for how the current pointer index type came to be (D42123)

arichardson added inline comments.Feb 16 2023, 4:28 PM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
930

Probably due to the DL->getIntPtrType hack that we have (which been hopefully be deleted soon after merging up to this change).

jrtc27 added inline comments.Feb 16 2023, 4:38 PM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
930

It would still generate a metadata-stripping ptrtoint+...+inttoptr sequence though, that would only stop it asserting on using the wrong type for ptrtoint

foad added inline comments.Feb 17 2023, 2:00 AM
llvm/lib/Transforms/Scalar/NaryReassociate.cpp
454

If it has a semantic effect on IR then that needs to be clearly documented in the LangRef. Literally the only mention of it in the document is: "The fourth parameter <idx> is a size of index that used for address calculation."

krzysz00 updated this revision to Diff 498418.Feb 17 2023, 9:44 AM

Update documentation on GEP since it didn't get fully updated

Allen added a subscriber: Allen.Feb 20 2023, 8:00 AM

What's missing here that prevents a landing? I feel like I missed something in the sea of comments

What's missing here that prevents a landing? I feel like I missed something in the sea of comments

I'm being a stickler for testing every single piece here and there are a few missing (e.g. MergeICmps.cpp)

@arsenm, I can try to put together tests for some of the remaining stuff (like the ICmp merge). However, unless I'm forgetting something, the remaaining untested bits are the ones that call accumulateConstantOffset(), which, per https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Operator.cpp#L94-L96 , will crash if you mix up pointer bitwidth and index bitwidth. I'm not sure that fixing up those crashes (when most of the other code in LLVM has been fixed for this) justifies a test?

@arsenm, I can try to put together tests for some of the remaining stuff (like the ICmp merge). However, unless I'm forgetting something, the remaaining untested bits are the ones that call accumulateConstantOffset(), which, per https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Operator.cpp#L94-L96 , will crash if you mix up pointer bitwidth and index bitwidth. I'm not sure that fixing up those crashes (when most of the other code in LLVM has been fixed for this) justifies a test?

You could always pre-commit a not --crash test that is fixed by this commit?

I didn't know not --crash was a Thing, thanks!

I'll try and put those together tomorrow

krzysz00 updated this revision to Diff 499987.Feb 23 2023, 2:51 PM

Use precommited tests for accumulateConstantOffset() calls

arichardson accepted this revision.Feb 23 2023, 3:44 PM

I would check the output of the no longer crashing tests but otherwise LGTM. Please wait for @arsenm to confirm as well before committing though.

llvm/test/Transforms/MergeICmps/X86/distinct-index-width-crash.ll
1

Maybe also check the output here to make sure it's somewhat sensible?

This revision is now accepted and ready to land.Feb 23 2023, 3:44 PM
krzysz00 updated this revision to Diff 500235.Feb 24 2023, 9:15 AM
krzysz00 marked an inline comment as done.

Update tests to include non-trivial CHECK lines

krzysz00 updated this revision to Diff 500295.Feb 24 2023, 1:58 PM

Missed a bad test, whoops

krzysz00 updated this revision to Diff 502160.Mar 3 2023, 9:28 AM

Update for planned new address spaces (p7 = descriptor + offset, p8 = descriptor)

@arsenm Any objections to landing this? I think I've added all the needed "don't crash" tests

krzysz00 updated this revision to Diff 507773.Mar 23 2023, 9:46 AM

Rebase updates

krzysz00 updated this revision to Diff 507848.Mar 23 2023, 12:32 PM

Fix buffer tests