This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibCalls] handle subobjects of constant aggregates
ClosedPublic

Authored by msebor on May 6 2022, 11:45 AM.

Details

Summary

This change removes the known limitation of the library function call folders to only work with top-level arrays of characters (as per the TODO comment in the code) and allows them to also fold calls involving subobjects of constant aggregates such as member arrays. It does that by

  1. adding a new function, ReadByteArrayFromGlobal that returns the byte representation of the initializer of a global constant, as a ConstantArray, and
  2. removing the restriction from getConstantDataArrayInfo to only consider simple strings and having it use ReadByteArrayFromGlobal to convert other aggregates to arrays of bytes.

This initial change only handles constant offsets into these aggregates.

I have tested the change by building LLVM on x86_64 and running check-all. In full disclosure, the patch seems to cause a few surprising regressions in the test suite that I'm yet to investigate:

LLVM :: CodeGen/AArch64/arm64-2012-05-07-MemcpyAlignBug.ll
LLVM :: CodeGen/ARM/constantpool-promote-ldrh.ll
LLVM :: CodeGen/BPF/remove_truncate_5.ll
LLVM :: CodeGen/BPF/rodata_2.ll
LLVM :: CodeGen/PowerPC/aix-vec-arg-spills-mir.ll
LLVM :: CodeGen/PowerPC/aix-vec-arg-spills.ll
LLVM :: DebugInfo/COFF/types-array.ll

Diff Detail

Event Timeline

msebor created this revision.May 6 2022, 11:45 AM
msebor requested review of this revision.May 6 2022, 11:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 11:45 AM
nikic added a comment.May 6 2022, 12:30 PM

In terms of high-level design, converting the initializer to a ConstantDataArray is not great: This is a very inefficient representation, especially if you take into account that these "temporary" ConstantDataArrays are globally interned and will probably never get freed.

Now that the ConstantDataArraySlice no longer corresponds to an already existing ConstantDataArray, it should probably be switched to store an array of bytes instead. From a cursory look, I don't think anything fundamentally depends on the current representation.

nikic added a comment.May 6 2022, 1:14 PM

Now that the ConstantDataArraySlice no longer corresponds to an already existing ConstantDataArray, it should probably be switched to store an array of bytes instead. From a cursory look, I don't think anything fundamentally depends on the current representation.

Hm, I guess the need to represent slices of different element types could make things somewhat awkward. ConstantDataArray represents everything as an array of bytes internally, and then type-puns to the actual element type on access.

nikic added inline comments.May 17 2022, 6:32 AM
llvm/lib/Analysis/ConstantFolding.cpp
660

There is no need to convert the values to ConstantInt, you should be able to use the ConstantDataArray::get() constructor with the SmallVector directly.

llvm/lib/Analysis/ValueTracking.cpp
4109

You can replace this with stripAndAccumulateOffsets(), in which case you don't need explicit handling for GEPs.

In terms of high-level design, converting the initializer to a ConstantDataArray is not great: This is a very inefficient representation, especially if you take into account that these "temporary" ConstantDataArrays are globally interned and will probably never get freed.

I agree that the conversion is less than optimal. The proposed approach is the least intrusive one I could come up with. Everything else I considered would have required more extensive changes. I'm interested in improving it but I'd prefer to do it incrementally rather than in the initial revision (or in a patch series).
If I missed something please let me know. If we want to improve this I'm not sure that starting by rearchitecting the existing design of these classes is a better approach than starting with this simple enhancement. But I also don't know enough LLVM memory management to judge the costs of the overhead.

Now that the ConstantDataArraySlice no longer corresponds to an already existing ConstantDataArray, it should probably be switched to store an array of bytes instead. From a cursory look, I don't think anything fundamentally depends on the current representation.

Hm, I guess the need to represent slices of different element types could make things somewhat awkward. ConstantDataArray represents everything as an array of bytes internally, and then type-puns to the actual element type on access.

My initial thought was to store the byte representation but I ended up giving up on the idea because of that, at least for the initial patch.

msebor added inline comments.May 18 2022, 1:14 PM
llvm/lib/Analysis/ConstantFolding.cpp
660

Nice, thanks!

llvm/lib/Analysis/ValueTracking.cpp
4109

You mean replace the line above and the whole if statement below with a call to stripAndAccumulateOffsets() plus the handling of the resulting offset? To do that I need to get a DataLayout reference from the GlobalVariable that V refers to, and I don't see how to do that without first stripping the intervening pointer casts.

msebor updated this revision to Diff 431781.May 24 2022, 1:48 PM

Use ConstantDataArray::get() ctor with SmallVector instead of looping.
Updated the failing tests below:

The revised patch updates the failing tests below. I made some effort to verify the correctness of the changes to the first batch:

LLVM :: CodeGen/AArch64/arm64-2012-05-07-MemcpyAlignBug.ll
LLVM :: CodeGen/ARM/constantpool-promote-ldrh.ll
LLVM :: CodeGen/BPF/remove_truncate_5.ll
LLVM :: CodeGen/BPF/rodata_2.ll

but none for these:

LLVM :: CodeGen/PowerPC/aix-vec-arg-spills-mir.ll
LLVM :: CodeGen/PowerPC/aix-vec-arg-spills.ll
LLVM :: DebugInfo/COFF/types-array.ll
nikic added a comment.May 31 2022, 5:41 AM

In terms of high-level design, converting the initializer to a ConstantDataArray is not great: This is a very inefficient representation, especially if you take into account that these "temporary" ConstantDataArrays are globally interned and will probably never get freed.

I agree that the conversion is less than optimal. The proposed approach is the least intrusive one I could come up with. Everything else I considered would have required more extensive changes. I'm interested in improving it but I'd prefer to do it incrementally rather than in the initial revision (or in a patch series).
If I missed something please let me know. If we want to improve this I'm not sure that starting by rearchitecting the existing design of these classes is a better approach than starting with this simple enhancement. But I also don't know enough LLVM memory management to judge the costs of the overhead.

Yeah, I think we can start with this patch.

llvm/lib/Analysis/ConstantFolding.cpp
650

I think this is going to assert for a scalable vector global (unusual, but legal). You should probably just bail out in that case.

651

Do we need to protect against large globals here? I'm thinking about something like { i8 1, [100000000 x i8] zeroinitializer }, which has a compact representation, but will be materialized in its full size here.

llvm/lib/Analysis/ValueTracking.cpp
4109

You'd normally just pass in const DataLayout &DL as an argument. Happy to have that done as a followup though, to reduce API churn in this patch.

4171

Is the type store size really what you want to check here? This means a) for ElementSize==32 this would accept a float array and b) for ElementSize==8 it would accept an i4 array.

Now, this will actually get rejected by the isIntegerTy() check below afterwards, but it seems odd to first do a different check here. (This also means we should miss the ReadByteArrayFromGlobal code path for cases where it disagrees).

msebor updated this revision to Diff 436928.Jun 14 2022, 2:40 PM

Update patch per feedback, add a new test. Retested by running make check-all on x86_64-linux.

llvm/lib/Analysis/ConstantFolding.cpp
650

I found the following in llvm/test/Verifier/scalable-global-vars.ll while looking for a way to define such a vector:

;; Global variables cannot be scalable vectors, since we don't
;; know the size at compile time.

; CHECK: Globals cannot contain scalable vectors
; CHECK-NEXT: <vscale x 4 x i32>* @ScalableVecGlobal
@ScalableVecGlobal = global <vscale x 4 x i32> zeroinitializer

Are there other scalable vectors that can be defined at global scope?

651

It might make sense to cap the size (at a minimum to avoid the 32 vs 64-bit size_t problem). Otherwise the SmallVectorctor aborts when malloc fails. I couldn't find code to follow as an example to prevent this so I sort of arbitrarily picked INT_MAX as the maximum. I've added memchr-7.ll to exercise this although I imagine the test could fail on memory constrained systems. Let me know if there's a guideline for how to choose a more suitable maximum in these situations or how to write a more robust test.

llvm/lib/Analysis/ValueTracking.cpp
4171

I'm not sure I understand the question but matching sizes is the purpose of the test and of the ElementSize argument. The point of the isIntegerTy() check seems to be to prevent folding wcslen calls with zero-initialized arrays with a smaller size. Since such calls are undefined the only benefit of going out of our way and expanding them to the library call that I can think of is to get them caught by the sanitizer. I've replaced the check for type with one for size.

nikic added inline comments.Jun 15 2022, 7:28 AM
llvm/lib/Analysis/ConstantFolding.cpp
650

Ah, I was just wrong about this, sorry. We have scalable Constants, but if they can't be used as a GV initializer, then there's no problem as far as this code is concerned.

651

The closest thing I could find is instcombine-maxarray-size with value 1024, which might be a bit low for this usage. I'd probably pick 65536 as a nice round value, but I don't care strongly about this limit -- it's a theoretical concern at this point, and we can always adjust it.

llvm/lib/Analysis/ValueTracking.cpp
4171

Okay, I see now that ConstantDataArray only accepts power-of-2, multiple of 8 types, so arrays of i4 and similar don't take this codepath. An example for the float issue is this:

declare i64 @wcslen(ptr)

@g = constant [3 x float] [float 1.0, float 2.0, float 0.0]

define i64 @test() {
  %res = call i64 @wcslen(ptr @g)
  ret i64 %res
}

!llvm.module.flags = !{!0}
!0 = !{i32 1, !"wchar_size", i32 4}

This will assert with the current patch.

The initializer can only be directly reused if isIntegerTy(ElementSize), in other cases we should go through the ReadByteArrayFromGlobal() code to convert the elements into integers (even if they already have the correct size -- the consuming code expects that they are integers).

We could change the API contract to allow non-integer elements to be returned, but I don't think that would be useful for any current users of this API.

4257

Something is broken with the indentation here and below.

4274

Just ArrayTy = cast<ArrayType>(Init->getType()) should be sufficient for both cases -- ReadByteArrayFromGlobal() should always return a constant with array type.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1297

Broken indentation here as well.

msebor updated this revision to Diff 437302.Jun 15 2022, 12:57 PM
msebor marked 3 inline comments as done.

Revision 3 with the following changes:

  • Only extract data from constants of integral types.
  • Simplify computation of array type.
  • Replace stray tabs with spaces.
  • Add a test.
llvm/lib/Analysis/ValueTracking.cpp
4171

I see what you mean now, thanks for the test case! This is a limitation of the current approach of restricting conversions to arrays of i8. The only existing use case would be the wcslen folder so that alone is probably not worth an API change. But as we discussed the conversion from the underlying byte representation to a byte array is inefficient and avoiding it would also make it possible to extend the API to support arbitrary conversions. I'll keep that in mind for a possible follow up.

4257

Tabs (required by GCC) vs spaces (LLVM).

nikic added a comment.Jun 16 2022, 2:07 AM

Can you please upload the patch with full context (-U99999) again? Hard to check the tests otherwise, as the global declarations are missing. The patch generally looks good to me though.

llvm/include/llvm/Analysis/ConstantFolding.h
179

nit: Constant *

llvm/lib/Analysis/ValueTracking.cpp
4242

I don't really get the purpose of this special case. InstCombine tests pass for me if I drop the if() part and always go into the else branch.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1203

This change is kind of unrelated, but it will go away with your other patch anyway.

1297

Still broken :)

msebor updated this revision to Diff 438050.Jun 17 2022, 3:47 PM
msebor marked an inline comment as done.

Revision 4 of the patch includes the following changes:

  • Keep optimizeMemCmpVarSize from folding calls with excessive constant sizes (exposed by rebasing on top of latest trunk).
  • Remove code in getConstantDataArrayInfo made redundant by other changes in the same function.
  • Remove tests from memcmp-3.ll that are duplicated in memcmp-4.ll.
  • Let clang-format adjust formatting.

Retested by running make check-all.

msebor added inline comments.Jun 17 2022, 3:48 PM
llvm/lib/Analysis/ValueTracking.cpp
4242

The code predates this change (see line 4217). The only difference dropping the if has on trunk is on the assertion in wcslen-3.ll:192 which has this comment:

; This could in principle be simplified, but the current implementation bails on
; type mismatches.

This patch simplifies the emitted code as noted in the comment (and adjusts the test) so it doesn't seem like the special case ever served a good purpose and can be removed.

nikic accepted this revision.Jun 21 2022, 12:48 AM

LGTM

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1150

I assume this is done to preserve some existing test behavior?

llvm/test/Transforms/InstCombine/strlen-8.ll
40

This is actually folding strlen(&a5_4[2][1]) -- probably not intentional?

This revision is now accepted and ready to land.Jun 21 2022, 12:48 AM
msebor added inline comments.Jun 21 2022, 10:56 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1150

Yes, sort of: memcmp-4.ll, added in preparation for this change.

llvm/test/Transforms/InstCombine/strlen-8.ll
40

Yes, that's a typo, thanks! None of the cases in this test is expected to be folded yet. There's a TODO: Handle subobjects comment in optimizeStringLength but let me add a comment to the test too. It might be helpful to start tracking these outstanding gaps more formally.