This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion] Use common alignment for loads in caller
AbandonedPublic

Authored by psamolysov on May 5 2022, 6:01 AM.

Details

Summary

In byval promotion, the generated in callers 'load' instructions have
the common alignment with the corresponding offset and alignment of
the promoted byval argument:

define internal void @callee_load_first_element(%struct.ss* byval(%struct.ss) align 16 %b) {
  %temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  ...
}

will be optimized into:

define internal void @callee_load_from_aligned_1(i32 %b.0.val) {
  %temp2 = add i32 %b.0.val, 1
  ret void
}


define i32 @caller_load_first_element() {
  %S = alloca %struct.ss, align 16
...
  %S.0 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
  %S.0.val = load i32, i32* %S.0, align 16
  %S.1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
  %S.1.val = load i64, i64* %S.1, align 4
  call void @callee_load_first_element(i32 %S.0.val, i64 %S.1.val)
...
}

(So, '%S.0.val' has 'align 16', the same align as the initial
'%struct.ss' argument has.)

But the "usual" promotion doesn't follow this rule, the corresponding
'load' instruction generated in the caller will have just the maximum
alignment of all the loads for the argument's part in the callee, and
the possible alignment for the argument itself is just ignored:

define internal void @callee_load_from_aligned(%struct.ss* align 16 %b) {
  %temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  %temp1 = load i32, i32* %temp, align 4
  %temp2 = add i32 %temp1, 1
  %temp3 = load i32, i32* %temp, align 8
  ret void
}

will be optimized into:

define internal void @callee_load_from_aligned(i32 %b.0.val) {
  %temp2 = add i32 %b.0.val, 1
  ret void
}

define i32 @caller_load_from_aligned() {
  %S = alloca %struct.ss, align 16
...
  %1 = getelementptr %struct.ss, %struct.ss* %S, i64 0, i32 0
  %S.val = load i32, i32* %1, align 8
  call void @callee_load_from_aligned_1(i32 %S.val)
...
}

(So, '%S.val' has 'align 8' while the pointer, argument '%b' in the
callee, is aligned by 16.)

The intent of the patch is to align the behavior of both propagation
schemes: byval and non-byval. However, if there is a load with a larger
value of the 'align' attribute than the argument has, the non-byval
promotion will use this alignment while the byval one doesn't.

Diff Detail

Event Timeline

psamolysov created this revision.May 5 2022, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 6:01 AM
psamolysov requested review of this revision.May 5 2022, 6:01 AM

I've opened this review to get a more common behavior for byval and non-byval promotion schemes before replacing the byval scheme with tweaked non-byval (with allowed stores) one. I think having some more ifs in the source code to check whether this is promotion from byval or not and emulate the old behavior of alignments in each scheme is not a good idea and makes the code more difficult. I understand that this is not a common thing to have non-byval pointers in function arguments with the align attribute, I've actually seen a few cases in the align.ll LIT test only (@caller_guaranteed_aligned_1(i1 %c, i32* align 16 dereferenceable(4) %p) for example).

nikic added inline comments.May 6 2022, 2:40 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521

Isn't this incorrect for non-zero offsets?

Overall, I'm not sure I really understand what you're trying to achieve here. What we're interested here is whether loads are speculatable, which is the case when they are defereferenceable and aligned. This can be either because the load is guaranteed to execute, or because we have known dereferenceability/alignment knowledge. Byval deref/align will be taken into account for the latter check (allCallersPassValidPointerForArgument).

psamolysov added inline comments.May 6 2022, 3:01 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521

@nikic Yes, you are absolutely right, this is incorrect for non-zero offsets. I'm going to fix it ASAP.

Initially, my intent was to copy the byval promotion behavior where the alignment of the byval argument is taken into account because if the argument is aligned by 32, for example, we should not use less alignment for the part with zero offset loading. And because the byval argument is just a copy of the original value from the caller, it's very easy to get it's alignment: just use the value from the corresponding argument definition in the callee.

But after rethinking, I see that for non-byval pointers, we should take into account not the argument alignment from the callee definition (as I did here) but actual alignment value from the caller in which we generate new load instructions. It can be considered as an error in the frontend of course, but if we have loads in the callee with less alignment for the part with zero offset than the whole structure was allocated in the caller with, and we use this less alignment in the new generated load instructions, it might be a source of errors in the compiled application. What do you think?

nikic added inline comments.May 6 2022, 4:01 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521

I think you're mixing up two things here: Legality and optimization. I believe the current code fully handles the legality aspect for both byval and non-byval arguments. If I understand right, your proposed change here is related to optimization: If you have an align 32 argument that is used in an align 4 load, you want to make the promoted load use align 32 instead.

Doing that would be fine, but I also think that this is not really the place to do it: We already have transforms that will increase load/store alignment based on known alignment (e.g. from parameter align attributes). For example, InstCombine does this (see getOrEnforceKnownAlignment). I think the alignment handling code here is already tricky enough without also trying to optimize the alignments at the same time.

psamolysov added inline comments.May 6 2022, 4:27 AM
llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521

@nikic Thank you very much for the good explanation. My concern was about legality, so I though this is illegal to use align 4 loads for align 32 arguments.

If we replace the byval promotion approach with the currently used non-byval one as is, the non-byval rules will be used for alignments of the generated load instructions too: just the maximum align of the present in the callee load instructions will be used and align of the byval argument will just be ignored. Example, the byval.ll test, function @g:

define internal void @g(%struct.ss* byval(%struct.ss) align 32 %b) nounwind {
...
}

after promotion by non-byval scheme, in caller:

define i32 @main() nounwind  {
  %S = alloca %struct.ss, align 32
...
  %S01 = getelementptr %struct.ss, %struct.ss* , i64 0, i32 0
  %S01_VAL = load i32, i32* %S01, align 4
  call void @g(i32 %S01_VAL)
...
}

Will this %S01_VAL = load i32, i32* %S01, align 4, not align 32 as it is now, be correct?

jdoerfert edited the summary of this revision. (Show Details)May 6 2022, 9:06 AM

I though this is illegal to use align 4 loads for align 32 arguments.

It is not. Any lower alignment (for the load) is perfectly legal.

I though this is illegal to use align 4 loads for align 32 arguments.

It is not. Any lower alignment (for the load) is perfectly legal.

@jdoerfert thank you for the answer.
If so, I see no sense for these changes and the revision can be abandoned. From another point of view, this is an opportunity from the optimization perspective to use for the promoted 'load' instructions the alignment from the argument's allocation if this won't make the code of the pass significant harder but as @nikic has already mentioned, this pass looks as it's not the right place to do it.

psamolysov abandoned this revision.May 11 2022, 6:50 AM