Page MenuHomePhabricator

[CodeGen] Using ZExt for extractelement indices.
ClosedPublic

Authored by Peter on Aug 30 2022, 2:56 PM.

Details

Summary

In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating i1 true into i32 -1.
This is because IRTranslator uses SExt for indices.
In this fix, we move from SExt to ZExt. We also included a test for AMDGPU, an updated test for AArch64

This patch fixes this issue.

Diff Detail

Event Timeline

Peter created this revision.Aug 30 2022, 2:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 2:56 PM
Peter requested review of this revision.Aug 30 2022, 2:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 2:56 PM
foad added a comment.Sep 1 2022, 2:11 AM

Can you add some text to the LangRef to clarify that insert/extractelement indices are treated as unsigned?

By inspection, SelectionDAG seems to have the same bug. See SelectionDAGBuilder::visitInsertElement and SelectionDAGBuilder::visitExtractElement. But perhaps it is harmless there for some reason.

Peter added a comment.Sep 1 2022, 12:53 PM

If you can point me where the doc is located, I will make the change.

As of SelectionDAGBuilder::visitInsertElement and SelectionDAGBuilder::visitExtractElement, we will change them so its compliant with the documentation.

Please allow me a few days before I submit another patch, it's been busy days for me.

foad added a comment.Sep 2 2022, 2:33 AM

If you can point me where the doc is located, I will make the change.

docs/LangRef.rst in the LLVM repo.

Please allow me a few days before I submit another patch, it's been busy days for me.

No problem!

Peter updated this revision to Diff 459203.Sep 9 2022, 2:32 PM

updated the documentation so it is complient with our code

foad added inline comments.Sep 12 2022, 1:10 AM
llvm/docs/LangRef.rst
9702–9703

From a semantics point of view, I think all we need to say is that the value is "treated as unsigned" or words to that effect. I don't think we should mention extending here (and definitely not truncating which sounds like it could change the value!).

Peter added inline comments.Sep 12 2022, 11:44 AM
llvm/docs/LangRef.rst
9702–9703

https://llvm.godbolt.org/z/Mszb7EvPe

The (unfortunate) fact that we are using zextOrTrunc already implies that we can change the value given a large index, even thou no one is doing it.
I see two ways out of this:

  1. Explain in detail in the doc "when" truncation will happen or
  2. Further change the doc so the index can only be i64, which would involve even more code change.
foad added inline comments.Sep 13 2022, 12:16 AM
llvm/docs/LangRef.rst
9702–9703

The document should describe the intended semantics of the IR, not the details of the current GlobalISel implementation.

(You're correct that GlobalISel does odd things if the index type is wider than 64 bits, but I don't think it's really *wrong*. The only effect it can have is to change an out-of-range index into an in-range index. And if the index was out of range in the first place then the result was undefined anyway (or poison) so it doesn't really matter what GlobalISel does with it.)

Currently, you say what to do: zext or truncate. I thought that you should say what it is: an unsigned integer with up to 64 bit.

What about we say: "The index may be a variable of any integer type, and will be treated as an unsigned integer."

If that looks good I think I will update it and fix all the testcases sometime next week.

foad added a comment.Sep 13 2022, 11:59 PM

What about we say: "The index may be a variable of any integer type, and will be treated as an unsigned integer."

That works for me!

Peter updated this revision to Diff 463017.Sep 26 2022, 2:06 PM

Update tests. All passing.

Peter updated this revision to Diff 463018.Sep 26 2022, 2:08 PM

update doc

Peter added a comment.Sep 26 2022, 2:10 PM

Many archs' tests were failing due to this change. I have updated all the test. Most of them can be done using a script.
Still, do we want to include some arch experts to review the test changes?

[IRTranslator] Using ZExt for extractelement indices.

Maybe change this to [CodeGen] now that SelectionDAG changes are included?

Many archs' tests were failing due to this change. I have updated all the test. Most of them can be done using a script.
Still, do we want to include some arch experts to review the test changes?

Yes I think this needs some arch-specific review, especially since the code looks worse (longer) in some cases. @craig.topper @RKSimon @t.p.northover @nemanjai @kaz7 @sunfish @theraven

I checked a few code size regressions and they look correct. Combines would replace extract/insert with undef because index was i32 -1 or i64 -1 (from i1 true). Now they extract/insert element at index 1.
My best guess how this bug was not discovered sooner is that frontends use i32/i64 for index type.
Other type of regression is when i32 (marked as signed argument) has to be zero-extended to i64 (and with mask), before it used to be a copy. Considering langref changes it is probably more correct to mark index arguments as zeroext (index will be copied again no code size regression). But this should be checked with people who know more about specific targets.

RKSimon added inline comments.Sep 27 2022, 5:29 AM
llvm/test/CodeGen/X86/var-permute-128.ll
131–132

These are mildly annoying, on some targets movd is notably quicker than pextr* - I think we're just missing a SimplifyDemandedBits handler though.

pengfei added inline comments.Sep 27 2022, 5:56 AM
llvm/test/CodeGen/X86/vec_extract.ll
114

This seems redundant.

RKSimon added inline comments.Sep 27 2022, 6:08 AM
llvm/test/CodeGen/X86/vec_extract.ll
114

D123394 /might/ address this - it handles something similar in a couple of places.

pengfei added inline comments.Sep 27 2022, 6:10 AM
llvm/test/CodeGen/X86/vec_extract.ll
114

That's great :)

Peter retitled this revision from [IRTranslator] Using ZExt for extractelement indices. to [CodeGen] Using ZExt for extractelement indices..Sep 27 2022, 10:50 AM
Peter added inline comments.Sep 27 2022, 11:27 AM
llvm/test/CodeGen/X86/var-permute-128.ll
131–132

I just checked the documentation of both instructions. It seems you are right.
Is there anything you hoped we could've done differently, or it is another issue?

Other type of regression is when i32 (marked as signed argument) has to be zero-extended to i64 (and with mask), before it used to be a copy. Considering langref changes it is probably more correct to mark index arguments as zeroext (index will be copied again no code size regression). But this should be checked with people who know more about specific targets.

For RISCV I think that the indices should indeed be marked zext. Testing sext doesn't get us a whole lot IMO. @craig.topper, @reames?

As a data point in support of the unsigned semantics, InstComine already uses zext to canonicalize index type (see getPreferredVectorIndex). Several of the transforms use unsigned comparison. So, at IR level, we seem to uniformly be expecting unsigned behavior.

On the RISCV changes, several of these look like regressions, but it comes down to the fact that arguments in the tests are already sign extended, so the sext is cheaper than the zext. One thing I note - which may be generally useful - is that anything above i32 (i.e. where the zext vs sext difference is observable in these tests) is definitely out of bounds. Since out of bounds returns poison, I believe we can use an any extend here. This might reduce the impact.

arsenm added a subscriber: arsenm.Sep 28 2022, 8:02 AM
arsenm added inline comments.
llvm/test/CodeGen/PowerPC/aix-vec_extract_p9.ll
104

Are we missing a SimplifyDemandedBits combine on the extract index?

Since out of bounds returns poison, I believe we can use an any extend here.

I think any_extend is too aggressive. But we could check isSExtCheaperThanZExt(), sure.

Peter added a comment.Sep 30 2022, 1:16 PM

Should we merge this or is there anything else that needs to be addressed?

Should we merge this or is there anything else that needs to be addressed?

The regressions don't look particularly important, but there may be a missing SimplifyDemandedBits style optimization on the index

RKSimon added inline comments.Oct 12 2022, 3:58 AM
llvm/test/CodeGen/PowerPC/variable_elem_vec_extracts.ll
2

Please can you pre-commit the autogen change and then rebase so the patch shows the codegen diff?

Peter updated this revision to Diff 467255.Oct 12 2022, 1:58 PM

Rebase with upstream and changed tests in llvm/test/CodeGen/PowerPC/variable_elem_vec_extracts.ll

Peter marked an inline comment as done.Oct 12 2022, 1:58 PM
Peter marked an inline comment as done.Oct 12 2022, 2:03 PM
Peter added inline comments.
llvm/test/CodeGen/PowerPC/variable_elem_vec_extracts.ll
9

I am wondering which part of the code, if any, in the front end is responsible for generating these attributes. I feel like we should generate zeroext for indices instead of signext due to the document change.

Peter updated this revision to Diff 467294.Oct 12 2022, 4:23 PM

rebase to upstream.

Peter marked 5 inline comments as done.Oct 12 2022, 5:23 PM
pengfei added inline comments.Oct 12 2022, 7:47 PM
llvm/test/CodeGen/PowerPC/variable_elem_vec_extracts.ll
9

I think it's CodeGenModule::ConstructAttributeList

craig.topper added inline comments.Oct 12 2022, 8:43 PM
llvm/test/CodeGen/PowerPC/variable_elem_vec_extracts.ll
9

The attributes tend to follow the types of the arguments in the C source code. So if it was a unsigned type it would be zeroext, if it was a signed type it would be signext. It's not based on how it is used in the function.

Peter marked 2 inline comments as done.Oct 13 2022, 10:57 AM
Peter added inline comments.
llvm/test/CodeGen/PowerPC/variable_elem_vec_extracts.ll
9

In that case I don't have to change anything. BYW thanks pengfei for pointing out, I just reviewed that code.

arsenm accepted this revision.Oct 13 2022, 6:55 PM
This revision is now accepted and ready to land.Oct 13 2022, 6:55 PM
Peter marked an inline comment as done.Oct 13 2022, 7:28 PM

Thanks for approving the diff, I'll let it sit for another day to see if anyone has more comments and merging it tomorrow.

This revision was automatically updated to reflect the committed changes.