This is an archive of the discontinued LLVM Phabricator instance.

[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
ClosedPublic

Authored by reames on Dec 8 2021, 1:03 PM.

Details

Summary

The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

A couple process notes:

  • Sorry for the lack of context, there's so many test changes that phabricator choked on a full context diff.
  • This isn't all of the test diffs, only the autogened ones. If we're happy with the direction, I'll autogen a bunch of files, and then update the rest before submit.

Diff Detail

Event Timeline

reames created this revision.Dec 8 2021, 1:03 PM
reames requested review of this revision.Dec 8 2021, 1:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 1:03 PM
Herald added a subscriber: aheejin. · View Herald Transcript
nikic added a comment.Dec 8 2021, 1:27 PM

Canonicalizing this sounds generally reasonable, but why towards i64 rather than i32? We require i32 for shuffle masks, and it's the canonical type for struct indices, so it seems like the more natural choice, and would probably result in less test diffs.

nikic added a comment.Dec 8 2021, 1:36 PM

Hm, I see now that IRBuilder will use an i64 by default, which is presumably the reason for the choice.

reames added a comment.Dec 8 2021, 1:39 PM

Hm, I see now that IRBuilder will use an i64 by default, which is presumably the reason for the choice.

Actually, the choice was much more arbitrary than that. I tried both, and saw huge diffs with each. Given that, I went with what I tend to write.

I could argue that this matches the canonicalization for GEP indices on most 64 bit targets, but really, I think this is pretty arbitrary. If we were doing non-constants, we'd probably care a lot more about the choice, but with only constants we really shouldn't need to care as all the actual values are going to be extremely small (and thus representable with anything the target chooses pretty easily).

nikic added a comment.Dec 9 2021, 12:14 AM

Okay, in that case this looks good to me, we don't seem to have any clearly preferred type here. Given the test churn, waiting for a second opinion would be good.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
382

Something like auto *NewIdx = ConstantInt::get(IndexC->getContext(), IndexC->getValue().zextOrTrunc(64)) might be more elegant?

Whatever we choose will be wrong anyway so let's go for i32 unless we actually need those uppper 32 bits to be usable.

spatel added a comment.Dec 9 2021, 8:59 AM

Whatever we choose will be wrong anyway so let's go for i32 unless we actually need those uppper 32 bits to be usable.

If no target has a chance of codegen'ing anything close to that limit, I'd go for i32. For example, this may not finish in any practical timeframe with llc?

define void @f(<1048576 x i8>* %x) {
  %v = load <1048576 x i8>, <1048576 x i8>* %x
  %add = add <1048576 x i8> %v, %v
  store <1048576 x i8> %add, <1048576 x i8>* %x
  ret void
}

No matter what value is chosen, we should make a helper function and/or give the bitwidth a name, so we're not using a magic constant in multiple places.

reames added a comment.Dec 9 2021, 9:50 AM

Sounds like the consensus is i32? I'll give it another day for further discussion, and will then update the patch if that's the direction we want to go in.

Sounds like the consensus is i32? I'll give it another day for further discussion, and will then update the patch if that's the direction we want to go in.

Would we update IRBuilder to use i32 and perhaps change the interface from uint64_t to unsigned?

Sounds like the consensus is i32? I'll give it another day for further discussion, and will then update the patch if that's the direction we want to go in.

Would we update IRBuilder to use i32 and perhaps change the interface from uint64_t to unsigned?

Can we not increase the scope of the patch? This is purely a minor cleanup as far as I'm concerned, and if it grows too much, I'll simply abandon it as not worth the effort.

I am particularly concerns about changing argument types of the public API given possibility for silent downcast.

Sounds like the consensus is i32? I'll give it another day for further discussion, and will then update the patch if that's the direction we want to go in.

Would we update IRBuilder to use i32 and perhaps change the interface from uint64_t to unsigned?

Can we not increase the scope of the patch? This is purely a minor cleanup as far as I'm concerned, and if it grows too much, I'll simply abandon it as not worth the effort.

I am particularly concerns about changing argument types of the public API given possibility for silent downcast.

I guess my point was that it seems silly to have IRBuilder make something that InstCombine will always change. So maybe that's a vote for using i64 and not i32?

Can we go with I64 since some canonicalization is better than nothing, and if someone cares about making it I32, do that in a separate patch? Aside from some large test churn, I don't really see any downside to changing this (arbitrary) choice later.

Can we go with I64 since some canonicalization is better than nothing, and if someone cares about making it I32, do that in a separate patch? Aside from some large test churn, I don't really see any downside to changing this (arbitrary) choice later.

No objection to i64 from me.

No objection to i64 from me.

reames updated this revision to Diff 393946.Dec 13 2021, 9:50 AM

Incorporate style suggestion. Once this is LGTMed, I'll autogen all the relevant tests and rebase once more before commit.

This revision is now accepted and ready to land.Dec 13 2021, 9:54 AM
lebedev.ri accepted this revision.Dec 13 2021, 9:55 AM

lg

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
370

Pull out the repeated 64 constant?

reames abandoned this revision.Dec 13 2021, 12:44 PM

I'm abandoning work on this. Sorry for doing that for an already approved patch, but the work required to finish the last update on the test turns out to be significantly more than I'd thought, and just isn't worth it.

The clang tests effected by this seem to be primarily intrinsic tests. These tests don't autogenerate correctly, despite claiming to be autogened. A single attempt at autogenning them takes more than an hour of runtime. The tests check both IR and assembly. Each test contains hundreds of sub-tests, so updating them by hand is not really an option.

IMHO, these intrinsic tests should not be in tree. I'm frustrated enough at the moment to just delete them, but that hardly seems like a constructive response. Given that, I'm just going to walk away since this patch was solely motivated by a desire to cleanup from the beginning.

I'm abandoning work on this. Sorry for doing that for an already approved patch, but the work required to finish the last update on the test turns out to be significantly more than I'd thought, and just isn't worth it.

The clang tests effected by this seem to be primarily intrinsic tests. These tests don't autogenerate correctly, despite claiming to be autogened. A single attempt at autogenning them takes more than an hour of runtime. The tests check both IR and assembly. Each test contains hundreds of sub-tests, so updating them by hand is not really an option.

IMHO, these intrinsic tests should not be in tree. I'm frustrated enough at the moment to just delete them, but that hardly seems like a constructive response. Given that, I'm just going to walk away since this patch was solely motivated by a desire to cleanup from the beginning.

Can you point to some examples of the affected tests?

Can you point to some examples of the affected tests?

Here are the ones which still fail after using update_cc_test_checks.py

Clang :: CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
Clang :: CodeGen/SystemZ/builtins-systemz-zvector.c
Clang :: CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
Clang :: CodeGen/SystemZ/builtins-systemz-zvector2.c
Clang :: CodeGen/X86/avx-shuffle-builtins.c
Clang :: CodeGen/aarch64-bf16-ldst-intrinsics.c
Clang :: CodeGen/aarch64-neon-vcmla.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_abdlb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_abdlt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_adclb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_adclt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addhnb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addhnt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addlb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addlbt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addlt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addwb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_addwt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bcax.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bdep.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bext.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bgrp.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bsl.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bsl1n.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_bsl2n.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_eor3.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_eorbt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_eortb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_hadd.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_hsub.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_hsubr.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_mlalb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_mlalt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_mlslb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_mlslt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_mullb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_mullt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_nbsl.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_pmul.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_pmullb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_pmullb_128.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_pmullt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_pmullt_128.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qadd.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmlalb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmlalbt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmlalt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmlslb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmlslbt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmlslt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmulh.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmullb.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qdmullt.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qrdmlah.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qrdmlsh.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qrdmulh.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qrshl.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qshl.c
Clang :: CodeGen/aarch64-sve2-intrinsics/acle_sve2_qsub.c

Ignore my rant above. Had lunch and rechecked history, and noticed user error on my part. The tests are still very slow, but it's possible the auto-update will work when run properly. We'll see.

reames reclaimed this revision.Dec 13 2021, 4:30 PM

Reopening as I've gotten all the tests worked through.

This revision is now accepted and ready to land.Dec 13 2021, 4:30 PM
This revision was landed with ongoing or failed builds.Dec 13 2021, 4:57 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2021, 4:57 PM