Page MenuHomePhabricator

[Bitcode] Support expanding constant expressions into instructions

Authored by nikic on Jun 14 2022, 2:32 AM.



This implements an autoupgrade from constant expressions to instructions, which is needed for

The basic approach is that constant expressions (CST_CODE_CE_* records) now initially only create a BitcodeConstant value that holds opcode, flags and operands IDs. Then, when the value actually gets used, it can be converted either into a constant expression (if that expression type is still supported) or into a sequence of instructions. As currently all expressions are still supported, -expand-constant-exprs is added for testing purposes, to force expansion.

PHI nodes require special handling, because the constant expression needs to be evaluated on the incoming edge. We do this by putting it into a temporary block and then wiring it up appropriately afterwards (for non-critical edges, we could also move the instructions into the predecessor).

This also removes the need for the forward referenced constants machinery, as the BitcodeConstants only hold value IDs. At the point where the value is actually materialized, no forward references are needed anymore.

Diff Detail

Event Timeline

nikic created this revision.Jun 14 2022, 2:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 2:32 AM
nikic requested review of this revision.Jun 14 2022, 2:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 2:32 AM
nikic updated this revision to Diff 437860.Jun 17 2022, 5:44 AM
nikic edited the summary of this revision. (Show Details)

Fix one of the uselistorder issues, improve error message.

@dexonsmith I'm really struggling to get the uselistorder support working with this patch. The key change here is that constant expressions are materialized on first use, rather than all at once when parsing the constant block. I think I fixed the case where constant expressions are referenced by instructions correctly, but despite trying different variants for a few hours, I just can't figure out how to handle expressions in global initializers correctly. Do you have any pointers?

This seems challenging indeed. (To clarify, there’s never a need to worry about use-list-order when an upgrade is happening, since use-lists can be dropped when there’s a mismatch, but since all constants have changed materialization order that affects the non-upgrade case too.)

The key is to understand the order that constant expressions will be created, and therefore what the “natural” use-list order would be. What might help to debug this would be to sort use-lists before serialization, and dump them after deserialization, to track the mutations. Once you understand what’s happening it should be easier to add/change the logic to predict the order.

It might be okay to add an extra pass to serialization (only when use-lists are being written), as long as it it’s still relatively cheap. Not sure if that’s helpful or not.

BTW, it’s not easy for me to dig in myself for the next couple of months (don’t have ready access to a development machine) but happy to iterate on ideas / think things through.

nikic updated this revision to Diff 438364.Jun 20 2022, 6:06 AM
nikic edited the summary of this revision. (Show Details)

Fix uselistorder preservation. In the end, this largely amounted to removing a lot of the special handling around global values we previously had. In orderModule(), we now visit globals in the same order they get initialized (which is reverse order) and also visit the initializers while doing so. Additionally, we now go through BitcodeConstant for all non-leaf constants, even those that wouldn't strictly require it (no_cfi, dso_local and blockaddress). This makes sure that we don't get a different uselistorder for just these special cases, which would get really awkward to handle.

There is still a remaining issue in this patch, because there are some getValueFwdRef() calls from MetadataLoader, which also need to materialize constant expressions.

Nice; glad this was a simplification in the end. I had a quick look and the new logic looks right to me.

I’m curious: is the metadata loader issue related to use-list order, or independent?

nikic updated this revision to Diff 438380.Jun 20 2022, 7:08 AM

I’m curious: is the metadata loader issue related to use-list order, or independent?

Both :) I just fixed the general issue, and am now left with the use-list order issue only. I think we should be ordering MetadataAsValue uses as part of functions now (rather than globally) and also predict use lists for them, but I haven't figured out the details yet.

nikic updated this revision to Diff 438414.Jun 20 2022, 8:33 AM
nikic edited the summary of this revision. (Show Details)
nikic added reviewers: aeubanks, dexonsmith.

Okay, the metadata use-list order issue is fixed now as well, so this patch should be ready. The extra subtlety there is that function metadata is decoded in one go before the function instructions, rather than interleaved as it gets used. Additionally it's necessary to explicitly visit these operands in predictValueUseListOrder(), otherwise we might not emit a uselistorder directive at all.

I’ll leave the main review to @aeubanks (SGTM but I don’t have time right now to think through in detail), but I looked at the metadata and use-list order changes and they LGTM. Thanks for working through the issues!

Most of this looks good to me, I just have a few nits and one bigger question about phi handling inline.


Suggest to make it easier to understand for somebody who lacks context.


Shouldn't these be private?


Could be an if/else without a break so that all required values are pushed to the worklist immediately, which is slightly more convenient for the common case where the constant expression overall is a tree. But I don't feel strongly about it.


I don't understand what Args is trying to achieve here. Each predecessor basic block should only appear once, so the Args.find(BB) should never be successful -- right?

I *think* perhaps what you're trying to do is optimize the case where multiple phi nodes have the same incoming value for the same basic block? But then the Args map would have to have bigger scope and look slightly different.

Speaking of, shouldn't the PhiConstExprBB be cached somehow so that multiple phi nodes use the same edge basic blocks?

nikic updated this revision to Diff 438673.Jun 21 2022, 6:10 AM
nikic marked 2 inline comments as done.

Address review comments, and in particular fix handling of multiple phis in the same block.

nikic marked an inline comment as done.Jun 21 2022, 6:11 AM
nikic added inline comments.

These are directly used in the materializeValue() code. It didn't seem worthwhile to make these private and add getters for something that is private to the bitcode reader anyway.


Good point, done. A subtlety here is that we now need to iterate in reverse order (and then reverse Ops) to make sure the worklist entries are processed in correct order. Otherwise we'd expand the last operand first.


A predecessor can appear multiple times in a phi block, but must have the same incoming value each time. This most commonly happens with switch terminators (see @test_phi_multiple_identical_predecessors). I've made the comment for this a bit more detailed.

And you're absolutely right, I wasn't handling the case of multiple phis correctly. I've changes this to reuse the same block for all expressions on one edge (@test_phi_multiple_nodes).

Thanks. This looks pretty good to me, but I'm not familiar with the bitcode reader so perhaps somebody else can still take a look.


Huh, I didn't realize duplicate predecessors were allowed. Weird, but okay :)

nikic marked an inline comment as done.
nikic added a subscriber: efriedma.

Thanks. This looks pretty good to me, but I'm not familiar with the bitcode reader so perhaps somebody else can still take a look.

Thanks for the review!

Don't think we have a lot of people familiar with the bitcode reader, maybe @aeubanks or @efriedma can take a look?

dexonsmith accepted this revision.Jun 27 2022, 6:02 AM

BitcodeReader LGTM.

This revision is now accepted and ready to land.Jun 27 2022, 6:02 AM
MaskRay added inline comments.

Omit cl::init(false),

/home/maskray/llvm/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:3235:12: error: non-constant-expression cannot be narrowed from type 'llvm::Instruction::OtherOps' to 'uint8_t' (aka 'unsigned char') in initializer list [-Wc++11-narrowing]
          {OpTy->isFPOrFPVectorTy() ? Instruction::FCmp : Instruction::ICmp,

Add a uint8_t cast

MaskRay added inline comments.Jun 27 2022, 11:52 PM

Unused now? warning: private field 'Context' is not used [-Wunused-private-field]

nikic updated this revision to Diff 440510.Jun 28 2022, 12:43 AM
nikic marked 3 inline comments as done.

Drop cl::init, fix warnings.

This revision was landed with ongoing or failed builds.Jun 28 2022, 2:10 AM
This revision was automatically updated to reflect the committed changes.
bkramer added inline comments.

I found a 2013 vintage CUDA bitcode file that relies on this upgrader. It does

@unrollpragma = private addrspace(1) constant [17 x i8] c"#pragma unroll 1\00"
bitcast @unrollpragma to i8*

Bitcasting away addrspace has been disallowed since the advent of addrspacecast. Was this backwards compat dropped accidentally or is it time to drop this upgrader?

If we want the latter we can also remove it from AutoUpgrade.cpp.

nikic added inline comments.Jun 29 2022, 5:17 AM

This wasn't intentional. Can you please check whether works with your bitcode file?

bkramer added inline comments.Jun 29 2022, 5:30 AM

That fixes the assert failure I was seeeing. Thanks.

nikic added inline comments.Jun 29 2022, 5:39 AM

Thanks for the quick confirmation, I've landed this as

You might want to add your file as a bitcode test, if that makes sense from a license / size perspective. That would make sure it doesn't break in the future :)