This is an archive of the discontinued LLVM Phabricator instance.

[Bitcode, Type] Assign deterministic IDs to unnamed types at creation time.
AbandonedPublic

Authored by fhahn on Jun 25 2018, 4:01 AM.

Details

Summary

Currently unnamed types cause problems for overloaded intrinsics like
ssa_copy, because different unnamed types get mangled to the same
string.

This patch introduces an additional map to LLVMContextImpl, which keeps
track of IDs to use when mangling for unnamed types. The IDs are
assigned at type creation time. For textual IR, those types are created
by order of appearance in the LL file. This change also changes the
bitcode writer to write unnamed types in the order they were created,
before other types. This ensures IDs are assigned in the same fashion in
both cases.

Diff Detail

Event Timeline

fhahn created this revision.Jun 25 2018, 4:01 AM

Please add a testcase to make sure we can correctly serialize/deserialize IR containing calls to ssa_copy. I don't think your current approach will work in that case; assigning the identifiers on demand means we won't choose the same identifiers the next time the IR is loaded.

fhahn planned changes to this revision.Jun 26 2018, 2:56 PM

Thanks Eli! If we serialize a declaration generated in such a way and then de-serialize it again a different pass might generate names in a different order, causing problems. I'll try to address that tomorrow.

fhahn added a comment.Jun 27 2018, 9:19 AM

Hm when serializing to bitcode, the order the types are written is based on their uses in functions. That means when reading a bitcode file, the type objects may get created in a different order than defined in the LL file and this is a problem for generating IDs deterministically "on the side".

Currently I am not sure what a good approach/solution would be. Any thoughts/suggestions would be greatly appreciated!

Maybe we could set names for the types in question? It's kind of awkward, but it should work, I think...

Alternatively, you could avoid generating ssa_copy calls which require mangling non-literal structs; assuming you don't need to ssa_copy first-class aggregates, struct types only show up in pointers, and you can always bitcast pointers to i8*.

fhahn added a comment.Jun 28 2018, 8:09 AM

Maybe we could set names for the types in question? It's kind of awkward, but it should work, I think...

Alternatively, you could avoid generating ssa_copy calls which require mangling non-literal structs; assuming you don't need to ssa_copy first-class aggregates, struct types only show up in pointers, and you can always bitcast pointers to i8*.

Thanks! I think I'll try to go with that approach for now, as it seems to have a much smaller impact. "All" that should be needed is teaching PredicateInfo's users to look through bitcasts in such cases I think.

fhahn updated this revision to Diff 154211.Jul 5 2018, 5:47 AM
fhahn retitled this revision from [Function] Use deterministic IDs when mangling unnamed types. to [Bitcode, Type] Assign deterministic IDs to unnamed types at creation time..
fhahn edited the summary of this revision. (Show Details)
fhahn added a reviewer: reames.

I've updated the patch to generate the IDs for unnamed types at creation time and also updated the bitcode writer to write the unamed types first, in the order they were created. This way, the IDs matche between serializing and de-serializing.

This change changes the order types are written to the bytecode file, but that should be a backwards compatible change, because the order only affects mangling with this patch.

efriedma added inline comments.Jul 5 2018, 12:42 PM
lib/Bitcode/Writer/ValueEnumerator.cpp
330

Changing the order we emit the types should be fine. I'm a little concerned this will make us emit unused types in bitcode.

fhahn added inline comments.Jul 6 2018, 11:28 AM
lib/Bitcode/Writer/ValueEnumerator.cpp
330

Ah yes, thanks! I'll update the patch, I am just looking for the best way to order the unnamed types as needed while also preserving the required order for type references, which seems slightly tricky.

chandlerc requested changes to this revision.Jul 6 2018, 6:43 PM

I think this approach is fundamentally flawed. It doesn't achieve its stated goal in full generality.

If I load two modules into the same context and write them to bitcode, this will produce different bitcode than if each module were loaded into a fresh context and written to bitcode. I don't think we want that.

I actually think the whole idea of *any* of the canonical storage for this coming from the context is fundamentally and deeply flawed.

Either the order in which types are created *must not* be observable (IE, we should make it a hard error to mangle them into a function type, which would have prevented the issue that led to this patch in the first place) or we must move the types to be owned by the *module*, not the context. While I'm actually a fan of this (I really dislike the context owning *anything* serialized and deserialized in bitcode) I think it is pretty disruptive change. I think it would be much simpler to just firmly block this from mattering.

This revision now requires changes to proceed.Jul 6 2018, 6:43 PM
fhahn abandoned this revision.Jul 9 2018, 11:51 AM

Thanks @chandlerc, I think it is too fragile and I want to avoid unnecessary changes here. To get PredicateInfo working with unnamed types, it is probably easier to use a custom mangling for ssa_copy there; we can rely on the order we create ssa_copy calls and clean up all declarations we created after the predicateinfo is destroyed.

fhahn added a comment.Jul 10 2018, 3:55 AM

I have created D49126 to deal with this problem locally in PredicateInfo, which seems the only place where this is currently causing any problems. It is easier to do there, as we can clean up any ssa_copy calls and declarations when destroying PredicateInfo. I also raised https://bugs.llvm.org/show_bug.cgi?id=38117 for a general solution.