This is an archive of the discontinued LLVM Phabricator instance.

[Bitcode, Type] Assign deterministic IDs to unnamed types at creation time.
AbandonedPublic

Authored by fhahn on Jun 25 2018, 4:01 AM.

Download Raw Diff

Details

Reviewers

davide
chandlerc
craig.topper
rnk
reames

Summary

Currently unnamed types cause problems for overloaded intrinsics like
ssa_copy, because different unnamed types get mangled to the same
string.

This patch introduces an additional map to LLVMContextImpl, which keeps
track of IDs to use when mangling for unnamed types. The IDs are
assigned at type creation time. For textual IR, those types are created
by order of appearance in the LL file. This change also changes the
bitcode writer to write unnamed types in the order they were created,
before other types. This ensures IDs are assigned in the same fashion in
both cases.

Diff Detail

Event Timeline

fhahn created this revision.Jun 25 2018, 4:01 AM

Herald added a subscriber: mgrang. · View Herald TranscriptJun 25 2018, 4:01 AM

fhahn mentioned this in rL333740: Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp….Jun 25 2018, 4:02 AM

fhahn mentioned this in D45330: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions..

Please add a testcase to make sure we can correctly serialize/deserialize IR containing calls to ssa_copy. I don't think your current approach will work in that case; assigning the identifiers on demand means we won't choose the same identifiers the next time the IR is loaded.

Thanks Eli! If we serialize a declaration generated in such a way and then de-serialize it again a different pass might generate names in a different order, causing problems. I'll try to address that tomorrow.

Hm when serializing to bitcode, the order the types are written is based on their uses in functions. That means when reading a bitcode file, the type objects may get created in a different order than defined in the LL file and this is a problem for generating IDs deterministically "on the side".

Currently I am not sure what a good approach/solution would be. Any thoughts/suggestions would be greatly appreciated!

Maybe we could set names for the types in question? It's kind of awkward, but it should work, I think...

Alternatively, you could avoid generating ssa_copy calls which require mangling non-literal structs; assuming you don't need to ssa_copy first-class aggregates, struct types only show up in pointers, and you can always bitcast pointers to i8*.

In D48541#1145433, @efriedma wrote:

Maybe we could set names for the types in question? It's kind of awkward, but it should work, I think...

Alternatively, you could avoid generating ssa_copy calls which require mangling non-literal structs; assuming you don't need to ssa_copy first-class aggregates, struct types only show up in pointers, and you can always bitcast pointers to i8*.

Thanks! I think I'll try to go with that approach for now, as it seems to have a much smaller impact. "All" that should be needed is teaching PredicateInfo's users to look through bitcasts in such cases I think.

I've updated the patch to generate the IDs for unnamed types at creation time and also updated the bitcode writer to write the unamed types first, in the order they were created. This way, the IDs matche between serializing and de-serializing.

This change changes the order types are written to the bytecode file, but that should be a backwards compatible change, because the order only affects mangling with this patch.

efriedma added inline comments.Jul 5 2018, 12:42 PM

lib/Bitcode/Writer/ValueEnumerator.cpp
330 ↗	(On Diff #154211)	Changing the order we emit the types should be fine. I'm a little concerned this will make us emit unused types in bitcode.

fhahn added inline comments.Jul 6 2018, 11:28 AM

lib/Bitcode/Writer/ValueEnumerator.cpp
330 ↗	(On Diff #154211)	Ah yes, thanks! I'll update the patch, I am just looking for the best way to order the unnamed types as needed while also preserving the required order for type references, which seems slightly tricky.

I think this approach is fundamentally flawed. It doesn't achieve its stated goal in full generality.

If I load two modules into the same context and write them to bitcode, this will produce different bitcode than if each module were loaded into a fresh context and written to bitcode. I don't think we want that.

I actually think the whole idea of *any* of the canonical storage for this coming from the context is fundamentally and deeply flawed.

Either the order in which types are created *must not* be observable (IE, we should make it a hard error to mangle them into a function type, which would have prevented the issue that led to this patch in the first place) or we must move the types to be owned by the *module*, not the context. While I'm actually a fan of this (I really dislike the context owning *anything* serialized and deserialized in bitcode) I think it is pretty disruptive change. I think it would be much simpler to just firmly block this from mattering.

This revision now requires changes to proceed.Jul 6 2018, 6:43 PM

Thanks @chandlerc, I think it is too fragile and I want to avoid unnecessary changes here. To get PredicateInfo working with unnamed types, it is probably easier to use a custom mangling for ssa_copy there; we can rely on the order we create ssa_copy calls and clean up all declarations we created after the predicateinfo is destroyed.

fhahn mentioned this in D49126: [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types..Jul 10 2018, 3:53 AM

I have created D49126 to deal with this problem locally in PredicateInfo, which seems the only place where this is currently causing any problems. It is easier to do there, as we can clean up any ssa_copy calls and declarations when destroying PredicateInfo. I also raised https://bugs.llvm.org/show_bug.cgi?id=38117 for a general solution.

fhahn mentioned this in rL337828: [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types..Jul 24 2018, 7:50 AM

fhahn mentioned this in D91250: Support intrinsic overloading on unnamed types.Nov 11 2020, 4:24 AM

jeroen.dobbelaere mentioned this in D91661: Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types..Nov 17 2020, 2:28 PM

jeroen.dobbelaere mentioned this in rG77080a1eb606: Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with….Mar 20 2021, 3:39 AM

Revision Contents

Path

Size

lib/

IR/

Function.cpp

17 lines

LLVMContextImpl.h

7 lines

test/

Transforms/

Util/

PredicateInfo/

unnamed-types.ll

36 lines

Diff 152653

lib/IR/Function.cpp

//===- Function.cpp - Implement the Global object classes -----------------===//		//===- Function.cpp - Implement the Global object classes -----------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the Function class for the IR library.		// This file implements the Function class for the IR library.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "LLVMContextImpl.h"
#include "SymbolTableListTraitsImpl.h"		#include "SymbolTableListTraitsImpl.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
▲ Show 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	if (PointerType* PTyp = dyn_cast<PointerType>(Ty)) {
Result += "p" + utostr(PTyp->getAddressSpace()) +		Result += "p" + utostr(PTyp->getAddressSpace()) +
getMangledTypeStr(PTyp->getElementType());		getMangledTypeStr(PTyp->getElementType());
} else if (ArrayType* ATyp = dyn_cast<ArrayType>(Ty)) {		} else if (ArrayType* ATyp = dyn_cast<ArrayType>(Ty)) {
Result += "a" + utostr(ATyp->getNumElements()) +		Result += "a" + utostr(ATyp->getNumElements()) +
getMangledTypeStr(ATyp->getElementType());		getMangledTypeStr(ATyp->getElementType());
} else if (StructType *STyp = dyn_cast<StructType>(Ty)) {		} else if (StructType *STyp = dyn_cast<StructType>(Ty)) {
if (!STyp->isLiteral()) {		if (!STyp->isLiteral()) {
Result += "s_";		Result += "s_";
		if (STyp->hasName())
Result += STyp->getName();		Result += STyp->getName();
		else {
		// Different unnamed struct types should get mangled to different names,
		// so we keep a mapping of types to ids to get deterministic naming.
		// The mapping is populated on demand here.
		DenseMap<StructType *, unsigned> &AnonIDs =
		STyp->getContext().pImpl->AnonStructTypesIDs;
		unsigned &AnonID = STyp->getContext().pImpl->AnonStructTypesUniqueID;
		auto I = AnonIDs.insert({STyp, AnonID});
		if (I.second)
		AnonID++;

		Result += Twine(I.first->second).str();
		}
} else {		} else {
Result += "sl_";		Result += "sl_";
for (auto Elem : STyp->elements())		for (auto Elem : STyp->elements())
Result += getMangledTypeStr(Elem);		Result += getMangledTypeStr(Elem);
}		}
// Ensure nested structs are distinguishable.		// Ensure nested structs are distinguishable.
Result += "s";		Result += "s";
} else if (FunctionType *FT = dyn_cast<FunctionType>(Ty)) {		} else if (FunctionType *FT = dyn_cast<FunctionType>(Ty)) {
▲ Show 20 Lines • Show All 830 Lines • Show Last 20 Lines

lib/IR/LLVMContextImpl.h

Show First 20 Lines • Show All 1,300 Lines • ▼ Show 20 Lines	#include "llvm/IR/Metadata.def"
DenseMap<unsigned, IntegerType*> IntegerTypes;		DenseMap<unsigned, IntegerType*> IntegerTypes;

using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;		using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;
FunctionTypeSet FunctionTypes;		FunctionTypeSet FunctionTypes;
using StructTypeSet = DenseSet<StructType *, AnonStructTypeKeyInfo>;		using StructTypeSet = DenseSet<StructType *, AnonStructTypeKeyInfo>;
StructTypeSet AnonStructTypes;		StructTypeSet AnonStructTypes;
StringMap<StructType*> NamedStructTypes;		StringMap<StructType*> NamedStructTypes;
unsigned NamedStructTypesUniqueID = 0;		unsigned NamedStructTypesUniqueID = 0;

		/// Mapping from unnamed struct types to a deterministic ID for mangling.
		/// It is populated on demand.
		DenseMap<StructType *, unsigned> AnonStructTypesIDs;
		unsigned AnonStructTypesUniqueID = 0;

DenseMap<std::pair<Type , uint64_t>, ArrayType> ArrayTypes;		DenseMap<std::pair<Type , uint64_t>, ArrayType> ArrayTypes;
DenseMap<std::pair<Type , unsigned>, VectorType> VectorTypes;		DenseMap<std::pair<Type , unsigned>, VectorType> VectorTypes;
DenseMap<Type, PointerType> PointerTypes; // Pointers in AddrSpace = 0		DenseMap<Type, PointerType> PointerTypes; // Pointers in AddrSpace = 0
DenseMap<std::pair<Type, unsigned>, PointerType> ASPointerTypes;		DenseMap<std::pair<Type, unsigned>, PointerType> ASPointerTypes;

/// ValueHandles - This map keeps track of all of the value handles that are		/// ValueHandles - This map keeps track of all of the value handles that are
/// watching a Value*. The Value::HasValueHandle bit is used to know		/// watching a Value*. The Value::HasValueHandle bit is used to know
/// whether or not a value has an entry in this map.		/// whether or not a value has an entry in this map.
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

test/Transforms/Util/PredicateInfo/unnamed-types.ll

This file was added.

				; RUN: opt < %s -print-predicateinfo 2>&1 \| FileCheck %s

				; Check we can use ssa.copy with unnamed types.

				; CHECK-LABEL: bb:
				; CHECK: Has predicate info
				; CHECK: branch predicate info { TrueEdge: 1 Comparison: %cmp1 = icmp ne %1* %arg, null Edge: [label %bb,label %bb1] }
				; CHECK-NEXT: %arg.0 = call %1* @llvm.ssa.copy.p0s_{{.+}}s(%1* %arg)

				; CHECK-LABEL: bb1:
				; CHECK: Has predicate info
				; CHECK-NEXT: branch predicate info { TrueEdge: 0 Comparison: %cmp2 = icmp ne %0* null, %tmp Edge: [label %bb1,label %bb3] }
				; CHECK-NAME: %tmp.0 = call %0* @llvm.ssa.copy.p0s_{{.+}}s(%0* %tmp)

				%0 = type opaque
				%1 = type opaque

				declare i8* @fun(%1*)

				define void @f0(%0* %arg, %1* %tmp) {
				bb:
				%cmp1 = icmp ne %0* %arg, null
				br i1 %cmp1, label %bb1, label %bb2

				bb1: ; preds = %bb
				%cmp2 = icmp ne %1* null, %tmp
				br i1 %cmp2, label %bb2, label %bb3

				bb2: ; preds = %bb
				ret void

				bb3: ; preds = %bb
				call i8* @fun(%1* %tmp)
				%tmp2 = bitcast %0* %arg to i8*
				ret void
				}