This is an archive of the discontinued LLVM Phabricator instance.

Always show template parameters in IR type names
AbandonedPublic

Authored by sepavloff on Nov 28 2017, 10:10 AM.

Details

Summary

If a module contains opaque type and another one contains definition
of equivalent type in sense of C++, llvm-link merges these types.
In this procedure it can rely only on type name. An issue arises if
the type is a class template specialization. See the example.

File 1

template<typename T> struct ABC;
ABC<int> *var_1;

File 2

template<typename T> struct ABC {
  T *f;
};
extern ABC<int> var_1;
ABC<int*> var_2;

llvm-link produces module:

%struct.ABC = type { i32** }
@var_1 = global %struct.ABC* null, align 8
@var_2 = global %struct.ABC zeroinitializer, align 8

Incomplete type ABC<int> from the first module becomes ABC<int*>. It
happens because structure types obtained from template specialization
share the same type name and differ from each other only by version
suffixes, in the example the types are named as ABC and ABC.0.
llvm-link cannot correctly merge these types.

This change enables template arguments in class template specializations.
Clang already prints them in class context, now they will appear in the
class name itself.

Diff Detail

Event Timeline

sepavloff created this revision.Nov 28 2017, 10:10 AM
rsmith edited edge metadata.Nov 29 2017, 5:54 PM

What actual problem is this solving? These IR type names exist only for the convenience of humans reading the IR, and making them long (potentially extremely long) by including template arguments seems likely to hinder that more than it helps.

What actual problem is this solving? These IR type names exist only for the convenience of humans reading the IR, and making them long (potentially extremely long) by including template arguments seems likely to hinder that more than it helps.

There is a case when IR type names are essential. If llvm-link sees opaque type in one TU, it tries to merge it with its definition in another TU. The type name is used to identify proper definition. All template specializations share the same name, so llvm-link has to choose random type, which results in incorrect IR. Such IR prevents from some whole-program analyses. This problem cannot be solved by llvm-link, as some information is already lost.

Possible IR bloating due to long type names is indeed an issue. D40508 tries to solve it by replacing long type name or part of it by SHA1 hash. Type names become shorter and still are usable for type identification across different TUs. Template arguments are added to the type names only if compilation produces IR (including embedded IR in LTO), machine code generation is not affected.

If adding template arguments in all cases is not appropriate, probably it makes sense to have an option, something like '--precise-ir', that would turn this generation on? It could be used in the cases when accurate type information in IR is required.

Consider a bit more complicated example.

File common.h

#ifndef COMMON_H
#define COMMON_H
struct Alarm { };
struct Info { };
template<typename T> struct Handler;
#endif

File use-1.cpp

#include "common.h"
Handler<Info> *info;
void set(Handler<Info> *e) { info = e; }

File use-2.cpp

#include "common.h"
Handler<Alarm> *alarm;
void set(Handler<Alarm> *e) { alarm = e; }

File defs.h

#ifndef DEFS_H
#define DEFS_H
template<> struct Handler<Info> { int val; };
template<> struct Handler<Alarm> { char val; };
void set(Handler<Info> *);
void set(Handler<Alarm> *);
#endif

File main.cpp

#include "common.h"
#include "defs.h"
void init() {
  Handler<Info> i;
  set(&i);
  Handler<Alarm> a;
  set(&a);
}

Process these files with commands:

clang -c -emit-llvm use-1.cpp -o use-1.cpp.bc
clang -c -emit-llvm use-2.cpp -o use-2.cpp.bc
clang -c -emit-llvm main.cpp -o main.cpp.bc
llvm-link use-1.cpp.bc use-2.cpp.bc main.cpp.bc -S -o test.ll

The resulting IR file has deficiencies:

  • The types of the global variables info and alarm are the same:
%struct.Event = type { i32 }
%struct.Event.0 = type { i8 }
@info = global %struct.Event* null, align 8
@alarm = global %struct.Event* null, align 8
  • The signatures of the set functions are the same:
define void @_Z4fireP5EventI4InfoE(%struct.Event* %e) #0 {
…
}

define void @_Z4fireP5EventI5AlarmE(%struct.Event* %e) #0 {
…
}
  • The call to set(Handler<Alarm> *) is augmented by a bitcast to 'fix' its type:
define void @_Z4fireP5EventI5AlarmE(%struct.Event* %e) #0 {
…
call void bitcast (void (%struct.Event*)* @_Z4fireP5EventI5AlarmE to void (%struct.Event.0*)*)(%struct.Event.0* %a)
…
}

The IR created by llvm-link is distorted. Although code generation (in LTO compilation) might be unaffected by this distortions, other applications of IR linking suffer from it. It does not allow to implement some checks, validation techniques and optimizations.

sepavloff updated this revision to Diff 125761.Dec 6 2017, 10:11 AM

Updated patch

  • old type names are generated always if --ir-type-names is not specified,
  • added new value of --ir-type-names, none, to suppress type names,
  • value of --ir-type-names is stored in module properties.

Although code generation (in LTO compilation) might be unaffected by this distortions, other applications of IR linking suffer from it. It does not allow to implement some checks, validation techniques and optimizations.

Any system trying to use IR type names to deduce information about source-level types is simply wrong. We simply don't provide the sort of guarantees you seem to be looking for here. We don't even guarantee to consistently use the same IR type for the same source type within a single translation unit. IR type names exist only for the benefit of humans reading the IR.

sepavloff abandoned this revision.Feb 26 2018, 8:37 PM

The feedback here, in D40508 and in the mail list (http://lists.llvm.org/pipermail/llvm-dev/2017-December/119585.html) demonstrates that this is a wrong direction.
Part of this patch is used in D43805, which implements in some sense an opposite approach, - using nameless llvm types.