This is an archive of the discontinued LLVM Phabricator instance.

Avoid using of DataLayout::getPointerSize() without address space argument in DWARF info generation
AbandonedPublic

Authored by rampitec on Aug 11 2015, 5:20 PM.

Details

Summary

DataLayout::getPointerSize() supposed to be used with the address space argument, though it has default value 0 for the transition period. There are the cases when some kind of a general pointer size is desirable, such as with debug info. In some places DataLayout is used though. This creates problem if target is 64 bit, but some address spaces are 32 bit, and notably if address space 0 has 32 bit pointers in 64 bit mode. An example of such target is HSAIL.

In this change calls to DataLayout::getPointerSize() are replaced with recently added AsmPrinter::getPointerSize() in DWARF debug code. The AsmPrinter::getPointerSize() itself is modified to call MAI->getPointerSize(). Originally it was calling TargetMachine::getPointerSize() which ends up in the DataLayout::getPointerSize() with the default argument. Since this call is not virtual in both AsmPrinter and TargetMachine that makes a little sense to override it in the target implementation for DWARF info purposes. The MCAsmInfo implementation has the benefit to query member variable PointerSize which can be previously initialized by the target, thus target will have control over the returned value.

In LLVM 3.6 many (but not all) of the getPointerSize() references in DWARF info were pointing to the MAI implementation, but recently that was changed to the new AsmPrinter and TargetMachine functions.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec updated this revision to Diff 31886.Aug 11 2015, 5:20 PM
rampitec retitled this revision from to Avoid using of DataLayout::getPointerSize() without address space argument in DWARF info generation.
rampitec updated this object.
rampitec set the repository for this revision to rL LLVM.
rampitec added a subscriber: llvm-commits.

I don't have enough context and the description not clear enough for me to understand what is going on. The fact that you claim that the default DataLayout getPointerSize() does not return the expected size sounds fishy to me.
Can you elaborate?

In D11969#222450, @joker.eph wrote:

I don't have enough context and the description not clear enough for me to understand what is going on. The fact that you claim that the default DataLayout getPointerSize() does not return the expected size sounds fishy to me.
Can you elaborate?

Sure. In our architecture there are several distinct memory segments physically residing in different memory. These segments even have separate instructions for memory access and separate capabilities. Notably, some of the segments use 32 bit pointers and some 64 bit. Then different address spaces are used to represent these memory segments. In particular private per-thread memory which is usually used for alloca reside in address space 0, while shared global memory is in the address space 1. Private memory has 32 bit addressing and global 64 bit. That is represented by the llvm's data layout string fragment "p0:32:32-p1:64:64". In essence this is a 64 bit architecture, with just some memory types being 32 bit.

Now, if you look at the DataLayout::getPointerSize() it has an argument AS, which stands for the address space. That is the declaration: "unsigned getPointerSize (unsigned AS=0) const". Note the AS has a default 0, which is usually all one need on x86 CPU. There is also a comment related to declaration: "FIXME: The defaults need to be removed once all of the backends/clients are updated." It is here: http://llvm.org/docs/doxygen/html/classllvm_1_1DataLayout.html#a9e653935f1d7ff84ff4a14667f4ca567

So basically this call is never supposed to be used without an argument, default is only for transition period. That is not possible to tell pointer size without knowing its address space on all architectures. Over the time the default shall be removed and the only showstopper are backends which need the update. No common code supposed to call it without an argument.

Yet, as I said, that is common to say an architecture is 64 bit even if not all segments in it are 64 bit. That is like medium or compact model in early x86 with some pointers 32 bit and some 16 bit. One application where you practically need to use general pointer size is DWARF and essentially anything where you would like to refer to a largest type which can hold any target pointer. So a call with the default argument 0 may work for this purpose even for the architectures with mixed pointer sizes, but unfortunately does not work if pointer in a zero address space is smaller than in some other.

BTW, there can be another approach here. We can create two different versions of getPointerSize() in DataLayout (and then 2 versions of getPointerSizeInBits() and all versions of alignment queries etc), one with the argument and another without. The one without may return a maximum size of pointers in all address spaces. It looks very easy to do, but generally error prone because encourages to use the version without an argument, just as the default =0 does now.

Thanks for the details. I think I get this, let me ask a more specific question. What I don't understand right now is how MAI->getPointerSize() get the right size since it does not get an address space either?
It seems to me that any place that is using a single "getPointerSize()" would have the address space problem.
From what I see, MAI is created by the TargetMachine, that itself gets it from the Subtarget, using a Target specific factory function. So in the end it is just a struct initialized by the Target.
At this point I'm not even sure why MAI has a pointer size in the first place, I'm missing a piece of the puzzle or it does not seem right.

In D11969#222468, @joker.eph wrote:

Thanks for the details. I think I get this, let me ask a more specific question. What I don't understand right now is how MAI->getPointerSize() get the right size since it does not get an address space either?
It seems to me that any place that is using a single "getPointerSize()" would have the address space problem.
From what I see, MAI is created by the TargetMachine, that itself gets it from the Subtarget, using a Target specific factory function. So in the end it is just a struct initialized by the Target.
At this point I'm not even sure why MAI has a pointer size in the first place, I'm missing a piece of the puzzle or it does not seem right.

Well, for most applications the DataLayout method with a specific address space shall be used. Though there are some rare instances where pointer size is needed with no pointer available. Like in the debug info where you have to generally specify target bitness, or if you generally need to know a biggest data type to hold any pointer. That is how MAI version of this function works, without a specific pointer or data type. In the debug code where this is used no pointer is actually available, but that is fine to support debugging.

Then MAI can be specialized by the target, but so the AsmPrinter and TargetMachine. One difference is that MAI has data field PointerSize which is initialized by the target. Then getPointerSize simply returns this value. It does not hurt to refer to it with a base class pointer. What is currently in AsmPrinter and TargetMachine can be reimplemented by the target as well, but that does not help when dwarf support code calls it through a base class. There must be a data field specified by target, either variable or virtual function pointer. I suppose that is undesirable to virtualize common classes from the performance perspective, thus MAI implementation is handy here.

What is the semantic of this pointer size? Especially if you have multiple pointer sizes depending on the address space?
Also how does the value returned by MAI::getPointerSize() relates to the DataLayout and/or the TargetMachine getPointerSize()?
The fact that they differs sounds fishy to me.
The fact that the semantic of MAI::getPointerSize() is not well defined is scary to me as well. This looks like something that needs to go away to me, and I'm scared that you want to use it for anything at this point.

In D11969#222471, @joker.eph wrote:

What is the semantic of this pointer size? Especially if you have multiple pointer sizes depending on the address space?
Also how does the value returned by MAI::getPointerSize() relates to the DataLayout and/or the TargetMachine getPointerSize()?
The fact that they differs sounds fishy to me.
The fact that the semantic of MAI::getPointerSize() is not well defined is scary to me as well. This looks like something that needs to go away to me, and I'm scared that you want to use it for anything at this point.

That is really target specific and target can specify it as it wants. In our case we say this is the maximum possible size of the pointer. I would agree, that would be nice if dwarf could differentiate between pointers of different types, but that does not seem to be case now. For sure where we can differentiate this should not be used at all.

I understand that the target defines this field, but it does not say what is the semantic of this field. Where is intended to be used and for what usage?
The target also controls what TargetMachine::getPointerSize() returns. Admittedly this hook is a transitional one, but the MCAsmInfo does not seem right either to me at this point.

I did a quick archeology and this was introduced in a commit 4 years ago that aknowledged that it does not seem right:

Move some parts of TargetAsmInfo down to MCAsmInfo. This is not the greatest solution but it is a small step towards removing the horror that is TargetAsmInfo.

I need to bug some expert tomorrow to figure this out.

In D11969#222474, @joker.eph wrote:

I understand that the target defines this field, but it does not say what is the semantic of this field. Where is intended to be used and for what usage?

I agree that is not well defined and I would prefer not to have it. Though I do not see how to support dwarf now without it. If the evil necessary I would personally prefer to have in one place only, removing the calls from TargetMachine and AsmPrinter. Anyhow, what is in the code now clearly does not work for a configuration we have, a MAI implementation does work. So it is bad, but better that nothing.

The target also controls what TargetMachine::getPointerSize() returns.

That is not really true when a base pointer is used like in this case. The control will never go to target in a way it is done now by dwarf code.

The target also controls what TargetMachine::getPointerSize() returns.

That is not really true when a base pointer is used like in this case. The control will never go to target in a way it is done now by dwarf code.

I don't understand what you mean by "a base pointer"?

In D11969#222886, @joker.eph wrote:

I don't understand what you mean by "a base pointer"?

Base class really. I mean calling TM.getPointerSize() will call base implementation from TargetMachine and not target code.

mehdi_amini edited edge metadata.
mehdi_amini removed a subscriber: mehdi_amini.

Oh I thought you were talking about defining the result of getPointerSize() as being a "base pointer size" or something like that. Got what you meant now.

I discussed with a few people around and my impression so far is that

  • The DataLayout does not defined anything about which address space the code resides in
  • The code for a single module can only be generated in one address space
  • Everything is fine as long as the DataLayout is specified using address space 0 for the code
  • MCAsmInfo::PointerSize is intended to describe the size of a pointer to the memory space where the code resides.

If this is correct, it seems to me that on the short term we should to solve your problem:

  • rename MCAsmInfo::PointerSize to something more explicit like MCAsmInfo::CodePointerSize
  • remove the hook AsmPrinter::getPointerSize() and redirect client to use explicitly MCAsmInfo::getCodePointerSize or DataLayout::getPointerSize

On a longer term, I wonder if this information shouldn't be part of the DataLayout as well. I'm not sure what the implication on the IR level optimizations can be at this point.

I hope this makes sense?

In D11969#223472, @joker.eph wrote:

Oh I thought you were talking about defining the result of getPointerSize() as being a "base pointer size" or something like that. Got what you meant now.

I discussed with a few people around and my impression so far is that

  • The DataLayout does not defined anything about which address space the code resides in
  • The code for a single module can only be generated in one address space
  • Everything is fine as long as the DataLayout is specified using address space 0 for the code
  • MCAsmInfo::PointerSize is intended to describe the size of a pointer to the memory space where the code resides.

If this is correct, it seems to me that on the short term we should to solve your problem:

  • rename MCAsmInfo::PointerSize to something more explicit like MCAsmInfo::CodePointerSize
  • remove the hook AsmPrinter::getPointerSize() and redirect client to use explicitly MCAsmInfo::getCodePointerSize or DataLayout::getPointerSize

On a longer term, I wonder if this information shouldn't be part of the DataLayout as well. I'm not sure what the implication on the IR level optimizations can be at this point.

I hope this makes sense?

Yes, it makes a lot of sense for me. I will change it to the first solution if there are no objections.

In D11969#223472, @joker.eph wrote:
  • The DataLayout does not defined anything about which address space the code resides in

I have checked with our engineer doing debug info, and it he confirmed these uses are not related to code addresses, but generally to variable addresses. In fact in our case all code addresses are redirected, so we should not even hit it. Then a bigger type holding pointers is an inefficiency, but not a bug.

I.e. it looks like the name "CodePointerSize" is not correct, a "PointerSize" makes more sense. Maybe a "getLargestPointerSize()" and a "LargestPointerSize" would be better here.
It can even be calculated from the DataLayout itself.

Yes, I agree. A better longer term solution will require to dig in into each use of it.
At the very least I think we need to remove TargetMachine.getPointerSize() and AsmPrinter.getPointerSize() not to increase a number of such functions without an argument.
I'm testing this conservative patch now.

Stas

rampitec abandoned this revision.Nov 13 2016, 12:47 PM
In D11969#223472, @joker.eph wrote:
  • The DataLayout does not defined anything about which address space the code resides in

I have checked with our engineer doing debug info, and it he confirmed these uses are not related to code addresses, but generally to variable addresses. In fact in our case all code addresses are redirected, so we should not even hit it. Then a bigger type holding pointers is an inefficiency, but not a bug.

I.e. it looks like the name "CodePointerSize" is not correct, a "PointerSize" makes more sense. Maybe a "getLargestPointerSize()" and a "LargestPointerSize" would be better here.
It can even be calculated from the DataLayout itself.

We need to emit correct size for code. I think @mehdi_amini's suggestions are on point. See D30879.