This is patch [3/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.
Details
Diff Detail
Event Timeline
First the easy part: Extending DISubrange to support DIExpressions is fine, if we do this, I think that it would be better to also use a DIExpression for constants so we can avoid having two ways of representing the same thing. Writing an autoupgrade for the metadata loader should be trivial. This part LGTM, thanks!
It is unclear to me what the intent of allowing a DIExpression here is: Are you planning to combine a DIExpression with a "count"-variable? DIExpressions can't reference a DIVariable at the moment, DIVariables are an implicit first element of the expression and for this reason a DIExpression can also only reference a single DIVariable. Are you planning to extend DIExpressions to take DIVariables as operands to support your use-case? I'm asking because it would be generally useful to be able to do something like:
!1 = !DILocalVariable(name: "basepointer") !2 = !DILocalVariable(name: "offset") !3 = !DIExpression(!1, !2, DW_OP_plus, DW_OP_deref)
Hi @aprantl, the reason for extending the count field with DIExpression was originally to express the size of an SVE vector, which is expressed as a scaling of a Dwarf pseudo register that represents the size of a vector with so-called 'Vector Granules'. This pseudo register is detailed in section 3.1 of the DWARF for the ARM® 64-bit Architecture (AArch64) with SVE support Documentation (https://developer.arm.com/docs/100985/0000).
There may also be reasons to combine a DIVariable with a DIExpression, but that's not required for C99 VLAs. I think it would be needed for Fortran arrays to e.g. access a field of an array descriptor object, but I'm not sure if something as advanced is required like you propose here (perhaps something similar to DIGlobalVariableExpression that combines a DI(Local)Variable and a DIExpression would be sufficient?). Do you have any other use-case in mind that would require something like that?
If we are going down this route,
How are you planning to reference the pseudo-register? By generating a pseudo-DIVariable and putting it into the DISubrange(count: ), or were you intending to hard-code a DW_OP_reg46 in the DIExpression?
There may also be reasons to combine a DIVariable with a DIExpression, but that's not required for C99 VLAs. I think it would be needed for Fortran arrays to e.g. access a field of an array descriptor object, but I'm not sure if something as advanced is required like you propose here (perhaps something similar to DIGlobalVariableExpression that combines a DI(Local)Variable and a DIExpression would be sufficient?). Do you have any other use-case in mind that would require something like that?
The use-case I had in mind was for describing generic Swift type metadata in optimized code where it might be useful to combine two variables in the same expression.
Yes, we can reference reg46 directly (assuming the provided debugger correctly interprets it) and can hard-code that register into the expression.
The use-case I had in mind was for describing generic Swift type metadata in optimized code where it might be useful to combine two variables in the same expression.
Right, if there are more than one uses for this it may be worth putting in the extra effort. My only concern is how to generate DWARF code for a generic expression with multiple references, as I don't think there is an easy way to reference a DIE in a DWARF expression. This would require generating an instance of the expression for each possible location of either value, basically propagating the locations of each variable into the expression along with a DW_OP_stack_value, and wrapping that into a location list for each live range. Is that what you had in mind?
If we decide to go that route, I do wonder if the current approach could be an intermediate step until we have further extended DIExpression?
First the easy part: Extending DISubrange to support DIExpressions is fine, if we do this, I think that it would be better to also use a DIExpression for constants so we can avoid having two ways of representing the same thing. Writing an autoupgrade for the metadata loader should be trivial. This part LGTM, thanks!
If we allow various Metadata nodes as values to 'count', I don't yet see the immediate benefit of encoding constants into DIExpression (although I'm fine either way), although the case becomes more compelling when we extend DIExpression as well and use only that.
My motivation for disallowing constants after introducing DIExpressions is that I want to avoid having more than one way to represent the same thing, since it complicates the code and brings potential for bugs.
How are you planning to reference the pseudo-register? By generating a pseudo-DIVariable and putting it into the DISubrange(count: ), or were you intending to hard-code a DW_OP_reg46 in the DIExpression?
Yes, we can reference reg46 directly (assuming the provided debugger correctly interprets it) and can hard-code that register into the expression.
Were you thinking of generating the DW_OP_breg46 right in the frontend? That would be certainly doable, but it is a departure from how we currently deal with DIExpressions, where DW_OP_reg operands are only generated in the backend.
I guess I'm still slightly confused about what your actual use-cases are. One thing I know is that you need to generate subranges that contain expressions referencing the pseudo-register 46. But will these expressions consist of a single DW_OP_reg46 or do you need to generate more complex expressions? If you use DIExpressions for the pseudo-register, why do you still need to reference other metadata nodes in DISubranges, is that orthogonal? I think it would help me wrap my head around your requirements if you could post some examples of DWARF you want to generate and you that would map into LLVM IR.
thanks!
Hi @aprantl, I understand the confusion now, sorry for not making this more clear earlier on! Let me clarify the use-cases by answering your questions.
If you use DIExpressions for the pseudo-register, why do you still need to reference other metadata nodes in DISubranges, is that orthogonal?
That is indeed orthogonal. I basically see two separate reasons to extend DISubrange:
- Make 'count' more flexible to allow specifying with a DIExpression.
- Make 'count' more flexible to be able to reference another metadata node.
Reason 1 is to express the type of an SVE vector, we can generate (in Clang) a DISubrange that expresses the number of elements in the vector (i.e. 'count') as a function of Reg46, defined as the number of 64bit 'granules' in a scalable vector. So for a 128 bit vector reg46 will be '2', for a 256bit vector '4', and so on.
So, for a type '<n x 2 x i64>' (meaning "a scalable vector containing 'n x 2' i64 elements"), the resulting DWARF expression would be (DW_OP_reg46). For type '<n x 4 x i32>' the DWARF expression would be twice the amount of elements, so (DW_OP_reg46, DW_OP_constu, 2, DW_OP_mul).
Reason 2 is to express the type/size of a variable length array by referencing a size expression that may reside in memory or in a register.
As you pointed out, there is also a third case that ties together a Metadata reference and a DIExpression (e.g. DIExpression(!1, !2, DW_OP_plus)), which would be required for more complicated cases. I have not tried to address something like that in this patch-series since it is not required to implement the C99 VLA support, but we confirmed that there are cases where this is useful/required.
Were you thinking of generating the DW_OP_breg46 right in the frontend? That would be certainly doable, but it is a departure from how we currently deal with DIExpressions, where DW_OP_reg operands are only generated in the backend.
Not sure if there is a better way, but we indeed have a downstream implementation to do this in Clang, since this is the place where the type Metadata for vectors/arrays is created.
But will these expressions consist of a single DW_OP_reg46 or do you need to generate more complex expressions?
These expressions will be slightly more involved than just DW_OP_reg46, as shown above.
Thanks, that helped!
Let's discuss reason 1 first. The two approaches to expressing Reg46 that I see are:
- hard-coding a DW_OP_breg46 DW_OP_constu, 2, DW_OP_mul, DW_OP_stack_value in the frontend Note that you can't combine DW_OP_regX with other operators, you have to use a combination of DW_OP_bregX DW_OP_stack_value instead (cf. DWARF5 section 2.6.1ff). You will have to extend the backend (DwarfExpression.cpp) and the Verifier to accept a DW_OP_(b)reg inside a DIExpression.
- Use an approach similar to llvm.dbg.value to bind the register and the DIExpression. But since Reg46 isn't actually used in the program, you would have to, e.g., create an intrinsic that returns Reg46, which is probably a lot more work for no clear benefit.
So I think your plan to generate the DW_OP_(b)reg in the frontend is good.
Now for reason 2. I also agree that this is an interesting use-case to support, and other language frontends, such as Fortran will benefit from this, too.
The interesting question here is again how to best bind the location to the rest of the DIExpression. For regular variables we use the dbg.value intrinsic to do this:
%ptr = ... ; in our example, a pointer to the variable call @llvm.dbg.value(metadata %ptr, metadata !DIVariable(name: "x", ...), metadata !DIExpression(DW_OP_deref))
This way metadata never points back to IR, and DIExpressions can be shared by multiple intrinsics.
As you proposed, we can create a pseudo-variable to inject a location into a DIRange:
%length = load i32 %array... call @llvm.dbg.value(metadata %length, metadata !1, metadata !DIExpression()) ... !1 = DIVariable(name: "$count", ...) !2 = !DIRange(count: metadata !1)
A couple of questions now pop up:
- It looks like we are dropping the DIExpression because we link the DIVariable to the DIRange, not the dbg.value. Note that many LLVM optimizations augment DIExpressions.
- I think this is a problem you brought up earlier, too: What happens when optimizations cause multiple dbg.value's to appear throughout the function? We only have one array type — which dbg.value holds the "right" location?
Okay, dug through the DWARF specification some more:
DW_AT_count/lower_bound/upper_bound can have a "reference" as value, which may point to another DIE, as usual the spec is vague on how exactly this should be used, but it looks like it could point to a DW_TAG_variable. If that is the case, then there is no reason for us to allow a combination of DIVariable and DIExpression in DIRange, because if you need both then the DIExpression should apply to the variable and be bound via llvm.dbg.values.
This doesn't solve the problem of how to combine two variables in one expression, but to me it looks like this question can be resolved independently by providing a mechanism for doing so in llvm.dbg.value.
So long story short, I think your proposal is good.
I just had an idea:
We could represent Reg46 like this:
call @llvm.dbg.declare(undef, !0, !DIExpression(DW_OP_uconst, 2, DW_OP_mul, DW_OP_stack_value)) ... !0 = !DIVariable(name: "$LLVM_ARM_GRANULE_REG_EXPR1") !1 = !DISubrange(count: !0)
and the backend recognized name.startsWith("$LLVM_ARM_GRANULE_REG_") and replaces its location with DW_OP_breg46. This way we don't need to allow DIExpressions in DISubrange and keep the representation a little simpler. Let me know what you think!
Are you sure the DW_OP_stack_value really needed? I thought this was only needed when the expression is used as a location expression.
You will have to extend the backend (DwarfExpression.cpp) and the Verifier to accept a DW_OP_(b)reg inside a DIExpression.
You're right, I forgot that I added that support downstream a while back, I'll see if I can share a separate patch for that as well!
Use an approach similar to llvm.dbg.value to bind the register and the DIExpression. But since Reg46 isn't actually used in the program, you would have to, e.g., create an intrinsic that returns Reg46, which is probably a lot more work for no clear benefit.
The SVE types used in the front-end come from our language extensions (ACLE), so we actually have an intrinsic available that we could optimize into a 'reg46' dwarf expression by matching the intrinsic, but then we'd still need to emit some target-specific intrinsic. An alternative would be to use a generic intrinsic for that... I know @huntergr has some patches downstream for a generic 'scalable vector width' intrinsic (https://reviews.llvm.org/D32530#912649) that we could use in the future.
The interesting question here is again how to best bind the location to the rest of the DIExpression. For regular variables we use the dbg.value intrinsic to do this:
%ptr = ... ; in our example, a pointer to the variable call @llvm.dbg.value(metadata %ptr, metadata !DIVariable(name: "x", ...), metadata !DIExpression(DW_OP_deref))This way metadata never points back to IR, and DIExpressions can be shared by multiple intrinsics.
As you proposed, we can create a pseudo-variable to inject a location into a DIRange:
%length = load i32 %array... call @llvm.dbg.value(metadata %length, metadata !1, metadata !DIExpression()) ... !1 = DIVariable(name: "$count", ...) !2 = !DIRange(count: metadata !1)A couple of questions now pop up:
- It looks like we are dropping the DIExpression because we link the DIVariable to the DIRange, not the dbg.value. Note that many LLVM optimizations augment DIExpressions.
I'm wondering why you're saying that the DIExpression is dropped? If we create a pseudo-variable like above, and the DIExpression is not empty, will it not just create a location list with the added expression to each location expression in the list? We can then reference the variable (with its own location list) from our count field?
This doesn't solve the problem of how to combine two variables in one expression, but to me it looks like this question can be resolved independently by providing a mechanism for doing so in llvm.dbg.value.
Agreed!
So long story short, I think your proposal is good.
Good to hear, thanks for digging into this and helping me understand the right way forward!
This looks like a mistake on my side. I wrote this before I understood that the idea is to reference the variable DIE with its own location list from the DW_TAG_subrange.
This doesn't solve the problem of how to combine two variables in one expression, but to me it looks like this question can be resolved independently by providing a mechanism for doing so in llvm.dbg.value.
Agreed!
So long story short, I think your proposal is good.
Good to hear, thanks for digging into this and helping me understand the right way forward!
@aprantl I think you convinced me that specifying the Dwarf register explicitly in the front-end is not the right way to go. We have some patches downstream to represent the size of a scalable vector with an intrinsic. We could reference that intrinsic with a dbg.value() and then match the intrinsic in the backend, similar to matching a specific name like you suggest. If we do that, the benefit of not allowing a DIExpression for the 'count' field is that a constant can only be expressed in a single way and there is no need for the auto-upgrade to DIExpression and the corresponding encoding/decoding of the constant value.
So, I think I can abandon this change...