Using 64-bit offsets is required to fully implement 64-bit DWARF.
As these classes are used in many different libraries they should temporarily support both 32- and 64-bit offsets.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
| include/llvm/Support/DataExtractor.h | ||
|---|---|---|
| 82 ↗ | (On Diff #207284) | Why do we need both variants? Would the 64-bit variant alone be sufficient? | 
| include/llvm/Support/DataExtractor.h | ||
|---|---|---|
| 82 ↗ | (On Diff #207284) | In theory, yes. But the class is widely used in LLVM, and not only to parse DWARF sections. Trying to switch to uint64_t everywhere will result in a huge patch which will be hard to review. Hence, I decided to move gradually. | 
| include/llvm/Support/DataExtractor.h | ||
|---|---|---|
| 82 ↗ | (On Diff #207284) | I think I'd prefer to review one large (but mostly mechanical) patch over having two overloads that could potentially introduce subtle and hard-to-debug-bugs when, e.g., the 32-bit version is accidentally invoked where the 64-bit version was needed. | 
| include/llvm/Support/DataExtractor.h | ||
|---|---|---|
| 82 ↗ | (On Diff #207284) | I would expect a mismatch like that would be rare - since the transition for one user would probably be fairly pervasive (since you pass around uint32_t* all over the place, so most of those will be forced to change when you change the root of that usage). But yeah, I'd at least want this 32+64 support to be /very/ short lived through a few patches over weeks, not months/years. | 
Initially, I tried to replace uint32_t offsets to uint64_t. Unfortunately, that is not as mechanical, as you might expect. Please look at D64059, which is my current progress with that approach. It is far to be complete and it is already overwhelmingly complex.
I identified several problems with that approach:
- You have to find all the places where offsets are printed and decide the correct specifiers. It is not hard to overlook some.
- Sometimes the values are not passed by pointers, so the compiler helps little to find that places.
- As sometimes the offsets are calculated from several values, you have to dig deeper to decide which should be switched to uint64_t and which should not. That is really not a mechanical work, too.
- Some clients just don't need 64-bit offsets, at least for now. But with that approach, they have to be updated as well, further increasing the complexity.
Comparing all that with a subtle chance to misuse an overloaded method, I'd prefer to move at a slow pace, but with patches which at least might be understandable.
Thank you for demonstrating this! I agree that the patch also introduces lots of opportunities for subtle bugs. It also increases the memory footprint in places where it likely won't be needed any time soon, such as 64-bit DWARF parsing support. This patch looks much more palatable to me now and I'd be okay with taking it if @dblaikie is okay with it. Is there danger in using the same function name or should we call the 64-bit variants something more explicit?
I don't feel /super/ strongly that it be done in one go. But it sounds like @ikudrin is suggesting maybe it's not their goal to migrate from 32 to 64 entirely, and that maybe it's OK to have both APIs? I really rather that not be the case. I think if there are negative size impacts for this migration to users of the DataExtractor API those can be worked around by certain clients continuing to store 32 bit offsets in data structures (but mostly these offsets aren't intended to be updated - so their addresses won't be passed directly to DataExtractor) and copying those into 64 bit values before using them as cursors into DataExtractor.
I'm OK with this patch direction - but only so long as there's a plan in weeks (probably not months, definitely not years) to migrate all callers over to 64 bit cursors.
(I could be convinced otherwise - but I think the bar would/should be pretty high to justify keeping both of these)
In fact, I am going to implement 64-bit DWARF support. I started from DataExtractor because changing it seems inevitable for that anyway.
In the beginning, I also thought that having only one possible type for offsets, uint64_t, is preferable. But thinking of that for some time, I cannot find any strong reason not to keep both variants, apart from some kind of aesthetic sense, which is debatable. I really do not see any new harm which may result from having overloads for both 32- and 64-bit offsets. The only issue I can guess is a potential risk of wrapping a 32-bit offset value, but the current implementation is also affected by that, so nothing new is here.
Moreover, this is just a utility class. It should provide its users with functionality, not force them to satisfy its whims, especially if that complicates things for them without any noticeable value; I mean, storing 32-bit offsets and creating a temporary 64-bit variable just to call the utility class does not seem aesthetic, too. It is the callers who know how big their data is and which data type for offsets reflects their needs better.
Anyway, as I said, my main intent is to add support for 64-bit DWARF. While there are other users, DataExtractor is mainly used in the DebugInfo/DWARF library and I expect to change it to use 64-bit offsets while implementing that support. Hopefully, that will be done with a bunch of relatively small patches. After that, we can decide what to do with the remaining callers. Unfortunately, I cannot promise that this transition may be done in several weeks. It seems that two to four months is a more realistic estimation.
I think the main one is code duplication - write once/fix once/etc is valuable.
Moreover, this is just a utility class. It should provide its users with functionality, not force them to satisfy its whims, especially if that complicates things for them without any noticeable value; I mean, storing 32-bit offsets and creating a temporary 64-bit variable just to call the utility class does not seem aesthetic, too. It is the callers who know how big their data is and which data type for offsets reflects their needs better.
I think if we knew then what we know now we'd have built it with 64 bit offsets & it wouldn't've been that much imposition to clients that can use 32 bit offsets.
Did you come across cases where you would need to insert new 64 bit temporaries? My understanding would be that anything storing a lot of offsets in a data structure would be storing fixed offsets that were not intended to be mutated (eg: the DIEs in a DIE tree might contain an offset to the start of the attributes in that DIE) - so you wouldn't want to pass the address of that offset directly to DataExtractor (because it would mutate it, then the DIE in the DIE tree would have an offset that no longer points to the start of the attributes - but somewhere in the middle/end - making it unusable from then on) - so such code would likely copy the offset into a local intended for mutation. The difference now is that local would be 64 bit. That doesn't seem like an imposition to me.
Anyway, as I said, my main intent is to add support for 64-bit DWARF. While there are other users, DataExtractor is mainly used in the DebugInfo/DWARF library and I expect to change it to use 64-bit offsets while implementing that support. Hopefully, that will be done with a bunch of relatively small patches. After that, we can decide what to do with the remaining callers. Unfortunately, I cannot promise that this transition may be done in several weeks. It seems that two to four months is a more realistic estimation.
Fair enough - though would it be possible to prioritize finishing the DataExtractor migration (or demonstrating it is not desirable) before necessarily fleshing out the rest of the DWARF 64 support? I'd be concerned it might be left lingering otherwise.
In that case, what do you think about templates?
Moreover, this is just a utility class. It should provide its users with functionality, not force them to satisfy its whims, especially if that complicates things for them without any noticeable value; I mean, storing 32-bit offsets and creating a temporary 64-bit variable just to call the utility class does not seem aesthetic, too. It is the callers who know how big their data is and which data type for offsets reflects their needs better.
I think if we knew then what we know now we'd have built it with 64 bit offsets & it wouldn't've been that much imposition to clients that can use 32 bit offsets.
Did you come across cases where you would need to insert new 64 bit temporaries? My understanding would be that anything storing a lot of offsets in a data structure would be storing fixed offsets that were not intended to be mutated (eg: the DIEs in a DIE tree might contain an offset to the start of the attributes in that DIE) - so you wouldn't want to pass the address of that offset directly to DataExtractor (because it would mutate it, then the DIE in the DIE tree would have an offset that no longer points to the start of the attributes - but somewhere in the middle/end - making it unusable from then on) - so such code would likely copy the offset into a local intended for mutation. The difference now is that local would be 64 bit. That doesn't seem like an imposition to me.
Seems you are right. We may come across some unusual cases during the transition, but we can solve them in the corresponding patches.
Anyway, as I said, my main intent is to add support for 64-bit DWARF. While there are other users, DataExtractor is mainly used in the DebugInfo/DWARF library and I expect to change it to use 64-bit offsets while implementing that support. Hopefully, that will be done with a bunch of relatively small patches. After that, we can decide what to do with the remaining callers. Unfortunately, I cannot promise that this transition may be done in several weeks. It seems that two to four months is a more realistic estimation.
Fair enough - though would it be possible to prioritize finishing the DataExtractor migration (or demonstrating it is not desirable) before necessarily fleshing out the rest of the DWARF 64 support? I'd be concerned it might be left lingering otherwise.
Well, in many cases they are connected. You need 64-bit offsets because they can be found in 64-bit DWARF sections. Thus, migrating to 64-bit offsets is a half-way to implement 64-bit DWARF in the DebugInfo/DWARF library, so, I think it is better to migrate class-by-class, adding DWARF64 support consciously, rather than just mechanical replacing 32-bit offsets with 64-bit ones. We have already seen that the mechanical approach does not work well.
Maybe we can postpone applying this patch until we have a whole set for the migration. Surely, it will require some efforts to keep all patches in the actual state, and I would be happy to avoid that. But this might be a reasonable way if you want the migration to be as fast as possible.
They help, for sure - though I I'm not sure it's probably necessary to have even that complexity (having DataExtractor templated on an integer type for the cursor), at least I don't see it yet, maybe with better understanding I might - but for now it really sounds like we would've built this with uint64_t only if we started from scratch.
Moreover, this is just a utility class. It should provide its users with functionality, not force them to satisfy its whims, especially if that complicates things for them without any noticeable value; I mean, storing 32-bit offsets and creating a temporary 64-bit variable just to call the utility class does not seem aesthetic, too. It is the callers who know how big their data is and which data type for offsets reflects their needs better.
I think if we knew then what we know now we'd have built it with 64 bit offsets & it wouldn't've been that much imposition to clients that can use 32 bit offsets.
Did you come across cases where you would need to insert new 64 bit temporaries? My understanding would be that anything storing a lot of offsets in a data structure would be storing fixed offsets that were not intended to be mutated (eg: the DIEs in a DIE tree might contain an offset to the start of the attributes in that DIE) - so you wouldn't want to pass the address of that offset directly to DataExtractor (because it would mutate it, then the DIE in the DIE tree would have an offset that no longer points to the start of the attributes - but somewhere in the middle/end - making it unusable from then on) - so such code would likely copy the offset into a local intended for mutation. The difference now is that local would be 64 bit. That doesn't seem like an imposition to me.
Seems you are right. We may come across some unusual cases during the transition, but we can solve them in the corresponding patches.
Anyway, as I said, my main intent is to add support for 64-bit DWARF. While there are other users, DataExtractor is mainly used in the DebugInfo/DWARF library and I expect to change it to use 64-bit offsets while implementing that support. Hopefully, that will be done with a bunch of relatively small patches. After that, we can decide what to do with the remaining callers. Unfortunately, I cannot promise that this transition may be done in several weeks. It seems that two to four months is a more realistic estimation.
Fair enough - though would it be possible to prioritize finishing the DataExtractor migration (or demonstrating it is not desirable) before necessarily fleshing out the rest of the DWARF 64 support? I'd be concerned it might be left lingering otherwise.
Well, in many cases they are connected. You need 64-bit offsets because they can be found in 64-bit DWARF sections. Thus, migrating to 64-bit offsets is a half-way to implement 64-bit DWARF in the DebugInfo/DWARF library, so, I think it is better to migrate class-by-class, adding DWARF64 support consciously, rather than just mechanical replacing 32-bit offsets with 64-bit ones. We have already seen that the mechanical approach does not work well.
Maybe we can postpone applying this patch until we have a whole set for the migration. Surely, it will require some efforts to keep all patches in the actual state, and I would be happy to avoid that. But this might be a reasonable way if you want the migration to be as fast as possible.
Nah, it's fine - happy to let you & @aprantl carry on here - holding the patches out of tree doesn't buy us anything. The main concern is just that things aren't cleaned up & left in a hybrid state, and that risk exists even if we delay this going in-tree until some slightly later point.
To answer one of @aprantl 's later questions - nah, I don't think these need different names. The overloads are distinct, a pointer to uint32_t can't implicitly convert to/from a pointer to uint64_t - so there doesn't seem to be any great risk of confusion there.
D64209 demonstrates how I see using templates for that task. Which way for these temporary patches do you prefer?
I am very keen for this to go ahead, but unfortunately don't have the personal bandwidth to take on any more reviewing of this sort of thing, sorry!
I have prepared a set of patches to switch LLVM and the sub-projects to use 64-bit offsets with DataExtractor and all other places connected to it directly or indirectly. Please, take a look.