Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[NFC][AsmPrinter] Refactor DbgVariable as a std::variant

Authored by scott.linder on Aug 23 2023, 3:40 PM.



Only a subset of the fields of DbgVariable are meaningful at any time,
and some fields are re-used for multiple purposes (for example
FrameIndexExprs is used with a throw-away frame-index of 0 to hold a
single DIExpression without needing to add another member). The exact
invariants must be reverse-engineered by inspecting the actual use of
the class, its imprecise/outdated doc-comment, and some asserts.

Refactor DbgVariable into a sum type by inheriting from std::variant.
This makes the active fields for any given state explicit and removes
the need to re-use fields in disparate contexts. As a bonus, it seems to
reduce the size on my x86_64 linux box from 144 bytes to 96 bytes.

There is some potential cost to std::get as it must check the active
alternative even when context or an assert obviates it. To try to help
ensure the compiler can optimize out the checks the patch also adds a
helper get method which uses the noexcept std::get_if.

Some of the extra cost would also be avoided more cleanly with a
refactor that exposes the alternative types in the public interface,
which will come in another patch.

Diff Detail

Event Timeline

scott.linder created this revision.Aug 23 2023, 3:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 3:40 PM
scott.linder requested review of this revision.Aug 23 2023, 3:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 3:40 PM

I've had this patch series sitting around for a little bit, and when I rebased I noted that there was now a comment from @fdeazeve explicitly pointing out that something like this refactoring would be nice, so I finally got around to cleaning it up and pushing it. Let me know what you think!

I broke things up to give a few different options for where to "stop", in case there was aversion to changing the interface fundamentally. I think the final result of the series is both easier to read and has potential performance benefits, but I'd love to hear feedback on either part.

Thank you so much for doing this! I was about to start working on this, so I'm thankful you posted this when you did! I'll have a look now

fdeazeve accepted this revision.Aug 25 2023, 10:59 AM

Everything looks good to me, this is much cleaner! On my first encounter with this class, it took me a long time to decipher what was going on, and the variant design is way more expressive.


Not for this patch, but it really bothers me that this is mutable. The original rationale must have been to allow sorting when we call the "get" method, but still...

This revision is now accepted and ready to land.Aug 25 2023, 10:59 AM
scott.linder added a comment.EditedAug 25 2023, 12:47 PM

Everything looks good to me, this is much cleaner! On my first encounter with this class, it took me a long time to decipher what was going on, and the variant design is way more expressive.

I know exactly what you mean! I originally started the change as just a way to prove my own understanding to myself. Without having done it I would never have understood the code to begin with.

I'm glad to hear that was a more universal reaction. I think the class slowly accreted many roles/paths, and was sort of "boiled alive". The fact that the only reasonable remedy before std::variant was inheritance or a lot of strife related to using a union member for non-POD types probably didn't help either.

Edit: Also, thank you for taking a look so quickly!


It is almost spooky how you've pointed out the same things I toyed with while working on the patches originally!

I had a patch in the middle somewhere which just exposed the need to sort in the interface, but scrapped it early on. I think your set idea makes a lot of sense and I'll follow up with a patch for that as well.

fdeazeve added inline comments.Aug 29 2023, 5:08 AM

One thing to consider here: sorting (either in the current vector approach or through the internal impl of std::set) require calling Expr->getFragmentInfo()->OffsetInBits as many time as comparison operations are performed, and this is possibly done multiple times for the same expression without any caching.

Something to keep in mind as a possible improvement as well (perhaps computing & storing the offset in advance)

Hey Scott, do you think it will be possible to merge these patches soon-ish? I would like to make some further changes to the entry value side of things, and it would be nice to get this in before.

scott.linder added inline comments.Sep 6 2023, 11:55 AM

I tried to do a bit of (non-statistically-rigorous) benchmarking of a few approaches.

The versions are:

  • main: unmodified main branch
  • patched+uncached: most straightforward switch to std::set, without any caching of the fragment info
  • patched+eager_cached: above, but assume the OffsetInBits will be needed, and cache it in the constructor of FrameIndexExpr
  • patched+lazy_cached: above, but re-introduce a mutable member so the comparison operator can cache the OffsetInBits on first use

The input is a precodegen dump of llvm-tblgen from a FullLTO build of main. Not sure if this is really representative, but it at least contains multi-fragment MMI entries.

If there is a better way to do the benchmark I am happy to redo it, this is just the best I could come up with that would take a reasonable amount of time.

llc -O3 build-lto/bin/llvm-tblgen.0.5.precodegen.bc -o /dev/null -filetype=obj -time-passes

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---

22.3635 ( 28.8%)  14.2489 ( 71.7%)  36.6124 ( 37.5%)  36.6142 ( 37.5%)  X86 Assembly Printer
21.9819 ( 28.1%)  14.6388 ( 71.8%)  36.6207 ( 37.2%)  36.6238 ( 37.2%)  X86 Assembly Printer

21.3484 ( 28.0%)  14.4208 ( 72.4%)  35.7693 ( 37.2%)  35.7706 ( 37.2%)  X86 Assembly Printer
21.4563 ( 28.2%)  14.4606 ( 72.2%)  35.9169 ( 37.4%)  35.9191 ( 37.4%)  X86 Assembly Printer

21.2332 ( 32.0%)  14.3204 ( 76.2%)  35.5536 ( 41.8%)  35.5555 ( 41.8%)  X86 Assembly Printer
21.6691 ( 31.7%)  14.5448 ( 74.4%)  36.2139 ( 41.2%)  36.2149 ( 41.2%)  X86 Assembly Printer

22.4542 ( 27.9%)  14.9983 ( 71.0%)  37.4525 ( 36.9%)  37.4537 ( 36.9%)  X86 Assembly Printer
22.9115 ( 28.3%)  14.7169 ( 69.3%)  37.6284 ( 36.8%)  37.6299 ( 36.8%)  X86 Assembly Printer

I only ran 2 trials so there is a lot of room for variance, but it seems like everything other than patched+eager_cached is more-or-less comparable when comparing User Time as a percent of compilation time. This makes some sense to me, as I suspect the typical case is to not have multiple fragments anyway, so we would never need to compare the offsets, turning the eager caching into pure overhead.

I think for the sake of simplicity in the code it is not worth having the mutable cache at all. If you agree I can post a patch! If this doesn't seem convincing I can do a more rigorous comparison.

fdeazeve added inline comments.Sep 6 2023, 12:04 PM

Oh wow, thanks for doing all of this! Indeed, I agree with you that the complexity is not worth it based on these numbers. I'd say we can start with the simplest implementation and work from there if this becomes a bottleneck.

By the way, we don't need to do this as part of the existing patch series, it seems reasonable to merge the refactors first!

This revision was landed with ongoing or failed builds.Sep 11 2023, 10:33 AM
This revision was automatically updated to reflect the committed changes.