This patch reduces file size by dropping variable locations a debugger user
will not see.
Background
PR45889 [1] describes a bug where we get a variable location for an instruction
range that exists outside of the instruction range of the scope that it is
declared in.
We discovered the root cause of that particular problem lies in the handling of
parameters and argument values through SelectionDAG. Locations like these are
superfluous. Test llvm/test/DebugInfo/COFF/register-variables.ll shows the same
problem occuring as a result of a seemingly incorrect scope range (more on this
later). In cases like this the variable locations appear to be sane and it is
that the scope ranges are wrong or misleading.
A debugger user will not get to see these out of scope locations. Not only do
they increase the size of the binary in and of themselves, but having more than
one location will prevent us from emitting locations that cover their entire
scope as single locations. Single locations are desirable because they take up
less space than location lists with single entries.
Solution
This patch drops variable locations which exist entirely outside of the
variable's scope. The way it is implemented is simple: after building the debug
entity history map we loop through it. For each variable we look at each entry.
If the entry opens a location range which does not intersect any of the
variable's scope's ranges then we mark it for removal. After visiting the
entries for each variable we also mark any clobbering entries which will no
longer be referenced for removal, and then finally erase the marked entries.
This all requires the ability to query the order of instructions, so before
this runs we number them.
Results
Building CTMark with CMAKE_BUILD_TYPE=RelWithDebInfo without (base) and with
(patched) this patch yields a geomean binary size difference of -1.9%, with the
.debug_loc sections up to nearly 14% smaller in some cases. For a clang-3.4
build I see similar percentage savings to sqlite3 in the suite below.
Program base patched diff test-suite :: CTMark/SPASS/SPASS.test 3971320 3814544 -3.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 14357584 13812000 -3.8% test-suite...TMark/7zip/7zip-benchmark.test 9190760 8885512 -3.3% test-suite :: CTMark/Bullet/bullet.test 7403096 7182752 -3.0% test-suite...:: CTMark/sqlite3/sqlite3.test 4077552 4003072 -1.8% test-suite :: CTMark/kimwitu++/kc.test 5117728 5038336 -1.6% test-suite...:: CTMark/ClamAV/clamscan.test 2545312 2526184 -0.8% test-suite :: CTMark/lencod/lencod.test 2643960 2631568 -0.5% test-suite...Mark/mafft/pairlocalalign.test 1530904 1526760 -0.3% test-suite...-typeset/consumer-typeset.test 1527120 1524040 -0.2% Geomean difference -1.9%
My machine is not set up for precise performance measurements, but I observed
no significant change in compile time (0.0x% change in either direction).
The function validThroughout in DwarfDebug.cpp returns true when a location can
be considered a single location for a variable. Earlier I mentioned that single
locations are desirable. Here are some numbers from a clang-3.4 build that show
the patch improving our abillity to detect them:
base patched times validThroughout is called 3676550 3638269 times validThroughout returns true 1470517 1541733 percentage of calls returning true 40.0% 42.4%
For these benchmarks 'base' is clang at 772349de887, and 'patched' is that
commit with the patch applied on top.
Tests
Added llvm/test/DebugInfo/X86/trim-var-locs.mir
Modified llvm/test/DebugInfo/X86/live-debug-variables.ll
Modified llvm/test/DebugInfo/ARM/PR26163.ll
In both tests an out of scope location is now removed. The remaining location
covers the entire scope of the variable allowing us to emit it as a single
location.
Modified llvm/test/DebugInfo/COFF/register-variables.ll
Branch folding merges the tails of if.then and if.else into if.else. Each
blocks' debug-locations point to different scopes so when they're merged we
can't use either. Because of this the variable 'c' ends up with a location
range which doesn't cover any instructions in its scope; with the patch applied
the location range is dropped and its flag changes to IsOptimizedOut.
Related future work?
The simple instruction numbering added in this patch can be used to improve how
we detect single locations in validThroughout (DwarfDebug.cpp). With a small
change on top of this patch we can reduce binary size even further, and
potentially by a similar magnitude. In addition it allows us to replaces a
linear scan in validThroughout with a map lookup.
Summary
This patch reduces the binary size of RelWithDebInfo builds by an average of
1.9%, and by nearly 4% in one case, across 11 applications by dropping variable
locations which a debugger user will never see. Some of these locations exist
in the first place as a result of bugs in clang so there's an argument that
this patch should not land, and instead we should make a verifer pass (either
in clang and llc, or in llvm-dwarfdump), and focus on fixing the issues they
reveal. OTOH this patch has immediate and tangible wins, and doesn't preclude
work on fixing those fundamental issues.
What do you think?
Thank you for taking the time to read this,
Orlando
Nit: It's nice to use Doxygen-style /// comments here, since IDEs can format it specially and integrate it in help browsers etc.