This is an archive of the discontinued LLVM Phabricator instance.

[Support] Implement zlib independent crc32 computation
ClosedPublic

Authored by evgeny777 on Mar 26 2019, 5:21 AM.

Details

Summary

The .gnu_debuglink section contains CRC32 checksum of debug information file. When LLVM is compiled without zlib support this checksum is completely ignored, which is error prone. This patch implements simple CRC32 calculation algorithm.

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 created this revision.Mar 26 2019, 5:21 AM
lebedev.ri added inline comments.
include/llvm/Support/CRC.h
22 ↗(On Diff #192260)

I'm not sure if oss-fuzz builds llvm with zlib or not, but if it does,
this is a *great* fuzzing target, to cross-check with the zlib output.

grimar added inline comments.Mar 26 2019, 5:30 AM
lib/DebugInfo/Symbolize/Symbolize.cpp
167 ↗(On Diff #192260)

What about always using non-zlib implementation?

evgeny777 marked 2 inline comments as done.Mar 26 2019, 6:01 AM
evgeny777 added inline comments.
include/llvm/Support/CRC.h
22 ↗(On Diff #192260)

LLVM_ENABLE_ZLIB is ON by default, so whether LLVM is built with zlib support or not depends on HAVE_ZLIB_H.
Are you suggesting something like llvm-crc32-fuzzer?

lib/DebugInfo/Symbolize/Symbolize.cpp
167 ↗(On Diff #192260)

zlib implementation is most likely faster, especially with -DBYFOUR. Also somebody may be using llvm::zlib::crc32

lebedev.ri marked an inline comment as done.Mar 26 2019, 6:16 AM
lebedev.ri added inline comments.
include/llvm/Support/CRC.h
22 ↗(On Diff #192260)

I guess this is a question of naming. I'd say:
llvm-crc32-fuzzer would be one to simply run this code only.
llvm-llvmcrc32-vs-zlibcrc32-fuzzer would run this code, the equivalent zlib code, and assert that they returned the same.

joerg added a subscriber: joerg.Mar 26 2019, 6:19 AM

I'd recomment copying the version from libarchive (https://github.com/libarchive/libarchive/blob/master/libarchive/archive_crc32.h):

  • don't bother with the 1KB table in both binary and source, just recompute it on the first use.
  • at least in the past unrolling the inner loop somewhat helped a lot

I wouldn't really fuzz test this, just build a set of test vector and be done.

don't bother with the 1KB table in both binary and source, just recompute it on the first use

Is single threaded usage guaranteed?

Is there really any good case to build LLVM without zlib? Maybe it'd be better to just require zlib unconditionally (or mark building it really unsupported) rather than bundling extra algorithms that will be used only conditionally (until someone mistakenly uses it unconditionally, and LLVM would start using different CRC32 impls in different places) like this.

Is there really any good case to build LLVM without zlib?

ZLib is still unavailable for windows builds afaik.

joerg added a comment.Mar 26 2019, 8:55 AM

don't bother with the 1KB table in both binary and source, just recompute it on the first use

Is single threaded usage guaranteed?

It doesn't matter, at most a single memory fence is necessary.

aprantl added inline comments.Mar 26 2019, 8:58 AM
lib/Support/CMakeLists.txt
78 ↗(On Diff #192260)

Shouldn't this be only conditionally compiled if zlib isn't available? It seems wrong to have the lookup table twice.

It doesn't matter, at most a single memory fence is necessary.

Yep. However, you don't have it in your code. The volatile int crc_tbl_inited prevents compiler optimizations but doesn't prevent out-of-order execution as far as I know.

@aprantl

Shouldn't this be only conditionally compiled if zlib isn't available?

How about putting the call to zlib::crc32 inside getCRC32 and using getCRC32 from Symbolize.cpp and other places?

evgeny777 updated this revision to Diff 192423.Mar 27 2019, 3:50 AM

New implementation taken from libarchive (dynamic crc table)

joerg added inline comments.Mar 27 2019, 6:39 AM
lib/Support/CRC.cpp
21 ↗(On Diff #192423)

A reference in the commit message is good enough.

23 ↗(On Diff #192423)

You can just use llvm::call_once for the init.

27 ↗(On Diff #192423)

I'd prefer to add the U for the constant here and below as they don't fit into the normal integer range.

44 ↗(On Diff #192423)

uint8_t or at least unsigned char would still be better here.

evgeny777 updated this revision to Diff 192575.Mar 28 2019, 1:03 AM

Addressed review comments

I still feel this code should be in an #ifndef HAVE_ZLIB with the #else branch forwarding the call into zlib.

evgeny777 updated this revision to Diff 193854.Apr 5 2019, 4:34 AM

Addressed comments

aprantl accepted this revision.Apr 5 2019, 8:46 AM
aprantl added inline comments.
include/llvm/Support/CRC.h
21 ↗(On Diff #193854)

///

This revision is now accepted and ready to land.Apr 5 2019, 8:46 AM
aprantl added inline comments.Apr 5 2019, 8:47 AM
include/llvm/Support/CRC.h
22 ↗(On Diff #193854)

Why not just call this llvm::crc32()?

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2019, 4:25 AM
Herald added a subscriber: kristina. · View Herald Transcript