This is an archive of the discontinued LLVM Phabricator instance.

Unify the two CRC implementations
ClosedPublic

Authored by hans on Oct 7 2019, 4:48 AM.

Details

Summary

David added the JamCRC implementation in r246590. More recently, Eugene added a CRC-32 implementation in r357901, which falls back to zlib's crc32 function if present.

These checksums are essentially the same, so having multiple implementations seems unnecessary. This replaces the CRC-32 implementation with the simpler one from JamCRC, and implements the JamCRC interface in terms of CRC-32 since this means it can use zlib's implementation when available, saving a few bytes and potentially making it faster.

JamCRC took an ArrayRef<char> argument, and CRC-32 took a StringRef. This patch changes it to ArrayRef<uint8_t> which I think is the best choice, and simplifies a few of the callers nicely.

Please take a look!

Diff Detail

Event Timeline

hans created this revision.Oct 7 2019, 4:48 AM
thakis accepted this revision.Oct 7 2019, 5:06 AM

Nice!

This revision is now accepted and ready to land.Oct 7 2019, 5:06 AM
hans updated this revision to Diff 223518.Oct 7 2019, 5:17 AM

Found a call in LLDB too.

joerg added a subscriber: joerg.Oct 7 2019, 7:24 AM

Why go back to the large tables for crc32? Just because JamCRC had that bug doesn't mean it should persist.

hans added a comment.Oct 7 2019, 7:43 AM

Why go back to the large tables for crc32? Just because JamCRC had that bug doesn't mean it should persist.

Because just using the table is much simpler and we already have it: no need for any run-time initialization and fancy code like call_once. Why do you consider it a bug? Generating a constant table like this at run-time -- again and again for each invocation of the program -- seems less than ideal to me.

Why go back to the large tables for crc32? Just because JamCRC had that bug doesn't mean it should persist.

Because just using the table is much simpler and we already have it: no need for any run-time initialization and fancy code like call_once. Why do you consider it a bug? Generating a constant table like this at run-time -- again and again for each invocation of the program -- seems less than ideal to me.

Do you have any benchmarks?

A table is simpler in some regards, but also less readable in another sense (what are these random hex values?). Having benchmark results helps settle that debate.

That said, general +1 to removing code complexity/duplication

llvm/lib/Support/CRC.cpp
26

Can you leave a comment how this table was generated/how it could be regenerated if needed in the future? And/or a unit test to assert the values are correct?

Also, in practice most clients will build against zlib and not see the tables. +1 to the current approach :)

I'd personally prefer either the non-table approach or having the tables generated at build time. Given this is only going to be used rarely, I don't think we should clutter the code with big tables.

hiraditya added inline comments.Oct 7 2019, 12:16 PM
llvm/lib/Support/CRC.cpp
26

+1

ruiu added a subscriber: ruiu.Oct 7 2019, 11:01 PM
ruiu added inline comments.
llvm/include/llvm/Support/JamCRC.h
45

This is the only place where you pass non-zero value as the first argument, and the way how that value is handled is a little irregular. So how about moving this class to CRC.h and define llvm::crc32 as llvm::crc32(ArrayRef<uint8_t>)?

hans marked 5 inline comments as done.Oct 8 2019, 1:32 AM

Why go back to the large tables for crc32? Just because JamCRC had that bug doesn't mean it should persist.

Because just using the table is much simpler and we already have it: no need for any run-time initialization and fancy code like call_once. Why do you consider it a bug? Generating a constant table like this at run-time -- again and again for each invocation of the program -- seems less than ideal to me.

Do you have any benchmarks?

Of what? That generating the table is slower than just using it directly? I'd say no benchmark is needed to conclude that :-)

A table is simpler in some regards, but also less readable in another sense (what are these random hex values?). Having benchmark results helps settle that debate.

It's the standard table for the CRC-32 polynomial. I'll add a comment with some references.

Also, in practice most clients will build against zlib and not see the tables. +1 to the current approach :)

Windows builds will typically not use zlib though, so this code does get used :-)

I'd personally prefer either the non-table approach or having the tables generated at build time. Given this is only going to be used rarely, I don't think we should clutter the code with big tables.

It's not used rarely, it's used all the time on Windows. Generating the table at build time would be nice in theory, but writing a tablegen and hooking it up in the build system seems like overkill for this.

llvm/include/llvm/Support/JamCRC.h
45

Actually, lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp also passes a non-zero value (from ObjectFileELF::CalculateELFNotesSegmentsCRC32).

But that's not the common case. Let's make an overload for the common case, and moving JamCRC into CRC.h makes sense too.

llvm/lib/Support/CRC.cpp
26

I'm adding a comment with references for how the algorithm and the table works. There is already a unit test in llvm/unittests/Support/CRCTest.cpp

hans updated this revision to Diff 223798.Oct 8 2019, 1:33 AM
hans marked an inline comment as done.
hans edited the summary of this revision. (Show Details)

Adding comments, moving JamCRC into CRC.h, and adding a one-parameter crc32 overload.

mgorny added a comment.Oct 8 2019, 2:33 AM

Doesn't winapi provide some method of computing crc32?

hans added a comment.Oct 8 2019, 3:29 AM

Doesn't winapi provide some method of computing crc32?

Not that I'm aware.

Do you have any benchmarks?

Of what? That generating the table is slower than just using it directly? I'd say no benchmark is needed to conclude that :-)

There is also the cost of code complexity to consider. Sure, the table generation goes away, but does that even show up on any benchmark/real world usage of this? Nobody's exercising the table generation code. People are either check-summing once (in which case, it doesn't matter as much if it's slow, because it happens once) or check-summing 1000 times (in which case the one-time table generation is probably not the CRC bottleneck). If there's no performance win, is it worth the potential code complexity of an opaque hex array over constructing it with an algorithm that can be inspected?

FWIW I'm still in favor of this patch, and the objcopy part that Herald added me for LGTM. Addressing my comments would help me understand this code but feel free to ignore if I'm the only one that feels this way.

llvm/lib/Support/CRC.cpp
26

From what I can tell:

  • This algorithm makes use of one byte from this table per byte in the input array
  • The table has 256 entries
  • The test only includes "The quick brown fox jumps over the lazy dog" and "123456789" (43 total values if I'm counting correctly)

There's no way that test sufficiently tests that all 256 values here are correct. Maybe check summing a very large (>100k) buffer would give confidence that -- probably -- all values are being used.

I think the unit test is sufficient for testing a generated table because it's hard to mess up the generation in a way that only affects some values. However with a hard-coded table, it's much more likely that a single value here could be wrong due to mechanical issues (e.g. accidentally changing a character in the process of formatting it).

29–37

I'd personally find keeping this as a comment to be more useful in understanding how the table is generated than digging up papers

rnk added a comment.Oct 8 2019, 10:50 AM

Maybe a dumb idea: can we compute the table with constexpr evaluation? You could set up a constexpr function that returns a struct that wraps the array, and then the body of the function would construct the array imperatively as the old initialization code did. Set up a constexpr global with an initializer that calls the function.

If not all compilers support this, we could instead use static_assert to check that the table is correct under an #ifdef __clang__, or whatever conditions apply.

One possible drawback of this approach is compile time. If that ends up mattering, it could be ifdef EXPENSIVE_CHECKS or something awful like that.

Another drawback is that I have seen MSVC accept globals marked constexpr, but then sometimes they get dynamic initializers anyway. The static_assert approach might be safer.

hans marked 4 inline comments as done.Oct 9 2019, 1:32 AM

Do you have any benchmarks?

Of what? That generating the table is slower than just using it directly? I'd say no benchmark is needed to conclude that :-)

There is also the cost of code complexity to consider. Sure, the table generation goes away, but does that even show up on any benchmark/real world usage of this? Nobody's exercising the table generation code. People are either check-summing once (in which case, it doesn't matter as much if it's slow, because it happens once) or check-summing 1000 times (in which case the one-time table generation is probably not the CRC bottleneck). If there's no performance win, is it worth the potential code complexity of an opaque hex array over constructing it with an algorithm that can be inspected?

You're right that it probably doesn't matter much, it's more the principle that it's wasteful to compute it at runtime.

I think maybe our disagreement comes from that I actually see including the table directly as *less* code complexity than the way we currently generate it. It's a code pattern for computing CRC-32. If you search the web for how to compute CRC-32, this table shows up. Some specs even define CRC-32 in terms of the table (e.g. https://docs.microsoft.com/en-us/openspecs/office_protocols/ms-abs/06966aa2-70da-4bf9-8448-3355f277cd77).

I think the code to generate the table (and to use it) is just as opaque. Where did the current code come from? It's from https://github.com/libarchive/libarchive/blob/master/libarchive/archive_crc32.h How did it work? There are no comments.

It's a fancy algorithm and there's no way around reading some external sources to figure out how it works. Using the pre-computed table directly removes some unnecessary work and in my opinion it's clearer than the previous code.

FWIW I'm still in favor of this patch, and the objcopy part that Herald added me for LGTM. Addressing my comments would help me understand this code but feel free to ignore if I'm the only one that feels this way.

I appreciate your comments and I hope I've addressed them satisfactorily.

In D68570#1700080, @rnk wrote:

Maybe a dumb idea: can we compute the table with constexpr evaluation? You could set up a constexpr function that returns a struct that wraps the array, and then the body of the function would construct the array imperatively as the old initialization code did. Set up a constexpr global with an initializer that calls the function.

This sounds fun, but I don't think it would be a step towards less code complexity :-)

If not all compilers support this, we could instead use static_assert to check that the table is correct under an #ifdef __clang__, or whatever conditions apply.

One possible drawback of this approach is compile time. If that ends up mattering, it could be ifdef EXPENSIVE_CHECKS or something awful like that.

Another drawback is that I have seen MSVC accept globals marked constexpr, but then sometimes they get dynamic initializers anyway. The static_assert approach might be safer.

I've extended the unit test to exercise the whole table. That should give us the benefit of static_assert'ing that it's correct, without the code complexity or compile time.

llvm/lib/Support/CRC.cpp
26

That's a fair point. I've added a test that exercises the whole table.

29–37

But to understand this code, one would need to dig up papers too.

hans updated this revision to Diff 223999.Oct 9 2019, 1:34 AM
hans marked 2 inline comments as done.

Extend unit test; clang-format.

This revision was automatically updated to reflect the committed changes.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 9 2019, 2:11 AM
llvm/lib/Support/CRC.cpp