This is an archive of the discontinued LLVM Phabricator instance.

[Support/Hash functions] Change the `final()` and `result()` of the hashing functions to return an array of bytes
ClosedPublic

Authored by akyrtzi on Apr 4 2022, 5:05 PM.

Details

Summary

Returning std::array<uint8_t, N> is better ergonomics for the hashing functions usage, instead of a StringRef:

  • When returning StringRef, client code is "jumping through hoops" to do string manipulations instead of dealing with array of bytes directly, which is more natural
  • Returning std::array<uint8_t, N> avoids the need for the hasher classes to keep a field just for the purpose of wrapping it and returning it as a StringRef

As part of this patch also change the BLAKE3 class API to match HashBuilder's generic hash API.

Diff Detail

Event Timeline

akyrtzi created this revision.Apr 4 2022, 5:05 PM
Herald added a reviewer: rriddle. · View Herald Transcript
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a reviewer: rafauler. · View Herald Transcript
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
akyrtzi requested review of this revision.Apr 4 2022, 5:05 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptApr 4 2022, 5:05 PM
MaskRay accepted this revision.Apr 4 2022, 5:21 PM
This revision is now accepted and ready to land.Apr 4 2022, 5:21 PM
akyrtzi updated this revision to Diff 420359.Apr 4 2022, 5:44 PM

Adjust ASTFileSignature to accept only the array hash bytes and directly from the final() call.

dexonsmith added inline comments.Apr 4 2022, 7:23 PM
llvm/include/llvm/Support/BLAKE3.h
39

The visual noise of BLAKE3<> everywhere is a bit unfortunate.

Another option:

// Hardcoded to 32B. OR same implementation as before, with optional template
// parameters to choose a size when accessing the hash.
class BLAKE3 { /* ... */ };

// Wrap final(), result(), and hash() to truncate the hash.
template <size_t TruncatedNumBytes> class TruncatedBLAKE3 : public BLAKE3 { ... };

WDYT?

llvm/include/llvm/Support/MD5.h
75

Should this (and result() below) be MD5Result? It has an implicit cast to std::array<uint8_t, 16>. Maybe it's better as you have it... not sure...

llvm/lib/Support/SHA1.cpp
289

Should this be HASH_LENGTH instead of 20?

akyrtzi updated this revision to Diff 420377.Apr 4 2022, 9:38 PM
  • Move BLAKE3 back to templated sizes for final() and result() and add TruncatedBLAKE3 that has the size parameter on the class.
  • Make MD5Result inherit from std::array<uint8_t, 16> which both simplifies its API and makes it a better fit for replacing StringRef.
  • Use HASH_LENGTH in implementations of SHA1::final() and SHA256::final()
llvm/include/llvm/Support/BLAKE3.h
39

Good idea! I moved BLAKE3 back to templated sizes for final() and result() and added TruncatedBLAKE3 that has the size parameter on the class.
TruncatedBLAKE3 is useful for using BLAKE3 as the hasher type for HashBuilder with non-default hash sizes, otherwise one can use BLAKE3 for all other uses.

llvm/include/llvm/Support/MD5.h
75

I avoided MD5Result due to not being as good as std::array for a "drop-in replacement" for StringRef due to lack of data() and size().
But it occurred to me that MD5Result could just inherit from std::array<uint8_t, 16> which would make it better fit to replace StringRef and improves & simplifies its API, see updated patch.

llvm/lib/Support/SHA1.cpp
289

Updated 👍

akyrtzi updated this revision to Diff 420378.Apr 4 2022, 9:45 PM

Also revert the README changes to the previous version of BLAKE3 class.

Amir added inline comments.Apr 4 2022, 10:10 PM
bolt/lib/Core/DebugData.cpp
823–824

I think it would be more in line with LLVM coding standards to expand auto in this case (and others) – see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable.
My understanding is that auto is fine where the type is obvious from the context (is explicitly available on the same line e.g. with casts), is abstract (T::iterator types), or doesn't matter (e.g. lambdas).

jhenderson added inline comments.Apr 4 2022, 11:10 PM
bolt/lib/Core/DebugData.cpp
823–824

+1 (also relevant in other places in this patch)

yota9 removed a subscriber: yota9.Apr 4 2022, 11:21 PM
akyrtzi updated this revision to Diff 420545.Apr 5 2022, 9:19 AM

Expand type instead of using auto

akyrtzi added inline comments.Apr 5 2022, 9:21 AM
bolt/lib/Core/DebugData.cpp
823–824

Updated 👍

akyrtzi marked 5 inline comments as done.Apr 5 2022, 9:28 AM
Amir accepted this revision.Apr 5 2022, 10:25 AM

BOLT changes LGTM

This revision was landed with ongoing or failed builds.Apr 5 2022, 9:38 PM
This revision was automatically updated to reflect the committed changes.