This is an archive of the discontinued LLVM Phabricator instance.

Undefined behaviour in Support/DJB.h
Needs RevisionPublic

Authored by troelsfr on Jun 23 2022, 1:25 AM.

Details

Reviewers
nikic
Summary

When LLVM is compiled with undefined behaviour sanitation, the hash function in DJB.h causes following error:

sh
external/llvm-project/llvm/include/llvm/Support/DJB.h:23:12: runtime error: left shift of 4086722279 by 5 places cannot be represented in type 'uint32_t' (aka 'unsigned int')
ISSUE: UndefinedBehaviorSanitizer: undefined-behavior external/llvm-project/llvm/include/llvm/Support/DJB.h:23:12

The issue can be reproduced (and solution verified) using a standalone version of the hash function:

c++
#include <iostream>
#include <cstdint>
#include <string>
  
 inline uint32_t djbHash(std::string Buffer, uint32_t H = 5381) {
   for (unsigned char C : Buffer)
     H = (H  << 5ull) + H + C;
   return H;
 }
  
inline uint32_t djbHashFixed(std::string Buffer, uint64_t H = 5381) {
  for (unsigned char C : Buffer)
    H = ((H & 0xffffffff) << 5ull) + H + C;
  return static_cast<uint32_t>(H);
}

int main()
{
    std::cout << djbHashFixed("Hello world") << std::endl;;
    std::cout << djbHash("Hello world") << std::endl;;
    return 0;
}

Compiling with:

sh
clang++ --std=c++17 -fsanitize=address,integer,undefined -fno-omit-frame-pointer -fno-sanitize-recover=all -g -O0 test.cpp -o test

shows the error occurring in the original hash function, but not in the fixed one:

% ./test
2310742177
test.cpp:7:14: runtime error: left shift of 193458846 by 5 places cannot be represented in type 'uint32_t' (aka 'unsigned int')
ISSUE: UndefinedBehaviorSanitizer: undefined-behavior test.cpp:7:14 in
zsh: abort      ./test

Diff Detail

Event Timeline

troelsfr created this revision.Jun 23 2022, 1:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 1:25 AM
troelsfr requested review of this revision.Jun 23 2022, 1:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 1:25 AM
nikic requested changes to this revision.Jun 23 2022, 5:48 AM
nikic added a subscriber: nikic.

This is not undefined behavior. -fsanitize=integer also enables some instrumentation for well-defined behaviors, e.g. to drive fuzzing. I don't think we should address failures for anything out side the -fsanitize=undefined set in LLVM.

This revision now requires changes to proceed.Jun 23 2022, 5:48 AM

Thanks for your comment.

You are completely right. I got misled by the output
SUMMARY: UndefinedbehaviorSanitizer, but after googling the message I
realised that this was a check added to integer in 2020 as you correctly
pointed out.

I am happy for this suggestion to be closed.

Thanks for your comment.

You are completely right. I got misled by the output
SUMMARY: UndefinedbehaviorSanitizer, but after googling the message I
realised that this was a check added to integer in 2020 as you correctly
pointed out.

I am happy for this suggestion to be closed.

This is something which should be changed, to be less misleading.