An upcoming patch to LLDB will require the ability to decode base64. This patch adds support for decoding base64 and adds tests.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
58 | https://godbolt.org/z/1rqe19MPM vs https://godbolt.org/z/sa1z6M8xf (although this for GCC) |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
58 | Is there a suffix I need to add to the "-1"? |
Don't use -1 as the invalid table value, use 64 to avoid triggering narrowing conversion errors.
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
58 | unfortunately i don't know off the top of my head |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
58 | if declared static, this will go in RO data and lead to more efficient code, see https://godbolt.org/z/z4zsqKqEn | |
66 | Shouldn't DecodeTable be part of the capture list? | |
77 | nit: multiple | |
85 | I wonder if the code would be simpler to read if the handling of the trailing '=' would be in a separate loop, that way we would have a first loop that does the fast and straight-forward processing of the stream, then a slightly more complex code to handle the trailing bytes? | |
100 | this if could be nested in above else |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
68 | I think declaring constexpr char Invalid = 64 in the function prologue and then using it everywhere would also helps readability ;-) |
Address feedback:
- Handle any bad '=' characters separately before the main loop.
- Remember how many bytes to strip after decoding all bytes.
- Use "Inv" constexpr instead of the magic 64 number to make things more readable
- Update tests for changes
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
85 | Actually this would be a good idea to verify that the string ends with at most two '=' characters, and this would help to detect '=' characters in the middle of the string which would be a different error. |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
69 | Would that make sens to set '=' as Inv here? That would prevent you from the initial find |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
69 | We can't, we need it to decode to zero so we can always decode 3 bytes worth of output. We then drop any extra zero bytes based on "NumBytesToStrip" at the end. It also keeps the logic much cleaner in the main decode loop since we don't have to check. |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
69 | I understand your point. Doing two pass over the sequence just for the sake of error checking bothers me a bit, I'd rather have a different handler for the tail. Do you have any idea of the length of a typical sequence? |
Updated the code to not pass over the input twice. We now check for invalid '=' characters in the decoding loop. Also changed how we strip extra bytes from the end of the decoded input by looking at the back of the Input string since we already detect invalid '=' characters in the main string and would have returned an error already.
Sorry for the back and forth: I find the logic difficult to follow. What would you think of something along these lines (untested) https://godbolt.org/z/vh356fhaP
I found that makes the code a bit harder to read and there is a few dups of the "Invalid Base64 character" error producing code sites.
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
56 | Then any collection class can be used, including SmallVectorImpl<uint8_t>. | |
57 | I can see if that would work. I couldn't remember if you can specialized on return value... | |
69 | I fixed the loop to check for equals in the main loop and got rid of the two loops through the input string. |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
61 | Could we rename this to Base64DecodeTable? It sits in the top-level llvm namespace after all. |
llvm/include/llvm/Support/Base64.h | ||
---|---|---|
61 | this is a static inside of the decodeBase64 function, so it isn't in the top level llvm namespace. |
Your change is failing to build on the PS4 Windows bot, can you take a look and revert if it takes time to fix?
https://lab.llvm.org/buildbot/#/builders/216/builds/6340
FAILED: unittests/Support/CMakeFiles/SupportTests.dir/Base64Test.cpp.obj C:\PROGRA~2\MICROS~1\2019\BUILDT~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe /nologo /TP -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iunittests\Support -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\unittests\Support -Iinclude -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\utils\unittest\googletest\include -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\utils\unittest\googlemock\include /DWIN32 /D_WINDOWS /Zc:inline /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd4324 -w14062 -we4238 /Gw /MD /O2 /Ob2 /EHs-c- /GR- -UNDEBUG -std:c++14 /showIncludes /Founittests\Support\CMakeFiles\SupportTests.dir\Base64Test.cpp.obj /Fdunittests\Support\CMakeFiles\SupportTests.dir\ /FS -c C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\unittests\Support\Base64Test.cpp C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include\llvm/Support/Base64.h(81): error C3493: 'Inv' cannot be implicitly captured because no default capture mode has been specified C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\unittests\Support\Base64Test.cpp(32): note: see reference to function template instantiation 'llvm::Error llvm::decodeBase64<std::vector<char,std::allocator<char>>>(llvm::StringRef,OutputBytes &)' being compiled with [ OutputBytes=std::vector<char,std::allocator<char>> ] C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include\llvm/Support/Base64.h(100): error C2064: term does not evaluate to a function taking 1 arguments C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include\llvm/Support/Base64.h(100): error C2737: 'DecodedByte': const object must be initialized
Why the template? Would something like SmallVectorImpl<uint8_t> work here?