This is an archive of the discontinued LLVM Phabricator instance.

Add support for decoding base64.
ClosedPublic

Authored by clayborg on May 23 2022, 4:03 PM.

Details

Summary

An upcoming patch to LLDB will require the ability to decode base64. This patch adds support for decoding base64 and adds tests.

Diff Detail

Event Timeline

clayborg created this revision.May 23 2022, 4:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 4:03 PM
clayborg requested review of this revision.May 23 2022, 4:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 4:03 PM
clayborg updated this revision to Diff 431515.May 23 2022, 4:05 PM

Removed an extra string that was added to Base64Test::Base64 during testing.

alexander-shaposhnikov added inline comments.
llvm/include/llvm/Support/Base64.h
58

https://godbolt.org/z/1rqe19MPM vs https://godbolt.org/z/sa1z6M8xf (although this for GCC)
-1 -> char - i guess this might trigger -Wnarrowing

clayborg added inline comments.May 23 2022, 4:41 PM
llvm/include/llvm/Support/Base64.h
58

Is there a suffix I need to add to the "-1"?

clayborg updated this revision to Diff 431526.May 23 2022, 5:00 PM

Don't use -1 as the invalid table value, use 64 to avoid triggering narrowing conversion errors.

llvm/include/llvm/Support/Base64.h
58

unfortunately i don't know off the top of my head

llvm/include/llvm/Support/Base64.h
58

if declared static, this will go in RO data and lead to more efficient code, see https://godbolt.org/z/z4zsqKqEn

66

Shouldn't DecodeTable be part of the capture list?

77

nit: multiple

85

I wonder if the code would be simpler to read if the handling of the trailing '=' would be in a separate loop, that way we would have a first loop that does the fast and straight-forward processing of the stream, then a slightly more complex code to handle the trailing bytes?

100

this if could be nested in above else

llvm/include/llvm/Support/Base64.h
68

I think declaring constexpr char Invalid = 64 in the function prologue and then using it everywhere would also helps readability ;-)

clayborg updated this revision to Diff 431794.May 24 2022, 2:23 PM

Address feedback:

  • Handle any bad '=' characters separately before the main loop.
  • Remember how many bytes to strip after decoding all bytes.
  • Use "Inv" constexpr instead of the magic 64 number to make things more readable
  • Update tests for changes
clayborg marked 4 inline comments as done.May 24 2022, 2:24 PM
clayborg added inline comments.
llvm/include/llvm/Support/Base64.h
85

Actually this would be a good idea to verify that the string ends with at most two '=' characters, and this would help to detect '=' characters in the middle of the string which would be a different error.

llvm/include/llvm/Support/Base64.h
69

Would that make sens to set '=' as Inv here? That would prevent you from the initial find

clayborg added inline comments.May 25 2022, 4:25 PM
llvm/include/llvm/Support/Base64.h
69

We can't, we need it to decode to zero so we can always decode 3 bytes worth of output. We then drop any extra zero bytes based on "NumBytesToStrip" at the end. It also keeps the logic much cleaner in the main decode loop since we don't have to check.

llvm/include/llvm/Support/Base64.h
69

I understand your point. Doing two pass over the sequence just for the sake of error checking bothers me a bit, I'd rather have a different handler for the tail. Do you have any idea of the length of a typical sequence?

clayborg updated this revision to Diff 433282.May 31 2022, 10:39 PM

Updated the code to not pass over the input twice. We now check for invalid '=' characters in the decoding loop. Also changed how we strip extra bytes from the end of the decoded input by looking at the back of the Input string since we already detect invalid '=' characters in the main string and would have returned an error already.

Sorry for the back and forth: I find the logic difficult to follow. What would you think of something along these lines (untested) https://godbolt.org/z/vh356fhaP

aprantl added inline comments.Jun 7 2022, 1:56 PM
llvm/include/llvm/Support/Base64.h
56

Why the template? Would something like SmallVectorImpl<uint8_t> work here?

57

I think an ErrorOr<OutputBytes> or Optional<OutputBytes> would be more idiomatic here?

Sorry for the back and forth: I find the logic difficult to follow. What would you think of something along these lines (untested) https://godbolt.org/z/vh356fhaP

I found that makes the code a bit harder to read and there is a few dups of the "Invalid Base64 character" error producing code sites.

llvm/include/llvm/Support/Base64.h
56

Then any collection class can be used, including SmallVectorImpl<uint8_t>.

57

I can see if that would work. I couldn't remember if you can specialized on return value...

69

I fixed the loop to check for equals in the main loop and got rid of the two loops through the input string.

aprantl accepted this revision.Jun 17 2022, 2:18 PM
aprantl added inline comments.
llvm/include/llvm/Support/Base64.h
61

Could we rename this to Base64DecodeTable? It sits in the top-level llvm namespace after all.

This revision is now accepted and ready to land.Jun 17 2022, 2:18 PM
This revision was landed with ongoing or failed builds.Jun 23 2022, 4:13 PM
This revision was automatically updated to reflect the committed changes.
clayborg added inline comments.Jun 23 2022, 4:14 PM
llvm/include/llvm/Support/Base64.h
61

this is a static inside of the decodeBase64 function, so it isn't in the top level llvm namespace.

dyung added a subscriber: dyung.Jun 23 2022, 5:48 PM

Your change is failing to build on the PS4 Windows bot, can you take a look and revert if it takes time to fix?

https://lab.llvm.org/buildbot/#/builders/216/builds/6340

FAILED: unittests/Support/CMakeFiles/SupportTests.dir/Base64Test.cpp.obj 
C:\PROGRA~2\MICROS~1\2019\BUILDT~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe  /nologo /TP -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iunittests\Support -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\unittests\Support -Iinclude -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\utils\unittest\googletest\include -IC:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\utils\unittest\googlemock\include /DWIN32 /D_WINDOWS   /Zc:inline /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd4324 -w14062 -we4238 /Gw /MD /O2 /Ob2  /EHs-c- /GR- -UNDEBUG -std:c++14 /showIncludes /Founittests\Support\CMakeFiles\SupportTests.dir\Base64Test.cpp.obj /Fdunittests\Support\CMakeFiles\SupportTests.dir\ /FS -c C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\unittests\Support\Base64Test.cpp
C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include\llvm/Support/Base64.h(81): error C3493: 'Inv' cannot be implicitly captured because no default capture mode has been specified
C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\unittests\Support\Base64Test.cpp(32): note: see reference to function template instantiation 'llvm::Error llvm::decodeBase64<std::vector<char,std::allocator<char>>>(llvm::StringRef,OutputBytes &)' being compiled
        with
        [
            OutputBytes=std::vector<char,std::allocator<char>>
        ]
C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include\llvm/Support/Base64.h(100): error C2064: term does not evaluate to a function taking 1 arguments
C:\buildbot-root\llvm-clang-x86_64-sie-win\llvm-project\llvm\include\llvm/Support/Base64.h(100): error C2737: 'DecodedByte': const object must be initialized