This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add Relooper
ClosedPublic

Authored by azakai on Jul 31 2015, 1:06 PM.

Details

Summary

This is just an initial checkin of an implementation of the Relooper algorithm, in preparation for WebAssembly codegen to utilize. It doesn't do anything yet by itself.

The Relooper algorithm takes an arbitrary control flow graph and generates structured control flow from that, utilizing a helper variable when necessary to handle irreducibility. The WebAssembly backend will be able to use this in order to generate an AST for its binary format.

Diff Detail

Repository
rL LLVM

Event Timeline

azakai updated this revision to Diff 31155.Jul 31 2015, 1:06 PM
azakai retitled this revision from to [WebAssembly] Add Relooper.
azakai updated this object.
azakai added a reviewer: sunfish.
azakai set the repository for this revision to rL LLVM.
azakai added a subscriber: llvm-commits.
jfb added a reviewer: jfb.Jul 31 2015, 1:23 PM
jfb removed a subscriber: jfb.
jfb edited edge metadata.Jul 31 2015, 1:35 PM

Could you also add the .cpp file to CMakeLists.txt? The Makefile picks it up automatically...

lib/Target/WebAssembly/Relooper.h
20

It's more common to include the C++ version:

#include <cassert>
#include <cstdio>
#include <cstdarg>
28

You should put everything in a sub-namespace of the llvm namespace.

42

nullptr is the new hotness, here and elsewhere.

61

Could you use llvm/ADT/SetVector.h instead?

105

Could you use llvm/ADT/SetVector.h instead?

jfb added a comment.Jul 31 2015, 1:56 PM

I did a first quick LLVM-style-only pass. I stopped midway, because my comments would be the same, so I'll take another look after a code update.

lib/Target/WebAssembly/Relooper.cpp
26

Same here on C++ versions.

52

Could you just use a std::string here? Or do the const char * even need to be copied? If they don't then you could use a StringRef.

82

Can you instead use a std::unique_ptr in the container?

96

Same here on std::unique_ptr in the container for the above two.

142

With LLVM's containers you should be able to use range-based for loops here and in other places below.

144

std::string or StringRef won't require O(n) strlen :-)

148

Remove or uncomment. Same in a few other places below.

157

Could you make this a command-line option instead, so it can be tuned? See http://llvm.org/docs/CommandLine.html

lib/Target/WebAssembly/Relooper.h
198
276

Instead use the DEBUG macro and LLVM's DEBUG_TYPE:
http://llvm.org/docs/ProgrammersManual.html#the-debug-macro-and-debug-option

Would be good to clang-format this before it goes in... I see a lot of places where '{}'s are used when there's a single statement in the body of the if/for.

lib/Target/WebAssembly/Relooper.cpp
468

I suspect ToInvalidate should be a SmallPtrSet, not a std::list.

jroelofs added inline comments.Jul 31 2015, 2:31 PM
lib/Target/WebAssembly/Relooper.cpp
118

I suspect ToInvestigate should be a SmallPtrSet, not a std::list.

372

I suspect ToInvalidate should be a SmallPtrSet, not a std::list.

414

I suspect Queue should be a SmallPtrSet, not a std::list.

756

ewww... macros. Mind turning these into functions? In the case of SHAPE_SWITCH, that would mean passing it lambdas for simple, multiple and loop.

799

looks like this comment got formatted poorly.

824

looks like this block of comment got formatted poorly.

924

If the object it self has ownership of this stack instead of just the outermost call of this function, then the memory management would be much simpler.

Thanks for the feedback!

About clang-format, I did run it, but I didn't do so with -style=LLVM, I think that's why it didn't fix all the things you expected it to. However, it doesn't look like clang-format fixes '{}'s when there's a single statement in the body of the if/for, so I have to do that manually.

I'll upload an updated diff.

lib/Target/WebAssembly/Relooper.cpp
52

This is actually code that will definitely be changed - we don't want a string of code, but instead some pointer to an LLVM structure. But I thought that removing it at this early time would make it harder to continue to work on this code. In other words, I left it here in order to make it clear what is to be replaced with proper LLVM content, when we get to that point.

82

The unique pointer will be destroyed before we want the object to be destroyed, however (the object lives before and after it is used in this object).

The lifetime of all those is kept by the Relooper instance, when it is released, it frees everything inside it.

118

I basically need a fifo list here, where I constantly take the first from the front and add to the back. Looking at SmallPtrSet, it seems unsuitable. Am I missing something?

756

I worry about performance, since we do traverse the AST in optimization passes quite a bit.

Is there a guarantee that the LLVM optimizer would get rid of the lambda overhead here, if this were written that way?

azakai updated this revision to Diff 31182.Jul 31 2015, 5:32 PM
azakai edited edge metadata.

This should address all the review comments, except for few ones I replied to with further questions or comments.

arsenm added a subscriber: arsenm.Jul 31 2015, 5:50 PM
arsenm added inline comments.
lib/Target/WebAssembly/CMakeLists.txt
12

Would it be possible to move this into lib/Transforms/Scalar and set up as a target independent pass? I would be interested in evaluating this as an alternative to StructurizeCFG

arsenm added inline comments.Jul 31 2015, 6:03 PM
lib/Target/WebAssembly/CMakeLists.txt
12

This would also have the added advantage of being able to write IR->IR tests

Will this be eventually structuring the IR or on MachineBasicBlocks?

lib/Target/WebAssembly/Relooper.cpp
526

.empti() would be preferrable

jfb added inline comments.Jul 31 2015, 8:07 PM
lib/Target/WebAssembly/Relooper.cpp
53

You mean in this patch, or a later one? It would be good to have a FIXME here.

I'm mostly asking because ew strdup and free :-)

83

Hmm, so the lifetime starts in BranchesOut below, but ends elsewhere (ToInvestigate?) while BranchesOut doesn't live for as long?

It's just unusual / undesirable to have manual memory management for algorithms like this. I get confused. Could it maybe instead use a BumpPtrAllocator, which allows you to move the objects around but bounds the lifetime properly? Or do you actually need to delete things during relooping otherwise it would consume too much memory?

I'm asking without understanding the algorithm yet (or rather, I haven't read the code in a while). So I've got my style hat on, feel free to shoo it off if it looks silly.

119

If it usually stay small then LLVM's Small* datastructures can just fit on the stack, and grow to the heap if needed. Nicer than std::list because of this. SmallVector can be a good fifo, but it's only suited to small things than don't usually grow bigger than your stack allocated quantity.

jfb added inline comments.Aug 1 2015, 11:00 AM
lib/Target/WebAssembly/Relooper.cpp
2

.cpp

32

You should put llvm/ before system #includes:
http://llvm.org/docs/CodingStandards.html#include-style
Same for Relooper.h.

75

In many cases you'll want to use for (const auto && or for (const auto &. In this case it's just copying a std::pair of pointers so it's not that bad, but in general you can get some pretty unexpected results with just auto.

96

LLVM style capitalizes the I. I know...
But this could be a range-based for loop anyways :-)
Or even better, some std::unique_ptr.

330

New is OK.

332

You should probably move some of these comments tot he line above, since it looks cramped with 80-column.

483

Should this also be configurable?

655

switch (var->getKind()) with default: llvm_unreachable(...); ?

757

No guarantees, but we're within yelling distance of optimizer people who can fix LLVM's performance :)

jroelofs added inline comments.Aug 2 2015, 8:07 AM
lib/Target/WebAssembly/Relooper.cpp
119

Oh, sorry, I missed that it needed to be a fifo, and not just a bag of things to work on.

139

weirdly formatted comment block.

216

weirdly formatted comment block.

364

weirdly formatted comment block.

407

weirdly formatted comment block.

494

weirdly formatted comment block.

537

weirdly formatted comment block.

573

weirdly formatted comment block.

642

thanks!

azakai added inline comments.Aug 3 2015, 11:45 AM
lib/Target/WebAssembly/Relooper.cpp
53

Adding FIXMEs.

I think in a later patch makes sense - we still don't have a final plan for how to do it, and also it might end up being someone other than me that writes it.

83

Every branch starts out in BranchesOut. Then the algorithm step by step creates the AST, and while doing so, moves them one at a time to ProcessedBranchesOut. This is nice since we often need to iterate over the not-yet-processed ones (so marking them with a flag and ignoring them would be less pleasant).

In theory a shared_ptr could be used in both of those (and in BranchesIn/ProcessedBranchesIn) but the overhead seems pointless to me.

757

Could we leave this for future investigation, then? In my experience with lambdas, the overhead is non-trivial (although I do love them for the code clarity, it's just that here, perf is quite important).

sunfish added inline comments.Aug 3 2015, 11:45 AM
lib/Target/WebAssembly/CMakeLists.txt
12

In reply to these two comments, and "Will this be eventually structuring the IR or on MachineBasicBlocks?",

This pass is closer in function to AMDILCFGStructurizer in the AMDGPU backend, and is meant to be run in a very late codegen pass after most everything else is done. At the same time, WebAssembly's control structures are more expressive than AMDGPU's, and this motivates different algorithms.

Also, there are people working to support serialization of MachineFunctions, which will enable the development of fine-grained unit tests.

We're open to generalizing this code so that other backends can share it. At the same time, the big picture here is that that we're adapting some pre-existing code that we developed for other purposes here, and instead of doing all the work out-of-tree and checking it in as a completed artifact, we're doing much of the work in-tree. The code you see here will likely change significantly before we're really ready to think about this.

azakai updated this revision to Diff 31248.Aug 3 2015, 11:47 AM

Thanks again for the comments. This addresses all the latest batch, except for the ones being discussed in inline comments.

arsenm added inline comments.Aug 3 2015, 11:54 AM
lib/Target/WebAssembly/CMakeLists.txt
12

Will this pass purely be doing the structurizing, or will it also be responsible for insertion of the higher level control flow instructions? I think one of the many failings of AMDILCFGStructurizer is that it attempts to do both at the same time / in the same pass.

jfb added inline comments.Aug 3 2015, 12:50 PM
lib/Target/WebAssembly/CMakeLists.txt
12

It identifies the structure, but acts like an AnalysisPass in that it just passes that info onto later code.

lib/Target/WebAssembly/Relooper.cpp
84

Yeah I'd like to avoid shared_ptr overheads.

It does seem like either unique_ptr or a BumpPtrAllocator would work though.

758

Fine with me, with a FIXME explaining that.

jfb added inline comments.Aug 3 2015, 12:59 PM
lib/Target/WebAssembly/Relooper.cpp
526

Agreed on using .empty() instead of .size() == 0 in a few places.

657

break; here and below.

sunfish added inline comments.Aug 3 2015, 1:11 PM
lib/Target/WebAssembly/CMakeLists.txt
12

The current Relooper algorithm works on arbitrary CFGs, so it doesn't need to do any structurizing.

azakai added inline comments.Aug 3 2015, 1:28 PM
lib/Target/WebAssembly/Relooper.cpp
84

unique_ptr can't work for BranchesIn because the blocks are used in more places, the branches also refer to blocks, for example.

[Processed]BranchesOut could use a unique_ptr+swapping, but it would seem a little inconsistent to me to use it only there. What do you think?

The relooper has a list of blocks. That could be a bump allocator in theory, but the API lets people create blocks and then takes ownership of them. These could alternatively be a unique_ptr, but other things will need to point to those same blocks (branches), so all those would be forced to be simple dumb pointers, and the mix of those seems ugly to me personally. But if you prefer it I can do it.

657

yikes! thanks!

azakai updated this revision to Diff 31258.Aug 3 2015, 1:29 PM

Latest updates (.empty(), breaks, a TODO on perf)

jfb added inline comments.Aug 3 2015, 2:03 PM
lib/Target/WebAssembly/Relooper.cpp
85

unique_ptr can't work for BranchesIn because the blocks are used in more places, the branches also refer to blocks, for example.

Is there one datastructure that always owns the blocks, and others that have references to them (that don't outlive the owner)? For that you could own with unique_ptr, and have non-owning pointers in the other datastructures.

I just find it hard / unusual to track lifetime manually, but I don't want to pay the cost of shared_ptr. :-)

134

!.empty()

159

!.empty()

280

!.empty()

292

!.empty()

341

!.empty()

382

!.empty()

433

!.empty()

512

.empty()

543

!.empty()

580

This and two lines down should probably be size_t (auto would get the right ::size_type).

613

!.empty()

726

Another customization point?

841

!.empty()

azakai updated this revision to Diff 31272.Aug 3 2015, 2:56 PM

More .empty() usage, unique_ptr for block branches, add nesting limit customization.

jfb added inline comments.Aug 3 2015, 3:22 PM
lib/Target/WebAssembly/Relooper.cpp
99

STLExtras.h has the sorely-missing make_unique for this. Same in a few places below.

lib/Target/WebAssembly/Relooper.h
29

Sort includes alphabetically.

azakai marked 57 inline comments as done.Aug 3 2015, 3:25 PM
azakai updated this revision to Diff 31284.Aug 3 2015, 4:25 PM

Use make_unique, fix include ordering.

azakai marked 2 inline comments as done.Aug 3 2015, 4:26 PM

Fixed the last two comments.

jfb added a comment.Aug 7 2015, 3:39 PM

I think this looks pretty good (besides the few other comments). @sunfish and I intend to iterate on this code (he had ideas on how to generate better ASTs), and:

  • Add extensive testing using WebAssembly's textual format.
  • Tune the code (so I'd ignore std::deque versus std::list discussions for now).
  • Make sure to clarify object ownership where possible, instead of using explicit memory management.

I'd still leave the patch open until next week to make sure anyone who wants to comment has a change to. Otherwise, I look forward to having control flow :-)

lib/Target/WebAssembly/Relooper.cpp
758

I discussed this with a few LLVM folks, and they'd rather start off with lambdas instead. We'll make sure to measure perf and ensure inlining isn't an issue.

lib/Target/WebAssembly/Relooper.h
116

It would be nice to move this to the Shape class, and use doxygen-style comments (same for other classes).

140

Could you make this one pure virtual, and implement an override dtor in derived classes?

azakai updated this revision to Diff 31700.Aug 10 2015, 11:24 AM

Now with pure c++11 lambda goodness. Also addresses the other latest comments.

jfb accepted this revision.Aug 11 2015, 5:11 PM
jfb edited edge metadata.

lgtm, with the intent of polishing, testing and performance-tuning the code as we start using it for the WebAssembly backend. We know this code works and does what it's intended to do in the context of Emscripten, so it makes sense to start from there and evolve the code incrementally in-tree for LLVM's/WebAssembly's purpose.

I'll leave this open for now if others want to comment, and I'll commit if it's uncontended.

Thanks!

This revision is now accepted and ready to land.Aug 11 2015, 5:11 PM
This revision was automatically updated to reflect the committed changes.

Hi, I have a project to rewrite llvm code as OpenCL https://github.com/hughperkins/cuda-on-cl It currently outputs lots of conditional branches, is labels and gotos, in the output .I want to 'reloop' this into fors/whiles/ etc. To what extent can I use the code in this PR to achieve this? What are the preferred option(s) for me to use this code from my own code? (eg drop the sourcecod into my repo? build as a shared object somehow? some other approach?)

Hugh

kripken added a subscriber: kripken.Nov 6 2016, 9:29 AM

Hi, I have a project to rewrite llvm code as OpenCL https://github.com/hughperkins/cuda-on-cl It currently outputs lots of conditional branches, is labels and gotos, in the output .I want to 'reloop' this into fors/whiles/ etc. To what extent can I use the code in this PR to achieve this? What are the preferred option(s) for me to use this code from my own code? (eg drop the sourcecod into my repo? build as a shared object somehow? some other approach?)

The best option is probably to take the Relooper implementation from Binaryen,

https://github.com/WebAssembly/binaryen/tree/master/src/cfg

In LLVM this PR landed but I'm not sure how much it was used or tested, and it seems to have been removed at some point. In Binaryen the Relooper is in active use, heavily tested and fuzzed, and also has a bunch of optimizations and improvements since the LLVM version.

I would just grab the Relooper.h and Relooper.cpp files from Binaryen and use those (might have some minor dependencies on headers that you can grab too). Happy to help with any questions you have in that process.

Awesome, thanks! Will take a look.