Dec 19 2016
Dec 2 2016
Dec 1 2016
- Switch to more specific error
- "Support" ASAN in CudaToolChain
I am not sure this is going to work.
You essentially break on the first iteration in the beginning of the loop.
Why not exit the function before the loop?
Also, the loop "claims" the args and by breaking early you leave the args unclaimed.
I don't remember this code well enough to argue about it w/o tests. :(
Before this patch, the following command would fail:
Oct 31 2016
Hi grosser. Sorry for not including the motivation in the commit message. During the review (https://reviews.llvm.org/D25701) I added a comment at the beginning about the motivation for the change in response to the same question from hfinkel. I won't repeat the details here. Instead, I'll just provide the previous link and summarize by saying that the old kernel launch model didn't work with templated CUDA kernels, so I decided not to keep it, but it could return later (hopefully in a more general form).
Oct 27 2016
- Default DeviceIndex for getSymbolMemory
Oct 25 2016
I'm contacting Tanya Lattner to make sure I have set the hooks up correctly for this documentation to be generated and published automatically by the standard LLVM doc scripts. I will wait to check this in until I hear back.
- Add ctors for Expected(Expected<U>)
Would it be worth updating (some of?) the examples to the fluent interface?
- Respond to jlebar's comments 2016-10-24
Oct 24 2016
In addition to responding to jlebar's posted comments, I also removed the acxxel::getPlatform function and replaced it with the two functions acxxel::getCUDAPlatform and acxxel::getOpenCLPlatform. I also added a comment to explain that the CUDA and OpenCL platforms are available out of the box with Acxxel, but that other platforms can be created as well. The old acxxel::getPlatform function made it confusing to think about how to add a new platform because it seemed like the new platform should also be registered somehow to put in on equal footing with CUDA and OpenCL. I hope this new design will be clearer in this aspect.
- Remove asserts in OpenCL example
- Respond to jlebar's OpenCL, util comments
Thanks for the review!
Oct 21 2016
- Early exit if not Failure.ShouldFix
Oct 19 2016
- Fix deleted Span container constructor
- Respond to jlebar's comments on cuda_acxxel.cpp
Oct 18 2016
- Remove fixed_vector.h
Latest patch responds to jlebar's comments on acxxel.cpp and does a couple of other things.
- Removes old Platform::getContext function. It used to be used for launching OpenCL kernels, but is not needed now.
- Cleans up a bunch of minor documentation stuff.
- Remove unused Platform::getContext function
- Documentation fixes
- Respond to jlebar's comments on acxxel.cpp
In my latest patch I responded to jlebar's comments about error handling. The new model in this patch is to have each Stream own its own error state, as was done in StreamExecutor. There is now a function to query the state of the Stream, and all the enqueuing functions that used to return Status now return Stream& instead. This means the fluent Stream launching interface is back as it was in StreamExecutor. Maybe we'll keep the name StreamExecutor for this new thing instead of calling it Acxxel in the end, but we'll keep it as Acxxel for now, at least to distinguish it from the old StreamExecutor code.
- New error handling in stream
- Reorganize kernel launch code
- Move enqueueEvent to Stream
- Respond to jlebar's comments 2
You were right that it did require checking the error at every line. To address this, I've added in a thread_local variable to keep track of the first error status. With this, users can make as many calls as they want without checking the error, then they can do a final check that nothing went wrong.
Should it be thread_local, or local to the stream itself? (Are streams even thread-safe? I would have assumed not, but if they are, we should comment that.)
The one thing I really miss from the old SE is not checking the error at every line. I wonder if we could say that errors carry forward just like they used to? Or maybe they do actually carry forward and I don't need to have an error check on every line -- I haven't gotten to the implementation yet. :)
- Respond to jlebar's comments
- Keep track of first error status per thread
Oct 17 2016
Due to the shift in emphasis away from supporting type-safe kernel launches and the movement of streams from being the central programming entities,
Can you please discuss the motivation for this?
We've decided to come at this problem from a different angle, so I'm abandoning this revision.
Adding arron.ballman as a reviewer as alexfh seems to be on leave for a few weeks.
Oct 10 2016
I just found and fixed another bug in this patch. Before, I wasn't using the spelling location for the fixit hint. This meant that a macro argument that was expanded to two locations, for example, would have the same fixit hint applied to it twice. My new test case verifies that this does not happen anymore.
- Prevent multiple fixes for macro expansion
I found a bug in my first patch that I have fixed now. I was trying to iterate over the source range by using SourceLocation::getLocWithOffset, but I realized that doesn't work, so I removed it and went back to the original method of checking SourceRange.getBegin().isMacroID() and SourceRange.getEnd().isMacroID().
- Return to original checking for macro in range
alexfh, sorry if you are not the right person to review this change. I based my choice on this history of this file.
Sep 27 2016
Sep 26 2016
Sep 15 2016
- Comment on dyn-shared-memory arg efficiency
I've just looked at the CMake source code for this and that's almost exactly what they do :-)
- Convert framework library names
Sep 14 2016
Alas, this still doesn't work for OS X. It seems that CUDA_DRIVER_LIBRARY is set to the full path to the framework (e.g. /Library/Frameworks/cuda.framework) whereas the flag that needs to be passed is -framework cuda. CMake automagically sorts this out when passing the framework path to target_link_libraries, but I'm not sure there's a simple way of getting it to spit out the correct flag for streamexecutor-config in a platform independent manner.
Does this also need the CUDA library to be added to streamexecutor-config output for --libs?
At the moment (with this patch), if I try and link against SE using streamexecutor-config, I get undefined references to all of the CUDA driver functions. If I manually add -framework cuda (OS X), it all works. The CUDA SAXPY example works for me, but I can see from the verbose ninja output that the example is building with -framework cuda already.
- streamexecutor-config report CUDA lib
- Use CMake's standard FindCUDA
- Respond to review comments