This is an archive of the discontinued LLVM Phabricator instance.

opt-viewer parallelized
AbandonedPublic

Authored by bcain on Nov 16 2016, 9:39 PM.

Details

Summary

Attempt to put opt-viewer critical items in parallel

Requires features from Python 2.7

Cruft left behind: timing printouts

TBD: Not identical output with reference, doesn't look like obvious bug but bears study before results can be satisfactory.

Performance
Below are performance results across various configurations. These were taken on an i5-5200U (dual core + HT). They were taken with a small subset of the YAML output of building Python 3.6.0b3 with LTO+PGO. 60 YAML files.

"multiprocessing" is the current submission contents. "baseline" is as of 544f14c6b2a07a94168df31833dba9dc35fd8289 (I think this is aka r287505).

"ImportError" vs "class<...CLoader>" below are just confirming the expected configuration (with/without CLoader).

**************************************** MULTIPROCESSING ****************************************
PyYAML:
	Traceback (most recent call last):
	  File "<string>", line 1, in <module>
	ImportError: cannot import name CLoader
	Python 2.7.12
processed input files 55.1229741573 secs
mapped remarks 0.0160808563232 secs
file rendering 1.31177091599 sec
index rendering 0.514272928238 sec
total time 56.9831430912 secs
160.52user 1.44system 0:57.39elapsed 282%CPU (0avgtext+0avgdata 157940maxresident)k
0inputs+7880outputs (0major+204566minor)pagefaults 0swaps

PyYAML+libYAML:
	<class 'yaml.cyaml.CLoader'>
	Python 2.7.12
processed input files 7.43243384361 secs
mapped remarks 0.0163559913635 secs
file rendering 1.27356004715 sec
index rendering 0.503628969193 sec
total time 9.24395084381 secs
21.52user 1.39system 0:09.66elapsed 237%CPU (0avgtext+0avgdata 158652maxresident)k
0inputs+7880outputs (0major+207754minor)pagefaults 0swaps

PyPy/PyYAML:
	Traceback (most recent call last):
	  File "<module>", line 1, in <module>
	ImportError: cannot import name 'CLoader'
	Python 2.7.12 (aff251e54385, Nov 09 2016, 18:02:49)
	[PyPy 5.6.0 with GCC 4.8.2]
processed input files 17.7338540554 secs
mapped remarks 0.0163650512695 secs
file rendering 2.23697400093 sec
index rendering 0.571249961853 sec
total time 20.5806679726 secs
57.87user 1.95system 0:21.16elapsed 282%CPU (0avgtext+0avgdata 282260maxresident)k
0inputs+7328outputs (0major+286354minor)pagefaults 0swaps
**************************************** BASELINE        ****************************************
PyYAML:
	Traceback (most recent call last):
	  File "<string>", line 1, in <module>
	ImportError: cannot import name CLoader
	Python 2.7.12
94.07user 0.86system 1:35.82elapsed 99%CPU (0avgtext+0avgdata 112204maxresident)k
0inputs+7880outputs (0major+30906minor)pagefaults 0swaps

PyYAML+libYAML:
	<class 'yaml.cyaml.CLoader'>
	Python 2.7.12
11.52user 0.64system 0:12.57elapsed 96%CPU (0avgtext+0avgdata 109588maxresident)k
0inputs+7880outputs (0major+29965minor)pagefaults 0swaps

PyPy/PyYAML:
	Traceback (most recent call last):
	  File "<module>", line 1, in <module>
	ImportError: cannot import name 'CLoader'
	Python 2.7.12 (aff251e54385, Nov 09 2016, 18:02:49)
	[PyPy 5.6.0 with GCC 4.8.2]
18.86user 0.97system 0:20.52elapsed 96%CPU (0avgtext+0avgdata 209160maxresident)k
0inputs+7880outputs (0major+48344minor)pagefaults 0swaps

Diff Detail

Event Timeline

bcain updated this revision to Diff 78319.Nov 16 2016, 9:39 PM
bcain retitled this revision from to opt-viewer parallelized.
bcain updated this object.
bcain added a reviewer: anemet.
bcain added a comment.Nov 16 2016, 9:49 PM

Other notes:

  • IIRC llvm depends on Python 2.7 so leveraging features from there shouldn't be a big deal.
  • In the multiprocessing case we probably have a concurrency bug with the c++filt file descriptor, so we'd either need a multiprocessing.Lock() block in demangle() or have each process spawn their own.
  • htmlformatter encoding was necessary to correctly handle input
  • os.path.exists()/os.mkdir() was a TOCTOU bug.
  • processes should probably have a heuristic derived from info like nproc

\

utils/opt-viewer/opt-viewer.py
329

Now that I take a look at it this logic might not be right to cascade that merged defaultdict[defaultdict[list]], I will investigate.

bcain updated this revision to Diff 78323.Nov 16 2016, 10:33 PM

Fixed file remarks dictionary merge.

bcain marked an inline comment as done.Nov 16 2016, 10:35 PM

Fixed dict merge implementation.

anemet edited edge metadata.Nov 17 2016, 9:15 AM

Other notes:

  • IIRC llvm depends on Python 2.7 so leveraging features from there shouldn't be a big deal.
  • In the multiprocessing case we probably have a concurrency bug with the c++filt file descriptor, so we'd either need a multiprocessing.Lock() block in demangle() or have each process spawn their own.

I don't think so. Demangle is not used during parsing.

  • htmlformatter encoding was necessary to correctly handle input

Is there a way to detect if the input is UTF-8? This can probably go in irrespective of this review.

Added a few Python specific comments.

utils/opt-viewer/opt-viewer.py
22

I'd move that right before import os.path.

31

I think PEP-8 suggests to group standard library imports together.

288

Is there a reason for having the imports here and not having them with the other imports?

337

It looks like hotnesses isn't used except for calculating the max_hotness. If that's the case I think you could use a generator expression instead of the list comprehension, which creates a new list object in memory.

  • In the multiprocessing case we probably have a concurrency bug with the c++filt file descriptor, so we'd either need a multiprocessing.Lock() block in demangle() or have each process spawn their own.

I don't think so. Demangle is not used during parsing.

Sorry, you're right, I missed the fact that you also parallelized file rendering.

bcain marked an inline comment as not done.Nov 17 2016, 8:52 PM

Per discussion on the list I will clean this up and make a new review.

Florian, I'll make the suggested changes.

bcain updated this revision to Diff 78761.Nov 21 2016, 6:05 PM
bcain edited edge metadata.
bcain set the repository for this revision to rL LLVM.

Rebased for submission given review feedback
omitted timing printouts
deleted multithreading code (strictly multiprocessing now)

bcain added a comment.Nov 21 2016, 6:06 PM

The 'hotness floor' content was commented out but should've been entirely omitted.

utils/opt-viewer/opt-viewer.py
386

whoops, I'd meant to exclude this from the submission. I'll purge it and consider it for a future submission.

bcain updated this revision to Diff 78817.Nov 21 2016, 6:07 PM
bcain added a reviewer: llvm-commits.

omitted "hotness floor" content

bcain added a comment.Nov 21 2016, 6:16 PM

Below are performance results across various configurations. These were taken on an i5-5200U (dual core + HT). They were taken with a small subset of the YAML output of building Python 3.6.0b3 with LTO+PGO. 60 YAML files.

"multiprocessing" is the current submission contents. "baseline" is as of 544f14c6b2a07a94168df31833dba9dc35fd8289 (I think this is aka r287505).

"ImportError" vs "class<...CLoader>" below are just confirming the expected configuration (with/without CLoader).

**************************************** MULTIPROCESSING ****************************************
PyYAML:
	Traceback (most recent call last):
	  File "<string>", line 1, in <module>
	ImportError: cannot import name CLoader
	Python 2.7.12
processed input files 55.1229741573 secs
mapped remarks 0.0160808563232 secs
file rendering 1.31177091599 sec
index rendering 0.514272928238 sec
total time 56.9831430912 secs
160.52user 1.44system 0:57.39elapsed 282%CPU (0avgtext+0avgdata 157940maxresident)k
0inputs+7880outputs (0major+204566minor)pagefaults 0swaps

PyYAML+libYAML:
	<class 'yaml.cyaml.CLoader'>
	Python 2.7.12
processed input files 7.43243384361 secs
mapped remarks 0.0163559913635 secs
file rendering 1.27356004715 sec
index rendering 0.503628969193 sec
total time 9.24395084381 secs
21.52user 1.39system 0:09.66elapsed 237%CPU (0avgtext+0avgdata 158652maxresident)k
0inputs+7880outputs (0major+207754minor)pagefaults 0swaps

PyPy/PyYAML:
	Traceback (most recent call last):
	  File "<module>", line 1, in <module>
	ImportError: cannot import name 'CLoader'
	Python 2.7.12 (aff251e54385, Nov 09 2016, 18:02:49)
	[PyPy 5.6.0 with GCC 4.8.2]
processed input files 17.7338540554 secs
mapped remarks 0.0163650512695 secs
file rendering 2.23697400093 sec
index rendering 0.571249961853 sec
total time 20.5806679726 secs
57.87user 1.95system 0:21.16elapsed 282%CPU (0avgtext+0avgdata 282260maxresident)k
0inputs+7328outputs (0major+286354minor)pagefaults 0swaps
**************************************** BASELINE        ****************************************
PyYAML:
	Traceback (most recent call last):
	  File "<string>", line 1, in <module>
	ImportError: cannot import name CLoader
	Python 2.7.12
94.07user 0.86system 1:35.82elapsed 99%CPU (0avgtext+0avgdata 112204maxresident)k
0inputs+7880outputs (0major+30906minor)pagefaults 0swaps

PyYAML+libYAML:
	<class 'yaml.cyaml.CLoader'>
	Python 2.7.12
11.52user 0.64system 0:12.57elapsed 96%CPU (0avgtext+0avgdata 109588maxresident)k
0inputs+7880outputs (0major+29965minor)pagefaults 0swaps

PyPy/PyYAML:
	Traceback (most recent call last):
	  File "<module>", line 1, in <module>
	ImportError: cannot import name 'CLoader'
	Python 2.7.12 (aff251e54385, Nov 09 2016, 18:02:49)
	[PyPy 5.6.0 with GCC 4.8.2]
18.86user 0.97system 0:20.52elapsed 96%CPU (0avgtext+0avgdata 209160maxresident)k
0inputs+7880outputs (0major+48344minor)pagefaults 0swaps
bcain marked 5 inline comments as done.Nov 21 2016, 6:19 PM

Corrected per review comments.

bcain updated this revision to Diff 78818.Nov 21 2016, 6:20 PM
bcain removed rL LLVM as the repository for this revision.

Moved 'os' module import per review comments.

bcain marked an inline comment as done.Nov 21 2016, 6:23 PM

Also note that some of the wrapping was changed. Florian suggested I try to maintain PEP8 so I just ran autopep8 and it wrapped some lines. If preferred I can revert these changes in order to simplify the diff.

bcain updated this object.Nov 21 2016, 6:26 PM
bcain set the repository for this revision to rL LLVM.Nov 21 2016, 7:34 PM

Brian,

I was probably not completely clear but you can't add llvm-commits to an existing review. I does not appear in llvm-commits properly. That is why I said that you need to create a new one with it as one of the original subscribers.

Thanks,
Adam

bcain abandoned this revision.Nov 22 2016, 5:33 AM

obsoleted by D26967