Download Raw Diff

Details

Reviewers

spatel
RKSimon
zvi
chandlerc
t.p.northover
bryant
efriedma

Commits

rGfe4e7d0c94ef: Update update_test_checks to work properly with phi nodes and other fun things.
rL291357: Update update_test_checks to work properly with phi nodes and other fun things.

Summary

Prior to this change, phi nodes were never considered defs, and so we ended up with undefined variables for any loop. Now, instead of trying to find just defs, we iterate over each actual IR value in the line, and replace them one by one with either a definition or a use. (This is n^2 due to python's mutable strings, but not a problem in practice so far).

We also don't try to match anything in the comment portions of the line.

I've tested it even on things like function pointer calls, etc, and against existing test cases uses update_test_checks
With this change, we are able to use update_tests on the cyclic cases in newgvn.

The only case i'm aware of that will misfire is if you have a string with which contains a valid token.
However, this is the same as it is now, with a slightly larger set of strings that may misfire.
Prior to this change, a test with the string " %a =" would be replaced.

Diff Detail

Build Status

Buildable 2690
Build 2690: arc lint + arc unit

Event Timeline

• dberlin updated this revision to Diff 83316.Jan 5 2017, 3:31 PM

• dberlin retitled this revision from to Update update_test_checks to work properly with phi nodes and other fun things..

• dberlin updated this object.

• dberlin added reviewers: spatel, chandlerc.

• dberlin added a subscriber: llvm-commits.

• dberlin added inline comments.Jan 5 2017, 3:32 PM

utils/update_test_checks.py
73	Note that the ugly \| is necessary because \Z (or $) does nothing in the character class :(

Thanks for working on this!

I made a note to myself that the script didn't work with loops, but I wasn't sure how to fix it. I'm highly unqualified to review because hacking together update_llc_test_checks.py with Tim's 'FileCheck for clang' script is my only experience with python...ever.

One thing I would like to note is that I was hoping to unify this script and update_llc_test_checks.py since they're probably still 95% the same thing, but maybe it's best to just specialize this script for 'opt' and let the 'llc' script evolve separately?

In D28384#637887, @spatel wrote:

One thing I would like to note is that I was hoping to unify this script and update_llc_test_checks.py since they're probably still 95% the same thing, but maybe it's best to just specialize this script for 'opt' and let the 'llc' script evolve separately?

I think we're better off enforcing the split between these now - if we're going to get better asm support in for different arch, update_llc_test_checks.py is going to diverge pretty quickly. Its not out of the question to share common code between them if we want to.

drive-by comment.

utils/update_test_checks.py
185	You're limiting the number of replaced uses to 1. Can a line have more than one use?

bryant added inline comments.Jan 6 2017, 9:26 AM

utils/update_test_checks.py

185

Just to clarify, what I meant is that you can do a lot better than manually replacing one at a time:

def genericize_check_lines(lines):
    def ssaify(match):
        v = match.group(1)
        if v in var_seen:
            rv = get_value_use(v)
        else:
            vars_seen.add('%' + v)
            rv = get_value_definition(v)
        return rv
    for line in lines:
        scrubbed_line = ...
        new_ine = IRVALUE_RE.sub(ssaify, scrubbed_line)
    ...

I suspect that underneath, _sre is building the replacement string incrementally, so it won't be quadratic.

bryant added inline comments.Jan 6 2017, 10:38 AM

utils/update_test_checks.py
185	Yes, but you want to replace one at a time anyway. because it could be a use of "bb", and the line could be phi [a, bb], [b, bb17], [c, bb]. If you replace all uses of bb, you'll mistakenly replace bb17. We can do better, but not in this way (see my other email, this will generate wrong answers) , and it's IMHO not worth it. You can avoid doing the replacement directly by using the match start and end, but if you do, we either have to mess with the loop quite a bit. Even if we stop using finditer (because we are replacing the string it is using as it goes, and it does not keep up to date) and use match directly, we have to offset where it starts looking for matches by the amount of text we've added by converting to var defs/uses, etc, or risk false matches. None of this is worth it compared to keeping it simple like I did, imho. 13:34:06 ~> cat faster.py import re xs = "PHI [a, bb], [b, bb17], [c, bb] " * 901 varz = {"bb": "nop", "bb17": "movzx"} myre = re.compile(r"[a-z0-9]+") vrz = {str(i) : str(i) + "a" for i in xrange(4000)} vrz["sd"] = "g" def pythonic(): rv = xs def f(match): return varz.get(match.group(0), match.group(0)) return myre.sub(f, rv) def manual(): rv = xs for match in myre.finditer(xs): rv = rv.replace(match.group(0), varz.get(match.group(0), match.group(0)), 1) return rv print pythonic()[:50] print pythonic() == manual() from timeit import timeit print timeit("pythonic()", "from __main__ import ", number=100) print timeit("manual()", "from __main__ import ", number=100) 13:35:55 ~> python faster.py PHI [a, nop], [b, movzx], [c, nop] PHI [a, nop], [ True <timeit-src>:2: SyntaxWarning: import * only allowed at module level 1.07926893234 <timeit-src>:2: SyntaxWarning: import * only allowed at module level 12.0660321712 https://docs.python.org/2/library/re.html#re.sub

Make this faster

The benchmark is invalid since it can't possibly work doing it that way:

There is no way to do it the way you want entirely without rewriting a lot more of this code (because we work on the scrubbed line, but can't output it), so we'd have to rewrite the function calling it

Also, this requires we capture more of the group. You are just taking group(1), which is ... not right, because it will delete all the spacing, the parens, the commas, etc, because sub expects a replacement for the entire match.
We then have to reconstitute a correct replacement by capturing all of the line.

So i think that boils down to "i can give you wrong answers much faster" :)

A closer test would also be to just replace only starting at the match point.
The reason replace is slower is mainly because it's starting the entire string.

If you look, something like

line[:match.start(1)] + line.replace(match.group(1), whatever) + line[match.end(1):] is 6-10 times faster.

But since you seem to really want the .sub approach, i've taken that.

bryant added inline comments.Jan 6 2017, 12:21 PM

utils/update_test_checks.py
73	How come you added more capture groups? Isn't this the syntax that you're expecting? FILE ::== LINE* LINE ::== LINE_ENT+ LINE_ENT ::== IR_VALUE \| INTERVENING_STUFF And the purpose of all this is to capture and rename IR_VALUEs, right?
163	Is there a reason for moving vars_seen to the top level scope? I don't think that `global` is needed. def generic_check(lines): vars_seen = set() def ssaify(match): v = match.group(1) if v in vars_seen: rv = get_value_use(v) else: vars_seen.add(v) # this works just fine. rv = get_value_definition(v) return the new match which is match.group(1) in the previous IR_VALUE_RE Global would only be needed if you're reassigning the variable vars_seen, whereas the update here is done with a method call.

When a callable is passed to sub, sub will pass the match the callable and expect the replacement string to be returned. Then sub will sub the match with the return value.

Also, this requires we capture more of the group. You are just taking group(1), which is ... not right, because it will delete all the spacing, the parens, the commas, etc, because sub expects a replacement for the entire match.
We then have to reconstitute a correct replacement by capturing all of the line.

Capture only that which needs to be replaced. Why do you need to capture the spacing, parens, commas, INTERVENING_STUFF? I think I'm missing something critically obvious.

My strong suggestion would be for you to try your idea out :)
If you tried your original ssaify code, you'd immediately notice it deletes spaces and parens :)

This is one of the reasons i said the original code was simpler.

In any case, let me try to expand on "sub expects to replace the entire match":

The entire match is group 0. Group 0 includes the match of the entire regexp, not just the capture groups (which start at 1).

Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> def foo(match):
...     return "12345"
... 
>>> matcher = re.compile("\s+(aaaaaa)\s+")
>>> matcher.sub(foo,"                    aaaaaa             ")
'12345'
>>>

If you want to use sub, you have to capture the other groups so you can put the entire match back together (or call replace on group 0, which is equivalent to what i was doing before, just more verbose)

I'm happy to go a little further, but at some point, this code is just not worth the time of using this script.
If you want to spend time to fix all of these issues, and have the energy to take this over and make it what you want, that would be awesome.
I'm just trying to make it work better than it did.

utils/update_test_checks.py
73	Because they are needed to make sub work, see below.
163	I'm aware you can do this, i just tend to find nested functions much uglier to follow in simple scripts like this one. Again, i don't care about this code enough, so if you wanted a nested function, i'll make one.

If you want to use sub, you have to capture the other groups so you can put the entire match back together (or call replace on group 0, which is equivalent to what i was doing before, just more verbose)

Sorry, you can also use non-capturing groups i believe (using ?:), but this makes the regexp even uglier

(I could also use backreferences in the replacement string, but i like being explicit)

In D28384#638258, @dberlin wrote:

If you want to use sub, you have to capture the other groups so you can put the entire match back together (or call replace on group 0, which is equivalent to what i was doing before, just more verbose)

Sorry, you can also use non-capturing groups i believe (using ?:), but this makes the regexp even uglier

(I could also use backreferences in the replacement string, but i like being explicit)

Unfortunately, it looks like sub has non-intuitive behavior here.
The non-capturing groups definitely cause the match group to change, but sub still replaces the entire regex:

In [10]: def foo(match):
    ...:     print(match.groups())
    ...:     return "12345"
    ...:

In [11]: matcher = re.compile('(?:\s+)(aaaaaa)(?:\s+)')
In [12]: matcher.sub(foo, "                aaaaaa            ")
('aaaaaa',)
Out[12]: '12345'

Use a nested function

OK, I'm still catching up here, but it seems like you are all converging on an implementation strategy you like?

I'm happy to make the REs and script faster ... *as long as it adds no complexity*. The nested function seems fine. But if we have to write more code in order to speed things up, that seems really bad, because this script should be run *very* infrequently. (I think this is what Danny was saying at one point, I'm mostly trying to echo it.)

I'd like to get some helpful comments to a future person (perhaps myself) about what all of this machinery does and how it works. The 'ssaify' nested function seems to definitely warrant some comments (and the non-nested-function variant would have too).

I believe this resolves all outstanding comments.

LGTM, with one nit.

Thank you for working on this.

utils/update_test_checks.py
177	for i, line in enumerate(lines): line = line.replace scrubbed_line = lines[i] = IR_VALUE_RE.sub(..., scrubbed_line)

This revision is now accepted and ready to land.Jan 7 2017, 10:41 AM

Closed by commit rL291357: Update update_test_checks to work properly with phi nodes and other fun things. (authored by dannyb). · Explain WhyJan 7 2017, 11:15 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D56185: [BDCE] Remove instructions without demanded bits.Jan 1 2019, 7:39 AM

Diff 83455

utils/update_test_checks.py

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	LLC_FUNCTION_RE = re.compile(
r'^\s(?:[^:\n]+?:\s\n\s*\.size\|\.cfi_endproc\|\.globl\|\.comm\|\.(?:sub)?section)',		r'^\s(?:[^:\n]+?:\s\n\s*\.size\|\.cfi_endproc\|\.globl\|\.comm\|\.(?:sub)?section)',
flags=(re.M \| re.S))		flags=(re.M \| re.S))
OPT_FUNCTION_RE = re.compile(		OPT_FUNCTION_RE = re.compile(
r'^\sdefine\s+(?:internal\s+)?[^@]@(?P<func>[\w-]+?)\s*\('		r'^\sdefine\s+(?:internal\s+)?[^@]@(?P<func>[\w-]+?)\s*\('
r'(\s+)?[^{]\{\n(?P<body>.?)^\}$',		r'(\s+)?[^{]\{\n(?P<body>.?)^\}$',
flags=(re.M \| re.S))		flags=(re.M \| re.S))
CHECK_PREFIX_RE = re.compile('--check-prefix=(\S+)')		CHECK_PREFIX_RE = re.compile('--check-prefix=(\S+)')
CHECK_RE = re.compile(r'^\s;\s([^:]+?)(?:-NEXT\|-NOT\|-DAG\|-LABEL)?:')		CHECK_RE = re.compile(r'^\s;\s([^:]+?)(?:-NEXT\|-NOT\|-DAG\|-LABEL)?:')
IR_VALUE_DEF_RE = re.compile(r'\s+%(.*) =')		# Match things that look at identifiers, but only if they are followed by
		# spaces, commas, paren, or end of the string
		IR_VALUE_RE = re.compile(r'(\s+)%(.+?)([,\s]\|\Z)')
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Note that the ugly \| is necessary because \Z (or $) does nothing in the character class :( dberlin: Note that the ugly \| is necessary because \Z (or $) does nothing in the character class :(
		bryantUnsubmitted Done Reply Inline Actions How come you added more capture groups? Isn't this the syntax that you're expecting? FILE ::== LINE* LINE ::== LINE_ENT+ LINE_ENT ::== IR_VALUE \| INTERVENING_STUFF And the purpose of all this is to capture and rename IR_VALUEs, right? bryant: How come you added more capture groups? Isn't this the syntax that you're expecting? FILE ::==…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions Because they are needed to make sub work, see below. dberlin: Because they are needed to make sub work, see below.


# Invoke the tool that is being tested.		# Invoke the tool that is being tested.
def invoke_tool(args, cmd_args, ir):		def invoke_tool(args, cmd_args, ir):
with open(ir) as ir_file:		with open(ir) as ir_file:
stdout = subprocess.check_output(args.tool_binary + ' ' + cmd_args,		stdout = subprocess.check_output(args.tool_binary + ' ' + cmd_args,
shell=True, stdin=ir_file)		shell=True, stdin=ir_file)
# Fix line endings to unix CR style.		# Fix line endings to unix CR style.
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
def get_value_definition(var):		def get_value_definition(var):
return '[[' + get_value_name(var) + ':%.*]]'		return '[[' + get_value_name(var) + ':%.*]]'


# Use a FileCheck variable.		# Use a FileCheck variable.
def get_value_use(var):		def get_value_use(var):
return '[[' + get_value_name(var) + ']]'		return '[[' + get_value_name(var) + ']]'


# Replace IR value defs and uses with FileCheck variables.		# Replace IR value defs and uses with FileCheck variables.
def genericize_check_lines(lines):		def genericize_check_lines(lines):
		def ssaify(match):
		bryantUnsubmitted Done Reply Inline Actions Is there a reason for moving vars_seen to the top level scope? I don't think that `global` is needed. def generic_check(lines): vars_seen = set() def ssaify(match): v = match.group(1) if v in vars_seen: rv = get_value_use(v) else: vars_seen.add(v) # this works just fine. rv = get_value_definition(v) return the new match which is match.group(1) in the previous IR_VALUE_RE Global would only be needed if you're reassigning the variable vars_seen, whereas the update here is done with a method call. bryant: Is there a reason for moving vars_seen to the top level scope? I don't think that `global` is…
		dberlinAuthorUnsubmitted Not Done Reply Inline Actions I'm aware you can do this, i just tend to find nested functions much uglier to follow in simple scripts like this one. Again, i don't care about this code enough, so if you wanted a nested function, i'll make one. dberlin: I'm aware you can do this, i just tend to find nested functions much uglier to follow in simple…
		v = match.group(2)
		if v in vars_seen:
		rv = get_value_use(v)
		else:
		vars_seen.add(v)
		rv = get_value_definition(v)
		return match.group(1) + rv + match.group(3)

		vars_seen = set()
lines_with_def = []		lines_with_def = []
vars_seen = []
for line in lines:		for line in lines:
# An IR variable named '%.' matches the FileCheck regex string.		# An IR variable named '%.' matches the FileCheck regex string.
line = line.replace('%.', '%dot')		line = line.replace('%.', '%dot')
		bryantUnsubmitted Not Done Reply Inline Actions for i, line in enumerate(lines): line = line.replace scrubbed_line = lines[i] = IR_VALUE_RE.sub(..., scrubbed_line) bryant: for i, line in enumerate(lines): line = line.replace scrubbed_line = lines[i] =…
m = IR_VALUE_DEF_RE.match(line)		# Ignore any comments, since the check lines will too. We don't
if m:		# delete them however.
vars_seen.append(m.group(1))		scrubbed_line = SCRUB_IR_COMMENT_RE.sub(r'', line)
line = line.replace('%' + m.group(1), get_value_definition(m.group(1)))		new_scrubbed_line = IR_VALUE_RE.sub(ssaify, scrubbed_line)
		line = line.replace(scrubbed_line, new_scrubbed_line)
lines_with_def.append(line)		lines_with_def.append(line)

# A single def isn't worth replacing?		return lines_with_def
		bryantUnsubmitted Not Done Reply Inline Actions You're limiting the number of replaced uses to 1. Can a line have more than one use? bryant: You're limiting the number of replaced uses to 1. Can a line have more than one use?
		bryantUnsubmitted Not Done Reply Inline Actions Just to clarify, what I meant is that you can do a lot better than manually replacing one at a time: def genericize_check_lines(lines): def ssaify(match): v = match.group(1) if v in var_seen: rv = get_value_use(v) else: vars_seen.add('%' + v) rv = get_value_definition(v) return rv for line in lines: scrubbed_line = ... new_ine = IRVALUE_RE.sub(ssaify, scrubbed_line) ... I suspect that underneath, `_sre` is building the replacement string incrementally, so it won't be quadratic. bryant: Just to clarify, what I meant is that you can do a lot better than manually replacing one at a…
		bryantUnsubmitted Not Done Reply Inline Actions Yes, but you want to replace one at a time anyway. because it could be a use of "bb", and the line could be phi [a, bb], [b, bb17], [c, bb]. If you replace all uses of bb, you'll mistakenly replace bb17. We can do better, but not in this way (see my other email, this will generate wrong answers) , and it's IMHO not worth it. You can avoid doing the replacement directly by using the match start and end, but if you do, we either have to mess with the loop quite a bit. Even if we stop using finditer (because we are replacing the string it is using as it goes, and it does not keep up to date) and use match directly, we have to offset where it starts looking for matches by the amount of text we've added by converting to var defs/uses, etc, or risk false matches. None of this is worth it compared to keeping it simple like I did, imho. 13:34:06 ~> cat faster.py import re xs = "PHI [a, bb], [b, bb17], [c, bb] " * 901 varz = {"bb": "nop", "bb17": "movzx"} myre = re.compile(r"[a-z0-9]+") vrz = {str(i) : str(i) + "a" for i in xrange(4000)} vrz["sd"] = "g" def pythonic(): rv = xs def f(match): return varz.get(match.group(0), match.group(0)) return myre.sub(f, rv) def manual(): rv = xs for match in myre.finditer(xs): rv = rv.replace(match.group(0), varz.get(match.group(0), match.group(0)), 1) return rv print pythonic()[:50] print pythonic() == manual() from timeit import timeit print timeit("pythonic()", "from __main__ import ", number=100) print timeit("manual()", "from __main__ import ", number=100) 13:35:55 ~> python faster.py PHI [a, nop], [b, movzx], [c, nop] PHI [a, nop], [ True <timeit-src>:2: SyntaxWarning: import * only allowed at module level 1.07926893234 <timeit-src>:2: SyntaxWarning: import * only allowed at module level 12.0660321712 https://docs.python.org/2/library/re.html#re.sub bryant: > Yes, but you want to replace one at a time anyway. > > > because it could be a use of "bb"…
#if len(vars_seen) < 2:
# return lines

output_lines = []
vars_seen.sort(key=len, reverse=True)
for line in lines_with_def:
for var in vars_seen:
line = line.replace('%' + var, get_value_use(var))
output_lines.append(line)

return output_lines


def add_checks(output_lines, prefix_list, func_dict, func_name, tool_basename):		def add_checks(output_lines, prefix_list, func_dict, func_name, tool_basename):
# Select a label format based on the whether we're checking asm or IR.		# Select a label format based on the whether we're checking asm or IR.
if tool_basename == "llc":		if tool_basename == "llc":
check_label_format = "; %s-LABEL: %s:"		check_label_format = "; %s-LABEL: %s:"
else:		else:
check_label_format = "; %s-LABEL: @%s("		check_label_format = "; %s-LABEL: @%s("
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Update update_test_checks to work properly with phi nodes and other fun things.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83455

utils/update_test_checks.py

This is an archive of the discontinued LLVM Phabricator instance.

Update update_test_checks to work properly with phi nodes and other fun things.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83455

utils/update_test_checks.py

Update update_test_checks to work properly with phi nodes and other fun things.
ClosedPublic