This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
2/2
CMakeLists.txt
12/12
CompletionModel.cmake
16/18
CompletionModelCodegen.py
-
for-review-only/
1/1
CompletionModel.h
1/1
CompletionModel.cpp
-
DecisionForestRuntimeTest.h
-
DecisionForestRuntimeTest.cpp
-
model/
-
features.json
-
forest.json
-
unittests/
-
CMakeLists.txt
-
CodeCompleteTests.cpp
1/1
DecisionForestTests.cpp
-
model/
2/2
CategoricalFeature.h
-
features.json
-
forest.json

Differential D83814

[clangd] Add Random Forest runtime for code completion.
ClosedPublic

Authored by usaxena95 on Jul 14 2020, 2:32 PM.

Download Raw Diff

Details

Reviewers

sammccall
adamcz

Commits

rG9b6765e784b3: [clangd] Add Random Forest runtime for code completion.

Summary

We intend to replace heuristics based code completion ranking with a Decision Forest Model.

This patch introduces a format for representing the model and an inference runtime that is code-generated at build time.

Forest.json contains all the trees as an array of trees.
Features.json describes the features to be used.
Codegen file takes the above two files and generates CompletionModel containing Feature struct and corresponding Evaluate function. The Evaluate function maps a feature to a real number describing the relevance of this candidate.
The codegen is part of build system and these files are generated at build time.
Proposes a way to test the generated runtime using a test model.
- Replicates the model structure in unittests.
- unittest tests both the test model (for correct tree traversal) and the real model (for sanity).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

usaxena95 created this revision.Jul 14 2020, 2:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2020, 2:32 PM

Herald added subscribers: cfe-commits, kadircet, arphaman and 4 others. · View Herald Transcript

Harbormaster failed remote builds in B64229: Diff 277978!Jul 14 2020, 2:33 PM

usaxena95 edited the summary of this revision. (Show Details)Jul 15 2020, 1:58 AM

usaxena95 edited the summary of this revision. (Show Details)Jul 15 2020, 9:09 AM

usaxena95 edited the summary of this revision. (Show Details)

nridge added a subscriber: nridge.Jul 21 2020, 4:24 PM

Is there some additional context for what this patch is aiming to do? The description sounds interesting but I don't really understand it.

What is a "feature" in this context?

The features refers to the code completion signals in https://github.com/llvm/llvm-project/blob/master/clang-tools-extra/clangd/Quality.h
These signals are currently used to map the code completion candidates to a relevance score using hand-coded heuristics.
We intend to replace the heuristics with a Decision forest model. This patch introduces a dummy model and corresponding runtime that will be used to inference this model.

This is still WIP and I will provide more details in the description once this is finalized.

Addressed offline comments.

Harbormaster failed remote builds in B65534: Diff 280406!Jul 24 2020, 4:57 AM

usaxena95 edited the summary of this revision. (Show Details)Jul 24 2020, 7:11 AM

usaxena95 added a reviewer: sammccall.

Better formatting in generated files.

Harbormaster failed remote builds in B65559: Diff 280455!Jul 24 2020, 7:13 AM

usaxena95 edited the summary of this revision. (Show Details)Jul 24 2020, 7:23 AM

In D83814#2166349, @usaxena95 wrote:

This is still WIP and I will provide more details in the description once this is finalized.

This really should have high level documentation - I don't think Nathan will be the only one to ask these questions.
We need a description of what problem this solves, what the concepts are (trees, features, scores), inference, training, data sources, generated API.

Given that the implementation is necessarily split across CMake, generated code, generator, and data (but not a lot of hand-written C++) I think it's probably best documented in a README.md or so in the model/ dir.

Seems fine to do that in this patch, or ahead of time (before the implementation)... I wouldn't wait to do it after, it'll help with the review.

One question we haven't discussed is how workspace/symbol should work. (This uses the hand-tuned score function with some different logic). This is relevant to naming: it's not the "completion model" if it also has other features.

clang-tools-extra/clangd/CMakeLists.txt
30	if you want the compiler script to be a parameter, make it an argument to the function rather than a magic variable. But really, I think the CompletionModel.cmake is tightly coupled to the python script, I think it can be hardcoded there.
clang-tools-extra/clangd/CompletionModel.cmake
2	I think there's some confusion in the naming. You've got the code split into two locations: the generic generator and the specific code completion model. However the directory named just "model" contains the specific stuff, and the generic parts are named "completionmodel!". I'd suggest either: don't generalize, and put everything in clangd/quality or so split into clangd/quality/ and clangd/forest/ for the specific/generic parts
6	what does the class do?
8	df is cryptic. decision_forest or gen_decision_forest?
18	I'd suggest passing the component filenames explicitly here since you're computing them anyway

sammccall added inline comments.Jul 24 2020, 2:15 PM

clang-tools-extra/clangd/CompletionModel.cmake
20	fname -> filename
30	this needs to be guarded based on the compiler - other compilers use different flags I'd suggest just -Wno-usuned
32	It'd be nice to avoid passing data out by setting magic variables with generic names. The caller is already passing in the filename they want, so they know what it is.
clang-tools-extra/clangd/CompletionModelCodegen.py
2	this needs documentation throughout
10	Hmm, do these classes really pay for themselves compared to just using the JSON-decoded data structures directly and writing functions? e.g. def setter(feature): if (feature['kind'] == KIND_CATEGORICAL) return "void Set{feature}(float V) {{ {feature} = OrderEncode(V); }}".format(feature['name']) ...
11	These labels seem a bit abstract, what do you think about "number"/"enum"?
22	Assert failures print the expression, so no need to say "header not found" etc. And in fact the next line will throw a reasonable exception in this case... could drop these
29	nit: `set{Feature}`, following LLVM style
34	raise instead? I'm not sure we want this error handling to be turned off... assert is for programming errors (applies to the rest of the usage of assert too)
46	i think this is `set(f.header for f in features.values() if f.type == Feature.Type.CATEGORICAL)` with no need for lambda but my python is rusty
53	why not assert this on self.ns so that "::Foo" will work fine?
61	join here rather than returning an array?
68	gaurd -> guard
71	`''.join('_' if x in '-' else x.upper() for x in self.fname)` ?
75	(again, not clear the classes pay for themselves in terms of complexity. We do a lot of parsing and validation, but it's not clear it's important)
104	would be nice to structure to reduce the indentation inside here. Also 'codegen' is a pretty generic name for this particular piece. I'd consider def node(n): return { 'boost': boost_node, 'if_greater': if_greater_node, 'if_member': if_member_node, }[n['operation']](n) and then define each case separately. (That snippet does actually check errors!)
130	rather than passing features around to get access to the enum type, what about adding `using {feature}_type = ...` to the top of the generated file, and using that here?
283	putting all the trees in a single huge json file seems like it makes them hard to inspect, did you consider one-per-file?
clang-tools-extra/clangd/for-review-only/CompletionModel.cpp
29	nit: consistently either: tree0, tree0_node1 tree_0, tree_0_node_1 t0, t0_n1 etc
clang-tools-extra/clangd/for-review-only/CompletionModel.h
11	this shouldn't be part of the public interface. Can we make it private static in the model class?

usaxena95 added a reviewer: adamcz.Sep 8 2020, 5:01 AM

Hi @usaxena95 and @sammccall,

I am wondering about couple high-level things.

Do you guys intend to open-source also the training part of the model pipeline or publish a model trained on generic-enough training set so it could be reasonably used on "any" codebase?

Do you still intend to support the heuristic that is currently powering clangd in the future?

Thanks!

Addressed comments.

Harbormaster completed remote builds in B71261: Diff 291024.Sep 10 2020, 10:31 AM

usaxena95 added inline comments.Sep 10 2020, 10:37 AM

clang-tools-extra/clangd/CMakeLists.txt
30	Hardcoded it in .cmake file.
clang-tools-extra/clangd/CompletionModel.cmake
2	Moved .cmake, codegen and model in quality dir.
6	The class specifies the name and scope of the Feature class. `clang::clangd::Example` in this case.
18	This allows the generated cc file to include the header using "filename.h". If we give the filepath as input, we would have to strip out the filename from it. Although I like the current notion of being explicit that the output_dir contains the two files. We need to add output_dir to include path to use this library.
30	only MSVC needed a different flag. `-Wno-unused` works with Clang and GCC. https://godbolt.org/z/Gvdne7
32	We can avoid `GENERATED_CC`. But I still wanted to keep the output directory as a detail in this function itself and not as an input parameter. Changed the name to more specific name `DECISION_FOREST_OUTPUT_DIR`.
clang-tools-extra/clangd/CompletionModelCodegen.py
10	Removed the Feature class and Tree. CppClass calculates and holds the namespaces which I felt convenient.
53	Allowed fully qualified names of classes.
71	Sorry for making this complicated. filename was assumed to be in PascalCase (and not contain '-' at all). I wanted to convert it to UPPER_SNAKE_CASE. To avoid unnecessary complexity, lets simply convert it to upper case.

Hi @jkorous

Do you guys intend to open-source also the training part of the model pipeline ?

Open sourcing the training part (both dataset generation and using an open sourced DecisionForest based framework for training) has been on our radar. Although gathering capacity for this task has been difficult lately.

Publish a model trained on generic-enough training set so it could be reasonably used on "any" codebase?

Although the current model has not been trained on a generic codebase, but since the features involved doesn't capture code style/conventions/variable names, it is likely that it performs well on generic code bases as well. This remains to be tested.

Do you still intend to support the heuristic that is currently powering clangd in the future?

Currently we are planning to use this model behind a flag. Initially we would be focusing on comparing the two. Since maintaining and developing signals is easier for an ML model, we might end up deprecating the heuristics.

Thanks,
Utkarsh.

Added README.md for the code completion model.

Harbormaster completed remote builds in B71363: Diff 291205.Sep 11 2020, 7:26 AM

adamcz added inline comments.Sep 14 2020, 8:05 AM

clang-tools-extra/clangd/unittests/DecisionForestTests.cpp
6	This is supposed to be "namespace clang", right?

Fixed namespace.

usaxena95 marked an inline comment as done.Sep 15 2020, 12:16 AM

Harbormaster completed remote builds in B71690: Diff 291813.Sep 15 2020, 12:30 AM

Fixed namespace ending.

Harbormaster completed remote builds in B71734: Diff 291909.Sep 15 2020, 7:38 AM

Looks good to me overall, some minor style comments included ;-)

Do we expect this to be a generic model code generator that gets reused for other things? If not, maybe we can hardcode more (like the namespace, class name, etc), but if you think there's other use cases for this then this LGTM.

clang-tools-extra/clangd/quality/CompletionModelCodegen.py
39 ↗	(On Diff #291813)	Why GENERATED_DECISON_FOREST_MODEL instead of output_dir, to be consistent with header guards for other files? Doesn't matter much for generated code, but if someone opens this in vim they'll see warnings.
57 ↗	(On Diff #291813)	nit: add space after if for readability (also below)
93 ↗	(On Diff #291813)	Please extend the comment to mention the second return value (size of the tree)
105 ↗	(On Diff #291813)	This is a good place to use an Python's f-string. Also in few places below.
126 ↗	(On Diff #291813)	style nit: be consistent with spaces around +
177 ↗	(On Diff #291813)	Is there a reason to make this a friend free-function instead of static method on the Example class? The problem is that you now end up with clang::clangd::Evaluate, so if we every re-use this code gen for another model we'll have a name collision.
271 ↗	(On Diff #291813)	nit: be consistent about putting a "." at the end of the help text or not.
clang-tools-extra/clangd/unittests/model/CategoricalFeature.h
2	Can we rename this directory? quality/model makes some sense (although it would be better to include something about code completion there), but unittests/model is not very descriptive - what model? How about unittests/decision_forest_model/ or something like that? Or go with the Input/TEST_NAME pattern.

Addressed comments.

clang-tools-extra/clangd/quality/CompletionModelCodegen.py
39 ↗	(On Diff #291813)	The output_dir is the absolute path and not a project relative path. I tried to stick with a special prefix for header guard as done in other Generated headers (e.g. protos) If someone opens this in vim, there would many other warnings that they would see like "unused_label" ;) I don't think that would be a concern since it would be opened for inspection and not for editing.
177 ↗	(On Diff #291813)	The class name ("Example" currently) would be different for a different model and therefore there would be another overload for `Evaluate(const AnotherClass&)` even if the namespaces are same (`clang::clangd`).
clang-tools-extra/clangd/unittests/model/CategoricalFeature.h
2	You are right "quality" wasn't indicative of code completion here but we decided to be consistent with the current naming. The current heuristics for the ranking are in Quality.h and Quality.cpp ;-) changed the dir name in unittests.

Harbormaster completed remote builds in B71866: Diff 292188.Sep 16 2020, 5:13 AM

Just some naming and doc nits. This looks really solid now, nice job!

In D83814#2261458, @jkorous wrote:

Hi @usaxena95 and @sammccall,

I am wondering about couple high-level things.

Do you guys intend to open-source also the training part of the model pipeline or publish a model trained on generic-enough training set so it could be reasonably used on "any" codebase?

@adamcz and I were talking about this too... I think it's important we do as much of this as possible. I was the one not finding time to do it though, and I think Adam may do better :-)

the existing training stuff is using internal tech, but AFAIK it's nothing LightGBM can't do (it trains on a single machine). So we should be able to open-source the training setup and actually use that.
training data generation is harder to open, because it involves sampling a large diverse body of code and parsing lots of it. The core code that embeds clangd and extracts completion candidates should be very doable, so one could run over LLVM on one machine. The framework to run at a larger scale is coupled to internal infra though, and we're currently training on both public and non-public code.

clang-tools-extra/clangd/quality/CompletionModel.cmake
10 ↗	(On Diff #292188)	these vars are used only once, I'd suggest inlining them for readability
13 ↗	(On Diff #292188)	/generated/decision_forest seems redundant considering ${CMAKE_BINARY_DIR} is already the generated-files tree for the directory of the calling CMakeLists. Can't we just use ${CMAKE_BINARY_DIR} directly and avoid the DECISION_FOREST_OUTPUT_DIR variable?
clang-tools-extra/clangd/quality/CompletionModelCodegen.py
150 ↗	(On Diff #292188)	you can use triple-quoted f-strings, which I think would be more readable than blocks of "code +=" code += f"""class {cpp_class.name} {{ public: {"\n ".join(setters)} private: {"\n ".join(class_members)} """ etc may even do the whole thing in one go.
217 ↗	(On Diff #292188)	again, making this one big triple-quoted f-string may be nicer, up to you
clang-tools-extra/clangd/quality/README.md
4 ↗	(On Diff #292188)	The second half of this sentence simply restates the first. Maybe we can combine this with the second paragraph: "A decision tree is a full binary tree that provides a quality prediction for an input (code completion item). Internal nodes represent a binary decision based on the input data, and leaf nodes represent a prediction."
8 ↗	(On Diff #292188)	Nit: I think it's worth separating out defining features vs conditions. e.g. "An input (code completion candidate) is characterized as a set of features, such as the type of symbol or the number of existing references At every non-leaf node, we evaluate the condition to decide whether to go left or right. The condition compares one feature of the input against a constant. It is either: ...".
16 ↗	(On Diff #292188)	nit: rather than alternating between describing traversing all trees and one tree, I'd just say "To compute an overall quality score, we traverse each tree in this way and add up the scores".
26 ↗	(On Diff #292188)	This is a little prone to confusion with C++ type. Consider "kind" instead?
34 ↗	(On Diff #292188)	this might be "type"?
38 ↗	(On Diff #292188)	The max numeric value may not exceed 32 (is that right?)
76 ↗	(On Diff #292188)	nit: order should match order of the actual paragraphs. This is short enough though that you might not want the mini-TOC here.
98 ↗	(On Diff #292188)	This seems like it might be a pain to maintain. Maybe just include the json files and the public interface from DecicionForestRuntime.h?
98 ↗	(On Diff #292188)	may want to add the cmake invocation to generate the files.

Addressed comments.

usaxena95 marked an inline comment as done.Sep 18 2020, 12:52 AM

usaxena95 added inline comments.

clang-tools-extra/clangd/quality/CompletionModel.cmake
13 ↗	(On Diff #292188)	Changed `CMAKE_BINARY_DIR` to `CMAKE_CURRENT_BINARY_DIR` and removed /generated/decision_forest to avoid the `DECISION_FOREST_OUTPUT_DIR` variable.

Harbormaster completed remote builds in B72140: Diff 292717.Sep 18 2020, 12:54 AM

LG from my side!

This revision is now accepted and ready to land.Sep 18 2020, 1:06 AM

adamcz accepted this revision.Sep 18 2020, 6:27 AM

Removed generated (for review) files.

Harbormaster completed remote builds in B72188: Diff 292818.Sep 18 2020, 9:16 AM

Fixed output_dir cmake variable. Clean build succeeds now.
Ready to land.

Harbormaster completed remote builds in B72196: Diff 292828.Sep 18 2020, 10:15 AM

This revision was landed with ongoing or failed builds.Sep 18 2020, 10:27 AM

Closed by commit rG9b6765e784b3: [clangd] Add Random Forest runtime for code completion. (authored by usaxena95). · Explain Why

This revision was automatically updated to reflect the committed changes.

usaxena95 added a commit: rG9b6765e784b3: [clangd] Add Random Forest runtime for code completion..

echristo added a reverting change: rG549e55b3d563: Temporarily Revert "[clangd] Add Random Forest runtime for code completion.".Sep 18 2020, 2:50 PM

usaxena95 edited the summary of this revision. (Show Details)Sep 19 2020, 12:37 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

CMakeLists.txt

9 lines

CompletionModel.cmake

31 lines

CompletionModelCodegen.py

295 lines

for-review-only/

CompletionModel.h

23 lines

CompletionModel.cpp

37 lines

DecisionForestRuntimeTest.h

29 lines

DecisionForestRuntimeTest.cpp

48 lines

model/

features.json

8 lines

forest.json

18 lines

unittests/

CMakeLists.txt

10 lines

CodeCompleteTests.cpp

12 lines

DecisionForestTests.cpp

30 lines

model/

CategoricalFeature.h

5 lines

features.json

16 lines

forest.json

52 lines

Diff 280406

clang-tools-extra/clangd/CMakeLists.txt

Show All 20 Lines	configure_file(
${CMAKE_CURRENT_BINARY_DIR}/Features.inc		${CMAKE_CURRENT_BINARY_DIR}/Features.inc
)		)

set(LLVM_LINK_COMPONENTS		set(LLVM_LINK_COMPONENTS
Support		Support
AllTargetsInfos		AllTargetsInfos
FrontendOpenMP		FrontendOpenMP
)		)

		set(DF_COMPILER ${CMAKE_CURRENT_SOURCE_DIR}/CompletionModelCodegen.py)
		sammccallUnsubmitted Done Reply Inline Actions if you want the compiler script to be a parameter, make it an argument to the function rather than a magic variable. But really, I think the CompletionModel.cmake is tightly coupled to the python script, I think it can be hardcoded there. sammccall: if you want the compiler script to be a parameter, make it an argument to the function rather…
		usaxena95AuthorUnsubmitted Done Reply Inline Actions Hardcoded it in .cmake file. usaxena95: Hardcoded it in .cmake file.
		include(${CMAKE_CURRENT_SOURCE_DIR}/CompletionModel.cmake)
		df_compile(${CMAKE_CURRENT_SOURCE_DIR}/model CompletionModel clang::clangd::Example)

add_clang_library(clangDaemon		add_clang_library(clangDaemon
AST.cpp		AST.cpp
ClangdLSPServer.cpp		ClangdLSPServer.cpp
ClangdServer.cpp		ClangdServer.cpp
CodeComplete.cpp		CodeComplete.cpp
CodeCompletionStrings.cpp		CodeCompletionStrings.cpp
CollectMacros.cpp		CollectMacros.cpp
CompileCommands.cpp		CompileCommands.cpp
Show All 26 Lines	add_clang_library(clangDaemon
Selection.cpp		Selection.cpp
SemanticHighlighting.cpp		SemanticHighlighting.cpp
SemanticSelection.cpp		SemanticSelection.cpp
SourceCode.cpp		SourceCode.cpp
QueryDriverDatabase.cpp		QueryDriverDatabase.cpp
TUScheduler.cpp		TUScheduler.cpp
URI.cpp		URI.cpp
XRefs.cpp		XRefs.cpp
		${GENERATED_CC}

index/Background.cpp		index/Background.cpp
index/BackgroundIndexLoader.cpp		index/BackgroundIndexLoader.cpp
index/BackgroundIndexStorage.cpp		index/BackgroundIndexStorage.cpp
index/BackgroundQueue.cpp		index/BackgroundQueue.cpp
index/BackgroundRebuild.cpp		index/BackgroundRebuild.cpp
index/CanonicalIncludes.cpp		index/CanonicalIncludes.cpp
index/FileIndex.cpp		index/FileIndex.cpp
Show All 24 Lines	add_clang_library(clangDaemon
clangTidy		clangTidy
${LLVM_PTHREAD_LIB}		${LLVM_PTHREAD_LIB}
${ALL_CLANG_TIDY_CHECKS}		${ALL_CLANG_TIDY_CHECKS}

DEPENDS		DEPENDS
omp_gen		omp_gen
)		)

		target_include_directories(clangDaemon PUBLIC
		$<BUILD_INTERFACE:${DF_INCLUDE}>
		)

clang_target_link_libraries(clangDaemon		clang_target_link_libraries(clangDaemon
PRIVATE		PRIVATE
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangBasic		clangBasic
clangDriver		clangDriver
clangFormat		clangFormat
clangFrontend		clangFrontend
Show All 40 Lines

clang-tools-extra/clangd/CompletionModel.cmake

This file was added.

				# Run the Completion Model Codegenerator on the model in the
				# ${model} directory.
				sammccallUnsubmitted Done Reply Inline Actions I think there's some confusion in the naming. You've got the code split into two locations: the generic generator and the specific code completion model. However the directory named just "model" contains the specific stuff, and the generic parts are named "completionmodel!". I'd suggest either: don't generalize, and put everything in clangd/quality or so split into clangd/quality/ and clangd/forest/ for the specific/generic parts sammccall: I think there's some confusion in the naming. You've got the code split into two locations…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions Moved .cmake, codegen and model in quality dir. usaxena95: Moved .cmake, codegen and model in quality dir.
				# Produces a pair of files called ${fname}.h and ${fname}.cc in the
				# ${CMAKE_BINARY_DIR}/generated. The generated header will define a C++ class
				# called ${cpp_class} - which may be a namespace-qualified class name.
				function(df_compile model fname cpp_class)
				sammccallUnsubmitted Done Reply Inline Actions what does the class do? sammccall: what does the class do?
				usaxena95AuthorUnsubmitted Done Reply Inline Actions The class specifies the name and scope of the Feature class. `clang::clangd::Example` in this case. usaxena95: The class specifies the name and scope of the Feature class. `clang::clangd::Example` in this…
				set(model_json ${model}/forest.json)
				set(model_features ${model}/features.json)
				sammccallUnsubmitted Done Reply Inline Actions df is cryptic. decision_forest or gen_decision_forest? sammccall: df is cryptic. decision_forest or gen_decision_forest?

				set(output_dir ${CMAKE_BINARY_DIR}/generated/decision_forest)
				set(df_header ${output_dir}/${fname}.h)
				set(df_cpp ${output_dir}/${fname}.cpp)

				add_custom_command(OUTPUT ${df_header} ${df_cpp}
				COMMAND "${Python3_EXECUTABLE}" ${DF_COMPILER}
				--model ${model}
				--output_dir ${output_dir}
				--fname ${fname}
				sammccallUnsubmitted Done Reply Inline Actions I'd suggest passing the component filenames explicitly here since you're computing them anyway sammccall: I'd suggest passing the component filenames explicitly here since you're computing them anyway
				usaxena95AuthorUnsubmitted Done Reply Inline Actions This allows the generated cc file to include the header using "filename.h". If we give the filepath as input, we would have to strip out the filename from it. Although I like the current notion of being explicit that the output_dir contains the two files. We need to add output_dir to include path to use this library. usaxena95: This allows the generated cc file to include the header using "filename.h". If we give the…
				--cpp_class ${cpp_class}
				COMMENT "Generating code completion model runtime..."
				sammccallUnsubmitted Done Reply Inline Actions fname -> filename sammccall: fname -> filename
				DEPENDS ${DF_COMPILER} ${model_json} ${model_features}
				VERBATIM )

				set_source_files_properties(${df_header} PROPERTIES
				GENERATED 1)
				set_source_files_properties(${df_cpp} PROPERTIES
				GENERATED 1
				COMPILE_FLAGS -Wno-unused-label)
				set(GENERATED_CC ${df_cpp} PARENT_SCOPE)
				set(DF_INCLUDE ${output_dir} PARENT_SCOPE)
				sammccallUnsubmitted Done Reply Inline Actions this needs to be guarded based on the compiler - other compilers use different flags I'd suggest just -Wno-usuned sammccall: this needs to be guarded based on the compiler - other compilers use different flags I'd…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions only MSVC needed a different flag. `-Wno-unused` works with Clang and GCC. https://godbolt.org/z/Gvdne7 usaxena95: only MSVC needed a different flag. `-Wno-unused` works with Clang and GCC. https://godbolt.
				endfunction()
				No newline at end of file
				sammccallUnsubmitted Done Reply Inline Actions It'd be nice to avoid passing data out by setting magic variables with generic names. The caller is already passing in the filename they want, so they know what it is. sammccall: It'd be nice to avoid passing data out by setting magic variables with generic names. The…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions We can avoid `GENERATED_CC`. But I still wanted to keep the output directory as a detail in this function itself and not as an input parameter. Changed the name to more specific name `DECISION_FOREST_OUTPUT_DIR`. usaxena95: We can avoid `GENERATED_CC`. But I still wanted to keep the output directory as a detail in…

clang-tools-extra/clangd/CompletionModelCodegen.py

This file was added.

				import argparse
				import json
				sammccallUnsubmitted Not Done Reply Inline Actions this needs documentation throughout sammccall: this needs documentation throughout
				import struct
				from enum import Enum
				from dataclasses import dataclass
				from functools import reduce


				class Feature:
				class Type(Enum):
				sammccallUnsubmitted Done Reply Inline Actions Hmm, do these classes really pay for themselves compared to just using the JSON-decoded data structures directly and writing functions? e.g. def setter(feature): if (feature['kind'] == KIND_CATEGORICAL) return "void Set{feature}(float V) {{ {feature} = OrderEncode(V); }}".format(feature['name']) ... sammccall: Hmm, do these classes really pay for themselves compared to just using the JSON-decoded data…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions Removed the Feature class and Tree. CppClass calculates and holds the namespaces which I felt convenient. usaxena95: Removed the Feature class and Tree. CppClass calculates and holds the namespaces which I felt…
				NUMERICAL = 1
				sammccallUnsubmitted Done Reply Inline Actions These labels seem a bit abstract, what do you think about "number"/"enum"? sammccall: These labels seem a bit abstract, what do you think about "number"/"enum"?
				CATEGORICAL = 2

				def __init__(self, feature_json):
				self.name = feature_json['name']
				assert feature_json['type'] in [t.name for t in Feature.Type
				], "Unknown feature type."
				self.type = Feature.Type[feature_json['type']]

				if self.type == Feature.Type.CATEGORICAL:
				assert 'header' in feature_json, "Header not found in categorical feature."
				assert 'enum' in feature_json, "Enum not found in categorical feature."
				sammccallUnsubmitted Done Reply Inline Actions Assert failures print the expression, so no need to say "header not found" etc. And in fact the next line will throw a reasonable exception in this case... could drop these sammccall: Assert failures print the expression, so no need to say "header not found" etc. And in fact the…
				self.header = feature_json['header']
				self.enum = feature_json['enum']

				def setter(self):
				if self.type == Feature.Type.NUMERICAL:
				return "void Set{feature}(float V) {{ {feature} = OrderEncode(V); }}".format(
				feature=self.name)
				sammccallUnsubmitted Done Reply Inline Actions nit: `set{Feature}`, following LLVM style sammccall: nit: `set{Feature}`, following LLVM style
				if self.type == Feature.Type.CATEGORICAL:
				return "void Set{feature}(unsigned V) {{ {feature} = 1<<V; }}".format(
				feature=self.name)
				assert False, "Unhandled feature type."

				sammccallUnsubmitted Done Reply Inline Actions raise instead? I'm not sure we want this error handling to be turned off... assert is for programming errors (applies to the rest of the usage of assert too) sammccall: raise instead? I'm not sure we want this error handling to be turned off... assert is for…
				def member(self):
				return "uint32_t {feature} = 0;".format(feature=self.name)


				def ReadFeatures(features_json: list):
				return {f['name']: Feature(f) for f in features_json}


				def CollectHeaders(features: dict):
				return list(
				set(f.header for f in filter(
				lambda f: f.type == Feature.Type.CATEGORICAL, features.values())))
				sammccallUnsubmitted Done Reply Inline Actions i think this is `set(f.header for f in features.values() if f.type == Feature.Type.CATEGORICAL)` with no need for lambda but my python is rusty sammccall: i think this is `set(f.header for f in features.values() if f.type == Feature.Type.


				class CppClass:
				def __init__(self, fname, cpp_class):
				self.fname = fname
				assert not cpp_class.startswith(
				"::"), "Do not fully qualify class name."
				sammccallUnsubmitted Done Reply Inline Actions why not assert this on self.ns so that "::Foo" will work fine? sammccall: why not assert this on self.ns so that "::Foo" will work fine?
				usaxena95AuthorUnsubmitted Done Reply Inline Actions Allowed fully qualified names of classes. usaxena95: Allowed fully qualified names of classes.
				assert "::" in cpp_class, "Class cannot be defined in global namespace."
				ns_and_class = cpp_class.split("::")
				self.ns = ns_and_class[0:-1]
				self.name = ns_and_class[-1]

				def ns_begin(self):
				return ["namespace {ns} {{".format(ns=ns) for ns in self.ns]

				sammccallUnsubmitted Done Reply Inline Actions join here rather than returning an array? sammccall: join here rather than returning an array?
				def ns_end(self):
				return ["}} // namespace {ns}.".format(ns=ns) for ns in self.ns]

				def header_gaurd(self):
				# Convert fname to Upper Snake case.
				return "GENERATED_CODE_COMPLETION_MODEL_{}_H".format(
				reduce(lambda x, y: x + ('_' if y.isupper() else '') + y,
				sammccallUnsubmitted Done Reply Inline Actions gaurd -> guard sammccall: gaurd -> guard
				self.fname).upper())


				sammccallUnsubmitted Done Reply Inline Actions `''.join('_' if x in '-' else x.upper() for x in self.fname)` ? sammccall: `''.join('_' if x in '-' else x.upper() for x in self.fname)` ?
				usaxena95AuthorUnsubmitted Done Reply Inline Actions Sorry for making this complicated. filename was assumed to be in PascalCase (and not contain '-' at all). I wanted to convert it to UPPER_SNAKE_CASE. To avoid unnecessary complexity, lets simply convert it to upper case. usaxena95: Sorry for making this complicated. filename was assumed to be in PascalCase (and not contain '…
				class Tree:
				def __init__(self, json_tree, tree_num: int, node_num: int):
				self.operation = json_tree['operation']
				self.tree_num = tree_num
				sammccallUnsubmitted Done Reply Inline Actions (again, not clear the classes pay for themselves in terms of complexity. We do a lot of parsing and validation, but it's not clear it's important) sammccall: (again, not clear the classes pay for themselves in terms of complexity. We do a lot of parsing…
				self.node_num = node_num
				self.label = "t{0}_n{1}".format(tree_num, node_num)

				if self.operation == 'boost':
				self.score = json_tree['score']
				self.size = 1
				return

				self.feature = json_tree['feature']
				if self.operation == 'if_greater':
				self.threshold = json_tree['threshold']
				elif self.operation == 'if_member':
				self.members = json_tree['set']
				assert isinstance(self.members, list)
				else:
				raise ValueError("Unknown value for operation: ", self.operation)

				self.false = Tree(json_tree['else'],
				tree_num=tree_num,
				node_num=node_num + 1)
				self.true = Tree(json_tree['then'],
				tree_num=tree_num,
				node_num=node_num + self.false.size + 1)
				self.size = 1 + self.true.size + self.false.size

				def codegen(self, features):
				code = []
				if self.node_num == 0:
				code.append("tree_{0}:".format(self.tree_num))
				sammccallUnsubmitted Done Reply Inline Actions would be nice to structure to reduce the indentation inside here. Also 'codegen' is a pretty generic name for this particular piece. I'd consider def node(n): return { 'boost': boost_node, 'if_greater': if_greater_node, 'if_member': if_member_node, }[n['operation']](n) and then define each case separately. (That snippet does actually check errors!) sammccall: would be nice to structure to reduce the indentation inside here. Also 'codegen' is a pretty…

				if self.operation == "boost":
				code.append(
				"{label}: Score += {score}; goto tree_{next_tree};".format(
				label=self.label,
				score=self.score,
				next_tree=self.tree_num + 1))
				return code

				assert self.feature in features, "Unknown feature {}".format(
				self.feature)
				if self.operation == "if_greater":
				code.append(
				"{label}: if(E.{feature} >= {encoded} /{threshold}/) goto {true_label};"
				.format(label=self.label,
				feature=self.feature,
				encoded=order_encode(self.threshold),
				threshold=self.threshold,
				true_label=self.true.label))
				if self.operation == "if_member":
				members = '\|'.join([
				"BIT({enum}::{member})".format(
				enum=features[self.feature].enum, member=member)
				for member in self.members
				])
				code.append(
				sammccallUnsubmitted Done Reply Inline Actions rather than passing features around to get access to the enum type, what about adding `using {feature}_type = ...` to the top of the generated file, and using that here? sammccall: rather than passing features around to get access to the enum type, what about adding `using…
				"{label}: if(E.{feature} & ({members})) goto {true_label};".
				format(label=self.label,
				feature=self.feature,
				members=members,
				true_label=self.true.label))
				return code + self.false.codegen(features) + self.true.codegen(
				features)


				def ReadDecisionForest(forest_json: list):
				forest = []
				tree_num = 0
				for tree_json in forest_json:
				forest.append(Tree(tree_json, tree_num=tree_num, node_num=0))
				tree_num += 1
				return forest


				def gen_header_code(features, cpp_class):
				# Header gaurd
				code = """#ifndef {gaurd}
				#define {gaurd}
				#include <cstdint>

				""".format(gaurd=cpp_class.header_gaurd())

				# Namespace begin
				code += "\n".join(cpp_class.ns_begin()) + "\n"

				# Float order encoding.
				code += """
				// Produces an integer that sorts in the same order as F.
				// That is: a < b <==> orderEncode(a) < orderEncode(b).
				uint32_t OrderEncode(float F);

				"""
				setters = [f.setter() for f in features.values()]
				class_members = [f.member() for f in features.values()]

				# Class.
				code += "class {class_name} {{\n".format(class_name=cpp_class.name)
				code += "public:\n"
				code += " " + "\n ".join(setters) + "\n"
				code += "\n"
				code += "private:\n"
				code += " " + "\n ".join(class_members) + "\n"

				code += " friend float Evaluate(const Example&);\n"
				code += "};\n"
				code += "float Evaluate(const Example&);" + "\n"

				# Namespace end.
				code += "\n".join(cpp_class.ns_end()) + "\n"
				code += "#endif // {gaurd}".format(gaurd=cpp_class.header_gaurd())
				return code


				def order_encode(v: float):
				i = struct.unpack('<I', struct.pack('<f', v))[0]
				TopBit = 1 << 31
				# IEEE 754 floats compare like sign-magnitude integers.
				if (i & TopBit): # Negative float
				return (1 << 32) - i # low half of integers, order reversed.
				return TopBit + i # top half of integers


				def gen_evaluate_func(forest: list, features: dict):
				# Generate code for Random Forest.
				code = "float Evaluate(const Example& E) {\n"
				lines = []
				lines.append("float Score = 0;")
				for tree in forest:
				lines += tree.codegen(features)
				lines.append("")
				lines.append("tree_{}: // No such tree.".format(len(forest)))
				lines.append("return Score;")

				code += " " + "\n ".join(lines)
				code += "\n}"
				return code


				def gen_cpp_code(forest: list, features: dict, fname: str,
				cpp_class: CppClass):
				code = ""
				# Headers
				angled_include = [
				"cstring",
				"limits",
				]
				quoted_include = [
				"{}.h".format(fname),
				"llvm/ADT/bit.h",
				] + CollectHeaders(features)
				code = "\n".join('#include <{h}>'.format(h=h)
				for h in angled_include) + "\n\n"
				code += "\n".join('#include "{h}"'.format(h=h)
				for h in quoted_include) + "\n\n"
				code += "#define BIT(X) (1<<X)\n\n"

				# Namespaces Begin
				code += "\n".join(cpp_class.ns_begin()) + "\n"

				# Float order encoding.
				code += """
				uint32_t OrderEncode(float F) {
				static_assert(std::numeric_limits<float>::is_iec559, "");
				constexpr uint32_t TopBit = ~(~uint32_t{0} >> 1);

				// Get the bits of the float. Endianness is the same as for integers.
				uint32_t U = llvm::bit_cast<uint32_t>(F);
				std::memcpy(&U, &F, sizeof(U));
				// IEEE 754 floats compare like sign-magnitude integers.
				if (U & TopBit) // Negative float.
				return 0 - U; // Map onto the low half of integers, order reversed.
				return U + TopBit; // Positive floats map onto the high half of integers.
				}

				"""
				code += gen_evaluate_func(forest, features) + "\n"
				# Namespaces End
				code += "\n".join(cpp_class.ns_end()) + "\n"
				return code


				def main():
				parser = argparse.ArgumentParser('DecisionForestCodegen')
				parser.add_argument('--fname', help='output file name.')
				parser.add_argument('--output_dir', help='output directory')
				parser.add_argument('--model', help='path to model directory')
				parser.add_argument(
				'--cpp_class',
				help=
				'The name of the class (which may be a namespace-qualified) created in generated header.'
				)
				ns = parser.parse_args()

				output_dir = ns.output_dir
				fname = ns.fname
				header_file = "{dir}/{name}.h".format(dir=output_dir, name=fname)
				cpp_file = "{dir}/{name}.cpp".format(dir=output_dir, name=fname)
				cpp_class = CppClass(fname=fname, cpp_class=ns.cpp_class)

				model_json = "{dir}/forest.json".format(dir=ns.model)
				features_json = "{dir}/features.json".format(dir=ns.model)

				with open(features_json) as features_file:
				features = Feature.ReadFeatures(json.load(features_file))

				with open(model_json) as model_file:
				forest = Tree.ReadDecisionForest(json.load(model_file))

				with open(cpp_file, 'w+t') as output_cc:
				sammccallUnsubmitted Not Done Reply Inline Actions putting all the trees in a single huge json file seems like it makes them hard to inspect, did you consider one-per-file? sammccall: putting all the trees in a single huge json file seems like it makes them hard to inspect, did…
				output_cc.write(
				gen_cpp_code(forest=forest,
				features=features,
				fname=fname,
				cpp_class=cpp_class))

				with open(header_file, 'w+t') as output_h:
				output_h.write(gen_header_code(features=features, cpp_class=cpp_class))


				if __name__ == '__main__':
				main()

clang-tools-extra/clangd/for-review-only/CompletionModel.h

This file was added.

				#ifndef GENERATED_CODE_COMPLETION_MODEL_COMPLETION_MODEL_H
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
				#define GENERATED_CODE_COMPLETION_MODEL_COMPLETION_MODEL_H
				#include <cstdint>

				namespace clang {
				namespace clangd {

				// Produces an integer that sorts in the same order as F.
				// That is: a < b <==> orderEncode(a) < orderEncode(b).
				uint32_t OrderEncode(float F);

				sammccallUnsubmitted Done Reply Inline Actions this shouldn't be part of the public interface. Can we make it private static in the model class? sammccall: this shouldn't be part of the public interface. Can we make it private static in the model…
				class Example {
				public:
				void SetContextKind(unsigned V) { ContextKind = 1<<V; }

				private:
				uint32_t ContextKind = 0;
				friend float Evaluate(const Example&);
				};
				float Evaluate(const Example&);
				} // namespace clang.
				} // namespace clangd.
				#endif // GENERATED_CODE_COMPLETION_MODEL_COMPLETION_MODEL_H
				No newline at end of file

clang-tools-extra/clangd/for-review-only/CompletionModel.cpp

This file was added.

				#include <cstring>
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
				#include <limits>

				#include "CompletionModel.h"
				#include "llvm/ADT/bit.h"
				#include "clang/Sema/CodeCompleteConsumer.h"

				#define BIT(X) (1<<X)

				namespace clang {
				namespace clangd {

				uint32_t OrderEncode(float F) {
				static_assert(std::numeric_limits<float>::is_iec559, "");
				constexpr uint32_t TopBit = ~(~uint32_t{0} >> 1);

				// Get the bits of the float. Endianness is the same as for integers.
				uint32_t U = llvm::bit_cast<uint32_t>(F);
				std::memcpy(&U, &F, sizeof(U));
				// IEEE 754 floats compare like sign-magnitude integers.
				if (U & TopBit) // Negative float.
				return 0 - U; // Map onto the low half of integers, order reversed.
				return U + TopBit; // Positive floats map onto the high half of integers.
				}

				float Evaluate(const Example& E) {
				float Score = 0;
				tree_0:
				t0_n0: if(E.ContextKind & (BIT(clang::CodeCompletionContext::Kind::CCC_DotMemberAccess)\|BIT(clang::CodeCompletionContext::Kind::CCC_ArrowMemberAccess))) goto t0_n2;
				sammccallUnsubmitted Done Reply Inline Actions nit: consistently either: tree0, tree0_node1 tree_0, tree_0_node_1 t0, t0_n1 etc sammccall: nit: consistently either: - tree0, tree0_node1 - tree_0, tree_0_node_1 - t0, t0_n1 etc
				t0_n1: Score += 1.0; goto tree_1;
				t0_n2: Score += 3.0; goto tree_1;

				tree_1: // No such tree.
				return Score;
				}
				} // namespace clang.
				} // namespace clangd.

clang-tools-extra/clangd/for-review-only/DecisionForestRuntimeTest.h

This file was added.

				#ifndef GENERATED_CODE_COMPLETION_MODEL_DECISION_FOREST_RUNTIME_TEST_H
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
				#define GENERATED_CODE_COMPLETION_MODEL_DECISION_FOREST_RUNTIME_TEST_H
				#include <cstdint>

				namespace ns1 {
				namespace ns2 {
				namespace test {

				// Produces an integer that sorts in the same order as F.
				// That is: a < b <==> orderEncode(a) < orderEncode(b).
				uint32_t OrderEncode(float F);

				class Example {
				public:
				void SetANumber(float V) { ANumber = OrderEncode(V); }
				void SetAFloat(float V) { AFloat = OrderEncode(V); }
				void SetACategorical(unsigned V) { ACategorical = 1<<V; }

				private:
				uint32_t ANumber = 0;
				uint32_t AFloat = 0;
				uint32_t ACategorical = 0;
				friend float Evaluate(const Example&);
				};
				float Evaluate(const Example&);
				} // namespace ns1.
				} // namespace ns2.
				} // namespace test.
				#endif // GENERATED_CODE_COMPLETION_MODEL_DECISION_FOREST_RUNTIME_TEST_H
				No newline at end of file

clang-tools-extra/clangd/for-review-only/DecisionForestRuntimeTest.cpp

This file was added.

				#include <cstring>
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
				#include <limits>

				#include "DecisionForestRuntimeTest.h"
				#include "llvm/ADT/bit.h"
				#include "model/CategoricalFeature.h"

				#define BIT(X) (1<<X)

				namespace ns1 {
				namespace ns2 {
				namespace test {

				uint32_t OrderEncode(float F) {
				static_assert(std::numeric_limits<float>::is_iec559, "");
				constexpr uint32_t TopBit = ~(~uint32_t{0} >> 1);

				// Get the bits of the float. Endianness is the same as for integers.
				uint32_t U = llvm::bit_cast<uint32_t>(F);
				std::memcpy(&U, &F, sizeof(U));
				// IEEE 754 floats compare like sign-magnitude integers.
				if (U & TopBit) // Negative float.
				return 0 - U; // Map onto the low half of integers, order reversed.
				return U + TopBit; // Positive floats map onto the high half of integers.
				}

				float Evaluate(const Example& E) {
				float Score = 0;
				tree_0:
				t0_n0: if(E.ANumber >= 3276275712 /200.0/) goto t0_n4;
				t0_n1: if(E.ACategorical & (BIT(ns1::ns2::TestEnum::A)\|BIT(ns1::ns2::TestEnum::C))) goto t0_n3;
				t0_n2: Score += -4.0; goto tree_1;
				t0_n3: Score += 3.0; goto tree_1;
				t0_n4: if(E.AFloat >= 1082130432 /-1/) goto t0_n6;
				t0_n5: Score += -20.0; goto tree_1;
				t0_n6: Score += 10.0; goto tree_1;

				tree_1:
				t1_n0: if(E.ACategorical & (BIT(ns1::ns2::TestEnum::A)\|BIT(ns1::ns2::TestEnum::B))) goto t1_n2;
				t1_n1: Score += -6.0; goto tree_2;
				t1_n2: Score += 5.0; goto tree_2;

				tree_2: // No such tree.
				return Score;
				}
				} // namespace ns1.
				} // namespace ns2.
				} // namespace test.

clang-tools-extra/clangd/model/features.json

This file was added.

				[
				{
				"name": "ContextKind",
				"type": "CATEGORICAL",
				"enum": "clang::CodeCompletionContext::Kind",
				"header": "clang/Sema/CodeCompleteConsumer.h"
				}
				]
				No newline at end of file

clang-tools-extra/clangd/model/forest.json

This file was added.

				[
				{
				"operation": "if_member",
				"feature": "ContextKind",
				"set": [
				"CCC_DotMemberAccess",
				"CCC_ArrowMemberAccess"
				],
				"then": {
				"operation": "boost",
				"score": 3.0
				},
				"else": {
				"operation": "boost",
				"score": 1.0
				}
				}
				]
				No newline at end of file

clang-tools-extra/clangd/unittests/CMakeLists.txt

Show All 22 Lines
endif()		endif()

if (CLANGD_ENABLE_REMOTE)		if (CLANGD_ENABLE_REMOTE)
include_directories(${CMAKE_CURRENT_BINARY_DIR}/../index/remote)		include_directories(${CMAKE_CURRENT_BINARY_DIR}/../index/remote)
add_definitions(-DGOOGLE_PROTOBUF_NO_RTTI=1)		add_definitions(-DGOOGLE_PROTOBUF_NO_RTTI=1)
set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)		set(REMOTE_TEST_SOURCES remote/MarshallingTests.cpp)
endif()		endif()

		set(DF_COMPILER ${CMAKE_CURRENT_SOURCE_DIR}/../CompletionModelCodegen.py)
		include(${CMAKE_CURRENT_SOURCE_DIR}/../CompletionModel.cmake)
		df_compile(${CMAKE_CURRENT_SOURCE_DIR}/model DecisionForestRuntimeTest ns1::ns2::test::Example)

add_custom_target(ClangdUnitTests)		add_custom_target(ClangdUnitTests)
add_unittest(ClangdUnitTests ClangdTests		add_unittest(ClangdUnitTests ClangdTests
Annotations.cpp		Annotations.cpp
ASTTests.cpp		ASTTests.cpp
BackgroundIndexTests.cpp		BackgroundIndexTests.cpp
CanonicalIncludesTests.cpp		CanonicalIncludesTests.cpp
ClangdTests.cpp		ClangdTests.cpp
ClangdLSPServerTests.cpp		ClangdLSPServerTests.cpp
CodeCompleteTests.cpp		CodeCompleteTests.cpp
CodeCompletionStringsTests.cpp		CodeCompletionStringsTests.cpp
CollectMacrosTests.cpp		CollectMacrosTests.cpp
CompileCommandsTests.cpp		CompileCommandsTests.cpp
CompilerTests.cpp		CompilerTests.cpp
ConfigCompileTests.cpp		ConfigCompileTests.cpp
ConfigProviderTests.cpp		ConfigProviderTests.cpp
ConfigYAMLTests.cpp		ConfigYAMLTests.cpp
		DecisionForestTests.cpp
DexTests.cpp		DexTests.cpp
DiagnosticsTests.cpp		DiagnosticsTests.cpp
DraftStoreTests.cpp		DraftStoreTests.cpp
ExpectedTypeTest.cpp		ExpectedTypeTest.cpp
FileDistanceTests.cpp		FileDistanceTests.cpp
FileIndexTests.cpp		FileIndexTests.cpp
FindSymbolsTests.cpp		FindSymbolsTests.cpp
FindTargetTests.cpp		FindTargetTests.cpp
Show All 27 Lines	add_unittest(ClangdUnitTests ClangdTests
TestFS.cpp		TestFS.cpp
TestIndex.cpp		TestIndex.cpp
TestTU.cpp		TestTU.cpp
TypeHierarchyTests.cpp		TypeHierarchyTests.cpp
TweakTests.cpp		TweakTests.cpp
TweakTesting.cpp		TweakTesting.cpp
URITests.cpp		URITests.cpp
XRefsTests.cpp		XRefsTests.cpp
		${GENERATED_CC}

support/CancellationTests.cpp		support/CancellationTests.cpp
support/ContextTests.cpp		support/ContextTests.cpp
support/FunctionTests.cpp		support/FunctionTests.cpp
support/MarkupTests.cpp		support/MarkupTests.cpp
support/ThreadingTests.cpp		support/ThreadingTests.cpp
support/TestTracer.cpp		support/TestTracer.cpp
support/TraceTests.cpp		support/TraceTests.cpp

${REMOTE_TEST_SOURCES}		${REMOTE_TEST_SOURCES}

$<TARGET_OBJECTS:obj.clangDaemonTweaks>		$<TARGET_OBJECTS:obj.clangDaemonTweaks>
)		)

		target_include_directories(ClangdTests PUBLIC
		$<BUILD_INTERFACE:${DF_INCLUDE}>
		)

clang_target_link_libraries(ClangdTests		clang_target_link_libraries(ClangdTests
PRIVATE		PRIVATE
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangBasic		clangBasic
clangFormat		clangFormat
clangFrontend		clangFrontend
clangIndex		clangIndex
Show All 30 Lines

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

//===-- CodeCompleteTests.cpp ------------------------------------ C++ --===//		//===-- CodeCompleteTests.cpp ------------------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Annotations.h"		#include "Annotations.h"
#include "ClangdServer.h"		#include "ClangdServer.h"
#include "CodeComplete.h"		#include "CodeComplete.h"
#include "Compiler.h"		#include "Compiler.h"
		#include "CompletionModel.h"
#include "Matchers.h"		#include "Matchers.h"
#include "Protocol.h"		#include "Protocol.h"
#include "Quality.h"		#include "Quality.h"
#include "SourceCode.h"		#include "SourceCode.h"
#include "SyncAPI.h"		#include "SyncAPI.h"
#include "TestFS.h"		#include "TestFS.h"
#include "TestIndex.h"		#include "TestIndex.h"
#include "TestTU.h"		#include "TestTU.h"
Show All 21 Lines
using ::testing::AllOf;		using ::testing::AllOf;
using ::testing::Contains;		using ::testing::Contains;
using ::testing::ElementsAre;		using ::testing::ElementsAre;
using ::testing::Field;		using ::testing::Field;
using ::testing::HasSubstr;		using ::testing::HasSubstr;
using ::testing::IsEmpty;		using ::testing::IsEmpty;
using ::testing::Not;		using ::testing::Not;
using ::testing::UnorderedElementsAre;		using ::testing::UnorderedElementsAre;
		using ContextKind = CodeCompletionContext::Kind;

// GMock helpers for matching completion items.		// GMock helpers for matching completion items.
MATCHER_P(Named, Name, "") { return arg.Name == Name; }		MATCHER_P(Named, Name, "") { return arg.Name == Name; }
MATCHER_P(NameStartsWith, Prefix, "") {		MATCHER_P(NameStartsWith, Prefix, "") {
return llvm::StringRef(arg.Name).startswith(Prefix);		return llvm::StringRef(arg.Name).startswith(Prefix);
}		}
MATCHER_P(Scope, S, "") { return arg.Scope == S; }		MATCHER_P(Scope, S, "") { return arg.Scope == S; }
MATCHER_P(Qualifier, Q, "") { return arg.RequiredQualifier == Q; }		MATCHER_P(Qualifier, Q, "") { return arg.RequiredQualifier == Q; }
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	return codeComplete(FilePath, Test.point(), /Preamble=/nullptr, ParseInput,
Opts);		Opts);
}		}

Symbol withReferences(int N, Symbol S) {		Symbol withReferences(int N, Symbol S) {
S.References = N;		S.References = N;
return S;		return S;
}		}

		TEST(DecisionForestRuntime, SanityTest) {
		using Example = clangd::Example;
		using clangd::Evaluate;
		Example E1;
		E1.SetContextKind(ContextKind::CCC_ArrowMemberAccess);
		Example E2;
		E2.SetContextKind(ContextKind::CCC_SymbolOrNewName);
		EXPECT_GT(Evaluate(E1), Evaluate(E2));
		}

TEST(CompletionTest, Limit) {		TEST(CompletionTest, Limit) {
clangd::CodeCompleteOptions Opts;		clangd::CodeCompleteOptions Opts;
Opts.Limit = 2;		Opts.Limit = 2;
auto Results = completions(R"cpp(		auto Results = completions(R"cpp(
struct ClassWithMembers {		struct ClassWithMembers {
int AAA();		int AAA();
int BBB();		int BBB();
int CCC();		int CCC();
▲ Show 20 Lines • Show All 2,761 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/DecisionForestTests.cpp

This file was added.


				#include "DecisionForestRuntimeTest.h"
				#include "model/CategoricalFeature.h"
				#include "gtest/gtest.h"

				namespace clangd {
				adamczUnsubmitted Done Reply Inline Actions This is supposed to be "namespace clang", right? adamcz: This is supposed to be "namespace clang", right?
				namespace clangd {

				TEST(DecisionForestRuntime, Evaluate) {
				using Example = ::ns1::ns2::test::Example;
				using Cat = ::ns1::ns2::TestEnum;
				using ::ns1::ns2::test::Evaluate;

				Example E;
				E.SetANumber(200); // True
				E.SetAFloat(0); // True: +10.0
				E.SetACategorical(Cat::A); // True: +5.0
				EXPECT_EQ(Evaluate(E), 15.0);

				E.SetANumber(200); // True
				E.SetAFloat(-2.5); // False: -20.0
				E.SetACategorical(Cat::B); // True: +5.0
				EXPECT_EQ(Evaluate(E), -15.0);

				E.SetANumber(100); // False
				E.SetACategorical(Cat::C); // True: +3.0, False: -6.0
				EXPECT_EQ(Evaluate(E), -3.0);
				}
				} // namespace clangd
				} // namespace clangd
				No newline at end of file

clang-tools-extra/clangd/unittests/model/CategoricalFeature.h

This file was added.

				namespace ns1 {
				namespace ns2 {
				adamczUnsubmitted Done Reply Inline Actions Can we rename this directory? quality/model makes some sense (although it would be better to include something about code completion there), but unittests/model is not very descriptive - what model? How about unittests/decision_forest_model/ or something like that? Or go with the Input/TEST_NAME pattern. adamcz: Can we rename this directory? quality/model makes some sense (although it would be better to…
				usaxena95AuthorUnsubmitted Done Reply Inline Actions You are right "quality" wasn't indicative of code completion here but we decided to be consistent with the current naming. The current heuristics for the ranking are in Quality.h and Quality.cpp ;-) changed the dir name in unittests. usaxena95: You are right "quality" wasn't indicative of code completion here but we decided to be…
				enum TestEnum { A, B, C, D };
				} // namespace ns2
				} // namespace ns1
				No newline at end of file

clang-tools-extra/clangd/unittests/model/features.json

This file was added.

				[
				{
				"name": "ANumber",
				"type": "NUMERICAL"
				},
				{
				"name": "AFloat",
				"type": "NUMERICAL"
				},
				{
				"name": "ACategorical",
				"type": "CATEGORICAL",
				"enum": "ns1::ns2::TestEnum",
				"header": "model/CategoricalFeature.h"
				}
				]
				No newline at end of file

clang-tools-extra/clangd/unittests/model/forest.json

This file was added.

				[
				{
				"operation": "if_greater",
				"feature": "ANumber",
				"threshold": 200.0,
				"then": {
				"operation": "if_greater",
				"feature": "AFloat",
				"threshold": -1,
				"then": {
				"operation": "boost",
				"score": 10.0
				},
				"else": {
				"operation": "boost",
				"score": -20.0
				}
				},
				"else": {
				"operation": "if_member",
				"feature": "ACategorical",
				"set": [
				"A",
				"C"
				],
				"then": {
				"operation": "boost",
				"score": 3.0
				},
				"else": {
				"operation": "boost",
				"score": -4.0
				}
				}
				},
				{
				"operation": "if_member",
				"feature": "ACategorical",
				"set": [
				"A",
				"B"
				],
				"then": {
				"operation": "boost",
				"score": 5.0
				},
				"else": {
				"operation": "boost",
				"score": -6.0
				}
				}
				]
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add Random Forest runtime for code completion.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 280406

clang-tools-extra/clangd/CMakeLists.txt

clang-tools-extra/clangd/CompletionModel.cmake

clang-tools-extra/clangd/CompletionModelCodegen.py

clang-tools-extra/clangd/for-review-only/CompletionModel.h

clang-tools-extra/clangd/for-review-only/CompletionModel.cpp

clang-tools-extra/clangd/for-review-only/DecisionForestRuntimeTest.h

clang-tools-extra/clangd/for-review-only/DecisionForestRuntimeTest.cpp

clang-tools-extra/clangd/model/features.json

clang-tools-extra/clangd/model/forest.json

clang-tools-extra/clangd/unittests/CMakeLists.txt

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

clang-tools-extra/clangd/unittests/DecisionForestTests.cpp

clang-tools-extra/clangd/unittests/model/CategoricalFeature.h

clang-tools-extra/clangd/unittests/model/features.json

clang-tools-extra/clangd/unittests/model/forest.json

[clangd] Add Random Forest runtime for code completion.
ClosedPublic