Describe the bug The Pants Rust

With <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-i

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Python] The Pants Rust inference parser exhibits some dependency misinterpretation issues when python-infer.string_imports is enabled.,about pantsbuild/pants

thejcannon commented on May 25, 2024 2

1 seems like a bug. We'll likely have to find the right tree-sitter incantation for that.

2 and 3 are more of a "quirk" than a bug. However, since they aren't valid dotted module names (as far as I know) I think it's safe to also "fix" this quirk.

from pants.

cognifloyd commented on May 25, 2024 2

With #20472, only dotted strings that are made up of valid python identifiers will be be considered as possible imports. So, points 2 and 3 should be resolved in pants 2.20 or the 2.19.1 (once released).

To fix the first issue, someone needs to figure out how to deal with concatenated strings with the tree sitter in the dependency parser.

from pants.

cognifloyd commented on May 25, 2024

Similar to point 2: the rust parser also causes a warning for every file that has either of these strings, neither of which is a valid python module:

.
..

I get that using pants 2.18.2 with this in pants.toml (from the StackStorm project):

[python-infer]
string_imports = true
string_imports_min_dots = 1  # tools/config_gen.py has import strings with only one dot.
unowned_dependency_behavior = "ignore"
ambiguity_resolution = "by_source_root"
use_rust_parser = true

The old parser (in 2.18.x) explicitly required alphanumeric characters before the dot:

pants/src/python/pants/backend/python/dependency_inference/scripts/general_dependency_visitor.py

Lines 26 to 31 in 3da5ae5

    
           self._string_import_regex = re.compile( 
        
               r"^([a-z_][a-z_\d]*\.){" 
        
               + os.environ["STRING_IMPORTS_MIN_DOTS"] 
        
               + r",}[a-zA-Z_]\w*$", 
        
               re.UNICODE, 
        
           )

The new rust parser, however, seems to only exclude strings with a whitespace or a \ char (assuming I'm reading this rust code correctly):

pants/src/rust/engine/dep_inference/src/python/mod.rs

Lines 325 to 328 in 3632a66

    
           if !text.contains(|c: char| c.is_ascii_whitespace() || c == '\\') { 
        
               self.string_candidates 
        
                   .insert(text.to_string(), (range.start_point.row + 1) as u64); 
        
           }

Can we make the rust parser be more selective on which strings might be dependencies? In particular, a string that only consists of . chars, or maybe any string that ends in . should be excluded (as that would also catch points 2 and 3: 'a.b.' and 'retrying...').

Maybe with something like this:

        if !text.ends_with(".") && !text.contains(|c: char| c.is_ascii_whitespace() || c == '\\') {
            self.string_candidates
                .insert(text.to_string(), (range.start_point.row + 1) as u64);
        }

Oh. That wouldn't work because the detected strings are used for both imports and assets.
So, the additional logic to exclude irrelevant strings would need to go in the python side of things:

pants/src/python/pants/backend/python/dependency_inference/parse_python_dependencies.py

Lines 117 to 130 in 8eb5557

    
           if python_infer_subsystem.string_imports or python_infer_subsystem.assets: 
        
               for string, line in native_result.string_candidates.items(): 
        
                   slash_count = string.count("/") 
        
                   if ( 
        
                       python_infer_subsystem.string_imports 
        
                       and not slash_count 
        
                       and string.count(".") >= python_infer_subsystem.string_imports_min_dots 
        
                   ): 
        
                       imports.setdefault(string, (line, True)) 
        
                   if ( 
        
                       python_infer_subsystem.assets 
        
                       and slash_count >= python_infer_subsystem.assets_min_slashes 
        
                   ): 
        
                       assets.add(string)

So, maybe:

    if python_infer_subsystem.string_imports or python_infer_subsystem.assets:
        for string, line in native_result.string_candidates.items():
            slash_count = string.count("/")
            if (
                python_infer_subsystem.string_imports
                and not slash_count
                and not string.endswith(".")
                and string.count(".") >= python_infer_subsystem.string_imports_min_dots
            ):
                imports.setdefault(string, (line, True))
            if (
                python_infer_subsystem.assets
                and slash_count >= python_infer_subsystem.assets_min_slashes
            ):
                assets.add(string)

from pants.

cognifloyd commented on May 25, 2024

Another point (maybe it should be a separate issue, but I'll put it here for now):
The rust parser does not ignore strings on lines like this, even though I put the no-infer-dep directive on the line:

            "runner_module": "tests.test_runner",  # pants: no-infer-dep

Maybe the rust parser should check for the pragma before recording each string, from this:

pants/src/rust/engine/dep_inference/src/python/mod.rs

Lines 322 to 328 in 3632a66

    
           fn visit_string(&mut self, node: tree_sitter::Node) -> ChildBehavior { 
        
               let range = node.range(); 
        
               let text: &str = self.string_at(range); 
        
               if !text.contains(|c: char| c.is_ascii_whitespace() || c == '\\') { 
        
                   self.string_candidates 
        
                       .insert(text.to_string(), (range.start_point.row + 1) as u64); 
        
               }

to this:

    fn visit_string(&mut self, node: tree_sitter::Node) -> ChildBehavior {
        let range = node.range();
        let text: &str = self.string_at(range);
        if !!text.contains(|c: char| c.is_ascii_whitespace() || c == '\\') && !self.is_pragma_ignored(node) {
            self.string_candidates
                .insert(text.to_string(), (range.start_point.row + 1) as u64);
        }

from pants.

cognifloyd commented on May 25, 2024

@liudonggalaxy Do you get any warnings or errors for your last two points:

For the string BASE_PATH = 'a.b.', the Rust parser mistakenly treats it as a dependency, unlike the classic parser.

In print('retrying...'), the Rust parser incorrectly interprets it as a dependency, while the classic parser does not.

from pants.

liudonggalaxy commented on May 25, 2024

mkdir -p pants_inference
pants --python-infer-no-use-rust-parser peek :: > pants_inference/classic_parser.json
pants --python-infer-use-rust-parser peek :: > pants_inference/rust_parser.json
diff pants_inference/classic_parser.json pants_inference/rust_parser.json

Hello,
I didn't observe any warnings or errors, but the following CircleCI test failed in our environment:

mkdir -p pants_inference
pants --python-infer-no-use-rust-parser peek :: > pants_inference/classic_parser.json
pants --python-infer-use-rust-parser peek :: > pants_inference/rust_parser.json
diff pants_inference/classic_parser.json pants_inference/rust_parser.json

To resolve this issue and ensure the CircleCI job passes, we modified the following source code.

BASE_PATH = 'a.b.'
print('retrying...')

->

BASE_PATH ='.'.join(['a', 'b']) + '.'
print('retrying' + '...')

from pants.

[Python] The Pants Rust inference parser exhibits some dependency misinterpretation issues when python-infer.string_imports is enabled. about pants HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	self._string_import_regex = re.compile(
	r"^([a-z_][a-z_\d]*\.){"
	+ os.environ["STRING_IMPORTS_MIN_DOTS"]
	+ r",}[a-zA-Z_]\w*$",
	re.UNICODE,
	)

	if !text.contains(\|c: char\| c.is_ascii_whitespace() \|\| c == '\\') {
	self.string_candidates
	.insert(text.to_string(), (range.start_point.row + 1) as u64);
	}

	if python_infer_subsystem.string_imports or python_infer_subsystem.assets:
	for string, line in native_result.string_candidates.items():
	slash_count = string.count("/")
	if (
	python_infer_subsystem.string_imports
	and not slash_count
	and string.count(".") >= python_infer_subsystem.string_imports_min_dots
	):
	imports.setdefault(string, (line, True))
	if (
	python_infer_subsystem.assets
	and slash_count >= python_infer_subsystem.assets_min_slashes
	):
	assets.add(string)

	fn visit_string(&mut self, node: tree_sitter::Node) -> ChildBehavior {
	let range = node.range();
	let text: &str = self.string_at(range);
	if !text.contains(\|c: char\| c.is_ascii_whitespace() \|\| c == '\\') {
	self.string_candidates
	.insert(text.to_string(), (range.start_point.row + 1) as u64);
	}