Giter Site home page Giter Site logo

Comments (20)

ghjklw avatar ghjklw commented on May 28, 2024 1

A bit hacky, but can probably be improved and certainly more robust/generic than just ignoring spaces:

ast.dump(ast.parse("str|None")) == ast.dump(ast.parse("str | None"))

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Hi @ghjklw,

I understand your point. However, I am a bit concerned that this could become a slippery slope.

For example, if pydoclint allows str, optional to be equivalent to Optional[str], shall it also allow :obj:`str`, optional (what the Sphinx doc uses)? And what about other variations, such as str or None, str or `None` , str or ``None`` ?

It is possible that, with lots of bespoke logic, pydoclint can support all these variations, but I'm also a bit concerned that parsing these variations will make pydoclint a lot slower.

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Another example to show that things could become intractable is a type hint like this:

Optional[Union[int, float, Optional[bool], List[str]]]

To "translate" this to pseudo natural language, it would have to be:

"int or float or (bool, optional) or list of str, optional"

😅

from pydoclint.

real-yfprojects avatar real-yfprojects commented on May 28, 2024
Optional[Union[int, float, Optional[bool], List[str]]]

Well, this could be simplified to

int | float | bool | list[str] | None

Which could be written in pseudo natural language like this:

int, float, bool or list of str, optional

While I think pydoclint should support the syntaxes for optional arguments as defined in the official specifications, I don't think it should bother to support things like list of str. However, one could discuss supporting natural language lists like int, float or bool in another issue.

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

First of all, I checked the official Google Python style guide, and I didn't see Google recommending/enforcing any guidance for writing type hints in docstrings.

The example shown in the Sphinx documentation is Sphinx's style ("str, optional"). It's a fine style, but not quite practical for a linter.

There is only one way to specify type before Python 3.10, and a new way since Python 3.10. This standardization simplifies people's lives. At least, I personally only need to remember that "docstring's type hint should match the signature's type hint".

On the contrary, allowing natural languages can introduce ambiguities, variations, and people forgetting what the rules are.

Let me give you another example:

int | float | bool | str

Shall we allow int, float, bool, or str or int, float, bool or str or both? The former contains the Oxford comma and the latter does not.

from pydoclint.

ghjklw avatar ghjklw commented on May 28, 2024

Hi!

I now realise my initial comment was a bit misleading! I actually didn't understand str, optional as pseudo natural language nor as an equivalent to Optional[str] or str | None. I understood it as a special keyword to indicate that a keyword argument is not required / has a default value.

I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:

If it is not necessary to specify a keyword argument, use optional:

x : int, optional

Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).

or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:

copy : bool, default True
copy : bool, default=True
copy : bool, default: True

In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional or , default=XXX. The opposite should also be tested

I couldn't however find a good example to determine whether the type in the docstring should be var1 (str, optional) or var1 (str | None, optional). I personally think the second one makes more sense (and works fine as-is with Pydoclint) and is less ambiguous (it's not obvious with the first option that None is a valid value for var1)

from pydoclint.

ghjklw avatar ghjklw commented on May 28, 2024
Optional[Union[int, float, Optional[bool], List[str]]]

Well, this could be simplified to

int | float | bool | list[str] | None

Putting aside the question of the natural language representation of types (which I think, if desirable, should be tracked as an independent issue), this raises another interesting point:

from typing import Optional


def example1(var: str | None) -> None:
    """Test function 1.

    Args:
        var (str | None): Variable description.
    """


def example2(var: str | None) -> None:
    """Test function 2.

    Args:
        var (str|None): Variable description.
    """


def example3(var: Optional[str]) -> None:
    """Test function 3.

    Args:
        var (str | None): Variable description.
    """

Raises the following validation errors:

    12: DOC105: Function `example2`: Argument names match, but type hints do not match 
    20: DOC105: Function `example3`: Argument names match, but type hints do not match 

I guess that's because types are compared as strings, but it is also possible to do:

>>> (str | None) == (str|None)
True
>>> (str | None) == Optional[str]
True
>>> Optional[Union[int, float, Optional[bool], List[str]]] == (int | float | bool | List[str] | None)
True

Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?

from pydoclint.

real-yfprojects avatar real-yfprojects commented on May 28, 2024

Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?

It definitely be a desirable feature but I imagine it could be quite difficult to implement.

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Interesting!

I didn't know that we can directly compare type hints (str | None == str|None). TIL.

@ghjklw you are correct that the comparison of types in pydoclint is currently done via string comparison.

Here's a code snippet:

def __eq__(self, o: 'Arg') -> bool:
if not isinstance(o, Arg):
return False
return self.name == o.name and self._eq(self.typeHint, o.typeHint)

pydoclint only does static syntax analysis (more in what I mentioned in this comment: #35 (comment)). Doing "dynamic" code analysis on the full source code is most likely why darglint is slow.

That said, dynamically parsing type hint strings may not be that slow.

This is something I can do right now:

exec('left = str | None')
exec('right = str|None')
assert left == right

However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?

from pydoclint.

real-yfprojects avatar real-yfprojects commented on May 28, 2024

For example, the argument type is a custom class that needs to be quoted.

What do you mean by that? No type hints have to be quoted and why would that be a problem?

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

No type hints have to be quoted and why would that be a problem?

See this example:

def __lt__(self, other: 'Arg') -> bool:

from pydoclint.

real-yfprojects avatar real-yfprojects commented on May 28, 2024

from __future__ import annotations doesn't fix that?

from pydoclint.

ghjklw avatar ghjklw commented on May 28, 2024
exec('left = str | None')
exec('right = str|None')
assert left == right

However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?

Putting aside the security risk of using exec (probably acceptable in this case, but I might be wrong), I think there's possibly an even greater challenge than quoted type hints: the types could come from a large number of places: standard libraries, imports, wildcard imports, objects defined in the same script, libraries specific to the environment you're using (e.g. Spark when working in an environment like Databricks or Azure Synapse), dyamically created types or typevars... The more I think about it, the more I'm afraid this again comes back to the ast vs inspect approach...

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Oh wow, I didn't know about from __future__ import annotations. TIL.

Naturally, there will be people like me who don't know about from __future__ import annotations and thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.

I agree with @ghjklw 's comment above; there may be too many edge cases that we can't possibly think of.

That said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?

from pydoclint.

real-yfprojects avatar real-yfprojects commented on May 28, 2024

Thinking about it I only see the following cases where it might make sense to have a docstring type that doesn't match the type hint.

  • Type Aliases (In case of a library exposing an API, they make sense to be used as a type hint but might not make sense to be use in docstrings read by the users of the API)
  • Type Variables, ParamSpec, TypeVarTuple (I am very unsure about the best way to document those in the context of an API)
  • White spaces
  • Optional[] (However you could use from __futures__ import annotations instead)
  • Union[] (same as above)
  • Dict[], Set[], List[], ... (same as above)
  • Literal (to improve readability)
  • Annotated
  • TypeGuard (basically an alias for bool)
  • Imported types (If I import a type with from foo import T, I would want to document it as foo.T)

from pydoclint.

real-yfprojects avatar real-yfprojects commented on May 28, 2024

thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.

The type system will use typing.ForwardRef to represent those. For string to string comparisons you can simply strip any quotes (outside Literal[]).

hat said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?

I don't know of a type hint that doesn't separate two types by either , or |. So that should be possible.

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

I already have a method to remove quotes from the type hint:

@classmethod
def _stripQuotes(cls, string: str) -> str:
return string.replace('"', '').replace("'", '')

And I think using ast.dump(ast.parse(...)) is a good idea.

With _stripQuotes() and ast.dump(ast.parse(...)), we can at least tackle the following cases:

  • str | int | None vs str|int|None
  • 'MyClass' | None vs 'MyClass'|None
  • Union[int, bool] vs Union[int,bool]
  • ...

But ast.dump(ast.parse('str | None')) == ast.dump(ast.parse('Optional[str]')) is still False, simply because ast doesn't actually know what | does and what Optional means -- it's a static syntax parser and that's why it's fast.

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Hi @ghjklw , I forgot to address your previous point above

I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:

If it is not necessary to specify a keyword argument, use optional:

x : int, optional

Optional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:

Description of parameter `x` (the default is -1, which implies summation
over all axes).

or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:

copy : bool, default True
copy : bool, default=True
copy : bool, default: True

In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional or , default=XXX. The opposite should also be tested>

I used to write , default=True after bool, but darglint reported it as a style violation ("type hints don't match"), so I trained myself to put "default = True` in the argument description.

And now in pydoclint, I'm actually "inheriting" that behavior from darglint.

I think there are 2 new features that I can potentially add:

  1. For numpy-style docstrings, parse the string "bool, default: True" to ignore , default: True, so that people don't have to move it away from the type hint
  2. Parse the default value from the docstring and compare that with the default value in the signature

No. 2 is a slippery slope, because rule-based natural language parsers are never easy to implement. Too many edge cases. I can already think of some:

  • arg1 : int, default is 1
  • arg1 : int, default value: 1
  • arg1 : int (default = 1)
  • arg1 : bool = True

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Update: I haven't got a chance to get to it yet. But this issue is still on my mind.

from pydoclint.

jsh9 avatar jsh9 commented on May 28, 2024

Hi @ghjklw , I found that this PR (#58) actually fixed the issue of comparing str|int|bool with str | int | bool (only spaces are different).

However, I still think that to consider str | int and Union[str, int] equal, the implementation may be too cumbersome and fragile.

Therefore, I'll label str | int vs Union[str, int] as "may tackle in the future", and close this issue for now.

But if you have any additional comments or suggestions, please feel free to add comments below, or re-open this issue.

Thanks!

from pydoclint.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.