Comments (20)
A bit hacky, but can probably be improved and certainly more robust/generic than just ignoring spaces:
ast.dump(ast.parse("str|None")) == ast.dump(ast.parse("str | None"))
from pydoclint.
Hi @ghjklw,
I understand your point. However, I am a bit concerned that this could become a slippery slope.
For example, if pydoclint allows str, optional
to be equivalent to Optional[str]
, shall it also allow :obj:`str`, optional
(what the Sphinx doc uses)? And what about other variations, such as str or None
, str or `None`
, str or ``None``
?
It is possible that, with lots of bespoke logic, pydoclint can support all these variations, but I'm also a bit concerned that parsing these variations will make pydoclint a lot slower.
from pydoclint.
Another example to show that things could become intractable is a type hint like this:
Optional[Union[int, float, Optional[bool], List[str]]]
To "translate" this to pseudo natural language, it would have to be:
"int or float or (bool, optional) or list of str, optional"
😅
from pydoclint.
Optional[Union[int, float, Optional[bool], List[str]]]
Well, this could be simplified to
int | float | bool | list[str] | None
Which could be written in pseudo natural language like this:
int, float, bool or list of str, optional
While I think pydoclint should support the syntaxes for optional arguments as defined in the official specifications, I don't think it should bother to support things like list of str
. However, one could discuss supporting natural language lists like int, float or bool
in another issue.
from pydoclint.
First of all, I checked the official Google Python style guide, and I didn't see Google recommending/enforcing any guidance for writing type hints in docstrings.
The example shown in the Sphinx documentation is Sphinx's style ("str, optional"). It's a fine style, but not quite practical for a linter.
There is only one way to specify type before Python 3.10, and a new way since Python 3.10. This standardization simplifies people's lives. At least, I personally only need to remember that "docstring's type hint should match the signature's type hint".
On the contrary, allowing natural languages can introduce ambiguities, variations, and people forgetting what the rules are.
Let me give you another example:
int | float | bool | str
Shall we allow int, float, bool, or str
or int, float, bool or str
or both? The former contains the Oxford comma and the latter does not.
from pydoclint.
Hi!
I now realise my initial comment was a bit misleading! I actually didn't understand str, optional
as pseudo natural language nor as an equivalent to Optional[str]
or str | None
. I understood it as a special keyword to indicate that a keyword argument is not required / has a default value.
I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:
If it is not necessary to specify a keyword argument, use optional:
x : int, optionalOptional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:
Description of parameter `x` (the default is -1, which implies summation over all axes).or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:
copy : bool, default True copy : bool, default=True copy : bool, default: True
In other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain , optional
or , default=XXX
. The opposite should also be tested
I couldn't however find a good example to determine whether the type in the docstring should be var1 (str, optional)
or var1 (str | None, optional)
. I personally think the second one makes more sense (and works fine as-is with Pydoclint) and is less ambiguous (it's not obvious with the first option that None
is a valid value for var1
)
from pydoclint.
Optional[Union[int, float, Optional[bool], List[str]]]
Well, this could be simplified to
int | float | bool | list[str] | None
Putting aside the question of the natural language representation of types (which I think, if desirable, should be tracked as an independent issue), this raises another interesting point:
from typing import Optional
def example1(var: str | None) -> None:
"""Test function 1.
Args:
var (str | None): Variable description.
"""
def example2(var: str | None) -> None:
"""Test function 2.
Args:
var (str|None): Variable description.
"""
def example3(var: Optional[str]) -> None:
"""Test function 3.
Args:
var (str | None): Variable description.
"""
Raises the following validation errors:
12: DOC105: Function `example2`: Argument names match, but type hints do not match
20: DOC105: Function `example3`: Argument names match, but type hints do not match
I guess that's because types are compared as strings, but it is also possible to do:
>>> (str | None) == (str|None)
True
>>> (str | None) == Optional[str]
True
>>> Optional[Union[int, float, Optional[bool], List[str]]] == (int | float | bool | List[str] | None)
True
Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?
from pydoclint.
Wouldn't that be a better way of comparing types? Do you want to create another feature request to discuss it?
It definitely be a desirable feature but I imagine it could be quite difficult to implement.
from pydoclint.
Interesting!
I didn't know that we can directly compare type hints (str | None == str|None
). TIL.
@ghjklw you are correct that the comparison of types in pydoclint is currently done via string comparison.
Here's a code snippet:
pydoclint/pydoclint/utils/arg.py
Lines 31 to 35 in 1a3596f
pydoclint only does static syntax analysis (more in what I mentioned in this comment: #35 (comment)). Doing "dynamic" code analysis on the full source code is most likely why darglint is slow.
That said, dynamically parsing type hint strings may not be that slow.
This is something I can do right now:
exec('left = str | None')
exec('right = str|None')
assert left == right
However, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?
from pydoclint.
For example, the argument type is a custom class that needs to be quoted.
What do you mean by that? No type hints have to be quoted and why would that be a problem?
from pydoclint.
No type hints have to be quoted and why would that be a problem?
See this example:
pydoclint/pydoclint/utils/arg.py
Line 37 in 1a3596f
from pydoclint.
from __future__ import annotations
doesn't fix that?
from pydoclint.
exec('left = str | None') exec('right = str|None') assert left == rightHowever, I don't know how to handle cases where we need quotes in type hints. For example, the argument type is a custom class that needs to be quoted. Any suggestions or ideas?
Putting aside the security risk of using exec
(probably acceptable in this case, but I might be wrong), I think there's possibly an even greater challenge than quoted type hints: the types could come from a large number of places: standard libraries, imports, wildcard imports, objects defined in the same script, libraries specific to the environment you're using (e.g. Spark when working in an environment like Databricks or Azure Synapse), dyamically created types or typevars... The more I think about it, the more I'm afraid this again comes back to the ast
vs inspect
approach...
from pydoclint.
Oh wow, I didn't know about from __future__ import annotations
. TIL.
Naturally, there will be people like me who don't know about from __future__ import annotations
and thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.
I agree with @ghjklw 's comment above; there may be too many edge cases that we can't possibly think of.
That said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?
from pydoclint.
Thinking about it I only see the following cases where it might make sense to have a docstring type that doesn't match the type hint.
- Type Aliases (In case of a library exposing an API, they make sense to be used as a type hint but might not make sense to be use in docstrings read by the users of the API)
- Type Variables, ParamSpec, TypeVarTuple (I am very unsure about the best way to document those in the context of an API)
- White spaces
- Optional[] (However you could use
from __futures__ import annotations
instead) - Union[] (same as above)
- Dict[], Set[], List[], ... (same as above)
- Literal (to improve readability)
- Annotated
- TypeGuard (basically an alias for bool)
- Imported types (If I import a type with
from foo import T
, I would want to document it asfoo.T
)
from pydoclint.
thus still use quotes around not-yet-defined types. When designing a linter, we have to consider this possibility.
The type system will use typing.ForwardRef
to represent those. For string to string comparisons you can simply strip any quotes (outside Literal[]).
hat said, I wonder whether we can compare 2 string when ignoring any spaces? If we remove any spaces from both strings and they are the same, can we consider them to represent the same type?
I don't know of a type hint that doesn't separate two types by either ,
or |
. So that should be possible.
from pydoclint.
I already have a method to remove quotes from the type hint:
pydoclint/pydoclint/utils/arg.py
Lines 96 to 98 in 1a3596f
And I think using ast.dump(ast.parse(...))
is a good idea.
With _stripQuotes()
and ast.dump(ast.parse(...))
, we can at least tackle the following cases:
str | int | None
vsstr|int|None
'MyClass' | None
vs'MyClass'|None
Union[int, bool]
vsUnion[int,bool]
- ...
But ast.dump(ast.parse('str | None')) == ast.dump(ast.parse('Optional[str]'))
is still False
, simply because ast
doesn't actually know what |
does and what Optional
means -- it's a static syntax parser and that's why it's fast.
from pydoclint.
Hi @ghjklw , I forgot to address your previous point above
I thought this was a part of the Google format, but you're right, it isn't. However, it is part of Numpy's style guide:
If it is not necessary to specify a keyword argument, use optional:
x : int, optionalOptional keyword parameters have default values, which are displayed as part of the function signature. They can also be detailed in the description:
Description of parameter `x` (the default is -1, which implies summation over all axes).or as part of the type, instead of optional. If the default value would not be used as a value, optional is preferred. These are all equivalent:
copy : bool, default True copy : bool, default=True copy : bool, default: TrueIn other words, the logic here could be quite simple: if the function signature defines a default value, then the type should either contain
, optional
or, default=XXX
. The opposite should also be tested>
I used to write , default=True
after bool
, but darglint reported it as a style violation ("type hints don't match"), so I trained myself to put "default = True` in the argument description.
And now in pydoclint, I'm actually "inheriting" that behavior from darglint.
I think there are 2 new features that I can potentially add:
- For numpy-style docstrings, parse the string
"bool, default: True"
to ignore, default: True
, so that people don't have to move it away from the type hint - Parse the default value from the docstring and compare that with the default value in the signature
No. 2 is a slippery slope, because rule-based natural language parsers are never easy to implement. Too many edge cases. I can already think of some:
arg1 : int, default is 1
arg1 : int, default value: 1
arg1 : int (default = 1)
arg1 : bool = True
from pydoclint.
Update: I haven't got a chance to get to it yet. But this issue is still on my mind.
from pydoclint.
Hi @ghjklw , I found that this PR (#58) actually fixed the issue of comparing str|int|bool
with str | int | bool
(only spaces are different).
However, I still think that to consider str | int
and Union[str, int]
equal, the implementation may be too cumbersome and fragile.
Therefore, I'll label str | int
vs Union[str, int]
as "may tackle in the future", and close this issue for now.
But if you have any additional comments or suggestions, please feel free to add comments below, or re-open this issue.
Thanks!
from pydoclint.
Related Issues (20)
- Request: not throwing DOC501 for `abstractmethod` HOT 2
- Docs request: `pre-commit` config for `flake8` HOT 1
- Parameters not processed correctly for numpy style with incorrect underscoring HOT 1
- Add shortcuts for long typing.Annotated type hints HOT 3
- BUG: Return literal with double quotes incorrectly raises v203 HOT 3
- Request: Add configuration option that allows for type hint in signature _or_ docstring, but not both HOT 2
- **kwargs only arguments throw DOC106 and DOC109 HOT 2
- Failure to detect the `import typing as tt` convention HOT 2
- Failure to detect `noqa` statement HOT 5
- Google style: Allow omitting only the `Returns`/`Yields` if summary starts with `Return(s)/Yield(s)` HOT 16
- Google style: Allow documenting `@property`s as a member HOT 5
- Allow more control over`--skip-checking-short-docstrings` HOT 4
- More config control for type hinting in signature HOT 2
- Option to only check for non-existent params in pydocs, rather than params missing in pydocs HOT 2
- Deprecated command line argument `--src` leads to errors or missing proper documentation HOT 2
- Google style: Allow omitting `*args` and `**kwargs` from doc string HOT 3
- Add class attributes checking support HOT 1
- False positive DOC201 HOT 1
- False positive DOC405 HOT 2
- False positive 403 with "yield from" statement
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pydoclint.