IMPORTANT: The code was developed and tested only in Visual Studio Code with Pylance/Pyright.
Mutability is something I think about all the time when I program and I always feel frustrated by not being able to express what I want through the type system.
Even though Python's community believes mutability should be limited, Python doesn't have good support for pure programming. In particular, it's neither expression-oriented nor does it have native support for persistent collections.
I don't think mutability should be avoided at all costs, but I'd like to have some way to control it through static types.
For instance, let's say I have a list and want to initialize an object with it:
class A[T]:
xs: list[T]
def __init__(self, xs: list[T]) -> None:
self.xs = xs
...
That code is problematic as both A
and the caller might modify the list afterward.
The recommended alternative is probably the following:
class A[T]:
xs: list[T]
def __init__(self, xs: Iterable[T]) -> None:
self.xs = list(xs)
...
That doesn't solve the problem, though:
- copying the list or keeping the copy might be too expensive
- the caller and
A
might actually want to share the same list
One could also use standardized ABCs such as Sequence
, but they widen the types too much as they are abstract not only technically but also conceptually. To "cover" a concrete type, one would need to employ several traits, which tends to complicate the code, in my opinion.
I implemented a simple mechanism that works at type-checking time and has a negligible runtime cost, just like a cast
call. Here's what the previous code looks like using my approach:
class A[T]:
xs: list_[T, RK]
def __init__(self, xs: list_[T, RK]) -> None:
self.xs = xs
...
list_
is just a regular list
at run-time. The parameter RK
means "read - keep", that is, A
can keep xs
in read-only mode.
The "keep" part is not enforced, but the read-only part is.
If A
also desires write access, it needs to use WK
("write - keep") instead.
Here's another option:
class A[T]:
xs: list[T, WK]
def __init__(self, xs: list_r[T]) -> None:
self.xs = xs.copy()
...
list_r[T]
is a type alias for list_[T, R]
, which grants temporary read-only access. Temporary access only lasts until the function (__init__
, in this case) returns. This means that NO {reference to the list} obtained through that function can outlive the function execution.
There are 4 access modes:
- R: temporary read-only access:
def f(xs: list_[T, R]) -> T: xs.sort() # type error del xs[0] # type error return xs[0]
- W: temporary read-write access:
def f(xs: list_[T, W]) -> None: xs.sort() del xs[0]
- RK: indefinite read-only access
class A[T]: xs: list_[T, RK] def __init__(self, xs: list_[T, RK]) -> None: self.xs = xs def __getitem__(self, idx: int) -> T: return self.xs[idx] def __delitem__(self, idx: int): del self.xs[idx] # type error
- WK: indefinite read-write access
class A[T]: xs: list_[T, WK] def __init__(self, xs: list_[T, WK]) -> None: self.xs = xs def __getitem__(self, idx: int) -> T: return self.xs[idx] def __delitem__(self, idx: int): del self.xs[idx]
The modes have the following relationships:
classDiagram
R <|-- W
R <|-- RK
W <|-- WK
RK <|-- WK
Since list_[T, M]
is covariant in M
, a function that takes a list_[T, W]
will also accept a list_[T, WK]
, for example.
There are 6 + ε useful functions:
-
lift(., M = WK) lifts an
X
into anX_[..., M]
(e.g. alist
into alist_[..., WK]
). IfM
is not given, it defaults toWK
. -
r(.) restricts an
X_[..., M]
withM
in{R, W, RK, WK}
to anX_[..., R]
. -
w(.) restricts an
X_[..., M]
withM
in{W, WK}
to anX_[..., W]
. -
rk(.) restricts an
X_[..., M]
withM
in{RK, WK}
to anX_[..., RK]
. -
wk(.) restricts an
X_[..., WK]
to anX_[..., WK]
. -
restrict(., M) would be equivalent to
r
,w
,rk
, andwk
, if not for the fact that it doesn't give explicit permission. This will be explained later. -
ε: By ε, I mean that these functions are just marginal ones. They are:
lift_and_w(.)
=w(lift(.))
lift_and_rk(.)
=rk(lift(.))
lift_and_wk(.)
=wk(lift(.))
They're somewhat useful because they let us save 1 call. This is a further attempt to keep the runtime to a minimum just like with
cast
.There's no
lift_and_r
because that's just equivalent tolift(., R)
. Again, the reason for this will become clear later.
When a restriction is invalid, a type error is raised. For instance, consider this:
def f(primes: set_[int, R]):
_ = r(primes)
_ = w(primes) # type error
_ = rk(primes) # type error
_ = wk(primes) # type error
def g(primes: set_[int, W]):
_ = r(primes)
_ = w(primes)
_ = rk(primes) # type error
_ = wk(primes) # type error
This is a more interesting example:
class A[T]:
xs: list_[T, RK]
def __init__(self, xs: list_[T, RK]) -> None:
self.xs = xs
...
def f(xs: list_[T, R]) -> A:
return A(xs) # type error
How should we apply restrictions?
Let's consider this:
def f(xs: list_[int, WK]):
...
g(xs)
...
Since xs
"is a WK
", we can pass it around without any problem.
What happens if we believe that g
asks for an R
while it actually asks for a W
? xs
might be modified without our knowledge.
Here's one way to prevent that:
def f(xs: list_[int, WK]):
...
g(r(xs))
...
The problem with this is that it's
- tedious,
- slightly inefficient since
r(.)
is still an additional call (just likecast
), and - easy to forget and thus unsafe
A possible approach would be to get into the habit of applying restrictions at the beginning of the function:
def f(xs: list_[int, WK]):
xs_r = r(xs)
...
g(xs_r)
...
This is a little safer, but shouldn't it be this way by default?
Ideally,
def f(xs: list_[int, WK]):
...
g(xs)
...
should indicate that g
takes an R
. This should be enforced by the type checker.
For all other access modes, explicit permission should be required:
def f(xs: list_[int, WK]):
...
g1(xs) # read-only access
g2(rk(xs)) # at most RK access
g3(w(xs)) # at most W access
...
This is almost impossible to do in Python, as far as I can tell.
I thought of a few ways:
-
overload
:@overload def f(xs: list_[int, W, W]) -> None: ... # type: ignore def f(xs: list_[int, R, W]) -> None: xs.sort() # type error w(xs).sort()
X_[..., M, M_MAX]
means that the mode is currentlyM
but can be raised up toM_MAX
.I like to say that
f
above has two signatures: one external and one internal.While we could just use a flag instead of
M_MAX
, what's important is that a single signature is not enough:def g(xs: list_[int, R, W]) -> None: xs_w = w(xs) def f(xs: list_[int, R, W]) -> None: g(xs)
The problem is that
g
would accept and be able to do whateverf
does.(Note that the type ignore is required, at least in Pyright, because there should be at least 2 overloads. That's a minor thing that can be solved by making Pyright's error codes more granular so that one can use a global ignore for this case.)
The main problem with this approach is that writing and keeping two signatures in sync is quite bothersome.
-
decorator:
@mut def f(xs: list_[int, W]) -> None: xs.sort() # type error w(xs).sort()
The decorator would basically automate the
overload
technique.Unfortunately, this is not currently possible in Python.
-
active lock:
from typing import Literal as L def g1(xs: list_[int, W, L[946655]]) -> None: ... def g2(xs: list_[int, W, L['g2']]) -> None: ... def f(xs: list_[int, W, L[825417]]) -> None: g1(xs) # type error g1(w(xs, 946655)) g2(w(cs, 'g2'))
Whether one decides to use integers or strings, the code is noisy and not very safe because of possible "lock collisions".
-
passive lock:
class _L1: ... # lock def g(xs: list_[int, W, _L1]) -> None: ... class _L2: ... # lock def f(xs: list_[int, W, _L2]) -> None: g(xs) # type error g(w(xs))
This is bearable, I think.
The idea is that
r
,w
,rk
, andwk
replace the lock withAny
so the correct lock will be automatically "acquired". Simple but effective.Note that a "lock collision" between two functions just means that there's no protection (no explicit permission needed) between those two functions.
Also note that locks can be made private to classes:
class A: class _L: ... # class lock def __init__(self, xs: list_[int, W, _L]): ... def f(self, s: set_[int, W, _L])
One can also use different locks if one wants intra-class protection:
class A: class _L1: ... def __init__(self, xs: list_[int, W, _L1]): ... class _L2: ... def f(self, s: set_[int, W, _L2])
This is almost as good as the decorator.
Note that locks have a minor flaw: they don't protect a function from itself.
It's important to note that R
doesn't need any locks:
def f(xs: list_[int, R, Any]): ...
Indeed, one should use the predefined list_r
:
def f(xs: list_r[int, R]): ...
If you recall, I said that restrict
is like r
, w
, rk
, and wk
, but without the explicit permission. Indeed, restrict
is useful to convert, say, a WK
into a W
, without lifting the requirement of giving explicit W
permission to modify it.
Under the hood, restrict
sets the lock to None
, while r
, w
, rk
, and wk
to Any
.
I've been using list_
and set_
nonchalantly, but where do they come from?
How easy is it to add support for types, may they be user-created or built-in/preexisting?
There are two simple steps.
If we want to support a type we control, we may want to create the "lifted version" right away:
from mutability import *
type A_out[M: (R, W, RK, WK)] = A_[M, None]
type A_r = A_[R, Any]
class A_(Generic[Mut_M, Mut_L]):
class _L: ... # lock
_val: int
def __init__(self, val: int) -> None:
self._val = val
@property
def val(self) -> int:
return self._val
@val.setter
def val(self: A_[W, _L], val: int) -> None:
self._val = val
def get_val(self) -> int:
return self._val
def set_val(self: A_[W, _L], val: int) -> None:
self._val = val
def copy(self) -> A_out[WK]:
return A_(self._val)
def return_self(self) -> A_[Mut_M, Mut_L]:
return self
def __iadd__(self: A_[W, _L], a2: A_r) -> A_[Mut_M, Mut_L]:
self._val += a2._val
return cast(A_[Mut_M, Mut_L], self)
Note that I'm using Generic
instead of the new notation. There are two reasons for that:
- it gives control over variance
- it's safer: the less the user has to write the fewer the mistakes.
Here are the definitions of Mut_M
and Mut_L
:
Mut_M = TypeVar('Mut_M', 'R', 'W', 'RK', 'WK', covariant=True)
Mut_L = TypeVar('Mut_L')
Note that A_
has a single internal lock. The lock being single, the methods of A_
are not protected one from the other.
It doesn't take much to add individual locks, if one so desires:
from mutability import *
type A_out[M: (R, W, RK, WK)] = A_[M, None]
type A_r = A_[R, Any]
class A_(Generic[Mut_M, Mut_L]):
_val: int
def __init__(self, val: int) -> None:
self._val = val
@property
def val(self) -> int:
return self._val
class _L1: ... # lock
@val.setter
def val(self: A_[W, _L1], val: int) -> None:
self._val = val
def get_val(self) -> int:
return self._val
class _L2: ... # lock
def set_val(self: A_[W, _L2], val: int) -> None:
self._val = val
def copy(self) -> A_out[WK]:
return A_(self._val)
def return_self(self) -> A_[Mut_M, Mut_L]:
return self
class _L3: ... # lock
def __iadd__(self: A_[W, _L3], a2: A_r) -> A_[Mut_M, Mut_L]:
self._val += a2._val
return cast(A_[Mut_M, Mut_L], self)
Note that, in this case, a single lock is enough since the methods don't call one another. Of course, the situation might change or be misjudged, so adding individual locks a priori is safer.
In general, one should use the following pattern:
class _L:
def f(..., a1: X_[..., _L], ..., a2: X_[..., _L]) -> X_[..., None]:
...
Notice the None
in the return type. If the return type had the lock, it'd be possible to pass it to f
again without explicit permission.
The general rule is that things should be locked when they enter and unlocked when they leave.
I admit that "lock" might not be the most appropriate name, but it came naturally to me.
A few observations are in order:
- Since
A_out
andA_r
don't depend on a lock, they can be exported and reused by the users ofA_
.- Note that
A_r
hasAny
in place of the lock because read-only access doesn't require explicit permission.
- Note that
- One might also want to introduce
A_in
as an alias forA_
, for symmetry reasons:def f(..., a1: A_in[..., _L], ..., a2: A_in[..., _L]) -> A_out[...]: ...
- By annotating
self
, we can divide the methods into read-only and read-write (read-keep and write-keep are also possible):- read-only methods promise not to modify the object
- read-write methods can modify the object
- New instances returned by methods such as
copy
should have modeWK
as the caller should be allowed to do whatever they want with their copy. - There's no need to annotate
self
for read-only methods.
If one doesn't want to have to give explicit permission for a method, one can just use Any
in place of the lock:
def __iadd__(self: A_[W, Any], a2: A_r) -> A_[Mut_M, Mut_L]:
...
Access is still restricted because of the W
but there's no need for an explicit w(.)
thanks to that Any
.
If the type we want to add support for is built-in, preexisting, out of our control, or we don't want to touch it, then we may want to create a lifted version that will coexist with the un-lifted (i.e. regular) one.
In this case, the lifted version will just be an interface:
_T = TypeVar('_T')
# NOTE: Don't depend on a lock => can be used anywhere
type set_out[T, M: (R, W, RK, WK)] = set_[T, M, None]
type set_r[T] = set_[T, R, Any]
# NOTE: Self or depend on `_L` => can be used only here
class _L: ... # lock
type _set_in[T, M: (R, W, RK, WK)] = set_[T, M, _L]
type _Self_W[T] = set_[T, W, _L]
class set_(Generic[_T, Mut_M, Mut_L]):
@overload
def __init__(self) -> None: ...
@overload
def __init__(self, iterable: Iterable[_T], /) -> None: ...
def __init__(self, *args) -> None: ...
def add(self: _Self_W[_T], element: _T, /) -> None: ...
def copy(self) -> set_out[_T, WK]: ...
def difference(self, *s: Iterable[Any]) -> set_out[_T, WK]: ...
def difference_update(self: _Self_W[_T], *s: Iterable[Any]) -> None: ...
def discard(self: _Self_W[_T], element: _T, /) -> None: ...
def intersection(self, *s: Iterable[Any]) -> set_out[_T, WK]: ...
def intersection_update(self: _Self_W[_T], *s: Iterable[Any]) -> None: ...
def isdisjoint(self, s: Iterable[Any], /) -> bool: ...
def issubset(self, s: Iterable[Any], /) -> bool: ...
def issuperset(self, s: Iterable[Any], /) -> bool: ...
def remove(self: _Self_W[_T], element: _T, /) -> None: ...
def symmetric_difference(self, s: Iterable[_T], /) -> set_out[_T, WK]: ...
def symmetric_difference_update(self: _Self_W[_T], s: Iterable[_T], /
) -> None: ...
...
As I mentioned before, one may create lifted interfaces even for types one controls, although that's not necessary.
For the special functions (i.e. lift
, r
, etc...) to work, one needs to register a type. To do that, one just needs to add a few lines to "mutability_reg.py". Here's an example with 5 types:
_T1 = TypeVar('_T1')
_T2 = TypeVar('_T2')
class Liftable[T]: ...
@overload
def do_conv(obj: Liftable[dict[_T1, _T2]] | dict_[_T1, _T2, Mut_M, Mut_L],
m2: Mut_M2, d2: Mut_L2
) -> tuple[Mut_M, dict_[_T1, _T2, Mut_M2, Mut_L2]]: ...
@overload
def do_conv(obj: Liftable[set[_T1]] | set_[_T1, Mut_M, Mut_L],
m2: Mut_M2, d2: Mut_L2
) -> tuple[Mut_M, set_[_T1, Mut_M2, Mut_L2]]: ...
@overload
def do_conv(obj: Liftable[list[_T1]] | list_[_T1, Mut_M, Mut_L],
m2: Mut_M2, d2: Mut_L2
) -> tuple[Mut_M, list_[_T1, Mut_M2, Mut_L2]]: ...
@overload
def do_conv(obj: A2_[Mut_M, Mut_L], m2: Mut_M2, d2: Mut_L2
) -> tuple[Mut_M, A2_[Mut_M2, Mut_L2]]: ...
@overload
def do_conv(obj: A_[Mut_M, Mut_L], m2: Mut_M2, d2: Mut_L2
) -> tuple[Mut_M, A_[Mut_M2, Mut_L2]]: ...
def do_conv(*args, **kwargs) -> Any: ...
Notice that the overload for A_
, being a lifted-only type, has no reference to A
as A
doesn't exist, while the overload for dict
, being a built-in type, has an additional Liftable[dict[_T1, _T2]]
.
The qualifier Liftable
is important to me as I don't want lift
to be idempotent, i.e. it shouldn't work on already-lifted types. The reason is that lift
is all-powerful, so one might, for example, re-lift an R
to get a W
by accident.
My first idea was to use a protocol instead of a function like do_conv
. The users had to do the following:
class _A_Shapes[M, D](MutShapes[M]):
_mut_shape_r: A_[Mut.R, None]
def _mut_shape_w[D2](self, _: D2) -> A_[Mut.W, D2]: ...
def _mut_shape_rk[D2](self, _: D2) -> A_[Mut.RK, D2]: ...
def _mut_shape_wk[D2](self, _: D2) -> A_[Mut.WK, D2]: ...
class A_(Generic[Mut_M, Mut_D], _A_Shapes[Mut_M, Mut_D]):
_val: int
...
(The simplification from 3-4 functions to 1 happened later.)
I wasn't satisfied with the protocol approach because I still had to handle lift
separately since lift
operates over types one shouldn't touch.
It was also more invasive than the actual approach.
My goal has always been to have as little runtime as possible, which is also why I chose a functional approach (r(xs)
, w(xs)
, etc...) instead of an OOP one (xs.r
, xs.w
, etc...): at runtime, r
, w
, etc... just return their argument and list_
is just list
.
One issue with do_conv
is that it might become too long with time and maybe slow down type-checking (I'm not sure), but there are ways to split it, should the need arise.
Because of the registration, the current system has a circular dependence:
graph
mut[mutability.py] --> tvars[mutability_tvars.py]
mut --> |"(at type-\nchecking time)"|reg[mutability_reg.py]
reg --> tvars
reg --> X[X_.py]
X --> mut
A --> |A imports B|B
This isn't a problem at runtime since "mutability.py" imports "mutability_reg.py" behind an if TYPE_CHECKING
.