tc39 / proposal-weakrefs Goto Github PK
View Code? Open in Web Editor NEWWeakRefs
Home Page: https://tc39.github.io/proposal-weakrefs/
WeakRefs
Home Page: https://tc39.github.io/proposal-weakrefs/
Lots of information is missing from, or explicitly marked as outdated in, the README. It would be great to have an up-to-date explainer in the README itself. Might I suggest the following sections (based on the Stage 1 entrance criteria) with at least one paragraph each?
From Tc39 presentation:
When a weak reference is retrieved, the weakRef points strongly at it's target until the end of turn. Can that create a side channel?
See tc39/ecma262#1333 (comment)
The WeakRef spec itself needs to be updated to clearly specify root realm rather than realm (which would include compartments).
I missed that I never updated the reference section, so it's basically missing. It needs to be added, expanded.
How are the semantics (and collection algorithm) for weakrefs different from Ephemerons as described in the Hayes paper
Is the only difference that you don't treat the executor and holdings as "value fields" (using Barry's terminology)? If that is the case you have presumably elimiinated the need for the three-phase Ephemeron algorithm but at the cost of introducinghe possibility of leaks via circular references originating from the holdings or executor.
As best as I can tell from the current spec, there is no "System" object in ECMAScript. So this section needs to be worked over in some way.
More generally I'm concerned that this API may be behind some privilege wall that prevents its routine use without user prompts in libraries and so on. Can this be clarified?
Wasm is starting to need weak references. This issue is for discussion of how this proposal can work there.
Assume the WeakFactory's cleanup function does not retrieve all dirty WeakCells from the iterator. When will the cleanup function be scheduled to be called again?
Like this:
key = fg.register(target, holdings);
fg.unregister(key);
This would allow different implementations, e.g., key could be an object (like WeakCell), or it could be an integer which is used for accessing an array, or a hash map key.
In the current new API:
fg.register(target, key, holdings);
fg.unregister(key); // unregisters all!
we force the internal implementation to be a multihashmap / hashmap where the value is a list. (Actually, we'd need two multihashmaps, one for the active "WeakCells" and one for "WeakCells" already scheduled for cleanup...)
From TC39.
In workers with long-running turns, how does this work? (given that after access, weak pointers are strong till the end of the turn and finalization is only between turns).
assert(buff.isReleased);
Did you mean buf
?
The spec says that the iterator does not produce, so what does it do? Options
(from @bmeck)
In the section "extended example", in the call to makeWeakRef, the executor argument should probably not be this.executor
, but this.dropRef
?
Typo in the paragraph below the code: "need to do perform", the "do" is wild.
Currently the next
method gets set on the iterator object itself inside WeakFactoryCleanupJob
:
5. Let next be a new built-in function defined in WeakFactoryCleanupIterator next.
6. Perform CreateMethodProperty(iterator, next, next).
It seems like the next
method should be on the prototype, since all objects want to share the same method. In fact, this is what map and set iterators do: https://tc39.github.io/ecma262/#sec-%mapiteratorprototype%.next
The current text as it stands doesn't make it clear in which realm the DoAgentFinalization
is supposed to run, which then leaves it open which realm is used by the EnqueueJob
, which in turn means that it's unclear in which realm the WeakFactoryCleanupJob
has to run.
Since WeakFactoryCleanupJob
creates JavaScript objects and even calls into arbitrary user code - aka [Cleanup]
- which might not even be callable, it has to have some current realm.
One obvious solution here would be remember either the realm or the current execution context on the WeakFactory
and enter that realm in DoAgentFinalization
for that particular factory.
For long running turns, it would be nice to be able to allow GC to happen by scheduling finalization on a separate thread/agent. Since a number of use cases can use primitives for the holdings
reference used for cleanup ,it seems safe to pass this between agents. Roughly, this would require a new API to communicate that finalization should be done on a different agent.
5. If factory is not undefined.
i. Remove O from factory.[[ActiveCells]].
ii. Set O.[[Target]] to undefined.
iii. Set O.[[Factory]] to undefined.
iv. Set O.[[Holdings]] to undefined.
First, it seems that there should be some sort of assertion/check that factory
is a WeakFactory
object. Yes, it seems it will always be such an object if it isn't undefined
, but I think things could be a bit more clear in the spec text here.
Second, there should probably be some sort of check for if factory.[[ActiveCells]]
is a list and is not undefined
, since WeakFactory.prototype.shutdown
can set it to undefined
. Again, seems pretty obvious what is the intended behavior, but the spec could be made a bit more clear.
That is, does the spec have the guarantee (in its own terminology) that the GC condemns all unreachable objects in finite time?
I'm working with a number of people interested in writing documentation for JavaScript feature proposals. Writing documentation could be a useful for the development of the WeakRefs proposal, even before it's done, because:
Would the champions of this proposal be interested in getting in touch with people who would be interested in helping with TC39 proposal documentation generally?
... when do we detect it?
I have a definite use case for weak references in one of my projects, jsdom. It runs primarily in Node, but also in the browser. Of note, this case isn't related to resource management.
jsdom implements, among other things, the DOM Standard. In particular, there is one line that concerns us here: when removing a node:
- For each NodeIterator object iterator whose root’s node document is node’s node document, run the NodeIterator pre-removing steps given node and iterator.
To correctly implement this step, you need a reference to every NodeIterator object that's ever been created, so you can update it correctly (by running the pre-removing steps).
This really requires weak references, as otherwise you end up holding a strong reference to every NodeIterator ever created, never releasing their memory, or the memory of the other objects that the NodeIterator references (e.g. the two DOM nodes it points to, root and reference node).
Of note:
Finally, people may be amused by how we currently work around this. We have a user-configurable number, "max working NodeIterators", which we default to 10. We hold strong references to the last 10 created NodeIterators. If an 11th NodeIterator is created, we remove it from the list, then set an internal bit on it that says "sorry, this one's broken", and any methods you try to call on that NodeIterator will throw an exception, helpfully telling you to up the max working NodeIterators configuration.
We then loop over those 10 strongly-held NodeIterators whenever a node is removed from the document, and update them appropriately per the spec.
3. Let iterator be ObjectCreate(%ReclaimedIteratorPrototype%, « [[Factory]] »).
Does this prototype object have anything interesting on it?
Sorry if I'm missing something obvious because of my lack of understanding, but why can't we have a simpler API like this?
let obj = new Something();
let strong = obj;
let weak = Object.getWeakReference(obj);
obj; // Something
strong; // Something
weak; // Something
obj = undefined;
obj; // undefined
strong; // Something
weak; // Something
strong = undefined;
obj; // undefined
strong; // undefined
weak; // null
Raised at TC39.
Can we support cross-realm weak references?
The spec currently states that if the weakref becomes condemned before or at the same time as the target object, then no finalizer is executed. I can see why this is desirable choice for the implementation, but it seems to me that it violates reasonable assumptions about encapsulation.
Suppose a wasm application maintains many objects within its flat heap, which it reifies to JS as JS objects. These JS objects require finalization, so that the finalizer can destruct the wasm resource properly. Suppose that that's all there's to it (ie we don't have to map an address to the JS object representing it so there's no natural external data structure that will reference the weakrefs).
With the current design, we must retain an additional table of weakrefs in the JS heap to ensure that the JS finalizers will be run when the JS objects that are held weakly are condemned, and when an object is finalized the finalizer must ensure that the weakref that caused the finalizer to run is removed from the table, or it must ensure that the table is eventually swept for dead weakrefs.
Furthermore, libraries will themselves need to have their own tables or data structures for ensuring that weakrefs are kept alive long enough (they wouldn't likely share a single table for this), so this problem becomes somewhat non-local.
With a design that does not require the weakref to stay alive, the weakref could be attached to the JS object, or, I suppose, simply dropped on the floor, creating and dropping the weakref thus being equivalent to registering the object for finalization. This puts a larger burden on the implementation but the ergonomics seem to me to be better.
Some parts of the spec say or imply that 'get
/deref
on a weakref whose target is collected returns null
. Other parts say or imply void 0
, i.e., undefined
.
There may also be inconsistencies with the holdings, but I did not look as carefully. In any case, we need to look and resolve any such inconsistencies.
I can't see any way to observe whether an executor is null or undefined, but we should be sure we're consistent here anyway.
Has anyone written down any widely accepted criteria for choosing between null
and undefined
?
WeakRef instances currently come with get()
and clear()
accessors. In addition I think there should be a isClear()
predicate. The reason is that, given existing semantics at least (see #19), it may be a design choice to have a table of weakrefs representing objects that need finalization, and to sweep this table occasionally looking for cleared weakrefs to remove. (That's not the only possible design but it's a reasonable design.) In this case, we won't want to use get()
to access each weakref because that makes the weakref's pointer strong for the duration of the turn. So we should have a predicate to ask whether the weakref has been cleared, and can be removed from the table.
(I also think clear()
should return a boolean representing whether or not the reference was cleared (true
) or already was clear (false
) but if we have isClear()
the justification for changing clear()
is scant.)
It would be good to give a clear example why holdings is necessary (how it allows you to do things you could not otherwise) compared to having the executor simply close over the relevant variable, or using bind
to store it alongside the executor. As far as I can see, every program that uses holdings can be easily refactored to a more intuitive one that does not. What am I missing?
/cc @Sebmaster
What if it was done like this instead?
const weakObjs = new WeakSet([], objectAboutToBeCollected => {
// here do something with the object right before it is going to be collected,e.g.
objectAboutToBeCollected.dispose();
// or if objectAboutToBeCollected is attached to a live object then collection is cancelled,
// this method will be called again when it is ready to be collected once more
});
same for WeakMap
if listening for an object finalization is no longer needed then it can just be removed from the set / the set can be collected
makeCell and makeRef rely on the sameRealm operation:
"If SameRealm(target, this), then ..."
Why does it make sense to treat the reference strongly? What happens then - the cleanup function just never gets called, and the WeakCell stays alive forever (unless it's explicitly cleared)?
In addition, we don't have the SameRealm operation in V8 and when I talked with the engineers responsible for this area, they said "we don't want the objects to know which Realm they come from", that is, they think the SameRealm operation should not be possible and should not end up in the spec.
I can think of good reasons to want weak refs, but closing file handles is not one of them, so I question its use as a motivating example.
With file handles, you are managing an external non-memory resource with GC. That's a very good way to exhaust your system allotment of file handles, since there is no guarantee that any of them will ever be closed.
Within SpiderMonkey, for example, we partition the GC heap and can collect partitions independently. If one of these partitions were to only allocate file handles, you could quite easily run out of file descriptors despite massive ongoing allocations in other partitions. Worse: in actual fact, we currently do a lot of whole-heap collections for dumb reasons, which is a bug that we're working on fixing. I really would rather not be prevented from fixing that bug because people start depending on the GC to clean up non-memory resources. (And the obvious workarounds are unpalatable, eg informing the GC whenever allocating a file handle so that it can operate in a degraded mode in order to be prompt about cleaning up these other resources.)
It's called finalization, which suggests the thing that does this should be a finalizer.
Alternatively it should be called execution, done by an executor. But both of those terms are highly overloaded.
The spec says it just looks for WeakRef
fields on the instance, which are the same as the WeakCell
fields. Therefore it would just work on a WeakCell
. Is that acceptable? If not, we need to change how the parent and child classes are defined. (from @bmeck)
.. the "key" in this case should be the WeakRef object itself.
From @gsathya 's notes:
// new API
let wr = new WeakRef(target, holdings);
let fg = new FinalizationGroup(() => {});
fg.register(wr, key); // register WeakRef
fg.register(target, holdings, key); // create and register objectless WeakCell
I don't think it's relevant to pass a key here, since we can just fg.unregister(wr); instead of fg.unregister(key).
Also the API won't be uniform with the "objectless WeakCell" case, since for it we pass "target, holdings, key", and for the WeakRef one we pass "wr, key". So we should just pass "wr".
(In practice, the users would probably reuse the WeakRef as the key, and do fg.register(wr, wr); fg.unregister(wr); since there's no need to come up with any other key.)
I'm also not super convinced about the "objectless WeakCell" API, e.g., the user has to come up with unique keys. Maybe the API should give back the key? key = fg.register(target, holdings); ?
Can it be cleared between microtasks?
The current spec text talks about associating the set with Job (i.e.,microtask?). If we implement it like this, the WeakRef can be cleared between microtasks.
On the other hand, the slides (slide 17) say "A program cannot observe a target getting reclaimed within the execution of a job (turn of the event loop)" (not microtask queue).
E.g.,
Promise.resolve().then(() => { wr = wf.makeRef(...); }).then(() => { wr.deref(); });
... can the deref() now return undefined?
I think we should rename WeakFactory, for a couple of reasons:
Realistically speaking, most developers won't have strong intuitions about any of the terms in this domain. Nevertheless, I think we should try to use names that at least guide them in the right direction. In particular, the concept of grouping lifetimes of a set of WeakRefs is important, and I think the solution we settled on is excellent. Now we also need to make it as easy as possible for developers to understand this solution :)
I'm not saying that FinalizationGroup is a fantastic name—in particular it emphasizes only one half of what is provided—, but I do think that it at least points in the right direction, providing a good starting point for learning about the concepts involved via sources like MDN.
WeakCell
provides the means to do finalization without dereferencing, solving the long-turn use cases posed by Workers and maybe WebAssembly. However, they are relatively costly in terms of memory usage.
I'd like to propose dropping the concept of WeakCell
, and instead providing methods on WeakFactory
(or FinalizationGroup
) to register a [target
, holdings
] pair for finalization. That'd reduce the memory usage to two words, which I think is the best we can possibly do. Unregistering can be done by providing the same pair to an accompanying unregisterFromFinalization
method.
If we do #49, these could just be called register
and unregister
, or add
and remove
, which seems nice and reasonably clear to me.
I see some notes about a new API for WeakRefs at #54 , but it's somewhat underspecified. Is someone planning on writing up the new API in detail? cc @tschneidereit
Consider this example:
function cleanupOne(refs) {
// Process only one ref for cleanup.
console.log(refs.next().value.holdings);
}
const factory = new WeakFactory();
function makeGarbage(tag) {
const obj = {};
return factory.makeRef(obj, tag);
}
const a = makeGarbage("a");
const b = makeGarbage("b");
// Wait for `KeepDuringJob`'s affects to expire.
await nextJob();
// JS observes that the targets for a and b were reclaimed.
assertEq(a.deref(), null);
assertEq(b.deref(), null);
// Cleanup one reference.
//
// (This could also use the WeakCellJobs queue, but for this example it does not
// because it is easier to write this way.)
factory.cleanupSome(cleanupOne);
const c = makeGarbage("c");
// Wait for `KeepDuringJob`'s affects to expire.
await nextJob();
// JS observes that c has been reclaimed.
assertEq(c.deref(), null);
// Cleanup one more reference.
factory.cleanupSome(cleanupOne);
What are the permissible orderings of log messages?
Must it always be either a, b
or b, a
? Are either of a, c
or b, c
allowed as well?
It seems if we want to ensure that only the former are allowed, then I think WeakFactory objects should have a [[Dirty]]
internal property that is an ordered queue that dirty cells get moved into from [[ActiveCells]]
when they become dirty.
If there is concern that weak references can create side-channels and/or implementation-reliance, consider intentionally adding non-determinsm to mask those information leaks or reliances.
In short: If an object is weakly held by a WeakMap or WeakRef (or a weak-value-map, etc), then add a random delay to garbage collection for that object.
All engines would then behave similarly: weak objects would take some random amount of time to be truly cleaned up. That eliminates implementation-reliance or cross-engine incompatibility. As for side-channels, communicating via the GC would only be possible if you can predict the random number generator.
For prior art here, see Go's map iteration or channel multiplexing. Go will choose a random offset from which to begin iteration. This prevents users from relying on map iteration order. Similarly, channel multiplexing introduces extra randomness to prevent masking race conditions or introducing bias.
This issue was prompted by a discussion on Twitter between myself and Sam Tobin-Hochstadt.
The current spec doesn't do anything to prevent calling WeakFactory.makeCell or WeakFactory.makeRef after calling WeakFactory.shutdown.
But shutdown has set [[ActiveCells]] to undefined, and then makeCell / makeRef tries to add to it, so that sounds wrong.
If the intention of shutdown is to make the WeakFactory forever unusable, the spec needs to prevent calling makeCell / makeRef somehow (e.g,. set a flag in WeakFactory and throw an error in makeCell / makeRef if the flag is set).
If the intention is to just get rid of the current WeakCells without doing cleanup for them, the spec needs to reset the members in such a way that it remains usable. (The function could be called e.g., "clear()" in that case.)
ES prefers using classes to encapsulate methods that are reused, instead of re-creating new closures every time a given type of object is created. So, it should be new WeakRef(...)
with WeakRef.prototype.get
and WeakRef.prototype.clear()
.
This proposal hasn't really gotten much action lately, so I'm just curious how it stands right now. Driven by this inquiry, but thought I'd file an issue over here instead.
Currently, the weakref proposal specifies an individual turn (i.e., "job" in EcmaScript terminology) as the atomicity unit. A turn/job is everything that happens from one empty user stack state to the next empty user stack state. Any implementation that uses a coarser unit, consisting of several turns, will satisfy the current spec as long as the boundaries between these units are also turn boundaries, i.e., empty stack states.
At the November tc39 meeting @tschneidereit suggested that the atomicity unit be coarsened in the spec to match what implementations are expected to do anyway, which is to always service a non-empty promise queue before servicing any other turns, and to only deal with WeakMap atomicity issues in states where the stack and promise queue were both empty.
If this assumption about implementations held, then this would be a possible choice, with pros and cons to argue through. (I've changed my opinion several times.) But nodejs/promise-use-cases#25 indicates that the premise is false. Implementations might not always service the promise queue at strictly higher priority. The reasons for not doing so are well motivated and plausible. However this particular bug is settled, we would have to preclude this possibility for all future implementations if we coarsened the atomicity unit this way.
whatwg/streams#932 by @ricea brings up an interesting situation. I'm curious what the champions have to say about this, although I don't anticipate any changes to the weak refs proposal because of it.
The summary is:
y
holds on to a user-provided object x
indefinitelyy.close()
, y
will never use x
.x
to be GCed before y
.With weak refs, this of course becomes observable. Thus once weak refs are introduced, the spec as-written mandates that x
never be GCed before y
.
In reaction to this, the spec author should probably change y.close()
to null out the internal reference to x
explicitly. Then browsers will again be allowed to collect x
before y
and free up some memory, while still conforming to the spec.
Generalizing, the tricky parts of the situation are:
ES6 moved/added some methods like parseInt
to the respective namespaces (Number), so I thought it would be logical to support this style for all new APIs, even if there is only 1 function to add, so:
makeWeakRef(target, executor, holdings);
should be at least
WeakRef.makeWeakRef(target, executor, holdings);
or even
WeakRef.make(target, executor, holdings);
where WeakRef
is a Math
-like namespace object.
I'm not proposing making WeakRef
constructable via class, but I consider it a better alternative over just plain global function.
Several times on the web platform, we've denied feature requests due to not wanting to expose garbage collection behavior. The most prominent in my recent memory is detecting MessagePort closing: see
I think as part of exploring the cross-cutting concerns of this feature, this proposal and its champions need to answer at least the following questions:
I'd also hope that the champions can collaborate with others to dig up other cases where web platform features have been omitted due to the desire not to expose GC, and help us evaluate whether we should revisit those decisions and APIs if weakrefs become shipped. I'll try to direct folks here to remind of us other cases.
Do you need to keep a reference to the weak reference for the executor to fire?
(function(){
var a = ["important object"];
makeWeakRef(a, function() { console.log("Important object collected!") });
})()
Will the console.log ever happen?
Assuming that this may take a while to make it into the browsers, are there some official recommendations on manual memory management libraries for javascript?
Recently I've been using a library that rolls its own as an implementation of this paper.
It seems like even though there are some standard academic papers/references, there aren't many (or any?) standard libs to fill in the gaps for now.
If there are some generic libraries or guidelines to implement in javascript, it might be nice to list them in the Readme here as a stopgap measure
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.