Giter Site home page Giter Site logo

proposal-weakrefs's Issues

Add explainer to README

Lots of information is missing from, or explicitly marked as outdated in, the README. It would be great to have an up-to-date explainer in the README itself. Might I suggest the following sections (based on the Stage 1 entrance criteria) with at least one paragraph each?

  • Motivation
  • Proposed solution
  • High-level API
  • Illustrative examples
  • FAQ

Fix the reference section

I missed that I never updated the reference section, so it's basically missing. It needs to be added, expanded.

Different from Ephemerons?

How are the semantics (and collection algorithm) for weakrefs different from Ephemerons as described in the Hayes paper

Is the only difference that you don't treat the executor and holdings as "value fields" (using Barry's terminology)? If that is the case you have presumably elimiinated the need for the three-phase Ephemeron algorithm but at the cost of introducinghe possibility of leaks via circular references originating from the holdings or executor.

Privileged API and the System object

As best as I can tell from the current spec, there is no "System" object in ECMAScript. So this section needs to be worked over in some way.

More generally I'm concerned that this API may be behind some privilege wall that prevents its routine use without user prompts in libraries and so on. Can this be clarified?

Needs discussion / clarification: if cleanup doesn't iterate all weak cells, when will it be called again?

Assume the WeakFactory's cleanup function does not retrieve all dirty WeakCells from the iterator. When will the cleanup function be scheduled to be called again?

  1. Immediately (i.e., post another cleanup task immediately to the microtask queue, i.e., let other already scheduled microtasks run and call cleanup again)
  2. Whenever GC finds another dirty WeakCell belonging to the same WeakFactory
  3. Whenever GC finds another dirty WeakCell belonging to any WeakFactory
  4. Something else?

Should the FinalizationGroup create & return the key?

Like this:

key = fg.register(target, holdings);
fg.unregister(key);

This would allow different implementations, e.g., key could be an object (like WeakCell), or it could be an integer which is used for accessing an array, or a hash map key.

In the current new API:
fg.register(target, key, holdings);
fg.unregister(key); // unregisters all!

we force the internal implementation to be a multihashmap / hashmap where the value is a list. (Actually, we'd need two multihashmaps, one for the active "WeakCells" and one for "WeakCells" already scheduled for cleanup...)

How does this work in long-running turns?

From TC39.

In workers with long-running turns, how does this work? (given that after access, weak pointers are strong till the end of the turn and finalization is only between turns).

Misc bugs in writeup

In the section "extended example", in the call to makeWeakRef, the executor argument should probably not be this.executor, but this.dropRef?

Typo in the paragraph below the code: "need to do perform", the "do" is wild.

Should the next method for ReclaimedIterator be on the prototype instead of the object itself?

Currently the next method gets set on the iterator object itself inside WeakFactoryCleanupJob:

5. Let next be a new built-in function defined in WeakFactoryCleanupIterator next.
6. Perform CreateMethodProperty(iterator, next, next).

It seems like the next method should be on the prototype, since all objects want to share the same method. In fact, this is what map and set iterators do: https://tc39.github.io/ecma262/#sec-%mapiteratorprototype%.next

Current spec text unclear wrt. realm

The current text as it stands doesn't make it clear in which realm the DoAgentFinalization is supposed to run, which then leaves it open which realm is used by the EnqueueJob, which in turn means that it's unclear in which realm the WeakFactoryCleanupJob has to run.

Since WeakFactoryCleanupJob creates JavaScript objects and even calls into arbitrary user code - aka [Cleanup] - which might not even be callable, it has to have some current realm.

One obvious solution here would be remember either the realm or the current execution context on the WeakFactory and enter that realm in DoAgentFinalization for that particular factory.

Using separate agent to perform finalization.

For long running turns, it would be nice to be able to allow GC to happen by scheduling finalization on a separate thread/agent. Since a number of use cases can use primitives for the holdings reference used for cleanup ,it seems safe to pass this between agents. Roughly, this would require a new API to communicate that finalization should be done on a different agent.

drawing of thread with infinite length turn allowing GC of WeakRef targets

WeakCell.prototype.clear(): step 5 could be a little more clear

5. If factory is not undefined.
     i.   Remove O from factory.[[ActiveCells]].
     ii.  Set O.[[Target]] to undefined.
     iii. Set O.[[Factory]] to undefined.
     iv.  Set O.[[Holdings]] to undefined.

First, it seems that there should be some sort of assertion/check that factory is a WeakFactory object. Yes, it seems it will always be such an object if it isn't undefined, but I think things could be a bit more clear in the spec text here.

Second, there should probably be some sort of check for if factory.[[ActiveCells]] is a list and is not undefined, since WeakFactory.prototype.shutdown can set it to undefined. Again, seems pretty obvious what is the intended behavior, but the spec could be made a bit more clear.

Documentation

I'm working with a number of people interested in writing documentation for JavaScript feature proposals. Writing documentation could be a useful for the development of the WeakRefs proposal, even before it's done, because:

  • The act of writing documentation, even if it's not published, and thinking through the learning process, could help expose API design issues
  • Publishing documentation can improve outreach and help collect feedback, which could also lead to refinements in the proposal

Would the champions of this proposal be interested in getting in touch with people who would be interested in helping with TC39 proposal documentation generally?

Example use case (not resource management; convenient via iterable weak set)

I have a definite use case for weak references in one of my projects, jsdom. It runs primarily in Node, but also in the browser. Of note, this case isn't related to resource management.

jsdom implements, among other things, the DOM Standard. In particular, there is one line that concerns us here: when removing a node:

  1. For each NodeIterator object iterator whose root’s node document is node’s node document, run the NodeIterator pre-removing steps given node and iterator.

To correctly implement this step, you need a reference to every NodeIterator object that's ever been created, so you can update it correctly (by running the pre-removing steps).

This really requires weak references, as otherwise you end up holding a strong reference to every NodeIterator ever created, never releasing their memory, or the memory of the other objects that the NodeIterator references (e.g. the two DOM nodes it points to, root and reference node).


Of note:

  • This is not being used as any sort of backstop for cleaning things up. It's just plain required for correctness without massive memory leaks.
  • This doesn't require finalization at all.
  • What we really need is an "iterable weak set"; weak references of course can be used to build one of these (and an iterable weak set can be used to build a weak reference).

Finally, people may be amused by how we currently work around this. We have a user-configurable number, "max working NodeIterators", which we default to 10. We hold strong references to the last 10 created NodeIterators. If an 11th NodeIterator is created, we remove it from the list, then set an internal bit on it that says "sorry, this one's broken", and any methods you try to call on that NodeIterator will throw an exception, helpfully telling you to up the max working NodeIterators configuration.

We then loop over those 10 strongly-held NodeIterators whenever a node is removed from the document, and update them appropriately per the spec.

A simpler API?

Sorry if I'm missing something obvious because of my lack of understanding, but why can't we have a simpler API like this?

let obj = new Something();
let strong = obj;
let weak = Object.getWeakReference(obj);

obj; // Something
strong; // Something
weak; // Something

obj = undefined;

obj; // undefined
strong; // Something
weak; // Something

strong = undefined;

obj; // undefined
strong; // undefined
weak; // null

WeakRef collection semantics violate encapsulation

The spec currently states that if the weakref becomes condemned before or at the same time as the target object, then no finalizer is executed. I can see why this is desirable choice for the implementation, but it seems to me that it violates reasonable assumptions about encapsulation.

Suppose a wasm application maintains many objects within its flat heap, which it reifies to JS as JS objects. These JS objects require finalization, so that the finalizer can destruct the wasm resource properly. Suppose that that's all there's to it (ie we don't have to map an address to the JS object representing it so there's no natural external data structure that will reference the weakrefs).

With the current design, we must retain an additional table of weakrefs in the JS heap to ensure that the JS finalizers will be run when the JS objects that are held weakly are condemned, and when an object is finalized the finalizer must ensure that the weakref that caused the finalizer to run is removed from the table, or it must ensure that the table is eventually swept for dead weakrefs.

Furthermore, libraries will themselves need to have their own tables or data structures for ensuring that weakrefs are kept alive long enough (they wouldn't likely share a single table for this), so this problem becomes somewhat non-local.

With a design that does not require the weakref to stay alive, the weakref could be attached to the JS object, or, I suppose, simply dropped on the floor, creating and dropping the weakref thus being equivalent to registering the object for finalization. This puts a larger burden on the implementation but the ergonomics seem to me to be better.

Resolve inconsistencies between null and undefined

Some parts of the spec say or imply that 'get/deref on a weakref whose target is collected returns null. Other parts say or imply void 0, i.e., undefined.

There may also be inconsistencies with the holdings, but I did not look as carefully. In any case, we need to look and resolve any such inconsistencies.

I can't see any way to observe whether an executor is null or undefined, but we should be sure we're consistent here anyway.

Has anyone written down any widely accepted criteria for choosing between null and undefined?

WeakRef needs a query interface

WeakRef instances currently come with get() and clear() accessors. In addition I think there should be a isClear() predicate. The reason is that, given existing semantics at least (see #19), it may be a design choice to have a table of weakrefs representing objects that need finalization, and to sweep this table occasionally looking for cleared weakrefs to remove. (That's not the only possible design but it's a reasonable design.) In this case, we won't want to use get() to access each weakref because that makes the weakref's pointer strong for the duration of the turn. So we should have a predicate to ask whether the weakref has been cleared, and can be removed from the table.

(I also think clear() should return a boolean representing whether or not the reference was cleared (true) or already was clear (false) but if we have isClear() the justification for changing clear() is scant.)

Why is holdings needed?

It would be good to give a clear example why holdings is necessary (how it allows you to do things you could not otherwise) compared to having the executor simply close over the relevant variable, or using bind to store it alongside the executor. As far as I can see, every program that uses holdings can be easily refactored to a more intuitive one that does not. What am I missing?

/cc @Sebmaster

FAQ: Why not just add a callback to WeakMap/WeakSet?

What if it was done like this instead?

const weakObjs = new WeakSet([], objectAboutToBeCollected => {
  // here do something with the object right before it is going to be collected,e.g.
  objectAboutToBeCollected.dispose();
  // or if objectAboutToBeCollected is attached to a live object then collection is cancelled,
  // this method will be called again when it is ready to be collected once more
});

same for WeakMap

if listening for an object finalization is no longer needed then it can just be removed from the set / the set can be collected

Document the implications of the absense of a same-Realm check for WeakRefs

makeCell and makeRef rely on the sameRealm operation:

"If SameRealm(target, this), then ..."

Why does it make sense to treat the reference strongly? What happens then - the cleanup function just never gets called, and the WeakCell stays alive forever (unless it's explicitly cleared)?

In addition, we don't have the SameRealm operation in V8 and when I talked with the engineers responsible for this area, they said "we don't want the objects to know which Realm they come from", that is, they think the SameRealm operation should not be possible and should not end up in the spec.

The file handle example is an antipattern

I can think of good reasons to want weak refs, but closing file handles is not one of them, so I question its use as a motivating example.

With file handles, you are managing an external non-memory resource with GC. That's a very good way to exhaust your system allotment of file handles, since there is no guarantee that any of them will ever be closed.

Within SpiderMonkey, for example, we partition the GC heap and can collect partitions independently. If one of these partitions were to only allocate file handles, you could quite easily run out of file descriptors despite massive ongoing allocations in other partitions. Worse: in actual fact, we currently do a lot of whole-heap collections for dumb reasons, which is a bug that we're working on fixing. I really would rather not be prevented from fixing that bug because people start depending on the GC to clean up non-memory resources. (And the obvious workarounds are unpalatable, eg informing the GC whenever allocating a file handle so that it can operate in a degraded mode in order to be prompt about cleaning up these other resources.)

'Executor' term concern

It's called finalization, which suggests the thing that does this should be a finalizer.

Alternatively it should be called execution, done by an executor. But both of those terms are highly overloaded.

New API: registering a WeakRef into FinalizationGroup shouldn't have a key

.. the "key" in this case should be the WeakRef object itself.


From @gsathya 's notes:

// new API
let wr = new WeakRef(target, holdings);
let fg = new FinalizationGroup(() => {});
fg.register(wr, key); // register WeakRef
fg.register(target, holdings, key); // create and register objectless WeakCell

I don't think it's relevant to pass a key here, since we can just fg.unregister(wr); instead of fg.unregister(key).

Also the API won't be uniform with the "objectless WeakCell" case, since for it we pass "target, holdings, key", and for the WeakRef one we pass "wr, key". So we should just pass "wr".

(In practice, the users would probably reuse the WeakRef as the key, and do fg.register(wr, wr); fg.unregister(wr); since there's no need to come up with any other key.)

I'm also not super convinced about the "objectless WeakCell" API, e.g., the user has to come up with unique keys. Maybe the API should give back the key? key = fg.register(target, holdings); ?

Clarification / spec text correction wrt when is the "KeepDuringJob" Set cleared?

Can it be cleared between microtasks?

The current spec text talks about associating the set with Job (i.e.,microtask?). If we implement it like this, the WeakRef can be cleared between microtasks.

On the other hand, the slides (slide 17) say "A program cannot observe a target getting reclaimed within the execution of a job (turn of the event loop)" (not microtask queue).

E.g.,

Promise.resolve().then(() => { wr = wf.makeRef(...); }).then(() => { wr.deref(); });

... can the deref() now return undefined?

Rename WeakFactory to FinalizationGroup

I think we should rename WeakFactory, for a couple of reasons:

  1. Grouping lifetimes is the function of this class that's more immediately relevant to developers
  2. It's function as a mechanism for removing the ability to create WeakRefs (while keeping the ability to hand them out) is a very advanced concept that we don't need to put focus on

Realistically speaking, most developers won't have strong intuitions about any of the terms in this domain. Nevertheless, I think we should try to use names that at least guide them in the right direction. In particular, the concept of grouping lifetimes of a set of WeakRefs is important, and I think the solution we settled on is excellent. Now we also need to make it as easy as possible for developers to understand this solution :)

I'm not saying that FinalizationGroup is a fantastic name—in particular it emphasizes only one half of what is provided—, but I do think that it at least points in the right direction, providing a good starting point for learning about the concepts involved via sources like MDN.

@erights, @dtribble, WDYT?

Replace getCell with registerForFinalization (or just register)

WeakCell provides the means to do finalization without dereferencing, solving the long-turn use cases posed by Workers and maybe WebAssembly. However, they are relatively costly in terms of memory usage.

I'd like to propose dropping the concept of WeakCell, and instead providing methods on WeakFactory (or FinalizationGroup) to register a [target, holdings] pair for finalization. That'd reduce the memory usage to two words, which I think is the best we can possibly do. Unregistering can be done by providing the same pair to an accompanying unregisterFromFinalization method.

If we do #49, these could just be called register and unregister, or add and remove, which seems nice and reasonably clear to me.

@erights, @dtribble, does this seem reasonable to you?

What is the new API?

I see some notes about a new API for WeakRefs at #54 , but it's somewhat underspecified. Is someone planning on writing up the new API in detail? cc @tschneidereit

Question about ordering of dirty cells/refs yielded by ReclaimedIterator objects

Consider this example:

function cleanupOne(refs) {
  // Process only one ref for cleanup.
  console.log(refs.next().value.holdings);
}

const factory = new WeakFactory();

function makeGarbage(tag) {
    const obj = {};
    return factory.makeRef(obj, tag);
}

const a = makeGarbage("a");
const b = makeGarbage("b");

// Wait for `KeepDuringJob`'s affects to expire.
await nextJob();

// JS observes that the targets for a and b were reclaimed.
assertEq(a.deref(), null);
assertEq(b.deref(), null);

// Cleanup one reference.
//
// (This could also use the WeakCellJobs queue, but for this example it does not
// because it is easier to write this way.)
factory.cleanupSome(cleanupOne);

const c = makeGarbage("c");

// Wait for `KeepDuringJob`'s affects to expire.
await nextJob();

// JS observes that c has been reclaimed.
assertEq(c.deref(), null);

// Cleanup one more reference.
factory.cleanupSome(cleanupOne);

What are the permissible orderings of log messages?

Must it always be either a, b or b, a? Are either of a, c or b, c allowed as well?

It seems if we want to ensure that only the former are allowed, then I think WeakFactory objects should have a [[Dirty]] internal property that is an ordered queue that dirty cells get moved into from [[ActiveCells]] when they become dirty.

Consider adding additional non-determinism on purpose

If there is concern that weak references can create side-channels and/or implementation-reliance, consider intentionally adding non-determinsm to mask those information leaks or reliances.

In short: If an object is weakly held by a WeakMap or WeakRef (or a weak-value-map, etc), then add a random delay to garbage collection for that object.

All engines would then behave similarly: weak objects would take some random amount of time to be truly cleaned up. That eliminates implementation-reliance or cross-engine incompatibility. As for side-channels, communicating via the GC would only be possible if you can predict the random number generator.

For prior art here, see Go's map iteration or channel multiplexing. Go will choose a random offset from which to begin iteration. This prevents users from relying on map iteration order. Similarly, channel multiplexing introduces extra randomness to prevent masking race conditions or introducing bias.

This issue was prompted by a discussion on Twitter between myself and Sam Tobin-Hochstadt.

What should if WeakFactory.makeCell / makeRef is called after WeakFactory.shutdown?

The current spec doesn't do anything to prevent calling WeakFactory.makeCell or WeakFactory.makeRef after calling WeakFactory.shutdown.

But shutdown has set [[ActiveCells]] to undefined, and then makeCell / makeRef tries to add to it, so that sounds wrong.

If the intention of shutdown is to make the WeakFactory forever unusable, the spec needs to prevent calling makeCell / makeRef somehow (e.g,. set a flag in WeakFactory and throw an error in makeCell / makeRef if the flag is set).

If the intention is to just get rid of the current WeakCells without doing cleanup for them, the spec needs to reset the members in such a way that it remains usable. (The function could be called e.g., "clear()" in that case.)

Status update?

This proposal hasn't really gotten much action lately, so I'm just curious how it stands right now. Driven by this inquiry, but thought I'd file an issue over here instead.

Should we coarsen the atomicity unit?

Currently, the weakref proposal specifies an individual turn (i.e., "job" in EcmaScript terminology) as the atomicity unit. A turn/job is everything that happens from one empty user stack state to the next empty user stack state. Any implementation that uses a coarser unit, consisting of several turns, will satisfy the current spec as long as the boundaries between these units are also turn boundaries, i.e., empty stack states.

At the November tc39 meeting @tschneidereit suggested that the atomicity unit be coarsened in the spec to match what implementations are expected to do anyway, which is to always service a non-empty promise queue before servicing any other turns, and to only deal with WeakMap atomicity issues in states where the stack and promise queue were both empty.

If this assumption about implementations held, then this would be a possible choice, with pros and cons to argue through. (I've changed my opinion several times.) But nodejs/promise-use-cases#25 indicates that the premise is false. Implementations might not always service the promise queue at strictly higher priority. The reasons for not doing so are well motivated and plausible. However this particular bug is settled, we would have to preclude this possibility for all future implementations if we coarsened the atomicity unit this way.

Interesting impact on other specifications

whatwg/streams#932 by @ricea brings up an interesting situation. I'm curious what the champions have to say about this, although I don't anticipate any changes to the weak refs proposal because of it.

The summary is:

  • A spec is written so that spec-created object y holds on to a user-provided object x indefinitely
  • However, after calling y.close(), y will never use x.
  • Thus, it is currently unobservable whether or not an implementation allows x to be GCed before y.

With weak refs, this of course becomes observable. Thus once weak refs are introduced, the spec as-written mandates that x never be GCed before y.

In reaction to this, the spec author should probably change y.close() to null out the internal reference to x explicitly. Then browsers will again be allowed to collect x before y and free up some memory, while still conforming to the spec.

Generalizing, the tricky parts of the situation are:

  • Spec authors have to be more aggressive about nulling out internal references than they were before, when such references were unobservable. (Assuming they want to avoid mandating unnecessary memory usage.)
  • Implementers now have to be exact in following specs' recommendations on nulling out (or not) an internal reference, since doing so is now observable.
  • There are probably many divergences today in memory management strategies between browsers on various APIs of this sort, which may need to be audited now that the strategy is becoming observable.

Namespace for the API

ES6 moved/added some methods like parseInt to the respective namespaces (Number), so I thought it would be logical to support this style for all new APIs, even if there is only 1 function to add, so:

makeWeakRef(target, executor, holdings);

should be at least

WeakRef.makeWeakRef(target, executor, holdings);

or even

WeakRef.make(target, executor, holdings);

where WeakRef is a Math-like namespace object.

I'm not proposing making WeakRef constructable via class, but I consider it a better alternative over just plain global function.

How does this impact the web platform? (Cross-cutting concerns)

Several times on the web platform, we've denied feature requests due to not wanting to expose garbage collection behavior. The most prominent in my recent memory is detecting MessagePort closing: see

I think as part of exploring the cross-cutting concerns of this feature, this proposal and its champions need to answer at least the following questions:

  • Would this proposal allow you to create your own "close" event for MessagePort, of the type that these threads are asking for? If so, please show the example code.
  • If this proposal does do so, should we give developers what they've been asking for since at least 2013, and just add the event to browsers, now that the cat is out of the bag via weakrefs? (Needs consulting with browser teams.)

I'd also hope that the champions can collaborate with others to dig up other cases where web platform features have been omitted due to the desire not to expose GC, and help us evaluate whether we should revisit those decisions and APIs if weakrefs become shipped. I'll try to direct folks here to remind of us other cases.

stopgap solutions?

Assuming that this may take a while to make it into the browsers, are there some official recommendations on manual memory management libraries for javascript?

Recently I've been using a library that rolls its own as an implementation of this paper.

It seems like even though there are some standard academic papers/references, there aren't many (or any?) standard libs to fill in the gaps for now.

If there are some generic libraries or guidelines to implement in javascript, it might be nice to list them in the Readme here as a stopgap measure

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.