Giter Site home page Giter Site logo

proposal-weakrefs's Introduction

WeakRefs TC39 proposal

Status

Introduction

The WeakRef proposal encompasses two major new pieces of functionality:

  1. creating weak references to objects with the WeakRef class
  2. running user-defined finalizers after objects are garbage-collected, with the FinalizationRegistry class

These interfaces can be used independently or together, depending on the use case.

For developer reference documentation, see reference.md.

A note of caution

This proposal contains two advanced features, WeakRef and FinalizationRegistry. Their correct use takes careful thought, and they are best avoided if possible.

Garbage collectors are complicated. If an application or library depends on GC cleaning up a WeakRef or calling a finalizer in a timely, predictable manner, it's likely to be disappointed: the cleanup may happen much later than expected, or not at all. Sources of variability include:

  • One object might be garbage-collected much sooner than another object, even if they become unreachable at the same time, e.g., due to generational collection.
  • Garbage collection work can be split up over time using incremental and concurrent techniques.
  • Various runtime heuristics can be used to balance memory usage, responsiveness.
  • The JavaScript engine may hold references to things which look like they are unreachable (e.g., in closures, or inline caches).
  • Different JavaScript engines may do these things differently, or the same engine may change its algorithms across versions.
  • Complex factors may lead to objects being held alive for unexpected amounts of time, such as use with certain APIs.

Important logic should not be placed in the code path of a finalizer. Doing so could create user-facing issues triggered by memory management bugs, or even differences between JavaScript garbage collector implementations. For example, if data is saved persistently solely from a finalizer, then a bug which accidentally keeps an additional reference around could lead to data loss.

For this reason, the W3C TAG Design Principles recommend against creating APIs that expose garbage collection. It's best if WeakRef objects and FinalizationRegistry objects are used as a way to avoid excess memory usage, or as a backstop against certain bugs, rather than as a normal way to clean up external resources or observe what's allocated.

Weak references

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent (i.e. an object which is referred to by a weak reference) are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. However, until the object is actually destroyed, the weak reference may return the object even if there are no strong references to it.

A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object is not kept alive solely because it appears in a cache or mapping.

For example, if you have a number of large binary image objects (e.g. represented as ArrayBuffers), you may wish to associate a name with each image. Existing data structures just don't do what's needed here:

  • If you used a Map to map names to images, or images to names, the image objects would remain alive just because they appeared as values or keys in the map.
  • WeakMaps are not suitable for this purpose either: they are weak over their keys, but in this case, we need a structure which is weak over its values.

Instead, we can use a Map whose values are WeakRef objects, which point to the ArrayBuffer. This way, we avoid holding these ArrayBuffer objects in memory longer than they would be otherwise: it's a way to find the image object if it's still around, but if it gets garbage collected, we'll regenerate it. This way, less memory is used in some situations.

// This technique is incomplete; see below.
function makeWeakCached(f) {
  const cache = new Map();
  return key => {
    const ref = cache.get(key);
    if (ref) {
      const cached = ref.deref();
      if (cached !== undefined) return cached;
    }

    const fresh = f(key);
    cache.set(key, new WeakRef(fresh));
    return fresh;
  };
}

var getImageCached = makeWeakCached(getImage);

This technique can help avoid spending a lot of memory on ArrayBuffers that nobody is looking at anymore, but it still has the problem that, over time, the Map will fill up with strings which point to a WeakRef whose referent has already been collected. One way to address this is to periodically scavenge the cache and clear out dead entries. Another way is with finalizers, which we’ll come back to at the end of the article.

A few elements of the API are visible in this example:

  • The WeakRef constructor takes an argument, which has to be an object, and returns a weak reference to it.
  • WeakRef instances have a deref method that returns one of two values:
    • The object passed into the constructor, if it’s still available.
    • undefined, if nothing else was pointing to the object and it was already garbage-collected.

Finalizers

Finalization is the execution of code to clean up after an object that has become unreachable to program execution. User-defined finalizers enable several new use cases, and can help prevent memory leaks when managing resources that the garbage collector doesn't know about.

Another note of caution

Finalizers are tricky business and it is best to avoid them. They can be invoked at unexpected times, or not at all---for example, they are not invoked when closing a browser tab or on process exit. They don’t help the garbage collector do its job; rather, they are a hindrance. Furthermore, they perturb the garbage collector’s internal accounting. The GC decides to scan the heap when it thinks that it is necessary, after some amount of allocation. Finalizable objects almost always represent an amount of allocation that is invisible to the garbage collector. The effect can be that the actual resource usage of a system with finalizable objects is higher than what the GC thinks it should be.

The proposed specification allows conforming implementations to skip calling finalization callbacks for any reason or no reason. Some reasons why many JS environments and implementations may omit finalization callbacks:

  • If the program shuts down (e.g., process exit, closing a tab, navigating away from a page), finalization callbacks typically don't run on the way out. (Discussion: #125)
  • If the FinalizationRegistry becomes "dead" (approximately, unreachable), then finalization callbacks registered against it might not run. (Discussion: #66)

All that said, sometimes finalizers are the right answer to a problem. The following examples show a few important problems that would be difficult to solve without finalizers.

Locating and responding to external resource leaks

Finalizers can locate external resource leaks. For example, if an open file is garbage collected, the underlying operating system resource could be leaked. Although the OS will likely free the resources when the process exits, this sort of leak could make long-running processes eventually exhaust the number of file handles available. To catch these bugs, a FinalizationRegistry can be used to log the existence of file objects which are garbage collected before being closed.

The FinalizationRegistry class represents a group of objects registered with a common finalizer callback. This construct can be used to inform the developer about the never-closed files.

class FileStream {
  static #cleanUp(heldValue) {
    console.error(`File leaked: ${file}!`);
  }

  static #finalizationGroup = new FinalizationRegistry(FileStream.#cleanUp);

  #file;

  constructor(fileName) {
    this.#file = new File(fileName);
    FileStream.#finalizationGroup.register(this, this.#file, this);
    // eagerly trigger async read of file contents into this.data
  }

  close() {
    FileStream.#finalizationGroup.unregister(this);
    File.close(this.#file);
    // other cleanup
  }

  async *[Symbol.iterator]() {
    // read data from this.#file
  }
}

const fs = new FileStream('path/to/some/file');

for await (const data of fs) {
  // do something
}
fs.close();

Note, it's not a good idea to close files automatically through a finalizer, as this technique is unreliable and may lead to resource exhaustion. Instead, explicit release of resources (e.g., though try/finally) is recommended. For this reason, this example logs errors rather than transparently closing the file.

This example shows usage of the whole FinalizationRegistry API:

  • An object can have a finalizer referenced by calling the register method of FinalizationRegistry. In this case, three arguments are passed to the register method:
    • The object whose lifetime we're concerned with. Here, that's this, the FileStream object.
    • A held value, which is used to represent that object when cleaning it up in the finalizer. Here, the held value is the underlying File object. (Note: the held value should not have a reference to the weak target, as that would prevent the target from being collected.)
    • An unregistration token, which is passed to the unregister method when the finalizer is no longer needed. Here we use this, the FileStream object itself, since FinalizationRegistry doesn't hold a strong reference to the unregister token.
  • The FinalizationRegistry constructor is called with a callback as an argument. This callback is called with a held value.

The finalizer callback is called after the object is garbage collected, a pattern which is sometimes called "post-mortem". For this reason, the FinalizerRegistry callback is called with a separate held value, rather than the original object--the object's already gone, so it can't be used.

In the above code sample, the fs object will be unregistered as part of the close method, which will mean that the finalizer will not be called, and there will be no error log statement. Unregistration can be useful to avoid other sorts of "double free" scenarios.

Exposing WebAssembly memory to JavaScript

Whenever you have a JavaScript object that is backed by something in WebAssembly, you might want to run custom cleanup code (in WebAssembly or JavaScript) when the object goes away. A previous proposal exposed a collection of weak references, with the idea that finalization actions could be taken by periodically checking if they are still alive. This proposal includes a first-class concept of finalizers in order to give developers a way to avoid that repeated scanning.

For example, imagine if you have a big WebAssembly.Memory object, and you want to create an allocator to give fixed-size portions of it to JavaScript. In some cases, it may be practical to explicitly free this memory, but typically, JavaScript code passes around references freely, without thinking about ownership. So it's helpful to be able to rely on the garbage collector to release this memory. A FinalizationRegistry can be used to free the memory.

function makeAllocator(size, length) {
  const freeList = Array.from({length}, (v, i) => size * i);
  const memory = new ArrayBuffer(size * length);
  const finalizationGroup = new FinalizationRegistry(
    held => freeList.unshift(held));
  return { memory, size, freeList, finalizationGroup };
}

function allocate(allocator) {
  const { memory, size, freeList, finalizationGroup } = allocator;
  if (freeList.length === 0) throw new RangeError('out of memory');
  const index = freeList.shift();
  const buffer = new Uint8Array(memory, index * size, size);
  finalizationGroup.register(buffer, index);
  return buffer;
}

This code uses a few features of the FinalizationRegistry API:

  • An object can have a finalizer referenced by calling the register method of FinalizationRegistry. In this case, two arguments are passed to the register method:
    • The object whose lifetime we're concerned with. Here, that's the Uint8Array
    • A held value, which is used to represent that object when cleaning it up in the finalizer. In this case, the held value is an integer corresponding to the offset within the WebAssembly.Memory object.
  • The FinalizationRegistry constructor is called with a callback as an argument. This callback is called with a held value.

The FinalizationRegistry callback is called potentially multiple times, once for each registered object that becomes dead, with a relevant held value. The callback is not called during execution of other JavaScript code, but rather "in between turns". The engine is free to batch calls, and a batch of calls only runs after all of the Promises have been processed. How the engine batches callbacks is implementation-dependent, and how those callbacks intersperse with Promise work should not be depended upon.

Avoid memory leaks for cross-worker proxies

In a browser with web workers, a programmer can create a system with multiple JavaScript processes, and thus multiple isolated heaps and multiple garbage collectors. Developers often want to be able to address a "remote" object from some other process, for example to be able to manipulate the DOM from a worker. A common solution to this problem is to implement a proxy library; two examples are Comlink and via.js.

In a system with proxies and processes, remote proxies need to keep local objects alive, and vice versa. Usually this is implemented by having each process keep a table mapping remote descriptors to each local object that has been proxied. However, these entries should be removed from the table when there are no more remote proxies. With the finalization functionality in the WeakRef proposal, libraries like via.js can send a message when a proxy becomes collectable, to inform the object's process that the object is no longer referenced remotely. Without finalization, via.js and other remote-proxy systems have to fall back to leaking memory, or to manual resource management.

Note: This kind of setup cannot collect cycles across workers. If in each worker the local object holds a reference to a proxy for the remote object, then the remote descriptor for the local object prevents the collection of the proxy for the remote object. None of the objects can be collected automatically when code outside the proxy library no longer references them. To avoid leaking, cycles across isolated heaps must be explicitly broken.

Using WeakRef objects and FinalizationRegistry objects together

It sometimes makes sense to use WeakRef and FinalizationRegistry together. There are several kinds of data structures that want to weakly point to a value, and do some kind of cleanup when that value goes away. Note however that weak refs are cleared when their object is collected, but their associated FinalizationRegistry cleanup handler only runs in a later task; programming idioms that use weak refs and finalizers on the same object need to mind the gap.

Weak caches

In the initial example from this README, makeWeakCached used a Map whose values were wrapped in WeakRef instances. This allowed the cached values to be collected, but leaked memory in the form of the entries in the map. A more complete version of makeWeakCached uses finalizers to fix this memory leak.

// Fixed version that doesn't leak memory.
function makeWeakCached(f) {
  const cache = new Map();
  const cleanup = new FinalizationRegistry(key => {
    // See note below on concurrency considerations.
    const ref = cache.get(key);
    if (ref && !ref.deref()) cache.delete(key);
  });

  return key => {
    const ref = cache.get(key);
    if (ref) {
      const cached = ref.deref();
      // See note below on concurrency considerations.
      if (cached !== undefined) return cached;
    }

    const fresh = f(key);
    cache.set(key, new WeakRef(fresh));
    cleanup.register(fresh, key);
    return fresh;
  };
}

var getImageCached = makeWeakCached(getImage);

This example illustrates two important considerations about finalizers:

  1. Finalizers introduce concurrency between the "main" program and the cleanup callbacks. The weak cache cleanup function has to check if the "main" program re-added an entry to the map between the time that a cached value was collected and the time the cleanup function runs, to avoid deleting live entries. Likewise when looking up a key in the ref map, it's possible that the value has been collected but the cleanup callback hasn't run yet.
  2. Given that finalizers can behave in surprising ways, they are best deployed behind careful abstractions that prevent misuse, like makeWeakCached above. A profusion of FinalizationRegistry uses spread throughout a code-base is a code smell.

Iterable WeakMaps

In certain advanced cases, WeakRef objects and FinalizationRegistry objects can be very effective complements. For example, WeakMaps have the limitation that they cannot be iterated over or cleared. The WeakRefs proposal enables creating an “iterable + clearable WeakMap”:

Such “iterable WeakMaps” are already used in existing DOM APIs such as document.getElementsByClassName or document.getElementsByTagName, which return live HTMLCollections. As such, the WeakRef proposal adds missing functionality that helps explain existing web platform features. Issue #17 describes a similar use case.

class IterableWeakMap {
  #weakMap = new WeakMap();
  #refSet = new Set();
  #finalizationGroup = new FinalizationRegistry(IterableWeakMap.#cleanup);

  static #cleanup({ set, ref }) {
    set.delete(ref);
  }

  constructor(iterable) {
    for (const [key, value] of iterable) {
      this.set(key, value);
    }
  }

  set(key, value) {
    const ref = new WeakRef(key);

    this.#weakMap.set(key, { value, ref });
    this.#refSet.add(ref);
    this.#finalizationGroup.register(key, {
      set: this.#refSet,
      ref
    }, ref);
  }

  get(key) {
    const entry = this.#weakMap.get(key);
    return entry && entry.value;
  }

  delete(key) {
    const entry = this.#weakMap.get(key);
    if (!entry) {
      return false;
    }

    this.#weakMap.delete(key);
    this.#refSet.delete(entry.ref);
    this.#finalizationGroup.unregister(entry.ref);
    return true;
  }

  *[Symbol.iterator]() {
    for (const ref of this.#refSet) {
      const key = ref.deref();
      if (!key) continue;
      const { value } = this.#weakMap.get(key);
      yield [key, value];
    }
  }

  entries() {
    return this[Symbol.iterator]();
  }

  *keys() {
    for (const [key, value] of this) {
      yield key;
    }
  }

  *values() {
    for (const [key, value] of this) {
      yield value;
    }
  }
}


const key1 = { a: 1 };
const key2 = { b: 2 };
const keyValuePairs = [[key1, 'foo'], [key2, 'bar']];
const map = new IterableWeakMap(keyValuePairs);

for (const [key, value] of map) {
  console.log(`key: ${JSON.stringify(key)}, value: ${value}`);
}
// key: {"a":1}, value: foo
// key: {"b":2}, value: bar

for (const key of map.keys()) {
  console.log(`key: ${JSON.stringify(key)}`);
}
// key: {"a":1}
// key: {"b":2}

for (const value of map.values()) {
  console.log(`value: ${value}`);
}
// value: foo
// value: bar

map.get(key1);
// → foo

map.delete(key1);
// → true

for (const key of map.keys()) {
  console.log(`key: ${JSON.stringify(key)}`);
}
// key: {"b":2}

Remember to be cautious with use of powerful constructs like this iterable WeakMap. Web APIs designed with semantics analogous to these are widely considered to be legacy mistakes. It’s best to avoid exposing garbage collection timing in your applications, and to use weak references and finalizers only where a problem cannot be reasonably solved in other ways.

WeakMaps remain fundamental

It is not possible to re-create a WeakMap simply by using a Map with WeakRef objects as keys: if the value in such a map references its key, the entry cannot be collected. A real WeakMap implementation uses ephemerons to allow the garbage collector to handle such cycles.
This is the reason the IterableWeakMap example keeps the value in a WeakMap and only puts the WeakRef in a Set for iterations. If the value had instead been added to a Map such as this.#refMap.set(ref, value), then the following would have leaked:

let key = { foo: 'bar' };
const map = new IterableWeakMap(key, { data: 123, key });

Scheduling of finalizers and consistency of multiple .deref() calls

There are several conditions where implementations may call finalization callbacks later or not at all. The WeakRefs proposal works with host environments (e.g., HTML, Node.js) to define exactly how the FinalizationRegistry callback is scheduled. The intention is to coarsen the granularity of observability of garbage collection, making it less likely that programs will depend too closely on the details of any particular implementation.

In the definition for HTML, the callback is scheduled in task queued in the event loop. What this means is that, on the web, finalizers will never interrupt synchronous JavaScript, and that they also won't be interspersed to Promise reactions. Instead, they are run only after JavaScript yields to the event loop.

The WeakRefs proposal guarantees that multiple calls to WeakRef.prototype.deref() return the same result within a certain timespan: either all should return undefined, or all should return the object. In HTML, this timespan runs until a microtask checkpoint, where HTML performs a microtask checkpoint when the JavaScript execution stack becomes empty, after all Promise reactions have run.

Historical documents

Champions

  • Dean Tribble
  • Mark Miller
  • Till Schneidereit
  • Sathya Gunasekaran
  • Daniel Ehrenberg

Status

proposal-weakrefs's People

Contributors

ajafff avatar bakkot avatar bterlson avatar chicoxyzzy avatar codehag avatar cpcallen avatar cybai avatar dependabot[bot] avatar devsnek avatar dtribble avatar erights avatar erykpiast avatar fitzgen avatar fudco avatar gsathya avatar hotsphink avatar jonco3 avatar littledan avatar ljharb avatar marjakh avatar mathiasbynens avatar mhofman avatar ms2ger avatar smorimoto avatar syg avatar tjcrowder avatar tschneidereit avatar wingo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proposal-weakrefs's Issues

WeakRef collection semantics violate encapsulation

The spec currently states that if the weakref becomes condemned before or at the same time as the target object, then no finalizer is executed. I can see why this is desirable choice for the implementation, but it seems to me that it violates reasonable assumptions about encapsulation.

Suppose a wasm application maintains many objects within its flat heap, which it reifies to JS as JS objects. These JS objects require finalization, so that the finalizer can destruct the wasm resource properly. Suppose that that's all there's to it (ie we don't have to map an address to the JS object representing it so there's no natural external data structure that will reference the weakrefs).

With the current design, we must retain an additional table of weakrefs in the JS heap to ensure that the JS finalizers will be run when the JS objects that are held weakly are condemned, and when an object is finalized the finalizer must ensure that the weakref that caused the finalizer to run is removed from the table, or it must ensure that the table is eventually swept for dead weakrefs.

Furthermore, libraries will themselves need to have their own tables or data structures for ensuring that weakrefs are kept alive long enough (they wouldn't likely share a single table for this), so this problem becomes somewhat non-local.

With a design that does not require the weakref to stay alive, the weakref could be attached to the JS object, or, I suppose, simply dropped on the floor, creating and dropping the weakref thus being equivalent to registering the object for finalization. This puts a larger burden on the implementation but the ergonomics seem to me to be better.

New API: registering a WeakRef into FinalizationGroup shouldn't have a key

.. the "key" in this case should be the WeakRef object itself.


From @gsathya 's notes:

// new API
let wr = new WeakRef(target, holdings);
let fg = new FinalizationGroup(() => {});
fg.register(wr, key); // register WeakRef
fg.register(target, holdings, key); // create and register objectless WeakCell

I don't think it's relevant to pass a key here, since we can just fg.unregister(wr); instead of fg.unregister(key).

Also the API won't be uniform with the "objectless WeakCell" case, since for it we pass "target, holdings, key", and for the WeakRef one we pass "wr, key". So we should just pass "wr".

(In practice, the users would probably reuse the WeakRef as the key, and do fg.register(wr, wr); fg.unregister(wr); since there's no need to come up with any other key.)

I'm also not super convinced about the "objectless WeakCell" API, e.g., the user has to come up with unique keys. Maybe the API should give back the key? key = fg.register(target, holdings); ?

The file handle example is an antipattern

I can think of good reasons to want weak refs, but closing file handles is not one of them, so I question its use as a motivating example.

With file handles, you are managing an external non-memory resource with GC. That's a very good way to exhaust your system allotment of file handles, since there is no guarantee that any of them will ever be closed.

Within SpiderMonkey, for example, we partition the GC heap and can collect partitions independently. If one of these partitions were to only allocate file handles, you could quite easily run out of file descriptors despite massive ongoing allocations in other partitions. Worse: in actual fact, we currently do a lot of whole-heap collections for dumb reasons, which is a bug that we're working on fixing. I really would rather not be prevented from fixing that bug because people start depending on the GC to clean up non-memory resources. (And the obvious workarounds are unpalatable, eg informing the GC whenever allocating a file handle so that it can operate in a degraded mode in order to be prompt about cleaning up these other resources.)

Privileged API and the System object

As best as I can tell from the current spec, there is no "System" object in ECMAScript. So this section needs to be worked over in some way.

More generally I'm concerned that this API may be behind some privilege wall that prevents its routine use without user prompts in libraries and so on. Can this be clarified?

Status update?

This proposal hasn't really gotten much action lately, so I'm just curious how it stands right now. Driven by this inquiry, but thought I'd file an issue over here instead.

Clarification / spec text correction wrt when is the "KeepDuringJob" Set cleared?

Can it be cleared between microtasks?

The current spec text talks about associating the set with Job (i.e.,microtask?). If we implement it like this, the WeakRef can be cleared between microtasks.

On the other hand, the slides (slide 17) say "A program cannot observe a target getting reclaimed within the execution of a job (turn of the event loop)" (not microtask queue).

E.g.,

Promise.resolve().then(() => { wr = wf.makeRef(...); }).then(() => { wr.deref(); });

... can the deref() now return undefined?

Should the next method for ReclaimedIterator be on the prototype instead of the object itself?

Currently the next method gets set on the iterator object itself inside WeakFactoryCleanupJob:

5. Let next be a new built-in function defined in WeakFactoryCleanupIterator next.
6. Perform CreateMethodProperty(iterator, next, next).

It seems like the next method should be on the prototype, since all objects want to share the same method. In fact, this is what map and set iterators do: https://tc39.github.io/ecma262/#sec-%mapiteratorprototype%.next

Using separate agent to perform finalization.

For long running turns, it would be nice to be able to allow GC to happen by scheduling finalization on a separate thread/agent. Since a number of use cases can use primitives for the holdings reference used for cleanup ,it seems safe to pass this between agents. Roughly, this would require a new API to communicate that finalization should be done on a different agent.

drawing of thread with infinite length turn allowing GC of WeakRef targets

Namespace for the API

ES6 moved/added some methods like parseInt to the respective namespaces (Number), so I thought it would be logical to support this style for all new APIs, even if there is only 1 function to add, so:

makeWeakRef(target, executor, holdings);

should be at least

WeakRef.makeWeakRef(target, executor, holdings);

or even

WeakRef.make(target, executor, holdings);

where WeakRef is a Math-like namespace object.

I'm not proposing making WeakRef constructable via class, but I consider it a better alternative over just plain global function.

Should the FinalizationGroup create & return the key?

Like this:

key = fg.register(target, holdings);
fg.unregister(key);

This would allow different implementations, e.g., key could be an object (like WeakCell), or it could be an integer which is used for accessing an array, or a hash map key.

In the current new API:
fg.register(target, key, holdings);
fg.unregister(key); // unregisters all!

we force the internal implementation to be a multihashmap / hashmap where the value is a list. (Actually, we'd need two multihashmaps, one for the active "WeakCells" and one for "WeakCells" already scheduled for cleanup...)

'Executor' term concern

It's called finalization, which suggests the thing that does this should be a finalizer.

Alternatively it should be called execution, done by an executor. But both of those terms are highly overloaded.

Current spec text unclear wrt. realm

The current text as it stands doesn't make it clear in which realm the DoAgentFinalization is supposed to run, which then leaves it open which realm is used by the EnqueueJob, which in turn means that it's unclear in which realm the WeakFactoryCleanupJob has to run.

Since WeakFactoryCleanupJob creates JavaScript objects and even calls into arbitrary user code - aka [Cleanup] - which might not even be callable, it has to have some current realm.

One obvious solution here would be remember either the realm or the current execution context on the WeakFactory and enter that realm in DoAgentFinalization for that particular factory.

WeakRef needs a query interface

WeakRef instances currently come with get() and clear() accessors. In addition I think there should be a isClear() predicate. The reason is that, given existing semantics at least (see #19), it may be a design choice to have a table of weakrefs representing objects that need finalization, and to sweep this table occasionally looking for cleared weakrefs to remove. (That's not the only possible design but it's a reasonable design.) In this case, we won't want to use get() to access each weakref because that makes the weakref's pointer strong for the duration of the turn. So we should have a predicate to ask whether the weakref has been cleared, and can be removed from the table.

(I also think clear() should return a boolean representing whether or not the reference was cleared (true) or already was clear (false) but if we have isClear() the justification for changing clear() is scant.)

Documentation

I'm working with a number of people interested in writing documentation for JavaScript feature proposals. Writing documentation could be a useful for the development of the WeakRefs proposal, even before it's done, because:

  • The act of writing documentation, even if it's not published, and thinking through the learning process, could help expose API design issues
  • Publishing documentation can improve outreach and help collect feedback, which could also lead to refinements in the proposal

Would the champions of this proposal be interested in getting in touch with people who would be interested in helping with TC39 proposal documentation generally?

How does this work in long-running turns?

From TC39.

In workers with long-running turns, how does this work? (given that after access, weak pointers are strong till the end of the turn and finalization is only between turns).

Why is holdings needed?

It would be good to give a clear example why holdings is necessary (how it allows you to do things you could not otherwise) compared to having the executor simply close over the relevant variable, or using bind to store it alongside the executor. As far as I can see, every program that uses holdings can be easily refactored to a more intuitive one that does not. What am I missing?

/cc @Sebmaster

Different from Ephemerons?

How are the semantics (and collection algorithm) for weakrefs different from Ephemerons as described in the Hayes paper

Is the only difference that you don't treat the executor and holdings as "value fields" (using Barry's terminology)? If that is the case you have presumably elimiinated the need for the three-phase Ephemeron algorithm but at the cost of introducinghe possibility of leaks via circular references originating from the holdings or executor.

Interesting impact on other specifications

whatwg/streams#932 by @ricea brings up an interesting situation. I'm curious what the champions have to say about this, although I don't anticipate any changes to the weak refs proposal because of it.

The summary is:

  • A spec is written so that spec-created object y holds on to a user-provided object x indefinitely
  • However, after calling y.close(), y will never use x.
  • Thus, it is currently unobservable whether or not an implementation allows x to be GCed before y.

With weak refs, this of course becomes observable. Thus once weak refs are introduced, the spec as-written mandates that x never be GCed before y.

In reaction to this, the spec author should probably change y.close() to null out the internal reference to x explicitly. Then browsers will again be allowed to collect x before y and free up some memory, while still conforming to the spec.

Generalizing, the tricky parts of the situation are:

  • Spec authors have to be more aggressive about nulling out internal references than they were before, when such references were unobservable. (Assuming they want to avoid mandating unnecessary memory usage.)
  • Implementers now have to be exact in following specs' recommendations on nulling out (or not) an internal reference, since doing so is now observable.
  • There are probably many divergences today in memory management strategies between browsers on various APIs of this sort, which may need to be audited now that the strategy is becoming observable.

Should we coarsen the atomicity unit?

Currently, the weakref proposal specifies an individual turn (i.e., "job" in EcmaScript terminology) as the atomicity unit. A turn/job is everything that happens from one empty user stack state to the next empty user stack state. Any implementation that uses a coarser unit, consisting of several turns, will satisfy the current spec as long as the boundaries between these units are also turn boundaries, i.e., empty stack states.

At the November tc39 meeting @tschneidereit suggested that the atomicity unit be coarsened in the spec to match what implementations are expected to do anyway, which is to always service a non-empty promise queue before servicing any other turns, and to only deal with WeakMap atomicity issues in states where the stack and promise queue were both empty.

If this assumption about implementations held, then this would be a possible choice, with pros and cons to argue through. (I've changed my opinion several times.) But nodejs/promise-use-cases#25 indicates that the premise is false. Implementations might not always service the promise queue at strictly higher priority. The reasons for not doing so are well motivated and plausible. However this particular bug is settled, we would have to preclude this possibility for all future implementations if we coarsened the atomicity unit this way.

Example use case (not resource management; convenient via iterable weak set)

I have a definite use case for weak references in one of my projects, jsdom. It runs primarily in Node, but also in the browser. Of note, this case isn't related to resource management.

jsdom implements, among other things, the DOM Standard. In particular, there is one line that concerns us here: when removing a node:

  1. For each NodeIterator object iterator whose root’s node document is node’s node document, run the NodeIterator pre-removing steps given node and iterator.

To correctly implement this step, you need a reference to every NodeIterator object that's ever been created, so you can update it correctly (by running the pre-removing steps).

This really requires weak references, as otherwise you end up holding a strong reference to every NodeIterator ever created, never releasing their memory, or the memory of the other objects that the NodeIterator references (e.g. the two DOM nodes it points to, root and reference node).


Of note:

  • This is not being used as any sort of backstop for cleaning things up. It's just plain required for correctness without massive memory leaks.
  • This doesn't require finalization at all.
  • What we really need is an "iterable weak set"; weak references of course can be used to build one of these (and an iterable weak set can be used to build a weak reference).

Finally, people may be amused by how we currently work around this. We have a user-configurable number, "max working NodeIterators", which we default to 10. We hold strong references to the last 10 created NodeIterators. If an 11th NodeIterator is created, we remove it from the list, then set an internal bit on it that says "sorry, this one's broken", and any methods you try to call on that NodeIterator will throw an exception, helpfully telling you to up the max working NodeIterators configuration.

We then loop over those 10 strongly-held NodeIterators whenever a node is removed from the document, and update them appropriately per the spec.

Resolve inconsistencies between null and undefined

Some parts of the spec say or imply that 'get/deref on a weakref whose target is collected returns null. Other parts say or imply void 0, i.e., undefined.

There may also be inconsistencies with the holdings, but I did not look as carefully. In any case, we need to look and resolve any such inconsistencies.

I can't see any way to observe whether an executor is null or undefined, but we should be sure we're consistent here anyway.

Has anyone written down any widely accepted criteria for choosing between null and undefined?

Document the implications of the absense of a same-Realm check for WeakRefs

makeCell and makeRef rely on the sameRealm operation:

"If SameRealm(target, this), then ..."

Why does it make sense to treat the reference strongly? What happens then - the cleanup function just never gets called, and the WeakCell stays alive forever (unless it's explicitly cleared)?

In addition, we don't have the SameRealm operation in V8 and when I talked with the engineers responsible for this area, they said "we don't want the objects to know which Realm they come from", that is, they think the SameRealm operation should not be possible and should not end up in the spec.

Consider adding additional non-determinism on purpose

If there is concern that weak references can create side-channels and/or implementation-reliance, consider intentionally adding non-determinsm to mask those information leaks or reliances.

In short: If an object is weakly held by a WeakMap or WeakRef (or a weak-value-map, etc), then add a random delay to garbage collection for that object.

All engines would then behave similarly: weak objects would take some random amount of time to be truly cleaned up. That eliminates implementation-reliance or cross-engine incompatibility. As for side-channels, communicating via the GC would only be possible if you can predict the random number generator.

For prior art here, see Go's map iteration or channel multiplexing. Go will choose a random offset from which to begin iteration. This prevents users from relying on map iteration order. Similarly, channel multiplexing introduces extra randomness to prevent masking race conditions or introducing bias.

This issue was prompted by a discussion on Twitter between myself and Sam Tobin-Hochstadt.

What is the new API?

I see some notes about a new API for WeakRefs at #54 , but it's somewhat underspecified. Is someone planning on writing up the new API in detail? cc @tschneidereit

FAQ: Why not just add a callback to WeakMap/WeakSet?

What if it was done like this instead?

const weakObjs = new WeakSet([], objectAboutToBeCollected => {
  // here do something with the object right before it is going to be collected,e.g.
  objectAboutToBeCollected.dispose();
  // or if objectAboutToBeCollected is attached to a live object then collection is cancelled,
  // this method will be called again when it is ready to be collected once more
});

same for WeakMap

if listening for an object finalization is no longer needed then it can just be removed from the set / the set can be collected

How does this impact the web platform? (Cross-cutting concerns)

Several times on the web platform, we've denied feature requests due to not wanting to expose garbage collection behavior. The most prominent in my recent memory is detecting MessagePort closing: see

I think as part of exploring the cross-cutting concerns of this feature, this proposal and its champions need to answer at least the following questions:

  • Would this proposal allow you to create your own "close" event for MessagePort, of the type that these threads are asking for? If so, please show the example code.
  • If this proposal does do so, should we give developers what they've been asking for since at least 2013, and just add the event to browsers, now that the cat is out of the bag via weakrefs? (Needs consulting with browser teams.)

I'd also hope that the champions can collaborate with others to dig up other cases where web platform features have been omitted due to the desire not to expose GC, and help us evaluate whether we should revisit those decisions and APIs if weakrefs become shipped. I'll try to direct folks here to remind of us other cases.

Rename WeakFactory to FinalizationGroup

I think we should rename WeakFactory, for a couple of reasons:

  1. Grouping lifetimes is the function of this class that's more immediately relevant to developers
  2. It's function as a mechanism for removing the ability to create WeakRefs (while keeping the ability to hand them out) is a very advanced concept that we don't need to put focus on

Realistically speaking, most developers won't have strong intuitions about any of the terms in this domain. Nevertheless, I think we should try to use names that at least guide them in the right direction. In particular, the concept of grouping lifetimes of a set of WeakRefs is important, and I think the solution we settled on is excellent. Now we also need to make it as easy as possible for developers to understand this solution :)

I'm not saying that FinalizationGroup is a fantastic name—in particular it emphasizes only one half of what is provided—, but I do think that it at least points in the right direction, providing a good starting point for learning about the concepts involved via sources like MDN.

@erights, @dtribble, WDYT?

stopgap solutions?

Assuming that this may take a while to make it into the browsers, are there some official recommendations on manual memory management libraries for javascript?

Recently I've been using a library that rolls its own as an implementation of this paper.

It seems like even though there are some standard academic papers/references, there aren't many (or any?) standard libs to fill in the gaps for now.

If there are some generic libraries or guidelines to implement in javascript, it might be nice to list them in the Readme here as a stopgap measure

Needs discussion / clarification: if cleanup doesn't iterate all weak cells, when will it be called again?

Assume the WeakFactory's cleanup function does not retrieve all dirty WeakCells from the iterator. When will the cleanup function be scheduled to be called again?

  1. Immediately (i.e., post another cleanup task immediately to the microtask queue, i.e., let other already scheduled microtasks run and call cleanup again)
  2. Whenever GC finds another dirty WeakCell belonging to the same WeakFactory
  3. Whenever GC finds another dirty WeakCell belonging to any WeakFactory
  4. Something else?

Replace getCell with registerForFinalization (or just register)

WeakCell provides the means to do finalization without dereferencing, solving the long-turn use cases posed by Workers and maybe WebAssembly. However, they are relatively costly in terms of memory usage.

I'd like to propose dropping the concept of WeakCell, and instead providing methods on WeakFactory (or FinalizationGroup) to register a [target, holdings] pair for finalization. That'd reduce the memory usage to two words, which I think is the best we can possibly do. Unregistering can be done by providing the same pair to an accompanying unregisterFromFinalization method.

If we do #49, these could just be called register and unregister, or add and remove, which seems nice and reasonably clear to me.

@erights, @dtribble, does this seem reasonable to you?

Question about ordering of dirty cells/refs yielded by ReclaimedIterator objects

Consider this example:

function cleanupOne(refs) {
  // Process only one ref for cleanup.
  console.log(refs.next().value.holdings);
}

const factory = new WeakFactory();

function makeGarbage(tag) {
    const obj = {};
    return factory.makeRef(obj, tag);
}

const a = makeGarbage("a");
const b = makeGarbage("b");

// Wait for `KeepDuringJob`'s affects to expire.
await nextJob();

// JS observes that the targets for a and b were reclaimed.
assertEq(a.deref(), null);
assertEq(b.deref(), null);

// Cleanup one reference.
//
// (This could also use the WeakCellJobs queue, but for this example it does not
// because it is easier to write this way.)
factory.cleanupSome(cleanupOne);

const c = makeGarbage("c");

// Wait for `KeepDuringJob`'s affects to expire.
await nextJob();

// JS observes that c has been reclaimed.
assertEq(c.deref(), null);

// Cleanup one more reference.
factory.cleanupSome(cleanupOne);

What are the permissible orderings of log messages?

Must it always be either a, b or b, a? Are either of a, c or b, c allowed as well?

It seems if we want to ensure that only the former are allowed, then I think WeakFactory objects should have a [[Dirty]] internal property that is an ordered queue that dirty cells get moved into from [[ActiveCells]] when they become dirty.

A simpler API?

Sorry if I'm missing something obvious because of my lack of understanding, but why can't we have a simpler API like this?

let obj = new Something();
let strong = obj;
let weak = Object.getWeakReference(obj);

obj; // Something
strong; // Something
weak; // Something

obj = undefined;

obj; // undefined
strong; // Something
weak; // Something

strong = undefined;

obj; // undefined
strong; // undefined
weak; // null

WeakCell.prototype.clear(): step 5 could be a little more clear

5. If factory is not undefined.
     i.   Remove O from factory.[[ActiveCells]].
     ii.  Set O.[[Target]] to undefined.
     iii. Set O.[[Factory]] to undefined.
     iv.  Set O.[[Holdings]] to undefined.

First, it seems that there should be some sort of assertion/check that factory is a WeakFactory object. Yes, it seems it will always be such an object if it isn't undefined, but I think things could be a bit more clear in the spec text here.

Second, there should probably be some sort of check for if factory.[[ActiveCells]] is a list and is not undefined, since WeakFactory.prototype.shutdown can set it to undefined. Again, seems pretty obvious what is the intended behavior, but the spec could be made a bit more clear.

What should if WeakFactory.makeCell / makeRef is called after WeakFactory.shutdown?

The current spec doesn't do anything to prevent calling WeakFactory.makeCell or WeakFactory.makeRef after calling WeakFactory.shutdown.

But shutdown has set [[ActiveCells]] to undefined, and then makeCell / makeRef tries to add to it, so that sounds wrong.

If the intention of shutdown is to make the WeakFactory forever unusable, the spec needs to prevent calling makeCell / makeRef somehow (e.g,. set a flag in WeakFactory and throw an error in makeCell / makeRef if the flag is set).

If the intention is to just get rid of the current WeakCells without doing cleanup for them, the spec needs to reset the members in such a way that it remains usable. (The function could be called e.g., "clear()" in that case.)

Fix the reference section

I missed that I never updated the reference section, so it's basically missing. It needs to be added, expanded.

Misc bugs in writeup

In the section "extended example", in the call to makeWeakRef, the executor argument should probably not be this.executor, but this.dropRef?

Typo in the paragraph below the code: "need to do perform", the "do" is wild.

Add explainer to README

Lots of information is missing from, or explicitly marked as outdated in, the README. It would be great to have an up-to-date explainer in the README itself. Might I suggest the following sections (based on the Stage 1 entrance criteria) with at least one paragraph each?

  • Motivation
  • Proposed solution
  • High-level API
  • Illustrative examples
  • FAQ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.