Giter Site home page Giter Site logo

bnaya / objectbuffer Goto Github PK

View Code? Open in Web Editor NEW
167.0 167.0 5.0 6.99 MB

JavaScript Object like api, backed by an arraybuffer

License: MIT License

JavaScript 4.42% TypeScript 89.18% HTML 0.11% WebAssembly 6.29%
arraybuffer javascript sharedarraybuffer sharedmemory

objectbuffer's People

Contributors

bnaya avatar dimshik100 avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

objectbuffer's Issues

Figure-out public api

The current api is very ad-hoc.

We need to brainstorm and decide on a public api that we can stand for.

Impl hashCodeExternalValue without auxiliary arraybuffer

Important note: need to be system endienss aware

As part of the hashmap impl, we have flow (lookup) that we need to hash value that is yet saved inside our main ArrayBuffer.
But out current hash function is working only on arraybuffer, so we have to first save the key in an intermediate arraybuffer so we can hash it.

That's very wasteful, and better be avoided.

export function hashCodeExternalValue(
capacity: number,
value: string | number
): number {
const ab = new ArrayBuffer(typeof value === "string" ? value.length * 3 : 8);
const uint8 = new Uint8Array(ab);
let keyBytesLength = ab.byteLength;
if (typeof value === "string") {
keyBytesLength = stringEncodeInto(new Uint8Array(ab), 0, value);
} else {
new Float64Array(ab)[0] = value;
}
return hashCodeInPlace(uint8, capacity, 0, keyBytesLength);
}

Consider fixed-shape objects optimization / API. Objects are so wasteful!

Currently, each and every object is backed by It's own instance of hashmap.
That's maybe the most idiomatic but wasteful way to implement javascript objects.

A 100 items array of object as {keyProp: "myPropValue"} will allocate 100 times keyProp string for the key,
And also expensive hashmap nodes (internal data structure).

JavaScript engines in general tries to optimize it in few ways.
Most noticeably, to have try and share structure of objects.
Saving the structure of objects in a central registry, where each object shape can be a Set,
and each actual object will be a array on the length of the key's count and a pointer to the shape.

To perform a lookup on that object, we look the key in the object shape Set to get the index O(log N),
And access the array to get the value pointer O(log N)

Every new object with the same shape will reuse the same
object shape hashmap from the global registery.

What are the challenges

  • What happen if a user wants to change shape of an object that is already pointing a shape in the registry?
  • How to efficiently detect the shape of an object and find the correct shape in the registry
  • A LOT OF CODE COMPLEXITY

Possible solution

  • I need to write this one on another time

[optimization] Research and configure the minifier to inline more values

Notes:
We are using terser.
There are static/const values as:
l.NUMBER, l.BIGINT_POSITIVE, l.BIGINT_NEGATIVE, l.STRING;
Or Uint32Array.BYTES_PER_ELEMENT that although statically known, not are not beaning inlined by terser.
To see the the minified code:
https://unpkg.com/@bnaya/[email protected]/dist/objectbuffer.esm.js
And also run yarn build and look at the dist folder

See if https://github.com/terser/terser#annotations can help

Support typed arrays

Should be simple

  • Copy on save/set into the main arraybuffer
  • return typed array with offset
    ARC & cache as objects

Reuse strings and maintain objects references on save

Currently, when we save a value, even when there are identical strings, numbers, and objects that are reference to the same object, they will be saved as new values and take additional memory.
That's true for also object keys.

We may add map that let us save the value once and use the same saved value pointer for all of the occurrences.
But we will need to have reference counting and "copy and write" behaviour as we have now for objects. due to the overhead of the reference counting, it might be better to not apply it to numbers and bigint

examples:

Instead of creating 2 string entries
["a", "a"]
We will create one, with reference count 2.
When you'll have an array of objects, that can save a lot of memory by not re-saving the objects keys for each object instance

[{
  "postId": 1,
  "id": 1,
  "name": "id labore ex et quam laborum",
  "email": "[email protected]",
  "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
},
{
  "postId": 1,
  "id": 1,
  "name": "id labore ex et quam laborum",
  "email": "[email protected]",
  "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
}.
...
]

Another optimisation would be: preserving this kind of cache for the lifetime of the objectbuffer,
That can open many interesting things, to be elaborated on another issue

Find a replacement for intermediate `Entry` objects allocations [Or not]

As part of saving / reading values from the AB, there's intermediate objects allocations.
Most noticeably "entries"
Created by primitiveValueToEntry and passed to appendEntry / writeEntry / readEntry

We need to get rid of that mechanism, or to recycle these objects, as they have very similar shape,
And i think only up to-2 of them can exits in the same time.

But, maybe that's also no so bad, as they are all very short lived. TBD

Handling WebAssembly (AssemblyScript) objects from JS.

This seems like it might be the tool for the job. I saw you comments in the WebAssembly Discord #assemblyscript channel. Btw there's an official AssemblyScript Discord server now!

I'm working to port Three.js to AssemblyScript.

For people who will write in AssemblyScript, they'll just import the code and use it like normal. But there will be people who will (and for some of my projects I will) want to or need to write JavaScript and interface with things in WebAssembly. As an example, I'm working on custom elements that represent objects to draw on the screen with WebGL. It currently uses Three.js under the hood, in plain JS. But once I have ported enough of it to AssemblyScript, I'd like to run the WebGL engine in WebAssembly but I will need to interface with the custom elements which are JavaScript.

I've no idea how it will work yet, but something like what you are working seems like a possible way to create interfaces to the same objects that live in a WebAssembly module.

For example, if I were writing in pure AssemblyScript, then I could make a new Mesh and set some properties on it. But if I'm in JS, I would also like to do the same (but still have it be in the WASM module).

Any thoughts on this?

Consider Optional string cache by pointer address as key

Instead of decoding strings on each access, one might consider to cache the strings with the pointer as the key. when we free that pointer, remove that cache entry.

What's the problem?
Like depending on any auxiliary memory,
Another process might free that memory or set a new value to it.
Need to think of a way around it (cache busting increments on the shared memory?)

Fix Set/Map delete on iteration

Currently deleting the current iterated item on Map/Set will lead to skip the next item.

To fix that we need:

  • Allocate iterator inside the ArrayBuffer
  • Update the iterator on delete
  • add deallocation of that iterator on iteration end, manual, and on finalizationGroup
    For now its a known limitation

External values cache/map

What if instead of decoding strings, we keep reference to the original one behind a key that is arraybuffer with the bytes of the string, and compare this arraybuffer and reuse the already decoded string?
The js engine will be able to do many cool things with it.
But it can memory leak, and add memory overhead.
How can we known when we can remove the decoded string from the cache?
How can we lookup inside the cache?

Worth researching!

Fix strByteLength with emojis

strByteLength retunes wrong size for string with emojis (too big)

/**
* Incorrect length (too big) for emojis
* @param str
*/
export function strByteLength(str: string) {
let s = str.length;
for (let i = str.length - 1; i >= 0; i--) {
const code = str.charCodeAt(i);
if (code > 0x7f && code <= 0x7ff) s++;
else if (code > 0x7ff && code <= 0xffff) s += 2;
}
return s;
}

The output length needs to be compatible with the output of:

export function stringEncodeInto(uint8: Uint8Array, from: number, str: string) {

Example for failing test:
#85

Make compatible TextEncoder/TextDecoder typescript declarations

TextEncoder & TextDecoder are not part of the ECMA spec,
But Node and the Browser has their own almost identical impl.
One of the differences if that in the browser the constructors are available globally,
While in node, you need to import them from the util built-in package.

Our library strives to be isomorphic and portable as possible,
So we don't want out declarations to dependes on symbols from the dom lib, nor @types/node.

Currently we just accept any and not specific type for TextEncoder/TextDecoder.
What we can do is to copy most of the types of them from dom or node types,
And typescript structural checking will do the rest

Awesome job ๐Ÿ‘

Just wanted to say thank you for putting out this library, really great work! It's a really cool idea.

As an open source developer myself I know the feeling if someone appreciates the work you invested and too many people take it for granted that people build these projects in their spare time without much appreciation or gratitude.

So, this is not a feature request, bug report or anything of that kind, just a thank you for building this.

๐Ÿ‘๐Ÿ‘๐Ÿ‘

Keep it up and have a great day!

Improve speed

Hello @Bnaya! Thanks for awesome project!
I play with it and investigate, that speed so low compare to classic approach:
serialize object -> patch it -> deserialize object
Would be good to try to increase speed to have competition to serialisation/deserialization way

Refactor code flows that creating intermediate objects

We have several functions that returning separate values inside an intermediate object.
These functions might be called very frequently internally.
These code flows needs to be refactored to no depend of that.

Ideas for how:

  • Maybe one of the values is not in use
  • Maybe one of the values can be deduced from the other one (ptr can be from ptrToPtr)
  • When arrays are involved: maybe pass array from the caller and not return new one

Example:

// @todo avoid intermediate object
return {
pointer,
pointerToThePointer,
};

// @todo avoid intermediate object
return {
pointers,
pointersToValuePointers,
};
}

Search for // @todo avoid intermediate object in the code

Figure out sizeof

It's difficult to pre-calc the size of hashmap due to dynamic capacity of buckets, that depends on a factor but also on the inserted keys.
Creating a hashmap to calc the size of a hashmap doesn't makes sense

Transactional values saves are not hermetic, needs audit and tests

Every operation that requires new allocations,
We first to the allocations, and only then mutating the old state.
Example:
give this code:

const ob = createObjectBuffer({
a: 1
});
ob.a = "new string value";

When setting new value into the string value, we first save the string value,
And only then we reassign the pointer to point to the data, and update the needed ref counts.

Transactional values saves means that if we oom during save, stop the operation, free the memory allocated until that point, and as we didn't mutate anything else,
our object buffer is not stuck in any "broken" state

Re-use entries memory where possible

When we write a new primitive entry on array/object,
try to use the current entry memory instead of appending new one.
With numbers it will be easy.
other values, that have variable size, can be tricky.

Possible optimizations umbrella ๐Ÿš€

Possible optimizations umbrella ๐Ÿš€

  • #154 Iterative, non-blocking memory reclaim
  • #94 Refactor code flows that creating intermediate objects
  • #92 On the minified code, figure out why expressions like Uint32Array.BYTES_PER_ELEMENT or .BIGINT_POSITIVE are not inlined to their value, even though they are statically known
  • #103 External values cache/map
  • #117 hashmap creation, don't check if key already exists in bucket
  • #101 Refactor linkedListItemInsert to return pointer to pointer, and not save value
  • #123 Global string registry - avoid strings duplications in memory
  • #88 Implement hashCodeExternalValue without auxiliary arraybuffer
  • #100 Transactional values saves are not hermetic
  • #102 Reuse primitives and maintain objects references on save
  • #105 Fix strByteLength with emojis

Refactor linkedListItemInsert to return pointer to pointer, and not save value

The linkedListItemInsert api accepts nodeValuePointer, and saves it.
A better api would be to simply return pointer to pointer and let the caller to do the actual write when he see fits

export function linkedListItemInsert(
carrier: GlobalCarrier,
linkedListPointer: number,
nodeValuePointer: number
) {
const { allocator, heap } = carrier;
const newEndMarker = allocator.calloc(linkedListItem_size);
linkedListItem_set_all(heap, newEndMarker, 0, 0);
const wasEndMarker = linkedList_END_POINTER_get(heap, linkedListPointer);
linkedList_END_POINTER_set(heap, linkedListPointer, newEndMarker);
linkedListItem_set_all(heap, wasEndMarker, newEndMarker, nodeValuePointer);
return wasEndMarker;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.