bnaya / objectbuffer Goto Github PK
View Code? Open in Web Editor NEWJavaScript Object like api, backed by an arraybuffer
License: MIT License
JavaScript Object like api, backed by an arraybuffer
License: MIT License
The current api is very ad-hoc.
We need to brainstorm and decide on a public api that we can stand for.
Basically implemented.
Hard to test automatically, manual tests seems working
https://github.com/Bnaya/objectbuffer/blob/453549e09be4ea846e9028192df7a03fdc4fa126/src/externalPerspectiveTests/memoryGCTest.test.ts
https://github.com/tc39/proposal-weakrefs
https://chromium-review.googlesource.com/c/v8/v8/+/1986392
Figure out how to test it
Now there are tests using --expose-gc --harmony-weak-refs flags
Record the endianness of the stuff we put inside the ArrayBuffer somewhere
Important note: need to be system endienss aware
As part of the hashmap impl, we have flow (lookup) that we need to hash value that is yet saved inside our main ArrayBuffer.
But out current hash function is working only on arraybuffer, so we have to first save the key in an intermediate arraybuffer so we can hash it.
That's very wasteful, and better be avoided.
objectbuffer/src/internal/hashmap/hashmapUtils.ts
Lines 24 to 39 in 4a8438f
We can save that info as a flag in our ArrayBuffer,
but proxy doesn't have traps for it (?)
See:
https://esdiscuss.org/topic/object-freezing-proxies-should-freeze-or-throw
Currently, each and every object is backed by It's own instance of hashmap.
That's maybe the most idiomatic but wasteful way to implement javascript objects.
A 100 items array of object as {keyProp: "myPropValue"}
will allocate 100 times keyProp
string for the key,
And also expensive hashmap nodes (internal data structure).
JavaScript engines in general tries to optimize it in few ways.
Most noticeably, to have try and share structure of objects.
Saving the structure of objects in a central registry, where each object shape can be a Set,
and each actual object will be a array on the length of the key's count and a pointer to the shape.
To perform a lookup on that object, we look the key in the object shape Set to get the index O(log N)
,
And access the array to get the value pointer O(log N)
Every new object with the same shape will reuse the same
object shape
hashmap from the global registery.
What are the challenges
Possible solution
Notes:
We are using terser.
There are static/const values as:
l.NUMBER, l.BIGINT_POSITIVE, l.BIGINT_NEGATIVE, l.STRING;
Or Uint32Array.BYTES_PER_ELEMENT
that although statically known, not are not beaning inlined by terser.
To see the the minified code:
https://unpkg.com/@bnaya/[email protected]/dist/objectbuffer.esm.js
And also run yarn build
and look at the dist folder
See if https://github.com/terser/terser#annotations can help
Should be simple
Currently, when we save a value, even when there are identical strings, numbers, and objects that are reference to the same object, they will be saved as new values and take additional memory.
That's true for also object keys.
We may add map that let us save the value once and use the same saved value pointer for all of the occurrences.
But we will need to have reference counting and "copy and write" behaviour as we have now for objects. due to the overhead of the reference counting, it might be better to not apply it to numbers and bigint
examples:
Instead of creating 2 string entries
["a", "a"]
We will create one, with reference count 2.
When you'll have an array of objects, that can save a lot of memory by not re-saving the objects keys for each object instance
[{
"postId": 1,
"id": 1,
"name": "id labore ex et quam laborum",
"email": "[email protected]",
"body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
},
{
"postId": 1,
"id": 1,
"name": "id labore ex et quam laborum",
"email": "[email protected]",
"body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
}.
...
]
Another optimisation would be: preserving this kind of cache for the lifetime of the objectbuffer,
That can open many interesting things, to be elaborated on another issue
As part of saving / reading values from the AB, there's intermediate objects allocations.
Most noticeably "entries"
Created by primitiveValueToEntry
and passed to appendEntry
/ writeEntry
/ readEntry
We need to get rid of that mechanism, or to recycle these objects, as they have very similar shape,
And i think only up to-2 of them can exits in the same time.
But, maybe that's also no so bad, as they are all very short lived. TBD
This seems like it might be the tool for the job. I saw you comments in the WebAssembly Discord #assemblyscript channel. Btw there's an official AssemblyScript Discord server now!
I'm working to port Three.js to AssemblyScript.
For people who will write in AssemblyScript, they'll just import the code and use it like normal. But there will be people who will (and for some of my projects I will) want to or need to write JavaScript and interface with things in WebAssembly. As an example, I'm working on custom elements that represent objects to draw on the screen with WebGL. It currently uses Three.js under the hood, in plain JS. But once I have ported enough of it to AssemblyScript, I'd like to run the WebGL engine in WebAssembly but I will need to interface with the custom elements which are JavaScript.
I've no idea how it will work yet, but something like what you are working seems like a possible way to create interfaces to the same objects that live in a WebAssembly module.
For example, if I were writing in pure AssemblyScript, then I could make a new Mesh
and set some properties on it. But if I'm in JS, I would also like to do the same (but still have it be in the WASM module).
Any thoughts on this?
Instead of making null and undefined point to an actuall entry,
Use well-known numbers that are not valid pointers.
Instead of decoding strings on each access, one might consider to cache the strings with the pointer as the key. when we free that pointer, remove that cache entry.
What's the problem?
Like depending on any auxiliary memory,
Another process might free that memory or set a new value to it.
Need to think of a way around it (cache busting increments on the shared memory?)
Currently deleting the current iterated item on Map/Set will lead to skip the next item.
To fix that we need:
What if instead of decoding strings, we keep reference to the original one behind a key that is arraybuffer with the bytes of the string, and compare this arraybuffer and reuse the already decoded string?
The js engine will be able to do many cool things with it.
But it can memory leak, and add memory overhead.
How can we known when we can remove the decoded string from the cache?
How can we lookup inside the cache?
Worth researching!
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map
Can be similar to the object impl
strByteLength retunes wrong size for string with emojis (too big)
objectbuffer/src/internal/utils.ts
Lines 69 to 82 in ab8ba96
The output length needs to be compatible with the output of:
Example for failing test:
#85
https://github.com/Bnaya/objectbuffer/blob/develop/src/internal/memoryMachinery.ts
The memoryMachinery stuff adding few layers of indirection on top of typedArray to provide something that is similar to C structs
Most of these indirections can be replaced with flat code, that probably gonna be more performant.
prepack / closure compiler might we able to reduce some of the indirections, maybe after some changes
TextEncoder & TextDecoder are not part of the ECMA spec,
But Node and the Browser has their own almost identical impl.
One of the differences if that in the browser the constructors are available globally,
While in node, you need to import them from the util
built-in package.
Our library strives to be isomorphic and portable as possible,
So we don't want out declarations to dependes on symbols from the dom
lib, nor @types/node
.
Currently we just accept any
and not specific type for TextEncoder/TextDecoder.
What we can do is to copy most of the types of them from dom
or node
types,
And typescript structural checking will do the rest
I'm using "@bnaya/objectbuffer": "^0.10.0", have to copy assertNonNull.js to ...dist/internal manually
Just wanted to say thank you for putting out this library, really great work! It's a really cool idea.
As an open source developer myself I know the feeling if someone appreciates the work you invested and too many people take it for granted that people build these projects in their spare time without much appreciation or gratitude.
So, this is not a feature request, bug report or anything of that kind, just a thank you for building this.
๐๐๐
Keep it up and have a great day!
Hello @Bnaya! Thanks for awesome project!
I play with it and investigate, that speed so low compare to classic approach:
serialize object -> patch it -> deserialize object
Would be good to try to increase speed to have competition to serialisation/deserialization way
Objects are implemented using plain linked list.
evaluate & maybe replace it with hashmap based implementation
We have several functions that returning separate values inside an intermediate object.
These functions might be called very frequently internally.
These code flows needs to be refactored to no depend of that.
Ideas for how:
Example:
objectbuffer/src/internal/arrayHelpers.ts
Lines 35 to 39 in ab8ba96
objectbuffer/src/internal/hashmap/hashmap.ts
Lines 422 to 427 in da30950
Search for // @todo avoid intermediate object
in the code
Maybe something like:
https://github.com/CCareaga/heap_allocator
It's difficult to pre-calc the size of hashmap
due to dynamic capacity of buckets, that depends on a factor but also on the inserted keys.
Creating a hashmap to calc the size of a hashmap doesn't makes sense
Every operation that requires new allocations,
We first to the allocations, and only then mutating the old state.
Example:
give this code:
const ob = createObjectBuffer({
a: 1
});
ob.a = "new string value";
When setting new value into the string value, we first save the string value,
And only then we reassign the pointer to point to the data, and update the needed ref counts.
Transactional values saves means that if we oom during save, stop the operation, free the memory allocated until that point, and as we didn't mutate anything else,
our object buffer is not stuck in any "broken" state
https://github.com/Bnaya/objectbuffer/blob/4a8438f3a0f8f5d7d5c6757e43f0dfa3f7839f7b/src/internal/memoryMachinery.ts
also get rid of typed arrays on carrier, use only carrier.heap
Make sure toString works on our wrappers
Add tests
When we write a new primitive entry on array/object,
try to use the current entry memory instead of appending new one.
With numbers it will be easy.
other values, that have variable size, can be tricky.
Possible optimizations umbrella ๐
Uint32Array.BYTES_PER_ELEMENT
or .BIGINT_POSITIVE
are not inlined to their value, even though they are statically knownConsider also making them yieldable to be scheduler-friendly
The linkedListItemInsert
api accepts nodeValuePointer, and saves it.
A better api would be to simply return pointer to pointer and let the caller to do the actual write when he see fits
objectbuffer/src/internal/linkedList/linkedList.ts
Lines 45 to 60 in da30950
We use https://github.com/thi-ng/umbrella/tree/master/packages/malloc, with default memory alignment of 8.
That's makes every allocation of minimum 8 bytes, and rounded to the closet multiplication of 8 (overhead!)
We also need do be 8 aligned to be able to read/write 8 bytes data types (Float64, BigInt64) using TypedArray and not DataView.
But maybe we can use 4 bytes alignment, and pad to 8 bytes only when needed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.