The objectbuffer from bnaya

The current api is very ad-hoc.

We need to brainstorm and decide on a public api that we can stand for.

weakrefs umbrella

Basically implemented.
Hard to test automatically, manual tests seems working
https://github.com/Bnaya/objectbuffer/blob/453549e09be4ea846e9028192df7a03fdc4fa126/src/externalPerspectiveTests/memoryGCTest.test.ts

https://github.com/tc39/proposal-weakrefs
https://chromium-review.googlesource.com/c/v8/v8/+/1986392

~~Figure out how to test it~~
Now there are tests using --expose-gc --harmony-weak-refs flags

Record Endianness somewhere as metadata

Record the endianness of the stuff we put inside the ArrayBuffer somewhere

thi-ng/umbrella#196

Impl hashCodeExternalValue without auxiliary arraybuffer

Important note: need to be system endienss aware

As part of the hashmap impl, we have flow (lookup) that we need to hash value that is yet saved inside our main ArrayBuffer.
But out current hash function is working only on arraybuffer, so we have to first save the key in an intermediate arraybuffer so we can hash it.

That's very wasteful, and better be avoided.

objectbuffer/src/internal/hashmap/hashmapUtils.ts

Lines 24 to 39 in 4a8438f

    
           export function hashCodeExternalValue( 
        
             capacity: number, 
        
             value: string | number 
        
           ): number { 
        
             const ab = new ArrayBuffer(typeof value === "string" ? value.length * 3 : 8); 
        
             const uint8 = new Uint8Array(ab); 
        
             let keyBytesLength = ab.byteLength; 
        
             if (typeof value === "string") { 
        
               keyBytesLength = stringEncodeInto(new Uint8Array(ab), 0, value); 
        
             } else { 
        
               new Float64Array(ab)[0] = value; 
        
             } 
        
             return hashCodeInPlace(uint8, capacity, 0, keyBytesLength); 
        
           }

Write string decoder & encoder

Use thi-ng/malloc to manage memory

https://github.com/thi-ng/umbrella/tree/master/packages/malloc

Figure out freeze/seal support

We can save that info as a flag in our ArrayBuffer,
but proxy doesn't have traps for it (?)
See:
https://esdiscuss.org/topic/object-freezing-proxies-should-freeze-or-throw

refactor getAllLinkedAddresses to be iterative and not recursive

Configurable string min size, to allow space re-use

related #12

To make #12 more possible on strings, we can always allocate a min configurable string size,
And only if we exceed it, append to entry.

Consider fixed-shape objects optimization / API. Objects are so wasteful!

Currently, each and every object is backed by It's own instance of hashmap.
That's maybe the most idiomatic but wasteful way to implement javascript objects.

A 100 items array of object as {keyProp: "myPropValue"} will allocate 100 times keyProp string for the key,
And also expensive hashmap nodes (internal data structure).

JavaScript engines in general tries to optimize it in few ways.
Most noticeably, to have try and share structure of objects.
Saving the structure of objects in a central registry, where each object shape can be a Set,
and each actual object will be a array on the length of the key's count and a pointer to the shape.

To perform a lookup on that object, we look the key in the object shape Set to get the index O(log N),
And access the array to get the value pointer O(log N)

Every new object with the same shape will reuse the same
object shape hashmap from the global registery.

What are the challenges

What happen if a user wants to change shape of an object that is already pointing a shape in the registry?
How to efficiently detect the shape of an object and find the correct shape in the registry
A LOT OF CODE COMPLEXITY

Possible solution

I need to write this one on another time

[optimization] Research and configure the minifier to inline more values

Notes:
We are using terser.
There are static/const values as:
l.NUMBER, l.BIGINT_POSITIVE, l.BIGINT_NEGATIVE, l.STRING;
Or Uint32Array.BYTES_PER_ELEMENT that although statically known, not are not beaning inlined by terser.
To see the the minified code:
https://unpkg.com/@bnaya/[email protected]/dist/objectbuffer.esm.js
And also run yarn build and look at the dist folder

See if https://github.com/terser/terser#annotations can help

Support typed arrays

Should be simple

Copy on save/set into the main arraybuffer
return typed array with offset
ARC & cache as objects

"Rollback" partial operation when got OOM during partial

Reuse strings and maintain objects references on save

Currently, when we save a value, even when there are identical strings, numbers, and objects that are reference to the same object, they will be saved as new values and take additional memory.
That's true for also object keys.

We may add map that let us save the value once and use the same saved value pointer for all of the occurrences.
But we will need to have reference counting and "copy and write" behaviour as we have now for objects. due to the overhead of the reference counting, it might be better to not apply it to numbers and bigint

examples:

Instead of creating 2 string entries
["a", "a"]
We will create one, with reference count 2.
When you'll have an array of objects, that can save a lot of memory by not re-saving the objects keys for each object instance

[{
  "postId": 1,
  "id": 1,
  "name": "id labore ex et quam laborum",
  "email": "[email protected]",
  "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
},
{
  "postId": 1,
  "id": 1,
  "name": "id labore ex et quam laborum",
  "email": "[email protected]",
  "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
}.
...
]

Another optimisation would be: preserving this kind of cache for the lifetime of the objectbuffer,
That can open many interesting things, to be elaborated on another issue

Find a replacement for intermediate `Entry` objects allocations [Or not]

As part of saving / reading values from the AB, there's intermediate objects allocations.
Most noticeably "entries"
Created by primitiveValueToEntry and passed to appendEntry / writeEntry / readEntry

We need to get rid of that mechanism, or to recycle these objects, as they have very similar shape,
And i think only up to-2 of them can exits in the same time.

But, maybe that's also no so bad, as they are all very short lived. TBD

Handling WebAssembly (AssemblyScript) objects from JS.

This seems like it might be the tool for the job. I saw you comments in the WebAssembly Discord #assemblyscript channel. Btw there's an official AssemblyScript Discord server now!

I'm working to port Three.js to AssemblyScript.

For people who will write in AssemblyScript, they'll just import the code and use it like normal. But there will be people who will (and for some of my projects I will) want to or need to write JavaScript and interface with things in WebAssembly. As an example, I'm working on custom elements that represent objects to draw on the screen with WebGL. It currently uses Three.js under the hood, in plain JS. But once I have ported enough of it to AssemblyScript, I'd like to run the WebGL engine in WebAssembly but I will need to interface with the custom elements which are JavaScript.

I've no idea how it will work yet, but something like what you are working seems like a possible way to create interfaces to the same objects that live in a WebAssembly module.

For example, if I were writing in pure AssemblyScript, then I could make a new Mesh and set some properties on it. But if I'm in JS, I would also like to do the same (but still have it be in the WASM module).

Any thoughts on this?

Make null, undefined, true, false point to a known static location, and not entry

Instead of making null and undefined point to an actuall entry,
Use well-known numbers that are not valid pointers.

Consider Optional string cache by pointer address as key

Instead of decoding strings on each access, one might consider to cache the strings with the pointer as the key. when we free that pointer, remove that cache entry.

What's the problem?
Like depending on any auxiliary memory,
Another process might free that memory or set a new value to it.
Need to think of a way around it (cache busting increments on the shared memory?)

Fix Set/Map delete on iteration

Currently deleting the current iterated item on Map/Set will lead to skip the next item.

To fix that we need:

Allocate iterator inside the ArrayBuffer
Update the iterator on delete
add deallocation of that iterator on iteration end, manual, and on finalizationGroup
For now its a known limitation

External values cache/map

What if instead of decoding strings, we keep reference to the original one behind a key that is arraybuffer with the bytes of the string, and compare this arraybuffer and reuse the already decoded string?
The js engine will be able to do many cool things with it.
But it can memory leak, and add memory overhead.
How can we known when we can remove the decoded string from the cache?
How can we lookup inside the cache?

Worth researching!

Create performance benchmark infra

Implement Map

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map
Can be similar to the object impl

Fix strByteLength with emojis

strByteLength retunes wrong size for string with emojis (too big)

objectbuffer/src/internal/utils.ts

Lines 69 to 82 in ab8ba96

    
           /** 
        
            * Incorrect length (too big) for emojis 
        
            * @param str 
        
            */ 
        
           export function strByteLength(str: string) { 
        
             let s = str.length; 
        
             for (let i = str.length - 1; i >= 0; i--) { 
        
               const code = str.charCodeAt(i); 
        
               if (code > 0x7f && code <= 0x7ff) s++; 
        
               else if (code > 0x7ff && code <= 0xffff) s += 2; 
        
             } 
        
             return s; 
        
           }

The output length needs to be compatible with the output of:

objectbuffer/src/internal/stringEncodeInto.ts

Line 4 in 4dc8259

    
           export function stringEncodeInto(uint8: Uint8Array, from: number, str: string) {

Example for failing test:
#85

investigate memory machinery optimisation with prepack/closure compiler

https://github.com/Bnaya/objectbuffer/blob/develop/src/internal/memoryMachinery.ts

The memoryMachinery stuff adding few layers of indirection on top of typedArray to provide something that is similar to C structs

Most of these indirections can be replaced with flat code, that probably gonna be more performant.

prepack / closure compiler might we able to reduce some of the indirections, maybe after some changes

Make compatible TextEncoder/TextDecoder typescript declarations

TextEncoder & TextDecoder are not part of the ECMA spec,
But Node and the Browser has their own almost identical impl.
One of the differences if that in the browser the constructors are available globally,
While in node, you need to import them from the util built-in package.

Our library strives to be isomorphic and portable as possible,
So we don't want out declarations to dependes on symbols from the dom lib, nor @types/node.

Currently we just accept any and not specific type for TextEncoder/TextDecoder.
What we can do is to copy most of the types of them from dom or node types,
And typescript structural checking will do the rest

Can't resolve './assertNonNull

I'm using "@bnaya/objectbuffer": "^0.10.0", have to copy assertNonNull.js to ...dist/internal manually

Awesome job 👏

Just wanted to say thank you for putting out this library, really great work! It's a really cool idea.

As an open source developer myself I know the feeling if someone appreciates the work you invested and too many people take it for granted that people build these projects in their spare time without much appreciation or gratitude.

So, this is not a feature request, bug report or anything of that kind, just a thank you for building this.

👏👏👏

Keep it up and have a great day!

Improve speed

Hello @Bnaya! Thanks for awesome project!
I play with it and investigate, that speed so low compare to classic approach:
serialize object -> patch it -> deserialize object
Would be good to try to increase speed to have competition to serialisation/deserialization way

Refactor hashmap & linkedlist to use generated structs and not memoryMachine

Evaluate other malloc impl vs thing/malloc

https://github.com/codemix/malloc

Replace linked list with hashmap [?]

Objects are implemented using plain linked list.
evaluate & maybe replace it with hashmap based implementation

Refactor code flows that creating intermediate objects

We have several functions that returning separate values inside an intermediate object.
These functions might be called very frequently internally.
These code flows needs to be refactored to no depend of that.

Ideas for how:

Maybe one of the values is not in use
Maybe one of the values can be deduced from the other one (ptr can be from ptrToPtr)
When arrays are involved: maybe pass array from the caller and not return new one

Example:

objectbuffer/src/internal/arrayHelpers.ts

Lines 35 to 39 in ab8ba96

    
           // @todo avoid intermediate object 
        
           return { 
        
             pointer, 
        
             pointerToThePointer, 
        
           };

objectbuffer/src/internal/hashmap/hashmap.ts

Lines 422 to 427 in da30950

    
             // @todo avoid intermediate object 
        
             return { 
        
               pointers, 
        
               pointersToValuePointers, 
        
             }; 
        
           }

Search for // @todo avoid intermediate object in the code

Wanted: memory allocator that saves state on it's own heap

Maybe something like:
https://github.com/CCareaga/heap_allocator

re-Implement date without bigint

Figure out sizeof

It's difficult to pre-calc the size of hashmap due to dynamic capacity of buckets, that depends on a factor but also on the inserted keys.
Creating a hashmap to calc the size of a hashmap doesn't makes sense

Evaluate off main thread build with built in atomics-sync locks

Evaluate compatibility with assemblyscript

https://assemblyscript.org/memory.html#importing-memory

Transactional values saves are not hermetic, needs audit and tests

Every operation that requires new allocations,
We first to the allocations, and only then mutating the old state.
Example:
give this code:

const ob = createObjectBuffer({
a: 1
});
ob.a = "new string value";

When setting new value into the string value, we first save the string value,
And only then we reassign the pointer to point to the data, and update the needed ref counts.

Transactional values saves means that if we oom during save, stop the operation, free the memory allocated until that point, and as we didn't mutate anything else,
our object buffer is not stuck in any "broken" state

Refactor all memoryMachinery code to generated structs

https://github.com/Bnaya/objectbuffer/blob/4a8438f3a0f8f5d7d5c6757e43f0dfa3f7839f7b/src/internal/memoryMachinery.ts
also get rid of typed arrays on carrier, use only carrier.heap

obj.toString() support 🤫

Make sure toString works on our wrappers
Add tests

Re-use entries memory where possible

When we write a new primitive entry on array/object,
try to use the current entry memory instead of appending new one.
With numbers it will be easy.
other values, that have variable size, can be tricky.

Possible optimizations umbrella 🚀

Write browsers tests with karma [?]

Replace recursive operations with iterative ones

getAllLinkedAddresses
Also make sure not circular. use Set?
saveValue
sizeof

Consider also making them yieldable to be scheduler-friendly

Refactor linkedListItemInsert to return pointer to pointer, and not save value

The linkedListItemInsert api accepts nodeValuePointer, and saves it.
A better api would be to simply return pointer to pointer and let the caller to do the actual write when he see fits

objectbuffer/src/internal/linkedList/linkedList.ts

Lines 45 to 60 in da30950

    
           export function linkedListItemInsert( 
        
             carrier: GlobalCarrier, 
        
             linkedListPointer: number, 
        
             nodeValuePointer: number 
        
           ) { 
        
             const { allocator, heap } = carrier; 
        
             const newEndMarker = allocator.calloc(linkedListItem_size); 
        
             linkedListItem_set_all(heap, newEndMarker, 0, 0); 
        
             const wasEndMarker = linkedList_END_POINTER_get(heap, linkedListPointer); 
        
             linkedList_END_POINTER_set(heap, linkedListPointer, newEndMarker); 
        
             linkedListItem_set_all(heap, wasEndMarker, newEndMarker, nodeValuePointer); 
        
             return wasEndMarker; 
        
           }

Re-consider allocator alignment of 8 bytes.

We use https://github.com/thi-ng/umbrella/tree/master/packages/malloc, with default memory alignment of 8.
That's makes every allocation of minimum 8 bytes, and rounded to the closet multiplication of 8 (overhead!)
We also need do be 8 aligned to be able to read/write 8 bytes data types (Float64, BigInt64) using TypedArray and not DataView.

But maybe we can use 4 bytes alignment, and pad to 8 bytes only when needed?

	export function hashCodeExternalValue(
	capacity: number,
	value: string \| number
	): number {
	const ab = new ArrayBuffer(typeof value === "string" ? value.length * 3 : 8);
	const uint8 = new Uint8Array(ab);
	let keyBytesLength = ab.byteLength;

	if (typeof value === "string") {
	keyBytesLength = stringEncodeInto(new Uint8Array(ab), 0, value);
	} else {
	new Float64Array(ab)[0] = value;
	}

	return hashCodeInPlace(uint8, capacity, 0, keyBytesLength);
	}

	/**
	* Incorrect length (too big) for emojis
	* @param str
	*/
	export function strByteLength(str: string) {
	let s = str.length;
	for (let i = str.length - 1; i >= 0; i--) {
	const code = str.charCodeAt(i);
	if (code > 0x7f && code <= 0x7ff) s++;
	else if (code > 0x7ff && code <= 0xffff) s += 2;
	}

	return s;
	}

	// @todo avoid intermediate object
	return {
	pointer,
	pointerToThePointer,
	};

	// @todo avoid intermediate object
	return {
	pointers,
	pointersToValuePointers,
	};
	}

	export function linkedListItemInsert(
	carrier: GlobalCarrier,
	linkedListPointer: number,
	nodeValuePointer: number
	) {
	const { allocator, heap } = carrier;
	const newEndMarker = allocator.calloc(linkedListItem_size);

	linkedListItem_set_all(heap, newEndMarker, 0, 0);
	const wasEndMarker = linkedList_END_POINTER_get(heap, linkedListPointer);
	linkedList_END_POINTER_set(heap, linkedListPointer, newEndMarker);

	linkedListItem_set_all(heap, wasEndMarker, newEndMarker, nodeValuePointer);

	return wasEndMarker;
	}

bnaya / objectbuffer Goto Github PK

objectbuffer's People

Contributors

Stargazers

Watchers

Forkers

objectbuffer's Issues

Recommend Projects

Recommend Topics

Recommend Org