I initially made the ExportFunction have a single apply</cod

Alternative method for single returns about chicory HOT 11 CLOSED

bhelx commented on August 11, 2024

Alternative method for single returns

from chicory.

Comments (11)

electrum commented on August 11, 2024

Doing that would mean you can no longer implement the method using a lambda (it's no longer a @FunctionalInterface). One simple option is to add a new interface with an adapter method:

@FunctionalInterface
public interface ExportFunction {
    Value[] apply(Value... args) throws ChicoryException;

    static ExportFunction of(SimpleExportFunction function) {
        return args -> new Value[]{function.apply(args)};
    }

    @FunctionalInterface
    interface SimpleExportFunction {
        Value apply(Value... args) throws ChicoryException;
    }
}

But the real issue is that this interface is both inefficient and inconvenient. It's bad for performance because we box everything (AOT doesn't need boxing for arguments or single return), and it's annoying to implement because we have to adapt the arguments and return types.

We really need a low level interface based on MethodHandle that can be efficiently implemented by both users and the engine. Then we can have adapters for higher level interfaces based on annotations. For example:

@ExportFunction(args = {I32, I64, I32}, returns = {I32})
public static int myFunction(int a, long b, int c) {
    return (int) (a + b + c);
}

The JMH factorial benchmark shows a ~180x improvement for AOT vs interpreter on 1000 iterations, but only a ~60x improvement for 5 iterations. I haven't profiled it yet, but I suspect that the inefficiency of this interface is to blame.

(As I write this, I'm realizing that it would be helpful for the benchmark to have a Java implementation of the factorial function as a baseline for comparison.)

from chicory.

bhelx commented on August 11, 2024

Thanks for the suggestions on fixing the broader problem. Initially I was thinking about addressing this as an API boundary problem and wasn't thinking so much about performance, but if we can create both a performant and ergonomic interface that works externally and internally then that's probably the way to go.

from chicory.

electrum commented on August 11, 2024

My example there is more appropriate for a host function, as the implementation of an export function will come from the engine. For export functions, one option is for the user to provide a custom interface:

public interface Factorial {
    int execute(int n);
}

Or use any appropriately typed existing interface such as IntUnaryOperator.

For the interpreter, these interfaces could be implemented using MethodHandleProxies. For AOT, we could implement these interfaces directly if desired for better performance.

We could also support interfaces with multiple methods, which could make it easier to implement larger APIs. These could be implemented using JDK proxies or code generation.

I extended the factorial JMH benchmark with a baseline and found that AOT has identical performance to Java on 1000 iterations, but is ~6x slower for 5 iterations due to boxing. This means that the export function interface is the current blocker for low-overhead calls into WASM (perhaps important for certain types of WASM-as-a-library use cases).

from chicory.

andreaTP commented on August 11, 2024

Do we have an alternative to the usage of MethodHandle ?

I might be missing something, but, maybe generating the code directly for the expected use cases is an option?
E.g. generating code for up to 20 params and 10 return values.
It's anyhow a low level interface.

All in all, I'm curious to see a strawman to better understand the invariants here!

from chicory.

electrum commented on August 11, 2024

Do you have a specific objection to the usage of MethodHandle? It is a standard part of the Java platform since Java 7 and is used by the Java compiler for lambdas, string concatenation, etc. And it's even available on Android.

Generating specific interfaces is combinatorial with (4^N)*5 for a parameter arity of N with a result count of 0 or 1 (assume we will box for multi-return). So while that might be feasible for a small arity, ignoring the bloat to the distribution size, it doesn't seem like a good solution.

I can put together a prototype and see if we can get to zero overhead for AOT exports. This will also improve AOT performance for CALL_INDIRECT, which currently has to go through the boxed Machine.call() interface, even for other AOT methods.

from chicory.

andreaTP commented on August 11, 2024

So far, we have been very conscious about adding complexity to the interpreter codebase, and attempted to have 0 usage of reflection(even if modern).

It might be time to re-evaluate this decision when it provides a huge performance improvement; but let me play the devil's advocate a little more to be sure that's the best thing we can do to this codebase.

Please note that there are 0 reserves to use this technique in the AOT.

Thinking about it a little more, would a generative approach only for the most common use cases(up to 4 params and 0/1 returns) with a fallback for the remaining cases be a viable option?

from chicory.

electrum commented on August 11, 2024

The only complexity is in the adapter layer, which is used to create a nicer API for users. The interpreter could keep the same call() interface. The complexity right now is all on the user side (for both export and import functions). So this is both about performance and better APIs for users.

Up to 4 arity would be (4^4)*5 + (4^3)*5 + (4^2)*5 + (4^1)*5 + (4^0)*5 = 1705 combinations. But now that I think about it, I don't see how this would help, as you wouldn't have any way to call these from the interpreter (without generating that many call sites and a big switch or something).

and attempted to have 0 usage of reflection(even if modern)

I'd like to understand this more. Do you think it is somehow bad, or causes a problem for adoption or another goal? I don't understand why this would itself be a goal.

from chicory.

bhelx commented on August 11, 2024

Let's pick this up in the meeting tomorrow. Might be easier to talk about in person. I don't want to speak for Andrea but I think the primary concern is on keeping the interpreter code-base straightforward and if possible, push more complexity into the AOT layer. I think it should be achievable while satisfying Andrea's concerns.

from chicory.

andreaTP commented on August 11, 2024

Sorry for the delay!

Do you think it is somehow bad, or causes a problem for adoption or another goal? I don't understand why this would itself be a goal.

Great question! And probably it's a good time to challenge those, but let me recap the reasons that we have been using so far:

complexity: at the moment the codebase of the interpreter is straightforward, it's easy to onboard people in just few hours, I do value simplicity. We know since long that we need to relax this constraint when performance kicks in, although, I would feel more comfortable doing it after we have a full 100% spec V1 passing (validation included, but we are not far!).
native-image compilation: currently, the interpreter runs with 0 configuration after being compiled with GraalVM native-image. This is great because reflection configuration is hard to set up and horrific to maintain. That said, frameworks like Quarkus might help here, and in many cases this part is handled automagically.
Android compatibility: we don't have a real way to test this so far ... we have just "been told" about issues. It would be great to progress a bit under this aspect, as I fully understand this argument doesn't really hold.

All in all I'm in favor of:

push more complexity into the AOT layer.

when appropriate, but, if the performance gain is huge (especially if backed by real-world examples), I'm happy to keep the conversation going and iterate over those points.

from chicory.

andreaTP commented on August 11, 2024

If we exclude the performance discussion from this thread and we look only at the final user API I decided to dump my brain in a demo repository (since code is worth thousands of words).

This is my opinionated take on how I'm envisioning the user interaction with Chicory:
https://github.com/andreaTP/chicory-bindgen-poc

Happy to hear feedback!

from chicory.

bhelx commented on August 11, 2024

We should probably pick up this dicscussion in #441

from chicory.

Alternative method for single returns about chicory HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent