Giter Site home page Giter Site logo

Comments (9)

dessaya avatar dessaya commented on July 29, 2024

I think this only happens with wasm contracts. @BugFreeSoftware is executing view calls in parallel supported by the wasm interpreter?

The following cluster test reproduces the issue. Replacing go func with just func makes it so that the view calls are sent in sequence and then the test passes.

func TestSpamCallViewWasm(t *testing.T) {
	testutil.SkipHeavy(t)
	clu := newCluster(t)
	committee := []int{0}
	quorum := uint16(1)
	addr, err := clu.RunDKG(committee, quorum)
	require.NoError(t, err)
	chain, err := clu.DeployChain("chain", clu.Config.AllNodes(), committee, quorum, addr)
	require.NoError(t, err)

	e := &env{t: t, clu: clu}
	chEnv := &chainEnv{
		env:   e,
		chain: chain,
	}

	chEnv.requestFunds(scOwnerAddr, "client")
	chEnv.deployContract("inccounter", "inccounter sc", nil)

	{
		// increment counter once
		entryPoint := iscp.Hn("increment")
		tx, err := chEnv.chainClient().Post1Request(incHname, entryPoint)
		require.NoError(t, err)
		err = chain.CommitteeMultiClient().WaitUntilAllRequestsProcessed(chain.ChainID, tx, 30*time.Second)
		require.NoError(t, err)
	}

	const n = 1000
	ch := make(chan error, n)

	for i := 0; i < n; i++ {
		go func(i int) {
			cl := chain.SCClient(iscp.Hn("inccounter"), nil)
			r, err := cl.CallView("getCounter", nil)
			if err != nil {
				ch <- err
				return
			}

			v, ok, err := codec.DecodeInt64(r.MustGet(inccounter.VarCounter))
			if err != nil {
			} else if !ok {
				err = errors.New("!ok")
			} else if v != 1 {
				err = errors.New("v != 1")
			}
			ch <- err
		}(i)
	}

	for i := 0; i < n; i++ {
		err := <-ch
		if err != nil {
			t.Error(err)
		}
	}
}

from wasp.

dessaya avatar dessaya commented on July 29, 2024

Adding a mutex in WasmProcessor::call() fixes the panics:

type WasmProcessor struct {
	...
	mu        sync.Mutex
}

func (host *WasmProcessor) call(ctx iscp.Sandbox, ctxView iscp.SandboxView) (dict.Dict, error) {
	host.mu.Lock()
	defer host.mu.Unlock()
	...
}

(But of course this is not the correct solution, we need to be able to run view calls in parallel).

from wasp.

BugFreeSoftware avatar BugFreeSoftware commented on July 29, 2024

I've been thinking about this and I don't think this is possible with the current Wasm VM. The assumption with Wasm is currently: no threading, because it runs deterministic code. We load our Wasm SC code in an instance of the Wasm runtime and there it runs in isolation from other Wasm SCs. These separate instances can completely run in parallel. They have no way to interfere with each other at Wasm level, and this parallelization is supported by the Wasm runtime (WasmTime/Wasmer/etc).
But how does this work when a state update comes in while view calls are running already? That could seriously affect the results of the view calls, when the state is updated halfway the call. Or do we have a separate state copy from which the views are taking their data, which cannot change until all views are done? Also, even though the Wasm SCs run in separate VM instances, ultimately they share state because the entire chain has only a single state. Which means that calls to core contracts can be made in parallel from those SCs and potentially impact each other unless that is properly synchronized.

from wasp.

lmoe avatar lmoe commented on July 29, 2024

I don't have so much knowledge about the deeper logic there, but one thing that comes to mind is,
that the events inside the contract don't get emitted right at the call, but rather after the sc function has ended.

Maybe it's enough to copy the state into some other location which the callview api call then accesses?
Then these values would only change after each function has ended, which would be fine - at least for my use case.

Kind of like the same behavior with the events. But that doesn't solve the last issue Eric said then.

Maybe we could add something like an "uncertain state"?

from wasp.

dessaya avatar dessaya commented on July 29, 2024

@BugFreeSoftware the protection from state updates is already in place, in the form of an optimistic reader. Whenever the state is read during the view call, if it has changed the optimistic reader will panic, and then the whole view call will be retried. See packages/webapi/webapiutil/callview.go.

To make it work with wasm, one idea is to have a pool of instances of wasm interpreters, and in WasmProcessor::Call() take a free interpreter from the pool, use it to run the code, then return it to the pool. Would something like this work?

from wasp.

BugFreeSoftware avatar BugFreeSoftware commented on July 29, 2024

I have been thinking about that and we can probably do better. It's not the code that is the problem but the code re-entrancy. We already save the memory state after initialization is done, and restore it before every function call. Because it needs the data from the initialization but after that it should be completely deterministic. I'm going to figure out if it is possible to call functions in parallel on the Wasm VM by giving each function call their own memory upon calling. It could be we are reusing the same memory block for every call and then parallel calls become an issue. But if each call has its own memory that should not be an issue. After that it's only a matter of verifying that the interface to ISCP is re-entrant as well and does not rely on any state. I thought I removed all of those but I may have missed one. Will investigate.

from wasp.

BugFreeSoftware avatar BugFreeSoftware commented on July 29, 2024

Adding a mutex in WasmProcessor::call() fixes the panics:

No it doesn't. It will block any recursive call. Several tests will hang indefinitely.

from wasp.

lmoe avatar lmoe commented on July 29, 2024

What did work was to put the mutex here: https://github.com/iotaledger/wasp/blob/roulette_poc/packages/webapi/webapiutil/callview.go#L17

Its not great but at least for the web api, it seems to make it work at the end.
I believe NewFromChain is not threadsafe at some point. As putting the Mutex somewhere below brings up issues again..
Maybe we are referencing some data structure instead of copying it?

from wasp.

BugFreeSoftware avatar BugFreeSoftware commented on July 29, 2024

Fixed the problem for now. Mutex per SC processor. Can handle nested calls now. We can have multiple different SCs running in parallel because they each use a different processor. This should fix our problem for now until I can get multiple instances per processor running.
@lmoe please can you try to reproduce the WasmTime crash again with this code to see if this fixes that one for now?

from wasp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.