Giter Site home page Giter Site logo

Implement thread-safe ECS about amethyst HOT 84 CLOSED

amethyst avatar amethyst commented on August 24, 2024
Implement thread-safe ECS

from amethyst.

Comments (84)

JeanMertz avatar JeanMertz commented on August 24, 2024 6

Just wanted to note that this was a super fascinating discussion. Lots of information here that's useful for someone figuring out what patterns there are, and how to handle them. Thank you all for spending your time here ❤️

from amethyst.

kvark avatar kvark commented on August 24, 2024 5

Heads up everyone - I was able to implement a ECS design that fulfils my requirements:
https://github.com/kvark/parsec

Thread-safe parallel processing is there, with not even a char of unsafe code. Still got to figure out the entity/component recycling though, so will follow the thread.

from amethyst.

 avatar commented on August 24, 2024 3

@White-Oak I have mixed feelings on Unique ID vs (Generation, index).

I think Unique ID are less error prone and much simpler to manage, allocation is incrementing a number. You are guaranteed that it will never get reused, you don't have to track dead indexes, you don't need to figure out ways to allocate older generations before new generations (to avoid overflowing the generation counter).

That being said, it is really hard to figure out a great way to store UID => Component efficiently. The fastest strategy is going to be a hashmap if get/set is your primary modes of operation. Hashmaps are slow when it comes to joining two sets of components.

fn join(a: &HashMap<u64, Foo>, b: &HashMap<u64, Bar>) {
  for (uid, foo) in a {
    if let Some(bar) = b.get(uid) {
       /* Do Something with foo bar */
    }
  }
}

If you use a ordered map like a BTreeMap you can get around this. I exparimented with this for a little while. It resulted in the ordered_iter crate (it got migrated losing the commits where I wrote it :/ ).

Using it you can do better:

fn join(a: &BTreeMap<u64, Foo>, b: &BTreeMap<u64, Bar>) {
  for (uid, (foo, bar)) in a.iter().inner_join_map(b.iter()) {
     /* Do Something with foo bar */
  }
}

But a BTreeMap is about 3-4 times slower for fetching/insertion/deletion then a HashMap. At lest the one for the one that is included in the standard library. It's the the point where I have given up on it as a possible solution, anything that requires log(n) seems to slow in my opinion.

Using an index/generation approach get(idx)/set(idx) can be fast since it is a vector index operation. Vectors are also naturally ordered, meaning joining two vectors is a matter of incrementing two pointers which is very cache friendly.

fn join(a: &Vec<Option<Foo>>, b: &Vec<Option<Bar>>) {
  for (foo, bar) in a.iter().zip(b) {
     match (foo, bar) {
       (&Some(ref foo), &Some(ref bar)) => {
           /* Do Something with foo bar */
       )
     }
   }
}

Of course it should be obvious that the Vec join could be very slow if a and b are big vectors with no components actually assigned. You can get around this with more fancy data structures (I have been playing with Radix Trees).

Generation need not be short either. It could be 40bit with 24bit used for the index. A 40bit generation takes 12 days of being incremented a million times a second to wrap. This gives you a Unique ID & a index id. But of course, you still need to deal with the problems of how to efficiently reuse indexs.

The other problem with using the index approach is that you need an efficient way to a to deal with deletions. Having a component attached to your new entity because it was left over from the generation before is probably what @White-Oak is getting at when they were talking about non-determinism. The design of the ECS allows for use-after-delete bugs. You could get around it by including the generation with each component so you can skip dead generations. But all in all, it's a none-trivial design if you want to cover all the little corner cases.

There could be other methods, SQLite for example makes a distinction between Primary Key and a Row ID. The Primary Key is universally unique, the Row ID can be modified to better compact the data. Using a design like this components are stored row id=>value, so the lookup would be entity=>row id=>component. I haven't thought through such a design, just came up when I was researching.

((Sorry for the wall of text, I got ahead of myself))

Edit:

I forgot to ask, in the case of a generation mismatch. Is the preference to panic!, or would it be to just get a None back? I tend to see this case as a bug which is why panicking may make sense, but I'd love to hear other opinions.

from amethyst.

kvark avatar kvark commented on August 24, 2024 2

I tried to systematize some of the topics we discussed here. See https://gist.github.com/kvark/168b132818aa6f6ef4db

Please let me know if I missed something. Note: HashMap vs Vec vs RadixTree choice is not listed since a good ECS is supposed to be abstracted away from the particular component storage implementation.

from amethyst.

kvark avatar kvark commented on August 24, 2024 2

Found this GDC talk at Vulkan session: https://youtu.be/xXyZ4YaktyU?t=4983
It's about Nitrous engine, and they are using message-based communication between jobs. This doesn't really have much to do with ECS, but it does support @OvermindDL1 approach.

from amethyst.

White-Oak avatar White-Oak commented on August 24, 2024 1

@kvark once again I don't believe in ID recycling in deterministic engine.

from amethyst.

kvark avatar kvark commented on August 24, 2024 1

Please remind me what does ID recycling has to do with determinism? From the system's point of view a recycled ID is as new as a non-recycled one. The semantic is the same, the difference is only the implementation.

On Mar 22, 2016, at 7:52, Oak [email protected] wrote:

@kvark once again I don't believe in ID recycling in deterministic engine.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

from amethyst.

kvark avatar kvark commented on August 24, 2024 1

@OvermindDL1

Oh please be an opponent, any bugs or design flaws must be talked about or they may not be resolved. :-)

Now we are talking!

Honestly I never experienced such an issue, but that may just be due to the way I designed the things on top of the ECS so it never came about. Could you give an example?

  1. The input system receives a key press, and applies some changes to the speed vector of an object.
  2. The physics system moves this object, processes collisions, and adjusts its position and orientation.
  3. The rendering system updates the transformation matrix and draws the object.

It's a trivial example but you can see how this can be common. It takes 3 update cycles for this ECS to propagate changes from the input to the screen. Optimally, there should be only 1.

If I have hard dependencies so that Component A depends on a Component B always existing then when something adds Component A an event is immediately fired

Efficiently adding/deleting components or entities is also a hard problem, but that's not what I expressed my concern about initially. Changeset-based adding/deletion seems very convenient, but again has the same problem of latency. Think of the bullet fired from the gun - you don't want to delay it even for one frame.

I use events, I am not really happy about events, but they solve a lot of issues very easily regardless of the bit of a hate about the design...

Are events being used internally in your ECS implementation? Do you send an event for every component modification? This seems rather heavy.

from amethyst.

rphmeier avatar rphmeier commented on August 24, 2024

You may be interested in my (very early stages, very experimental) similar project here:
https://github.com/rphmeier/rust_toy_engine/tree/master/src/ecs

I've done quite a bit of thinking about parallelizing processors (as I call them) over the past few months, and I've arrived at the following conclusions:

Processors will often have a hierarchy of priority, where certain processors must be run before others.
Processors which do not conflict (i.e. do not access any of the same data) can be run concurrently with no issue. Most general processing can be split up into a "read" phase and a "write" phase.

From this I came up with the following design:

  • Internal game state (entity manager, component data blobs) is wrapped in an RwLock.
  • Every processor is passed a WorldHandle, which consists of a reference to the locked state as well as a handle to a thread pool.
  • The WorldHandle can be used to iterate over all entities with a given set of components either asynchronously or synchronously, with only read access to their data.
  • This iteration will internally produce a queue of actions, which can be consumed during the processor's write phase.

I am still in the middle of implementing the parallel iteration, as well as the MPSC queue, but the intent
is to have a standard processor look like this:

// The particle component this uses.
struct Particle {
    pos: Position,
    effect: Effect,
    frames_left: u32,
}

impl Component for Particle {}

struct ParticleProcessor {
    graphics: Graphics,
}
impl Processor for ParticleProcessor {
    fn process<'a, 'b, C: Components + 'a>(&'a mut self, world: WorldHandle<'a, 'b, C>) {
        enum Action {
            // draw the effect and decrement the frame count
            Decrement(Entity),
            // destroy the entity.
            Destroy(Entity)
        }

        // read phase is here -- no writes will occur while this is processing.
        let actions = world.all_with::<Particle>().for_each_async(|e| {
            // entities here are guaranteed to have the particle component.
            let part = world.get_component::<Particle>(e).unwrap();
            if part.frames_left != 0 {
                Action::Decrement(e),
            } else {
                Action::Destroy(e),
            }
        });

        // write phase is here. get exclusive access to the world.
        let mut write = world.write();

        for action in actions {
            match action {
                Action::Decrement(e) => {
                    let mut part = write.get_mut_component::<Particle>(e).unwrap();
                    part.frames -= 1;
                    draw_particle(&mut self.graphics, part.pos, part.effect);
                }
                Action::Destroy(e) => {
                    write.destroy_entity(e);
                }
            }
        }
    }
}

From this example we can see that there are a few different degrees of parallelism to processors.
Some can perform their reading phase asynchronously and then are required to do some work on a specific thread (graphics processors generally come to mind). Some will be able to do neither phase in parallel. Hopefully most will be capable of running purely in parallel with other processors, locking only to get exclusive access to the state.

Although the example here is somewhat contrived, the main idea is that as much work should be done
during the read phase as possible, deferring updates until the later write phase. Once a processor is ready to enter its write phase, it should be very quick to simply apply its updates to the state and finish up.
Actual execution of processors looks something like this:

let mut my_world = ...;

my_world.process(|ctxt| {
    // all processors executed in this group will execute purely asynchonously, except for
    // contention during their write phases.
    ctxt.process_group(|group| {
        group.process(&mut A);
        group.process(&mut B);
        group.process(&mut C);
    });
    // execution returns here only once all grouped processors have completely finished,
    // guaranteeing an order of events.

    // these processors may use some parallelism internally,
    // but are not able to be sent to other threads to execute.
    ctxt.process_sync(&mut D);
    ctxt.process_sync(&mut E);

    // E is only run once D is fully complete.
});

All this being said, it seems like your initial reaction has similar elements to mine, but it rather focuses on a priority of events rather than a priority of processors.

from amethyst.

ebkalderon avatar ebkalderon commented on August 24, 2024

Great information! Seems like our approaches are indeed pretty similar, for the most part. For further comparison, here is an academic paper called A Concurrent Component-Based Entity Architecture for Game Development (2015). It arrives at a similar conclusion to ours:

  1. Process: Equivalent to our "Processors".
  2. Past State: Refers to the read-only world state from the previous frame read by Processes.
  3. Future State: Refers to the modified world state being generated by the Processes during the current frame.

If possible, I prefer a lockless approach to parallelism, taking advantage of the Rust ownership system to catch data races at compile time, rather than at runtime with a mutex or similar locking data structure. However, I am open to using locks if concrete benchmarks prove better performance.

from amethyst.

rphmeier avatar rphmeier commented on August 24, 2024

@ebkalderon Thanks for the paper. I haven't seen it yet, but it seems promising. I am wary of the idea of using N "past" states and a "future" state -- this necessarily means copying the game state N times per frame to different points in memory. Of course, there are situations like networking where copying game state, or part of it, is necessary.

One issue I have with the paper is the scheduling aspect of processors. The author chooses to assign each processor purely to one thread, using a task-parallel model rather than a data-parallel one. A large portion of the scheduling section is concessions that processor time is unpredictable, and there is likely to be a decent amount of wasted CPU time due to that unpredictability. I favor a work-stealing approach, where parallelizable processors simply decide which data they need, how to process it, and let all the threads work as fast as they can in order to do that. There is no excess time spent on scheduling or waiting for overloaded threads to finish.

I also am not completely convinced that purely lock-free is even viable here: using your model, at some the threads must contend for access to the MPSC queue which produces component updates. It might be possible to implement such a queue in a lock-free way, but consider that the costs of locking an unlocked mutex are vastly overestimated (often not much more than an atomic compare-and-swap).

The key to any multithreaded ECS is idempotence. The same state S0 passed through the same processors should always lead to an identical output state S1. My initial approach was similar to yours, where every processor produced some amount of changes to the state, to be applied later. I toyed with a concept of "commutative" and "non-commutative" changes which could be made to component data -- these are strangely similar to Rust's borrowing system:

  • you could either make N commutative updates to an instance of a component, or
  • you could have one single non-commutative update to the instance.

To break these rules would lead to a runtime error. The implementation of this design led a few glaring issues:

  • defining updates to components at the type level was unreasonably verbose
  • it was difficult to defer changes until post-processing efficiently

One data model I toyed with for complete concurrency was giving every component type its own data buffer, rather than storing all an entity's component data together in a block. If you protect each of these buffers with a mutex of its own, processors which do not access the same component data can interleave with zero contention. The caveat is that the entity manager is a resource which many processors will contend for access to, so it became the bottleneck.

I am curious what your mental model for the types of updates is currently. How does the user specify which component types the world manages and what order updates should be applied in? What type do updates have? Where are they stored? How are they iterated over to be applied?

from amethyst.

ebkalderon avatar ebkalderon commented on August 24, 2024

Thanks for the response! Just to clarify, there is no copying of the world being done. There is only the existing world state and the future world state being computed by the processors. The reason why I contend that my proposal could be made lock-free is because it is task-parallel (AFAIK, a performance disadvantage) rather than data-parallel, with discrete parallel-read and serial-write stages, where the processors' writing priority is determined by the order in which they were passed into the Application through a builder pattern.

That said, I appreciate you explaining your position. The data-parallel design you described seems sound and probably more efficient (concrete benchmarks would have to be made to compare both to be sure). I welcome more feedback on both approaches, or other alternatives, from other participants before the final decision is made.

from amethyst.

HeroesGrave avatar HeroesGrave commented on August 24, 2024

Avoid trying to handle process scheduling. If any kind of scheduling is required by the final design, provide the tools to allow the user of the library to do it.

They'll be the ones that are able to properly benchmark different schedules in order to get the best performance for their particular game.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@HeroesGrave nice to see you here!
I don't think your advice applies 100% in this context. I understand why you'd want to avoid scheduling in ecs-rs, since it's purely an ECS library. Amethyst, on the other hand, is a game engine, so it might as well take the responsibility of scheduling the systems.

from amethyst.

mangecoeur avatar mangecoeur commented on August 24, 2024

Some thoughts from my attempts to parallelize a Flocking simulation:

  • if two systems affect the same entity properties (e.g. position) they must run in a deterministic order, and so cannot be run in parallel on the same entity. If one system needs access to all entities to do its thing, that precludes any other system from running in parallel, and limits the amount of paralism the current system can do, because it must avoid changing the world that it is using as an input to calculate it's next update.
  • In the flocking example, acceleration vectors for each entitiy depends on the locations, velocities of neighbours - if you were to iterate-and-write you change positions, which messes with the simulation. You need to access all entites in order to calculate the update for each entitiy. One crude solution is to have two copies of the entities and switch between them - one is read-only and the other write-only, and you 'blit' them for each update. You can parallelize as much as you like, since you know you only ever write to one boid at a time and you can have shared reads. You could even simulate several frames in advance if you like and invalidate-on-input. You can optimize using spatial data structures, which could also use parallel builders.
  • From the flocking example, I feel that parallelism should be left up to the System implementations, because they know much more about what they are trying to do (flocking, physics, etc) and therefore can be much smarter about how (or if) to run it in parallel - including doing fancy things like pushing physics work to the GPU.
  • The job of the ECS would be to make writing parallel systems easier and more memory efficient. For example, for flocking you only need access to the position and velocity vectors. The ECS could let you "borrow" these from the World and they would have much lower memory footprint than copies of the whole world would.
  • In addition, this would enable parallelism for Systems that are guaranteed not to affect the same entity components. For instance one system could affect only the "Color" components of the World's Entities, while another the "Position" components. Using Rust's borrowing system, it could guarantee that nothing is trying write to both in parallel.
  • End result: ECS runs Systems in parallel that can be statically guaranteed not to tread on each others toes. Systems that cannot run in parallel run one after the other, but internally can use memory-efficient access to borrowed properties to parallelize their internal workings.

from amethyst.

Oflor avatar Oflor commented on August 24, 2024

Isn't there a way to implement systems which do not affect (write to) the same components?

from amethyst.

ebkalderon avatar ebkalderon commented on August 24, 2024

@Oflor Honestly, that is going to be difficult. Most notably: the physics processor, the particle processor, and the animation processor all modify entities' transform data. One could break down the components into ever-more granular and specific pieces, but this introduces a host of unnecessary complexities (having multiple transform components, syncing duplicate data between them, etc).

I personally prefer to establish and enforce strict order dependencies between processors, either similar to @mangecoeur's or @rphmeier's approaches, or according to my original proposal.

from amethyst.

ebkalderon avatar ebkalderon commented on August 24, 2024

The user-facing API design of the ECS has been drafted on the ECS Design page on the wiki. Please leave your feedback here or on Gitter!

Edit: Updated link.

from amethyst.

mangecoeur avatar mangecoeur commented on August 24, 2024

Just had a look at the proposed usage. One thing I'm not keen on is that the proposed way of adding Processors doesn't make it sufficiently explicit that they will run in the order they are added, so that something like this:

.with(Input::new())
.with(Physics::new())

will not necessarily have the same result as this:

.with(Physics::new())
.with(Input::new())

Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear.

Some alternative suggestions

  • More explicit naming might help, like .append_processor(Physics::new()).
  • Create a vector of processors or processor references and add them, something like:
processors = vec!{Physics::new(), Input::new()}
world.add_processors(processors)
  • Create a list of processors using a macro (since vec might not be work as desired)
processors = processors_group!{Physics::new(), Input::new()}
world.add_processors(processors)

from amethyst.

mangecoeur avatar mangecoeur commented on August 24, 2024

Also a question: how are Processors associated with the Components that they need to process?

from amethyst.

lschmierer avatar lschmierer commented on August 24, 2024

Since there was some emphasis on making everything predictable and deterministic I think the its important that this is really clear.

Since we want to run our ECS multi-threaded, will this order be used for the sequential writes to world and reading is allways simultaneously?

Or do we want to control which systems are run in parallel?
Therefore I would suggest something like this:

SimulationBuilder()::new()
    .append(Rendering::new())
    .append_parallel(Physics::new(), Ui::new())
    .finalize()

Also a question: how are Processors associated with the Components that they need to process?

I think processors simply get a reference to the world and query the components they need to access.

Proposal for the API of the world:

impl World {
    /// Creates a new empty world.
    pub fn new() -> World;

    /// Creates a new entity in the world and returns a handle to it.
    pub fn create_entity(&mut self) -> Entity;

    /// Destroys a given entity and removes its components.
    pub fn destroy_entity(&mut self, entity: Entity);

    /// Attaches a component to an entity.
    pub fn insert_component<T: Any>(&mut self, entity: Entity, comp: T);

    /// Remove a component from an entity.
    pub fn remove_component<T: Any>(&mut self, entity: Entity);

    /// Get component by entity.
    pub fn component<T: Any>(&mut self, entity: Entity) -> Option<&T>;

    pub fn component_mut<T: Any>(&mut self, entity: Entity) -> Option<&mut T>;

    /// Iterate over components by type.
    pub fn components1<A: Any>(&self)-> Iterator<(Entity, &A)>;
    pub fn components2<A: Any, B: Any>(&self)-> Iterator<(Entity, &A, &B)>;
    pub fn components3<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &A, &B, &C)>;
    // some more...

    pub fn components1_mut<A: Any>(&self)-> Iterator<(Entity, &mut A)>;
    pub fn components2_mut<A: Any, B: Any>(&self)-> Iterator<(Entity, &mut A, &mut B)>;
    pub fn components3_mut<A: Any, B: Any, C: Any>(&self)-> Iterator<(Entity, &mut A, &mut B, &mut C)>;
    // some more...

}

from amethyst.

kvark avatar kvark commented on August 24, 2024

(sorry if I missed something in this thread, not studied it through)

ctxt.process_group(|group| {
        group.process(&mut A);
        group.process(&mut B);
        group.process(&mut C);
    });

and:

    .append_parallel(Physics::new(), Ui::new())

Both of these approaches enforce a fork-join model on the API level, which imposes a tight upper bound on the performance you can squeeze out of a parallel system, since chooses a rather tiny subset of all possible execution graphs. For example, what if Ui doesn't need to wait for the previous operation while Physics does?

Optimally, each system just specifies the dependencies for it to start, and then some sort of a scheduler figures out when to start it. We did some experiments with @csherratt on fibe-rs, you could use its example for inspiration:

    let ha = task(move |_| {print!("Hello, ")}).start(&mut front);
    task(move |_| {println!("World!")}).after(ha.signal()).start(&mut front);

I think processors simply get a reference to the world and query the components they need to access.

If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution.

One could specify the systems directly as dependencies, or build the dependency graph judging by which components are getting modified and read. The latter would work well if you could actually enforce the fact that these and only these components are what each system receives for processing.

from amethyst.

Oflor avatar Oflor commented on August 24, 2024

Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway).

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

from amethyst.

mangecoeur avatar mangecoeur commented on August 24, 2024

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

Wouldn't this require that Components should be defined by implementing a trait? That would make it difficult to create game objects by loading their definitions from declarative data files (would would be forced to define all entities in rust source code).

If you don't explicitly expose which components are being read, and which are modified per system, you don't provide the scheduler an opportunity to figure out the proper timeline for their execution.

I think this is important - even if it does mean messing with TypeIds, it opens up a lot of doors for speeding up execution, which should offset the extra complexity.

Another thought - it may be possible to perform these checks at compile time using compiler plugins. Declarative game object definitions could be read in at compile time (instead of at runtime) to allow these checks to be made. It may even be possible to build re-usable machinery that can either perform the checks at runtime or at compile time, so that while you prototype you can edit entities dynamically without waiting for a re-compile every time, but when you move to release you could bake everything in. Just an idea...

from amethyst.

kvark avatar kvark commented on August 24, 2024

@Oflor

Paralleling over systems might be an overkill if we already parallel over entities. There are going to be more entities than systems (if not, we should be able run it in a single thread anyway).

I haven't considered that. At the first sight, parallel entities processing would involve a lot of synchronization overhead. Like, you'd need some kind of dynamic sync before accessing each entity, as opposed to just starting a system. Another major problem would be the cache contention - if components are close to each other in memory, and one of them gets modified by one thread, access to all other components in this cache line will be delayed significantly for other threads... In other words, I'm not convinced it's worth it. Parallelizing systems just seems like an easier task, and it gives rather good performance - that's what Bungie and Naughty Dog are doing, IIRC.

Specifying accessed components requires dealing with trait-objects and TypeIds (I think), and that is already less efficient than compile-time system managers.

I'm pretty sure it should be possible to specify at compile time too. @mangecoeur proposed one way, via compiler plugins, but I'd also like to explore non-plugin approaches.

from amethyst.

 avatar commented on August 24, 2024
// Create the entity eid with the compoents Foo{}/Bar{}
let eid = ENTITY.with(Foo{}).with(Bar{}).create(&mut world);

// Update the component Baz
eid.with(Baz{}).update(&mut world);

The advantage of the builder API like I have shown above is that it does not lock the world object while you are building the list of components that you are modifying. This can let functions like this exist

// do some update 
fn update(eid: Entity) -> EntityWith<(Foo,)> { eid.with(Foo{}) }

// in parallel update all entities and write the results into `world`
// not sure if this would actually work this elegantly (haven't got to trying it yet)
entities.parallel_map(update).write(&mut world);

Accessing the data could be done with queries on the world object.

// select every entity from world that has a Foo attached to it
for (eid, foo) in world.select::<(Foo,)>() {

}

// select every entity from world where Foo & Bar are attached to it
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>() {

}

// select every entity from world where Foo & Bar are attached and 
// it is a member of the entity set
for (eid, (foo, bar)) in world.select::<(Foo, Bar)>().only(&entity_set) {

}

The biggest problem with these queries (or any iterator for that matter) is that you can't actually modify the world object since it is immutably borrowed by the select.

The solution I have always reverted to is making it cheap to clone world and then modify the cloned copy. I did this through copy-on-write, but I'm not sure it is a great general solution as it has performance penalties.

Alternatively systems could work in parallel with past copies, and write their updates to a changelist. This can be annoying as you can't see the writes you have made until they are committed to the world. But I know can be made to work.

from amethyst.

ebkalderon avatar ebkalderon commented on August 24, 2024

@csherratt We were initially going to go with the change list model, but eventually decided against it because of suspected latency. However, we haven't done any concrete profiling to confirm this. Have you tested this and/or the locking method before? If so, what were your conclusions?

from amethyst.

rphmeier avatar rphmeier commented on August 24, 2024

@csherratt's approach sounds really similar to the basic "queries" I've been imagining while experimenting. I can try and work a little on the implementation, to see how feasible they really are. If user-specified storage of component data is allowed (e.g. positional data structures), they can also compose with custom filters associated with that storage to get the entities desired with maximum efficiency.

The main issue with cloning the world is that it can get extremely expensive when you've got a lot of entities, and a lot of data to go with them. Even if it were as simple as a raw memcpy, you'd probably end up with each thread making one copy for itself. With a lot of data, you'd end up wasting a lot of time just copying it each frame. This would probably dwarf the cost of having a change list, walking it, and applying changes. I would strongly advocate taking a lock-based approach, since mutexes without contention are practically free. This does mean that whatever way we schedule processors, it should prioritize minimizing contention over data.

@kvark I do think you're right that having an implicitly fork-join execution model limits the space of execution graphs drastically. However, it also works simply to minimize contention over components, and the subset of execution graphs which it does allow are generally efficient. The alternate approach is what you said above: have a signal-based approach where processors wait to be "kickstarted" by their dependencies. This does mean that each processor needs to be aware of not only its dependencies, but also its successors. Can we easily encode this information without much boilerplate?

from amethyst.

 avatar commented on August 24, 2024

You could define a system based on it's inputs and what it modifies. The scheduler knowing this knowledge is useful. as @kvark has said.

So, where does this lead? If a system is defined by what it's input is, the system is pending on those inputs to be stable before it can process. If a system modifies one or more components it is both waiting on that component to be stable, and it is produce a new version of said components.

A Future is an exact match for this model. A system is run when all the components futures are ready, and when it is complete a number of futures are now marked as ready and other systems that depend on this component are run. If two system are run back to back that don't depend on output from one another they are run in parallel. If a system does depend on a component from another, it implicitly waits until an upstream system is finished running before it runs itself.

The ordering follows the order in software in the case of a conflict.

let mut world = World{};
loop {
  // run is a magical function that in practice would probably take more boiler plate then this.
  // I think it is possible to write, but have not attempted to yet.
  world = run(world /* world is moved here */, system_a);
  world = run(world, system_b);
  world = run(world, system_c);
  world = run(world, system_d);
}

@rphmeier about the copying, I said it would be copy-on-write. Meaning that when you do a insert or delete the data-structure only copies the nodes that are shared with a read-only copy. It's not free as the overhead is pushed to the writer (in the form of checks), but it is very convenient to work with since cloning is just bumping an atomic reference count.

I'm not a fan of locking. It's not because the locking itself is expensive. There are two problems

  1. Multiple writers wanting to update the same component have none-deterministic ordering.
  2. Writers wait behind readers, stalling their progress. If Foo is shared with 10 systems, If a system needs to modify Foo it needs to wait for every other with a readlock on Foo to release it before you can make progress. It just takes one system running slow to stall everything. With the futures only writers can stall readers which is still a problem, but there are less writers then readers.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@csherratt Are you thinking about the Future as a concept (based on Pulse), or usable as is from the standard library? So somewhere we'd have each component storage associated with a Future, which gets passed to the next system that reads/writes this component. If a system modifies it, it replaces the Future with the new one, so that the next system will need to wait. Do I understand your idea correctly?

I also realized that the scheduling problem can be very much simplified. Sorry if this is obvious for someone, but I feel it being important to mention explicitly - we don't have to implement a full-featured scheduler for the ECS systems. Instead, we follow the order in which user specifies the systems and components they modify. Each system either waits or executes in parallel, depending on whether or not the input components are locked for writing, and the output components are still being read.

This is very simple for the user: they don't have to think about it, but they can still fine-tune the execution order according to their specific requirements that the scheduler wouldn't know about anyway ;)

Going back to @csherratt 's run function. Why does it need to receive the world by the move? Also, I'd expect it to not be a freestanding function, and instead be called as scheduler.run(...).

So, given that the scheduling problem is addressed by relaxing the order and using Future for synchronization, I see only one big fat problem left: how do we specify the in/out components for systems such that this information is both passed at run-time to the scheduler, and 2) - directly enforces the system implementation to only work with these components (at compile time, going back to the trait System topic)?

from amethyst.

 avatar commented on August 24, 2024

Yes, I was thinking of something like Future via Pulse specificity something I called a SharedFuture. Basically a future that gives a read-only view of the data once it is ready, so multiple people can wait on it.

@kvark there is no problem with passing in &mut World and having the system modify it by swapping the systems it cares about. Just posted an example of the first thing that came to mind.

The one tricky area I foresee is enumeration of entities that are being created. It might be good enough to just use a mutex protected object. We could do the same thing as the futures of course.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

Just coming into this to look at rust game ECS engines but I thought it might be interesting to speak of the multi-threaded ECS design I used in C++.

The 'world' (Context in mine).

An Entity is just a 64 bit integer, where the low-order 32-bits is the index and the high order 32-bits is a counter (incremented every time an index is reused to 'nullify' old entities and make them invalid).

A component, which is conceptually POD, although it was defined more of something like:

class PositionComponent: public Component<PositionComponent> {
    Position& getPosition(Entity entity);
   // etc...
}

As in internally it could store it via, say, a packed array if it is an often-used component (like the above Position would be), or store it via a treemap, hashmap, set (if just an empty component), etc.., whatever is efficient for its use. I even have a component that changes the way it stores based on how many exists. This does require a bit of extra coding over a usual POD component, but it is not bad and there are plenty of abstractions that can be done to shorten it. I've found a Rust ECS system someone made at https://github.com/HeroesGrave/ecs-rs and it has an interesting style of making components (defining as either 'hot' or 'cold' to define whether it is a packed array or a hashmap, not quite as flexible, but interesting style). However, these component stores also have other functionality for holding 'pending' changes (hence the required subclassing in this design). This will be more explained shortly.

And Systems, these are fairly normal, except they are non-mutating. During processing of systems, which are run completely parallel (every system can be run on a different thread, a system can even branch further and let batches of entities be processed on different threads, most systems can do this, few that require checking all watched entities generally cannot, but these are few). What they do is if a change needs to be made to a component, like there was an entity has a DamageComponent of some kind added to it and there is a DamageSystem that watches for entities that have it and a HealthComponent, what it will instead do to apply the change is do:

// Do tests to see if the DamageComponent will usefully apply damage to this HealthComponent based on other things, if not it will do a changeset of just remove
context.changeset(HealthComponent::And<DamageComponent>(), [=]() {
    const auto damage = DamageComponent.getDamage(entity); // Think of this as a Result type return in Rust, good to test that here since some other changeset before this one could have pre-deleted it before this changeset applies, even though it existed during system run.
    if(damage) {
        HealthComponent.alterHealth(damage.value); 
        DamageComponent.remove(entity);
    }
};

So while the systems are all running in parallel the changesets are accumulated per thread, organized into a tree based on the first argument passed into the context.changeset so that, for example, anything that accesses HealthComponent will be run serially. After the systems are finished running then the tree-per-thread is merged into a single tree that is then branched out on across threads to apply the changesets to the components. I currently do not start running the changesets until all systems are finished but I left it open that I could do that change later by starting to run changesets of components that have no remaining systems touching them. Like https://github.com/HeroesGrave/ecs-rs I also have something Aspect'ish to advise the context of required components and 'potentially touched' components by a given system, this is also what builds up the first argument to the context.changeset function.

I have not really optimized it at all and there are obvious places in code that I could do so, but I have yet to have a need, it saturates the threads pretty well as it is as each thread just churns through the system processing lists and when empty it gets another from a non-locking/non-blocking atomic queue. It does take some programming discipline (I.E. Do not mutate components within systems, only within changesets, but a better design couple have preventing such possibilities as well if I were to redesign it today, and it would be easier so from what I see of Rust too). And thus far it far exceeds entityx in speed from any of the simple benchmarks I've done.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 thank you for sharing! So basically you got an ECS with:

  • generational recyclable entity IDs
  • changeset-based component updates, produced by fully parallel systems

I don't know if you read the previous discussion here, where I'm being an asshole opponent of the changeset-based system processing. My main gripe is how do you deal with increased latency? E.g. If component A depends on B, and B depends on C, and C on D, your system would only propagate changes from A to D in 3 update cycles.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark Oh please be an opponent, any bugs or design flaws must be talked about or they may not be resolved. :-)

Honestly I never experienced such an issue, but that may just be due to the way I designed the things on top of the ECS so it never came about. Could you give an example? If I have hard dependencies so that Component A depends on a Component B always existing then when something adds Component A an event is immediately fired (only in the transaction processing step) that tells stored Aspects about the added Component A, but also anything else listening to the event that Component A has been added to something, of which there are often passive systems that will add such dependencies immediately as so:

context.registerSystem(new DependencySystem<AComponent, BComponent>());

I use events, I am not really happy about events, but they solve a lot of issues very easily regardless of the bit of a hate about the design... I try to minimize their use in the 'Game Code', but they are used pretty heavily internally, especially with Aspect receiving updates entity lists (although there are optimizations on them to prevent indirect function calls on the event unlike more generic events).

from amethyst.

Oflor avatar Oflor commented on August 24, 2024

@OvermindDL1 Why do you need the recycling part of the entity id? If we make the id a plain u64 which increments with every created entity, we don't need to store dead ids to recycle and the id creation is an easier process.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@Oflor you are thinking too much in terms of HashMap-type access. Having the generation ID allows one to have linear access (as in Vec) to components, which is a big fat performance win for processing (as I already mentioned here).

from amethyst.

Oflor avatar Oflor commented on August 24, 2024

@kvark (generation, id) pair is a simplified Entity { id: u32, vec_idx: u32 } struct, which doesn't really follow the "Entity = ID" approach. There are other ways to implement fast linear access than this gen+id.

from amethyst.

White-Oak avatar White-Oak commented on August 24, 2024

@kvark yeah, you're right, sorry. I was thinking about network races, but it is the same even w/o recycling, so it is not the problem.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@Oflor Don't take "Entity = ID" approach as a dogma. The idea is that an entity can be just a simple identifier of a collection of components, as opposed to actually containing the component data or even storing separate identifiers for components (which is the approach I took initially, still not figured why it's bad). In that sense, (Id, Generation) is a perfect identifier - it's small, hashable, comparable, etc.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@Oflor Because I like to tightly pack things for, say, a PositionComponent that uses a normal array to index, by recycling and keeping ID's low then it can remain tightly packed with minimal wasted memory.

@White-Oak Not 'required' sure, but it is a fantastic optimization opportunity for both speed (cache compression) and memory (fewer wasted entries in component arrays).

@kvark :

1) The input system receives a key press, and applies some changes to the speed vector of an object.
2) The physics system moves this object, processes collisions, and adjusts its position and orientation.
3) The rendering system updates the transformation matrix and draws the object.
It's a trivial example but you can see how this can be common. It takes 3 update cycles for this ECS to propagate changes from the input to the screen. Optimally, there should be only 1.

In my system (1) input is passed via Events, and as such systems can listen to them, for example when the player is holding a certain item it adds a(n empty in some cases, a callback in others) component to the player Entity and a System listens to those inputs (it stops listening when there are no Entities with the component), so when the player has that item and they right click, that right click is listened by that system, which then matches the keymapping and if it matches it runs the Action function on it if applicable or whatever the system wants to do. The inputs are handled as one of the first systems to run, thus before all other systems, and it passes the events around as wanted, so when you click to place something down it happens in the same frame. If it is movement the system will set a changeset as appropriate at a higher priority level so it is run first so other changesets can do other things as necessary. This does mean if something is stopped in the physics engine it may take an extra frame to wakeup, but any 'changes' to an active object happen instantly.

In (2) there is a System that listens for the motion states from Bullet will apply the transforms as necessary in a higher priority changeset. In the games I've made such immediate motion is not really an issue though so I've not noticed an issue yet.

In (3) the rendering system renders at an event callback, not normal system processing, so it gets the state of everything after the transactions are complete (technically everything is triggered by an event, the initial systems, the transaction processing, after transaction processing, etc... I do not like events in this but using them everywhere behind the scenes has solved a lot of issues for me).

Are events being used internally in your ECS implementation? Do you send an event for every component modification? This seems rather heavy.

I do use them internally, and events are in general set at a few key points per frame (pre-start, start, processing, post-processing, frame-end) as well on component addition and removal (heavily used by aspects for keeping their internal lists synced, they are sorted so aspects that want certain components do not hear of others and no function pointers are actually used for them to keep speed up), but not modification unless a system specifically sends one out, like a DamageSystem sends out an event on Damage on something, which allows the event listeners to intercept, modify, or even cancel the Damage event. Aspects are the only place that I've customized the event system to be so specific to remove the indirect function calls, the rest is an event system that is non-threaded for quick calls that do not need to be thread-safe and another that is thread-safe but slower (technically boost::signals and boost::signals2 respectively). I always worry about speed issues in events, I dislike events in these contexts, but it has not been an issue for me yet in practice.

@csherratt Using a recycled index and a 'generation' part as mentioned above (just a counter for me, all packed into a single u64) simplifies a lot, and the counter in my system is per index (just an array of u32's in the Context/World, though honestly I'd probably just use a u32 as an entity ID with the index being 24 bits and the counter being 8 if I were to remake it, though I guess u64 with u32/u32 is safer. I am always worried about speed to invalidate the handles, but as of yet it has not been an issue.

As for joining, I would not recommend that, when I have 200 components and 500 systems with a few tens of thousands of entities it sounds horribly slow, that is why I use this Aspect registration that gets updated in each individual system to keep their own list. Basically in the Context/World there is a Map (I think I currently have it as an RB tree, hashmap may work better in my case, but eh) of (shared-weak) pointers to an object that holds a dynamic array (std::vector in C++) as well as another Map. The top Map's keys are component IDs, the Map insides keys are also component IDs and the Map insides keys are (shared-weak) pointers to more of that same object. Basically if an Aspect is registered and it wants to listen for entities that get a HealthComponent added or removed then it gets an existing object if exists or creates a new one if not and holds a pointer to it, when the system updates it accesses its Aspect linked array for the list of entities that have it. If an Aspect wants to listen for entities that get a HealthComponent and a DamageComponent added or removed then it looks in the map for the object that holds the highest compared name (DamageComponent in this case, though not for the reason you may think), gets its objects then gets/creates another object in its map for the HealthComponent and holds a shared pointer to that. It is a little heavy to first generate but it is only done on Context/System initial loading, just acting as a shared cache after that. I could optimize it more, have it build it from a pool of memory or something, but again, not been an issue for me yet so I have not. When a component gets added/removed the Context checks down those lists for what needs updating (hence skipping the event and function calls to update each Aspect individually, which is how an old version of mine worked and did have a bit of an overhead cost due to repeated branch prediction failures and indirect function calls).

The names of the components that I mentioned above are not strings, rather they are something that I call an Atom and of which I use like this:

// These all compile to an u64 at compile-time, not a string
"DMG"_atom64;
"Health"_atom64;
"Blah"_atom64;

// In C++ I can even do this and it will choose the correct branch:
switch(something) { // 'something' is an Atom64, which is just a typedef from a u64
    case "Testing"_atom64: break; // Compiles to a case of a u64
    default: break;
}

I use Atom64's all over the place, including in my events, which require a function signature like:

void callback(Atom64 eventID, EventData eventData);

Where EventData is just a RB tree since it tends to have 10< entries where the key is an Atom64 and the value is a Variant. The engine also has a Lua(JIT) binding layer is why I do this (where they can also define components and systems in, as well as receive and send events), so the Variant map is very convenient (I use them everywhere as well).

My code for handling Atom64's is this (stripped the highly compressed Atom32 code and other stuff):

namespace OverLib {

namespace StringAtom {

namespace detail {

//*
constexpr uint64_t atomize_table64[] = {
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    1,  2,  3,  4,  5,  6,  7,  8,  9, 10,  0,  0,  0,  0,  0,  0,
    0,  11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
    26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 0,  0,  0,  0,  37,
    0,  38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
    53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 0,  0,  0,  0,  0
};

constexpr char deatomize_table[] = " 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz";
/*/
constexpr uint64_t atomize_table64[] = {
     0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
     0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
     0,  1,  0,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
    31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
    47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 0,  59, 60, 61,
     0, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
    31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 62,  0, 63,  0,  0
};

constexpr char deatomize_table[] = " !#$%&'()*+,-./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ:;<=>?@[]^_{}";
//*/

constexpr size_t atom_charwidth = 6;
constexpr size_t atom_charwidthmask64 = 63;
constexpr size_t atom_charlen64 = 10;

}

typedef uint64_t Atom64;



constexpr
static Atom64 atomize64(const char* input, size_t offset = 0)
{
    return ((*input) == 0 || offset > detail::atom_charwidthmask64)
           ? 0
           : atomize64(input + 1, offset + detail::atom_charwidth) ^ (detail::atomize_table64[*input] << offset);
}

constexpr Atom64 operator "" _atom64(const char* str, long unsigned int what)
{
    return atomize64(str);
}

static String deatomize64(Atom64 atom)
{
    String ret(' ', detail::atom_charlen64);
    for (size_t i = 0; i < detail::atom_charlen64; ++i) {
        ret[i] = detail::deatomize_table[
                     (atom & (detail::atom_charwidthmask64 << (i * detail::atom_charwidth))) >> (i * detail::atom_charwidth)
                 ];
    }
    return ret;
}

}
}

And it can be used in all the usual ways:

constexpr uint64_t ATOM_ATOM = OverLib::StringAtom::atomize64("ATOM");

static int testing()
{
    using namespace OverLib::StringAtom;
    TightAtom32 tightatom;
    Atom64 atom;
    String s;
    tightatom = tightatomize32("0");
    atom = atomize64("0");
    s = tightdeatomize32(tightatom);
    s = deatomize64(atom);
    tightatom = tightatomize32("00");
    atom = atomize64("00");
    s = tightdeatomize32(tightatom);
    s = deatomize64(atom);
    tightatom = tightatomize32("BTOM");
    atom = atomize64("BTOM");
    s = tightdeatomize32(tightatom);
    s = tightdeatomize32(TIGHTATOM_ATOM);
    s = deatomize64(atom);
    s = deatomize64(ATOM_ATOM);
    tightatom = tightatomize32("Way too long");
    atom = atomize64("Another that is way way too long");
    Atom64 a = atomize64("Anoth  ");
    Atom64 b = atomize64("      e");
    Atom64 c = atomize64("Anoth e");
    s = tightdeatomize32(tightatom);
    s = deatomize64(atom);
    s = deatomize64(a);
    s = deatomize64(b);
    s = deatomize64(c);
    Atom64 d0 = "test"_atom64;
    Atom64 d1 = atomize64("test");
    assert(d0 == d1);
}

static int blah = testing();

Basically just making a 6-bit character with a max of 60 bits, thus 10 characters (it converts any unknown to be a character and ignores longer and shorter are padded to spaces at the end, I do not Trim it naturally as I use those purposefully at times). It is definitely a hack, but it is a hack I love, been using it for years. ^.^

EDIT: @csherratt

I forgot to ask, in the case of a generation mismatch. Is the preference to panic!, or would it be to just get a None back? I tend to see this case as a bug which is why panicking may make sense, but I'd love to hear other opinions.

Get None back, that is what I do and it is expected when a component is holding an EntityID that you want to check if still valid when it is serialized back in later.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 You've got a couple of interesting tricks in your sleeve ;)

Your solution to the latency problem seems to be not generic at all, but rather a number of custom decisions to avoid the problem. Nothing wrong about it at all, I was just hoping for something simple and robust.

(1)

The inputs are handled as one of the first systems to run, thus before all other systems, and it passes the events around as wanted,

Ok, but this contradicts your earlier statement:

During processing of systems, which are run completely parallel (every system can be run on a different thread, a system can even branch further and let batches of entities be processed on different threads, most systems can do this, few that require checking all watched entities generally cannot, but these are few).

(2)

In (2) there is a System that listens for the motion states from Bullet will apply the transforms as necessary in a higher priority changeset. In the games I've made such immediate motion is not really an issue though so I've not noticed an issue yet.

So does the renderer receive the resulted motion at this cycle, or does it only show it in the next one after this high-priority changeset is applied?

(3)

In (3) the rendering system renders at an event callback, not normal system processing, so it gets the state of everything after the transactions are complete

So this is a hack...

I was just thinking... if you at some point stop the world in order to process all the deferred transactions, this effectively prevents any ability to process multiple frame data segments in flight. Like, I want to be able to process physics for frame N while the renderer is working on frame N-1.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

On my phone do please excuse any grammatical oddities. :)

And yep, not generic, very specifically worked around by design. I would love to have a generic version but I am to the point that I honestly believe it to be very difficult if not impossible, plus it is my thought that anyone using a high performance game engine should be acutely aware of how they are progressing and to do it right.

(1) That is because SDL events are handled and pumped into the event system before the Systems start their processing, just an artifact of the system design that happens to work in my favor.

(2) The renderer processes the state to render theoretically after the transactions have processed, but in reality there is code in the PositionComponent and various model and rendering detail components that update the renderer state, octree, and other things during the transaction step, the component ordering ensures no thread issues there. I probably could have the physics system and rendering system start running concurrently at this time and even continue running while the game state is processed pre-changeset committing for the next frame as another optimization, but I have not done that yet.

(3) The entire thing is very much a hack yes, it was not designed from the beginning this way but rather grew by first incorporating entityx and a game engine together, editing both, eventually tossing out most of entityx and rewriting it to add a less crashy multithreading. I really would love to either find or help make a new engine of a similar design to replace it, hence my looking around here in Rust as my engine seems to be the only one like it in C++, Java has a few kind of close but sans the multithreaded systems, ditto with C#, plus I really really prefer native compiled languages over a JIT, especially for Android and it's horrible memory allocation system for Java, and Rust seems like it might even be better at that than even C++, especially as I have been coding in C++ for years in a seemingly very Rust-like way already. :)

from amethyst.

 avatar commented on August 24, 2024

The atom thing is neat, I don't think it is possible outside of a compiler plugin with the current rust.

@OvermindDL1 Am I understanding you correctly that every system is getting an event as components are added or removed? They track the components that meet their requirements, and act upon them each frame. The component data is stored in a shared data structure outside of the Aspect manager?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

And note, I have considered an ECS paradigm that holds 2-3 immutable states that just write to a new one each time, that would be perfectly dataflow (the parent of ECS) and wonderful and almost trivially multithreadable, however it would require copying at best the changed state and at worst all state every frame, which is fine for games with few entities, but as I have exceeded well over a hundred thousand active entities at once before on more than one occasion, all that memory churn sounds horrendously inefficient on modern systems so I have not pursued that thought yet, but it cannot be ruled out unless tested.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@csherratt C++ constexpr and string postfix are nice, but compiletime Atomizing is just an optimization boon, not required at all, Rust can do it no issue at runtime.

And it is not the Systems themselves (unless they register for those events for some reason), but rather the Aspects that listen for them, and they are optimized to be handled straight in the Context/World without the event overhead for them. I can give more details if you want when I get back home?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark Side note, Erlang/Elixir and other Actor (message-based) languages are a rather major love of mine beyond C++ (perhaps Rust soon), and do influence how I program in C++ a great deal. :-)

For the
Systems are scheduled in the same order they are added/registered
in your gist, perhaps let them register with an arbitrary integer for a priority so they can be sorted on adding, any with same priority are sorted in order. That is something I wish I did in mine instead of manual implacement.

And yes, Entity ID 0 should always be the 'null'/empty Entity, used as the empty placeholder. You can either disallow components on it or you could allow components, say for global settings or so.

from amethyst.

LucioFranco avatar LucioFranco commented on August 24, 2024

@OvermindDL1 something like a PriorityQueue for processing the systems?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@LucioFranco Systems should only be ever added once, nothing that fancy needed, just a dynamic vector that sorts upon insertion, slower insert, but fastest ordered iteration.

from amethyst.

doppioslash avatar doppioslash commented on August 24, 2024

@OvermindDL1 Have you watched this talk by Robert Virding? Synchronizing Game Components

from amethyst.

LucioFranco avatar LucioFranco commented on August 24, 2024

@OvermindDL1 Yup that is what a PriorityQueue is. I like that idea but if two systems have the same priority will the ECS run the one that was added/registered before the other one? So this means that we have an added layer of prioritization which could be useful in optimizing the processing from the users perspective.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@doppioslash Oh I love RVirding, I use so much of his code!

But nope, not seen 'that' one, but I had seen a much older one that he spoke in before, added to my watch list, thanks. :-)

But yes, I've used Erlang (pre-elixir) as a game engine before, it pushed lua to the client to handle an ECS on that side, just a play project but it was quite fun.

@LucioFranco Not always, I often see them made as a tree for whatever reason. And yep. :-)

from amethyst.

doppioslash avatar doppioslash commented on August 24, 2024

@OvermindDL1 cool project! :)
Is it open source anywhere?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@doppioslash On my old svn server (pre-git-even-existing), which is probably still running now that I think about it... though sadly not public, perhaps once I take the time to get everything off of it someday. Was a fun project though. Not really ECS as I would define it 'now', and the server was not at all.

Basically in the Erlang server-side each 'entity' was an Erlang process (for those not familiar with Actor design a 'Process' is not an OS process but rather can be thought of as a very very lightweight green-thread, has its own polling loop and spends most of its time frozen until it receives a message, even if it wants to be awoken in, say, 50 milliseconds or every 50 milliseconds or whatever it sends a message to the Time system to have it send a message back to it of a 'timeout' (with a custom data ID so you can have multiple timeouts pending and distinguish them)). An 'Entity' process was built from a little data description that defined other Erlang modules with data for them to start with (kind of a unity'ish fake ECS instead of actual ECS now that I think about it...) to callback into those modules, and they could register their own message handlers for incoming messages that were delegated out from the main loop. They would send messages out to other things that were listening (such as an AI, events, a player socket, etc...) to tell about updates, so nothing knew anything up to date about anything else without registering to receive updates (or sending a message asking for whatever of the active state to be sent back).

Theoretically such a server could scale 'infinitely', so would be good for an MMO or so I guess, if you had more servers to distribute the load.

The client however would connect to a server, receive a bulk load of versioned LUA, that LUA would load, register message handlers (such as something nearby enough for the player to start getting updates from), the LUA would handle interpolation, local GUI, etc. Not really fully featured, but it was fun to basically just program/mod the server-side and the client was just always up-to-date and would see changes in real-time as I would hotload code into the Erlang system, so it was fun to show off with. :-)

I did everything in Erlang on the server, even things that I should have sent off to a C++ process (Erlang is great at binding with languages in a variety of ways, a fantastic glue language), so things like the procedural level generations could be slowish, but that is easily fixable by just slaving the work out to the correct languages if I continued such a project.

from amethyst.

doppioslash avatar doppioslash commented on August 24, 2024

@OvermindDL1 hot loading always makes for a good party trick :)
Sounds very interesting, if you dig up the code do give me a shout.
I'm doing FRP with Elm, and doing some Erlang as well, I think it would be very interesting to figure out a way to have an FRP, actor-based Entity Component system.
I think Rust could be very helpful in interop with Erlang, or even a BEAM reimplementation, maybe.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

Honestly I really wish the BEAM was rebuilt with LLVM. There was a little project started by a couple once but it never went anywhere. BEAM is blazing fast for message passing, faster than any other language that I have ever tested for the reliability that BEAM allows for, however the thing it is slow with is math and such similar things, the things that could easily be fixed by an LLVM backend especially with how the memory is typed in the Erlang backend and how you can infer a lot of typed information at compilation, though adding types to Erlang (not just dialyzer) would be awesome, haskall'ish perhaps but not so pure. Rust interop would work well with Erlang, but really Erlang can interop with any language well, however Rust, like C and C++ would be capable of embedding code 'inside' the BEAM for max speed if you state your code is uncrashable enough for that (as that can crash BEAM and everything in it, it may be fast, but not safe).

I've been looking at Elm, nice web language, wish it bound well with Elixir, they are two sides of the same coin and would work so fantastically together!

from amethyst.

doppioslash avatar doppioslash commented on August 24, 2024

Yeah, NIF use is certainly possible.
Apparently that was merged in OTP at one point erlang/otp@9d46875
At the SF Erlang Factory there was a joint talk about Elm&Elixir with Elm's author, and the community survey showed that Elixir is the favourite backend language used with Elm.
Elm is just on js now, but the idea is to keep it a language that can be compiled to other VMs.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@doppioslash Huh, fascinating, did not know that they ever actually put anything like that in, though it is a bit past my time (my big Erlang learning heyday was during version 13-14, so a while ago, ever since it has just been programming in it or Elixir once it came out instead of staying more up-to-date as I should be...).

Heh, I could see that, Elixir and Elm would work very well together, it would be very possible to have them compiled via the same mix ecosystem too! Or maybe combine Elm and Elixir into some super TypedElixir? ^.^

/me loves types, that is the big irritant with Erlang/Elixir...

from amethyst.

ebkalderon avatar ebkalderon commented on August 24, 2024

@doppioslash @OvermindDL1 Just a heads-up, please remain on topic. This thread is crowded enough as it is. I agree with @kvark that something needs to be done so everyone's words don't get drowned out by all the noise.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@ebkalderon I think it is on topic actually, another idea popped up based on the prior. What about an ECS running inside Actor programming Actors/Processes? They have tightly packed memory, everything communicates via message passing, could even go far enough so that the rendering parts are disjoint from them and they are not aware of even their location without asking something, like real-life objects, could be fascinating... and scale very very well.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 With me being not familiar with Actor/Processes model, could you give some pointers, like existing solutions with this architecture, or some introductory stuff to see what you have in mind?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

The link @doppioslash gave earlier of https://www.youtube.com/watch?v=eHTePEtLV-c is a game system built on Actor programming (pure message passing and such). But Actor Oriented Programming, a brief history:

Long ago a style of programming was thought up, and it was called Object Oriented Programming as it completely hid all data within Objects, only exposing communication points, message areas. Shortly after C++, LISP, and others made a programming model that used objects that exposed data via methods and called it Object Oriented Programming, the original definition was mostly forgotten for a decade before revived with the new name of Actor Oriented Programming. There were a few languages built with it, none truly high performing except maybe the LISP version, but Erlang is the model of it in modern times (that language is only about 25 years old give-or-take a few years).

Actor Oriented Programming is modeled by a set of distinct no-share 'processes', each with their own event/message loop, in a mostly functional pseudo-language see this:

def doSomething(blueprintID) 
  doSomething(blueprintID, load(blueprint))
end

def doSomething(blueprintID, state)
    receive
        {:save, pid} -> pid ! {:state, self(), blueprintID, state};
        # Handle other specific messages, usually would have these in a parent 'behaviour'
        msg -> handleMsgToSelfECS(state, msg)
    after # Can omit this if not wanted, '10' can also be a variable
        10 -> doSomethingOnTimeout(state) # if no message received in this period then do something else
    end
    doSomething(blueprintID, state)
end

You can have more than one receive block too, such as when an internal system receives a message it could 'receive' a very specific one that it is waiting on, maybe more information or something, ignoring any others in the 'mailbox' for now until the above receive processes it again and catches all messages via the 'msg' generic capture.

You could spawn, say, many of these actors via:

1 .. 10 |> map(spawn &doSomething/1)

spawn is a very low level thing in erlang/elixir, you usually build up supervision trees instead, which should always be done, it makes the system extremely reliable and fault-tolerant, it is the 'OTP Principles' in Elixir/Erlang design, makes for an amazingly fault-tolerant and scaleable system.

But in general imagine an Actor being like an OS process, completely distinct memory, (it technically has mutable memory internally to it, but the language does not expose that except via a couple of ways, immutable otherwise), copying messages (sounds slow, but pay the cost then you can scale arbitrarily).

It was designed after real life, each Actor is an Object that is distinct, cannot access the memory or methods of anything else, they communicate via messages (like a sound taking a second to reach your ears, even light takes time in real life), and the messages are not immediate, they can take a few milliseconds, a few seconds even if the receiving actor is on the other side of the world, and it may even not reach it (though Erlang makes assurances that a message will reach the receiver as long as the receiver is alive, though it could die between when you sent the message and it received it). Internally you could easily have optimizations (as Erlang does many of) to enhance various things, but the programming layer on top for the 'users' to use should always be exposed as Actors.

This is a wonderful programming pattern that Erlang/Elixir have quite well solved at this point, you should take everything from Erlang/Elixir that you can, rock solid systems for decades now. I can go into great detail about the OTP patterns, Actor abstractions, efficient Messaging and reliability if you are curious, but it did make for a very nice to program in game world.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

For note, Actor programming though nice is not terribly useful on its own, the OTP patterns that have built up around it is what makes it amazing (hence why I linked to Erlangs OTP Principles, I can link to far more if you are curious, especially using a better language than the Prolog'y Erlang).

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 and @HeroesGrave - thank you for the feedback on the gist!
@OvermindDL1 thanks for describing the actor paradigm. What I struggle to understand is how it may relate to ECS. Actors seem to be a fairly high-level abstraction with non-trivial implementation. ECS we are targeting should be lightweight and close to the metal.

from amethyst.

doppioslash avatar doppioslash commented on August 24, 2024

@kvark Actors are about getting as parallel and distributed as possible. The principles to achieve maximum parallelisation are founding to Erlang and OTP.

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."

Basically following those principles is a way to get code to naturally run on as many cpus as you have available. Some of them are already compatible with Rust's principles, like immutability by default.
It may not be trivial to devise an Entity Component system compatible to those principles, but it's likely to give to whoever manages that, big parallelisation advantages compared to everyone else.
The difficulty after you have Actors is synchronising them, which is what that Virding's talk I linked is about.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@doppioslash so if we try to categorize the actors approach into the ECS wiki, will it go as "message passing" solution for the component updates? Or somewhere else?

from amethyst.

doppioslash avatar doppioslash commented on August 24, 2024

@kvark I think "message passing" is correct.
What do you think, @OvermindDL1 ?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark @doppioslash Correct, ECS and Actor are two different paradigms, however they can be used in tandem, they are not incompatible. Specifically what I would do is have an Actor system for all Entities, but have the Entities themselves manage their own components. By keeping the components internally then you can have the actors process things themselves, fully async, and you can pack them with the entity definition in memory as well (which is what Erlang does for an Actor, it allows the Erlang garbage collector to be blazing fast as it either needs to collect within a single Actor (very rare as it holds everything on the 'stack' so as functions are popped or TCO is used then things are just unseen even though still exist, if the GC runs it is generally only because the Actor got a very deep stack and is not using it so deep anymore so it will free up some of the tail memory, which you can force via the hibernate command to fully compact the Actor), or the GC 'runs' when an Actor dies by just freeing up the entire Actor memory (technically holding on to it to allocate another Actor within). The Erlang VM (BEAM) is blazing fast for Actors due to its design (which I know quite well in the JIT HiPE level so I can describe anything if you need), and I've yet to see a language implement it so it runs as fast or as safe, however the BEAM is fully dynamic, I.E. not typed, so that harms its efficiency (about on par with LUA for math, not great, but not bad, just not 'C' speed, although its interaction points of PORTS, NIFs, and others make up for that for needed areas) but the overall programming design is a great thing to model, and based on what I have been learning about Rust I think an Actor model could actually be implemented in it properly unlike in Java and C++ where the holes utterly cause it to fail in the corner cases.

'Within' the Actor you would allocate on its 'stack' the component data, it could have a cache on that same Actor 'stack' of which systems need to be run within it. Sadly this means that the function calls are inverted if the Actor itself runs it, however you could invert it internally so that the systems run on the internal Actor data. If the Actors remain tightly packed (say an on-created component allocation internally with externally linked dynamic components, or rebuild it on the fly, whichever makes more sense) than you could still get most of the cache efficiency as well, though honestly I would say do not even bother with this pre-optimization, the point of Actors is to run comparatively rarely by only responding to messages, just have them send messages. Basically say you have an Actor, it does not know where it is, it does not know its position, the Actor model is modeled after real life, if you are floating in space you have no clue where you are, not even on earth, you have to use reference points, you see the light coming from the reflected planetary surface of Earth to see your location, same with Actors, they do not hold their position themselves, they would query (send a message to) the 'World' that they are in (an octree perhaps, or however it wants to partition them internally, maybe it queries the physics engine for where it itself is). They only hold state data that is internal to them, their Position is not internal data, just like it is not to you, it is all relative, you need to figure out your location based on prior knowledge, looking at your surroundings, etc.. The nice thing about this design is say you have a game server, your Actors could, for example, be running across a whole bank of servers. You do have to design the game a little different, you have no such thing as lockstep (which I personally think is an anti-design anyway), messages take non-trivial time (usually microseconds, practically immeasurable, but the message will be received when the receiver is next scheduled, might be immediate, might be a second away if it is on a busy server across the world), and it is very easy to program for once you learn it, but it has such great advantages.

So for this, probably 'message passing' is the most accurate, but still not accurate, as the internal ECS system could still be any of the others as they run 'internally to' an Actor, but the Actors communicate via messages.

I just got home after a long day so I hope the above is understandable, I think I need a nap. ^.^

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

I do want to make clear, imagine an Actor being implemented underneath via something kind of like this (in pseudo-C++):

struct ActorData {
    ActorPid id;
    AtomicQueue messageStack;
    GrowableStack stack;
}

Erlangs have some (a lot of) extra fields for optimization purposes, but that is the basics of it.

The ActorPid would be say a struct of 2 64 bit integers, where there is the 'node' id and the 'self' id, the node being the machine ID that it is running on and the self id being the actual ID of the node, in Erlang they work kind of like the Handles I mentioned previously, the index ID's are reused but there is a counter of use so it cannot be re-accessed again.

If you send a message to an Actor pid then it appends the message onto the AtomicQueue end and is done with it. Every Erlang function takes an internal hidden pointer to its position in its stack, which is just untyped memory, the variables in Erlang are untyped, but the memory itself is typed, so doing something like anInt = 42 makes a name (not really a variable, it is immutable, just points to a point in its stack, at the end of the 'active' part when the name was defined, unless the JIT optimizes it out, which it often does), but the memory it points to is typed, an integer like the above will be a 32bit/64bit (depending on platform width) with the first few bits defining the type (only a few built-in types, sorted in such a way for efficiency, like a 0 integer is actually a fully 32/64bit 0 integer in memory). Obviously in Rust I would not recommend that as it is actually typed and that provides a great boon.

So the scheduler is running (one per CPU core) and it sees that the Actor that received the message has a non-empty message queue, so it then calls the frozen function (in Erlang a receive call) and the Actor grabs a message on the front of the stack and processes it until the Actor pauses again by calling receive. The OTP framework wraps up the raw receive calls (which the JIT splits up into distinct function calls internally anyway) into a thing with callback, the most basic of which is the GenServer, which defines an 'init' callback (on Actor creation, so it can setup initial state or load itself from a database or whatever), a set of handle_call methods (which are called when a message is received that wants a return message), a set of handle_cast methods (which are called with a message but does not send a reply, or it can do it later if it wants), and a set of handle_info methods (which handle messages that were passed to the Actor not via the OTP framework (which is just a wrapper message around the base message, the OTP style is very simple, yet very powerful), as well as a terminate method (which is called when one of the other functions request its death). As Erlang/BEAM/Elixir is functional and immutable it carries around the state in the argument and returns the mutated state (which will then be passed in other methods later), and if it wants to die it returns a value to stop itself, which then calls the terminate.

If an Actor terminates uncleanly (crashes, divides by zero, whatever) the terminate will not be called but the Actor will be outright killed (all others keep going) and the Supervisor that it was created under will see the reason of why it died and usually recreate it if you have it do so, if too many die too fast then the Supervisor will (if your settings tell it to) then kill itself too, also killing all Actors under its supervision, to where 'its' Supervisor can recreate it and it can recreate the others with fresh good state from the DB or whatever. That tree of Supervisors that go down to basic GenServers (which have other things built on them like GenEvents, GenFSM's, etc...) defines the Supervision Tree of the entire Application (of which there can be multiple running Applications within an entire project). When a Supervisor creates an Actor it can create it locally on the same Node (by default), on another Node, ephemerally, or in a few other ways, but it knows about them, watches them, sets up Monitors with the Schedulers so it knows when they die (or an entire Node dies, like hardware failure), etc...

Basically, a simple Elixir Actor without OTP would be (:something is an Atom in Elixir, like my Atoms in C++ though a bit more powerful, think of them as a global Enum for quick integral matching):

def getOtherThing():
  receive
    {:wait, anything} -> # wait for anything and return it!
      anything
  after 5000 -> # Nothing received after 5 seconds, how sad...  I could leave this out to wait forever...
    :nothing
  end
end

def myLoop(num):
  receive
    {:get, toPid} -> # Someone requested what my number is, lets tell them!
      toPid ! {:theNum, num} # Sending them a message of a tuple with the atom :theNum and my number
      myLoop(num) # Tail call, no stack growth

    {:set, newNum} -> # Someone wants to set my number, sure!
      myLoop(newNum) # Tail call, no stack growth

    {:set_get_old, toPid, newNum} -> # Someone wants to set my number and get the old one sent back to them, sure!
      toPid ! {:theNum, num}
      myLoop(newNum) # Tail call, no stack growth

    :wait_for_another_thing -> # Ooo, someone says they will send me something else, then ignore it
      getOtherThing() # This has an embedded receive that will return whatever the next message receive of a :wait type
      myLoop(num) # Tail call, no stack growth

    :stop -> # Aww, someone wants me to die...  Oh well...  :-(
      # just return, do nothing... you have to return 'something' in Elixir, so I'll just do the atom :stop
      :stop # Tail, exiting, this Actor will die

  after 1000 ->
    IO.puts "Nothing happened for a second, I'm bored..." # The IO system can even redirect output if you want to another Node, say you connected a shell into the system to introspect it
    myLoop(num) # Tail Call, no stack growth
end

You could use it like:

boop = spawn myLoop(5) # Spawn a new Actor with the initial arg of 5, returns the PID
boop ! {:get, self()} # Sends a message to boop, boop should respond fairly immediately, can dump all pending messages on self() in the shell by doing:
flush() # returns the number that boop sent back to us
boop ! {:set, 10} # Have boop use the number 10 now!
boop ! :erghblergh # boop will now have this message stuck in its message inbox since there is never a receive that handles any message, should have added a case to receive to handle any message and log and dump it, OTP does for you!
boop ! :stop # boop will now die when it processes this message a few microseconds later
boop ! :anything # This goes no where, you get no respond, no confirmation, if I set up a Monitor on boop then I would have got a Monitor :down message when boop died

With OTP the above as a GenServer it would be more like this:

defmodule Boop do
  use GenServer # this brings in a set of methods and such that you can override below, technically optional if you define them all, but good for documentation purposes anyway

  def start_link(startNum) do # The 'start_link' function can be named whatever, this is just normal OTP style
    GenServer.start_link(__MODULE__, startNum) # This calls the start_link on GenServer, what it does is that whatever process called this method will get an automatic monitor
      # setup for the process that GenServer is also creating (and returning the pid of), it will use this module 'Boop' as the callback module
      # When the GenServer code of this Actor (it has its own receive loop) receives a message then it will call this callback module
      # The callback functions are resolved at JIT/Actor creation time so no indirect calls are done that would waste time
  end

  def init(startNum) do # The argument is the second argument given to GenServer.start_link, can be whatever you want
    {:ok, startNum} # Return a tuple of {:ok, state} to start your GenServer actor, or return an {:error, msg} to die, few other options too
  end

  def handle_call(:get, _from, num) do
    {:reply, num, num} # return the number to the caller (second tuple entry) and as the state again (third)
  end

  def handle_call(handle more of these...) do
    # etc...
  end

  def handle_cast({:set, newNum}, _num) do
    {:noreply, newNum} # change the state to the new number
  end

  def handle_cast(more of these...) do
    # etc...
  end

  def handle_cast(:stop, _num) do
    {:stop, :normal, terminationStateIfAny} # Stop this GenServer with a :normal reason (can give anything else for a failure message) and pass in any data as the state for a terminate function if it exists
  end
end

And more (though for simple Actor Data containers like the above Elixir has a GenServer sub-type called an Agent, the above is similar to how an Agent is implemented). The raw BEAM 'receive' stuff would not be so easy in Rust as I've not seen a way that Rust can break up a single function into multiple callable parts, but the OTP style is doable by Rust perfectly, which is good as raw receive should almost never be used.

And an example of a BEAM webserver, take Phoenix, they've got it to over 2 million simultaneous websocket connections, and if they sent a message to those connections (a wikipedia article they used due to its size) took 2-4 seconds on a single machine. BEAM is, for example, what WhatsApp uses to host their messaging system on a single server for all users. This is not really a show-off of BEAM itself, as BEAM still has plenty more optimizations that can be done to it, but is rather showing off what the OTP style is capable of. Even if they did not have a single 24 core machine with 96 gigs of ram (which is still 40% unloaded at their most busy times) they could have scaled it out onto more computers, spool up more Phoenix nodes and create a Mesh where everything can communicate out transparently. BEAM scales not only up on more cores but also out onto more machines (great in case of hardware failure!), not because the VM is so good (although it is very well made), but because of the OTP programming principles.

If you want arbitrary scaling abilities among an arbitrary amount of connections with an arbitrary amount of running data containers (like a Game Entity) that can each be running different code, the OTP style is really the way to go.

Think of it this way, each Actor in an Actor system is like its own little running system process (just a lot more lightweight), you can program using OOP or ECS or whatever 'inside' of it, but all communication between Actors is done via messaging. Actors is just a higher level system view, you can still use ECS within them, or whatever is appropriate for the problem domain.

Also sorry for another long post. I get a bit rambly when tired. If you have specific questions I am much shorter at answering those. ^.^

from amethyst.

HeroesGrave avatar HeroesGrave commented on August 24, 2024

I've been messing around with an approach using clone-on-write and simple DAGs for scheduling.

First of all, data is clone-on-write. Component state is not mutated. This allows all components from the previous cycle to be visible in order to create the components for the next cycle.

Second of all, each system outputs one component (or intermediate calculation. see below).

The main problem with this approach of course is the number of frames it may take a change that depends on multiple components to propagate. The way I've solved this is allowing systems to depend on certain components or intermediate calculations already being processed, and to use those up-to-date values. This way, all neccessary changes can be propagated in 1 frame, as well as allowing systems to reuse intermediate calculations (I suspect this will be quite rare, but it's a useful feature nonetheless).

Therefore the signature for a system that requires 3 systems to be completed will look like this:

fn process(&mut self, in: (&ComponentList<A>, &ComponentList<B>, &ComponentList<C>), previous_cycle: &Components) -> ComponentList<Out>;

There is then a macro that builds up a sort-of DAG representing the systems that need to be completed before a certain system can be run. Then, using a scoped thread pool and some RwLocks, all the systems are queued up (or as many as the thread pool allows at once) and executed as their dependencies are met. When everything is finished, a new Components structure is returned with the updated components, and intermediate calculations are discarded.

It may even be possible to use critical path analysis to optimise the scheduling of systems at runtime based on some simple profiling, but that's not something I have the time to look into right now.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@HeroesGrave very interesting, thanks for sharing! Is this something you got working, or just toying around? I'd peek into the source ;)

Also, @csherratt mentioned that the cost of copy-on-write appears to be high. Cloning the world on each iteration can be expensive.

from amethyst.

 avatar commented on August 24, 2024

@kvark there is always overhead in doing copy-on-write. The worst part about it is that it has a bunch of extra code being generated to handle the aliased cases. But I didn't find it to be significantly higher then a none cow data-structure.

The nice thing about copy-on-write is that it makes it really easy to get around the borrow checker. It's essentially free for the readers to clone data-structures, so if you want to do an update and need a copy of some of the old data you just call clone and you don't have to worry about who owns the tree. You end up with almost no RefCell<T>, Mutex<T>, Arc<T> because the data-structures behave well when shared by readers.

The big disadvantage of cow is that you limit yourself in what data-structures you can select. Trees map to copy-on-write extremely well. So BTree's and Radix trees work well with them. Hashmaps, and VecMap do not since they require cloning the entire data-structure (not just the parts changed).

from amethyst.

HeroesGrave avatar HeroesGrave commented on August 24, 2024

@kvark: It's pretty much working, but I want to clean up the code a little bit more and test some more complex workloads.

Also, due to the lack of HKTs, it's not possible to be generic across different component storage types, which is a bit of a pain. Might throw everything in an enum as a workaround.

from amethyst.

liaud avatar liaud commented on August 24, 2024

I would like to give my opinion on this topic since I have already worked on implementing a
thread safe ECS. So i will give it a try.

In my implementation, entities are only composed of four things:

  • an index, for accessing component's containers
  • a version, to prevents confusing with an entity index recycled
  • a bitset to identify what are the components attached to the entity
  • a uuid to synchronise entity across networks

Most of the time the entity index is used for accessing everything, the handle (index + version),
is useful only when you need to keep a reference on an entity across upates. For exemple a target component that contains the entity targeted. If the entity dies for some reason, the component won't be able to upgrade the handle to a raw index and thus, accessing stale data.

Components are just pure data. They need to implement a trait Component with two associated types, one is for defining what is the container of the component. The second for defining how the
component is updated.

In this implementation, I always have two states. The past state and the futur one.

The past state is read-only, all processes read this state and compute modifications accordingly.
The futur state is write-only, it does not need to exist since we can modify in place the past
state in the sync points of the process scheduler.

Every process writes their modifications through a StateWriter that puts the updates in a buffer
(an insert, a remove or the update enum specified by the component trait). There is one StateWriter
by worker thread and we can reuse them across frames to avoid allocations.

The idea is that I can have a job system where a bunch of job are kicked to worker threads,
each job can spawn other jobs so the processes can create finer-grained units of computation.
This allows to kick a lot a processes on worker threads and let the work-stealing do the load balancing.

I can also use this job system to batch updates of components. Each job takes care of one component
type and applies all the update buffered to that component.

Finally, the process execution graph will be at best only one deep level, no dependencies, only a sync/update point.
If a process depends on other processes computation we can define precise sync points between their executions and update the past state before continuing. If you do COW structures you can even start the update of the components when no more systems are modifying it.
It is just a matter of adding a job waiting the execution of other jobs before being kicked.

There is a incomplete draft of the implementation available here. Note that it is incomplete and does not necessarily match what is described here.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

I've always questioned the point of the bitset that defines what components are set, I've just let the Component Containers handle that themselves. If something wants to know if an entity has some components then just query them. If something wants to know all the entities that have certain components, then make an Aspect and it will get updated. If something wants all entities and all components then it can make its own Aspect handler that will listen for everything, it can hold that data however it wants.

The reason I hate the bitsets is I use a LOT of components, many hundreds can even be a low number for a fairly small game, and carrying that data around of what is used and what is not is not really useful.

from amethyst.

liaud avatar liaud commented on August 24, 2024

@OvermindDL1 I agree with you on that point. It is only useful when you don't have many entities to iterate and you don't cache them for future updates. However, if a process wants to filter entities conditionally it allows to just do a bit mask to check if the entity is bound to the filter but this might be rare/nonexistent.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@Liotitch
Thank you for describing your approach!

@OvermindDL1
You assume there is an Aspect concept, which solves the problem. When would two different systems use the same aspect though? I'm thinking that each system would have it's own inputs/outputs, and thus it's own aspect.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark In every setup I've used so far I have many cases where multiple systems work on the same aspects, systems that have no aspects, and systems that have multiple aspects. It is a good style to follow.

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 could you drop in some examples?

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark Sure, from a small project:

*) Multiple systems with the same Aspects:
SoundComponent -> SoundManager (Play sounds), SoundBroadcaster (Broadcasts sounds to an AI system based on nearby things in an octree or so)

*) Systems with no Aspects
Anything that needs to process per tick that is not entities, like working through the network packet queue to distribute incoming packets where they belong

*) Systems with multiple different kinds of Aspects
SimpleCollision -> Has one Aspect list of entities with CollisionBodies and another Aspect with a list of entities that have CollisionAreas (many things often have both)

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark As I often like to do I like to run an ECS through a few of my stress tests that my system handles with ease to see if I have found a replacement so I can stop dealing with my own so it is time for parsec. ^.^

I used your example.rs as the base, created 1 million entities (not unheard of in some of my mini games, I often have ten thousand active entities with bursts of up to a million that can be rapidly created and destroyed, hence why I had to learn how to optimize an ECS), with 99% having a parsec::VecStorage Component and 2% having a parsec::HashMapStorage component with 1% of overlap. First issue found, these two lines takes a substantial amount of time (almost a full second):

    scheduler.run0w1r(|b: &CompBool| { // The rare component
        println!("Entity {}", b.0);
    });
    scheduler.wait();

Where such an equivalent Aspect in my system takes <1ms (even in a debug run). As I am still learning Rust I wondered if it was caused by println buffering output, so I removed that line (so the system is completely a no-op, still running in debug mode for note but was worried it might have been optimized out regardless as it often would in C++) and got the same results, so output buffering was not the issue.

At this point I delved into the code and it looked like it was iterating over every single entity for every single system callback even if not wanted (it is just skipped for the callback, but it is still iterated over), which to me does not seem in any sense scalable? What is the plan to work around that?

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 thank you for having a look at parsec!

You might be interested in the existing project called ecs_bench (by @lschmierer) that aims to profile different ECS implementations in Rust. It's first test case is very similar to yours, where only a fraction of entities actually satisfy the system's requirements for running.

What you found is a known issue - the runXwYr variants always check for all components, and don't attempt to do early exit. We could check for the components to exist one by one, but we'd have to settle with the user-given order, and it has writable firsts, so it's limited. We'll do this anyway, there is no downsides for early-existing at least like that.

One way to address this is - if you know that some of the components are rarer than others - to use the custom run function and query those rare components first, early exit, or otherwise proceed.
Edit: just realized your test case only uses a single component, so this optimization wouldn't help you.

Another - a more radical way, is to use tables by @csherratt. He's already got a prototype implemented on top of parsec's storage interface, and it performed 6 times faster on the ecs_bench's test case. AFAIK, it doesn't require any extensions to parsec itself, and works somewhat automatically (@csherratt please correct me if that's wrong), which makes it a much nicer alternative (in my eyes) to rolling any sort of Aspect support. I like to keep things simple and separate complexity into layers, and that's what parsec succeeds at so far.

Please feel free to hop on our gitter and/or create issues in our project (or contribute!), to avoid off-topic in this thread.

from amethyst.

OvermindDL1 avatar OvermindDL1 commented on August 24, 2024

@kvark Fascinating links and information, thanks.

Yeah my basic test involved a 2D screen with a set of tiny floating squares (movement system floats them around bouncing off screen edges) that when they intersect it (collision system) destroys both, emits (via event) a shower of new Entities for each (basically particles), and the spawn system shortly later spawns those back in within about 10ms. Generally have 1000 squares or 10k for a stress test, each collision creates about a hundred more entities as they zoom off until they fade out and are removed, and when it 10k exist the collisions are... consistently constant to put it mildly. I can only achieve real-time speeds in my system thanks to my aspect cache (and shader trickery for the rendering side). I should probably get a gitter account, done. ^.^

from amethyst.

kvark avatar kvark commented on August 24, 2024

@OvermindDL1 that sounds like a very good testcase for ECS. I'm currently porting yasteroids, and it's going to take some time, but I'd love to try out your scenario with parsec to see where we stand.

from amethyst.

kvark avatar kvark commented on August 24, 2024

Closed by #44

from amethyst.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.