Giter Site home page Giter Site logo

lamellar-runtime's Introduction

Lamellar - Rust HPC runtime

Lamellar is an investigation of the applicability of the Rust systems programming language for HPC as an alternative to C and C++, with a focus on PGAS approaches.

Some Nomenclature

Through out this readme and API documentation (https://docs.rs/lamellar/latest/lamellar/) there are a few terms we end up reusing a lot, those terms and brief descriptions are provided below:

  • PE - a processing element, typically a multi threaded process, for those familiar with MPI, it corresponds to a Rank.
    • Commonly you will create 1 PE per psychical CPU socket on your system, but it is just as valid to have multiple PE's per CPU
    • There may be some instances where Node (meaning a compute node) is used instead of PE in these cases they are interchangeable
  • World - an abstraction representing your distributed computing system
    • consists of N PEs all capable of communicating with one another
  • Team - A subset of the PEs that exist in the world
  • AM - short for Active Message
  • Collective Operation - Generally means that all PEs (associated with a given distributed object) must explicitly participate in the operation, otherwise deadlock will occur.
    • e.g. barriers, construction of new distributed objects
  • One-sided Operation - Generally means that only the calling PE is required for the operation to successfully complete.
    • e.g. accessing local data, waiting for local work to complete

Features

Lamellar provides several different communication patterns and programming models to distributed applications, briefly highlighted below

Active Messages

Lamellar allows for sending and executing user defined active messages on remote PEs in a distributed environment. User first implement runtime exported trait (LamellarAM) for their data structures and then call a procedural macro #[lamellar::am] on the implementation. The procedural macro produces all the necessary code to enable remote execution of the active message. More details can be found in the Active Messaging module documentation.

Darcs (Distributed Arcs)

Lamellar provides a distributed extension of an Arc called a Darc. Darcs provide safe shared access to inner objects in a distributed environment, ensuring lifetimes and read/write accesses are enforced properly. More details can be found in the Darc module documentation.

PGAS abstractions

Lamellar also provides PGAS capabilities through multiple interfaces.

LamellarArrays (Distributed Arrays)

The first is a high-level abstraction of distributed arrays, allowing for distributed iteration and data parallel processing of elements. More details can be found in the LamellarArray module documentation.

Low-level Memory Regions

The second is a low level (unsafe) interface for constructing memory regions which are readable and writable from remote PEs. Note that unless you are very comfortable/confident in low level distributed memory (and even then) it is highly recommended you use the LamellarArrays interface More details can be found in the Memory Region module documentation.

Network Backends

Lamellar relies on network providers called Lamellae to perform the transfer of data throughout the system. Currently three such Lamellae exist:

  • local - used for single-PE (single system, single process) development (this is the default),
  • shmem - used for multi-PE (single system, multi-process) development, useful for emulating distributed environments (communicates through shared memory)
  • rofi - used for multi-PE (multi system, multi-process) distributed development, based on the Rust OpenFabrics Interface Transport Layer (ROFI) (https://github.com/pnnl/rofi).
    • By default support for Rofi is disabled as using it relies on both the Rofi C-library and the libfabrics library, which may not be installed on your system.
    • It can be enabled by adding features = ["enable-rofi"] to the lamellar entry in your Cargo.toml file

The long term goal for lamellar is that you can develop using the local backend and then when you are ready to run distributed switch to the rofi backend with no changes to your code. Currently the inverse is true, if it compiles and runs using rofi it will compile and run when using local and shmem with no changes.

Additional information on using each of the lamellae backends can be found below in the Running Lamellar Applications section

Examples

Our repository also provides numerous examples highlighting various features of the runtime: https://github.com/pnnl/lamellar-runtime/tree/master/examples

Additionally, we are compiling a set of benchmarks (some with multiple implementations) that may be helpful to look at as well: https://github.com/pnnl/lamellar-benchmarks/

Below are a few small examples highlighting some of the features of lamellar, more in-depth examples can be found in the documentation for the various features.

Selecting a Lamellae and constructing a lamellar world instance

You can select which backend to use at runtime as shown below:

use lamellar::Backend;
fn main(){
 let mut world = lamellar::LamellarWorldBuilder::new()
        .with_lamellae( Default::default() ) //if "enable-rofi" feature is active default is rofi, otherwise  default is `Local`
        //.with_lamellae( Backend::Rofi ) //explicity set the lamellae backend to rofi,
        //.with_lamellae( Backend::Local ) //explicity set the lamellae backend to local
        //.with_lamellae( Backend::Shmem ) //explicity set the lamellae backend to use shared memory
        .build();
}

or by setting the following envrionment variable: LAMELLAE_BACKEND="lamellae" where lamellae is one of local, shmem, or rofi.

Creating and executing a Registered Active Message

Please refer to the Active Messaging documentation for more details and examples

use lamellar::active_messaging::prelude::*;

#[AmData(Debug, Clone)] // `AmData` is a macro used in place of `derive` 
struct HelloWorld { //the "input data" we are sending with our active message
    my_pe: usize, // "pe" is processing element == a node
}

#[lamellar::am] // at a highlevel registers this LamellarAM implemenatation with the runtime for remote execution
impl LamellarAM for HelloWorld {
    async fn exec(&self) {
        println!(
            "Hello pe {:?} of {:?}, I'm pe {:?}",
            lamellar::current_pe, 
            lamellar::num_pes,
            self.my_pe
        );
    }
}

fn main(){
    let mut world = lamellar::LamellarWorldBuilder::new().build();
    let my_pe = world.my_pe();
    let num_pes = world.num_pes();
    let am = HelloWorld { my_pe: my_pe };
    for pe in 0..num_pes{
        world.exec_am_pe(pe,am.clone()); // explicitly launch on each PE
    }
    world.wait_all(); // wait for all active messages to finish
    world.barrier();  // synchronize with other PEs
    let request = world.exec_am_all(am.clone()); //also possible to execute on every PE with a single call
    world.block_on(request); //both exec_am_all and exec_am_pe return futures that can be used to wait for completion and access any returned result
}

Creating, initializing, and iterating through a distributed array

Please refer to the LamellarArray documentation for more details and examples

use lamellar::array::prelude::*;

fn main(){
    let world = lamellar::LamellarWorldBuilder::new().build();
    let my_pe = world.my_pe();
    let block_array = AtomicArray::<usize>::new(&world, 1000, Distribution::Block); //we also support Cyclic distribution.
    block_array.dist_iter_mut().enumerate().for_each(move |(i,elem)| elem.store(i) ); //simultaneosuly initialize array accross all pes, each pe only updates its local data
    block_array.wait_all();
    block_array.barrier();
    if my_pe == 0{
        for (i,elem) in block_array.onesided_iter().into_iter().enumerate(){ //iterate through entire array on pe 0 (automatically transfering remote data)
            println!("i: {} = {})",i,elem);
        }
    }
}

Utilizing a Darc within an active message

Please refer to the Darc documentation for more details and examples

use lamellar::active_messaging::prelude::*;
use std::sync::atomic::{AtomicUsize,Ordering};

#[AmData(Debug, Clone)] // `AmData` is a macro used in place of `derive` 
struct DarcAm { //the "input data" we are sending with our active message
    cnt: Darc<AtomicUsize>, // count how many times each PE executes an active message
}

#[lamellar::am] // at a highlevel registers this LamellarAM implemenatation with the runtime for remote execution
impl LamellarAM for DarcAm {
    async fn exec(&self) {
        self.cnt.fetch_add(1,Ordering::SeqCst);
    }
}

fn main(){
    let mut world = lamellar::LamellarWorldBuilder::new().build();
    let my_pe = world.my_pe();
    let num_pes = world.num_pes();
    let cnt = Darc::new(&world, AtomicUsize::new());
    for pe in 0..num_pes{
        world.exec_am_pe(pe,DarcAm{cnt: cnt.clone()}); // explicitly launch on each PE
    }
    world.exec_am_all(am.clone()); //also possible to execute on every PE with a single call
    cnt.fetch_add(1,Ordering::SeqCst); //this is valid as well!
    world.wait_all(); // wait for all active messages to finish
    world.barrier();  // synchronize with other PEs
    assert_eq!(cnt.load(Ordering::SeqCst),num_pes*2 + 1);
}

Using Lamellar

Lamellar is capable of running on single node workstations as well as distributed HPC systems. For a workstation, simply copy the following to the dependency section of you Cargo.toml file:

lamellar = "0.6.1"

If planning to use within a distributed HPC system copy the following to your Cargo.toml file:

lamellar = { version = "0.6.1", features = ["enable-rofi"]}

NOTE: as of Lamellar 0.6.1 It is no longer necessary to manually install Libfabric, the build process will now try to automatically build libfabric for you. If this process fails, it is still possible to pass in a manual libfabric installation via the OFI_DIR envrionment variable.

For both environments, build your application as normal

cargo build (--release)

Running Lamellar Applications

There are a number of ways to run Lamellar applications, mostly dictated by the lamellae you want to use.

local (single-process, single system)

  1. directly launch the executable
    • cargo run --release

shmem (multi-process, single system)

  1. grab the lamellar_run.sh
  2. Use lamellar_run.sh to launch your application
    • ./lamellar_run -N=2 -T=10 <appname>
      • N number of PEs (processes) to launch (Default=1)
      • T number of threads Per PE (Default = number of cores/ number of PEs)
      • assumes <appname> executable is located at ./target/release/<appname>

rofi (multi-process, multi-system)

  1. allocate compute nodes on the cluster:
    • salloc -N 2
  2. launch application using cluster launcher
    • srun -N 2 -mpi=pmi2 ./target/release/<appname>
      • pmi2 library is required to grab info about the allocated nodes and helps set up initial handshakes

Environment Variables

Lamellar exposes a number of environment variables that can used to control application execution at runtime

  • LAMELLAR_THREADS - The number of worker threads used within a lamellar PE
    • export LAMELLAR_THREADS=10
  • LAMELLAE_BACKEND - the backend used during execution. Note that if a backend is explicitly set in the world builder, this variable is ignored.
    • possible values
      • local
      • shmem
      • rofi
  • LAMELLAR_MEM_SIZE - Specify the initial size of the Runtime "RDMAable" memory pool. Defaults to 1GB
    • export LAMELLAR_MEM_SIZE=$((20*1024*1024*1024)) 20GB memory pool
    • Internally, Lamellar utilizes memory pools of RDMAable memory for Runtime data structures (e.g. [Darcs][crate::Darc], [OneSidedMemoryRegion][crate::memregion::OneSidedMemoryRegion],etc), aggregation buffers, and message queues. Additional memory pools are dynamically allocated across the system as needed. This can be a fairly expensive operation (as the operation is synchronous across all PEs) so the runtime will print a message at the end of execution with how many additional pools were allocated.
      • if you find you are dynamically allocating new memory pools, try setting LAMELLAR_MEM_SIZE to a larger value
    • Note: when running multiple PEs on a single system, the total allocated memory for the pools would be equal to LAMELLAR_MEM_SIZE * number of processes

NEWS

  • February 2023: Alpha release -- v0.6.1
  • November 2023: Alpha release -- v0.6
  • January 2023: Alpha release -- v0.5
  • March 2022: Alpha release -- v0.4
  • April 2021: Alpha release -- v0.3
  • September 2020: Add support for "local" lamellae, prep for crates.io release -- v0.2.1
  • July 2020: Second alpha release -- v0.2
  • Feb 2020: First alpha release -- v0.1

BUILD REQUIREMENTS

  • Crates listed in Cargo.toml

Optional: Lamellar requires the following dependencies if wanting to run in a distributed HPC environment: the rofi lamellae is enabled by adding "enable-rofi" to features either in cargo.toml or the command line when building. i.e. cargo build --features enable-rofi Rofi can either be built from source and then setting the ROFI_DIR environment variable to the Rofi install directory, or by letting the rofi-sys crate build it automatically.

At the time of release, Lamellar has been tested with the following external packages:

GCC CLANG ROFI OFI IB VERBS MPI SLURM
7.1.0 8.0.1 0.1.0 1.20 1.13 mvapich2/2.3a 17.02.7

BUILDING PACKAGE

In the following, assume a root directory ${ROOT}

  1. download Lamellar to ${ROOT}/lamellar-runtime

cd ${ROOT} && git clone https://github.com/pnnl/lamellar-runtime

  1. Select Lamellae to use:

    • In Cargo.toml add "enable-rofi" feature if wanting to use rofi (or pass --features enable-rofi to your cargo build command ), otherwise only support for local and shmem backends will be built.
  2. Compile Lamellar lib and test executable (feature flags can be passed to command line instead of specifying in cargo.toml)

cargo build (--release) (--features enable-rofi)

executables located at ./target/debug(release)/test
  1. Compile Examples

cargo build --examples (--release) (--features enable-rofi)

executables located at ./target/debug(release)/examples/

Note: we do an explicit build instead of cargo run --examples as they are intended to run in a distriubted envrionment (see TEST section below.)

HISTORY

  • version 0.6.1
    • Clean up apis for lock based data structures
    • N-way dissemination barrier
    • Fixes for AM visibility issues
    • Better error messages
    • Update Rofi lamellae to utilize rofi v0.3
    • Various fixes for Darcs
  • version 0.6
    • LamellarArrays
      • additional iterator methods
        • count
        • sum
        • reduce
      • additional element-wise operations
        • remainder
        • xor
        • shl, shr
      • Backend operation batching improvements
      • variable sized array indices
      • initial implementation of GlobalLockArray
      • 'ArrayOps' trait for enabling user defined element types
    • AM Groups - Runtime provided aggregation of AMs
      • Generic 'AmGroup'
      • 'TypedAmGroup'
        • 'static' group members
    • Miscellaneous
      • added LAMELLLAR_DEADLOCK_TIMEOUT to help with stalled applications
      • better error handling and exiting on panic and critical failure detection
      • backend threading improvements
      • LamellarEnv trait for accessing various info about the current lamellar envrionment
      • additional examples
      • updated documentation
  • version 0.5
    • Vastly improved documentation (i.e. it exists now ;))
    • 'Asyncified' the API - most remote operations now return Futures
    • LamellarArrays
      • Additional OneSidedIterators, LocalIterators, DistributedIterators
      • Additional element-wise operations
      • For Each "schedulers"
      • Backend optimizations
    • AM task groups
    • AM backend updates
    • Hooks for tracing
  • version 0.4
    • Distributed Arcs (Darcs: distributed atomically reference counted objects)
    • LamellarArrays
      • UnsafeArray, AtomicArray, LocalLockArray, ReadOnlyArray, LocalOnlyArray
      • Distributed Iteration
      • Local Iteration
    • SHMEM backend
    • dynamic internal RDMA memory pools
  • version 0.3.0
    • recursive active messages
    • subteam support
    • support for custom team architectures (Examples/team_examples/custom_team_arch.rs)
    • initial support of LamellarArray (Am based collectives on distributed arrays)
    • integration with Rofi 0.2
    • revamped examples
  • version 0.2.2:
    • Provide examples in readme
  • version 0.2.1:
    • Provide the local lamellae as the default lamellae
    • feature guard rofi lamellae so that lamellar can build on systems without libfabrics and ROFI
    • added an example proxy app for doing a distributed DFT
  • version 0.2:
    • New user facing API
    • Registered Active Messages (enabling stable rust)
    • Remote Closures feature guarded for use with nightly rust
    • redesigned internal lamellae organization
    • initial support for world and teams (sub groups of PE)
  • version 0.1:
    • Basic init/finit functionalities
    • Remote Closure Execution
    • Basic memory management (heap and data section)
    • Basic Remote Memory Region Support (put/get)
    • ROFI Lamellae (Remote Closure Execution, Remote Memory Regions)
    • Sockets Lamellae (Remote Closure Execution, limited support for Remote Memory Regions)
    • simple examples

NOTES

STATUS

Lamellar is still under development, thus not all intended features are yet implemented.

CONTACTS

Current Team Members

Ryan Friese - [email protected]
Roberto Gioiosa - [email protected] Erdal Mutlu - [email protected]
Joseph Cottam - [email protected] Greg Roek - [email protected]

Past Team Members

Mark Raugas - [email protected]

License

This project is licensed under the BSD License - see the LICENSE.md file for details.

Acknowledgments

This work was supported by the High Performance Data Analytics (HPDA) Program at Pacific Northwest National Laboratory (PNNL), a multi-program DOE laboratory operated by Battelle.

lamellar-runtime's People

Contributors

erdalmutlu avatar ghproek avatar josephcottam avatar latesnow avatar rdfriese avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lamellar-runtime's Issues

Example does not compile with Rust 1.68.1

The following example on the page does not compile.

Code Block

use lamellar::array::prelude::*;

fn main(){
    let world = lamellar::LamellarWorldBuilder::new().build();
    let my_pe = world.my_pe();
    let block_array = AtomicArray::<usize>::new(&world, 1000, Distribution::Block); //we also support Cyclic distribution.
    block_array.dist_iter_mut().enumerate().for_each(move |elem| *elem = my_pe); //simultaneosuly initialize array accross all pes, each pe only updates its local data
    block_array.wait_all();
    block_array.barrier();
    if my_pe == 0{
        for (i,elem) in block_array.onesided_iter().into_iter().enumerate(){ //iterate through entire array on pe 0 (automatically transfering remote data)
            println!("i: {} = {})",i,elem);
        }
    }
}

Error Message

error[E0614]: type `(usize, lamellar::array::atomic::AtomicElement<usize>)` cannot be dereferenced

Version

RustC 1.68.1 
Lamellar = 0.5

Failed to build Lamellar with rofi

Hi,

I tried to build Lamellar on a Linux cluster. I first downloaded Lamellar as git clone https://github.com/pnnl/lamellar-runtime, and then built it usingcd lamellar-runtime && cargo build --features enable-rofi. I got the following error message:

  --- stderr
  autoreconf: Entering directory `.'
  autoreconf: configure.ac: not using Gettext
  autoreconf: running: aclocal --force -I config
  autoreconf: configure.ac: tracing
  autoreconf: running: libtoolize --copy --force
  autoreconf: running: /usr/bin/autoconf --force
  autoreconf: running: /usr/bin/autoheader --force
  autoreconf: running: automake --add-missing --copy --force-missing
  configure.ac:66: installing 'config/compile'
  configure.ac:17: installing 'config/missing'
  Makefile.am: installing 'config/depcomp'
  autoreconf: Leaving directory `.'
  ln: failed to create hard link 'src/.libs/libfabric.lax/lt1-libfabric_la-osd.o' => 'src/linux/libfabric_la-osd.o': Operation not permitted
  autoreconf: Entering directory `.'
  autoreconf: configure.ac: not using Gettext
  autoreconf: running: aclocal --force --warnings=none -I m4
  autoreconf: configure.ac: tracing
  autoreconf: running: libtoolize --copy --force
  autoreconf: running: /usr/bin/autoconf --force --warnings=none
  autoreconf: running: /usr/bin/autoheader --force --warnings=none
  autoreconf: running: automake --add-missing --copy --force-missing --warnings=none
  configure.ac:5: installing './compile'
  configure.ac:2: installing './missing'
  src/Makefile.am: installing './depcomp'
  autoreconf: Leaving directory `.'
  configure: WARNING: rdma/fabric.h: accepted by the compiler, rejected by the preprocessor!
  configure: WARNING: rdma/fabric.h: proceeding with the compiler's result
  configure: error: 
  thread 'main' panicked at /cluster/home/gaobin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/autotools-0.2.6/src/lib.rs:781:5:

  command did not execute successfully, got: exit status: 1

  build script failed, must exit now
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It seems to be the error of creating a hard link. But how to solve it?

Moreover, I also tried to build rofi-sys manually as:

git clone https://github.com/pnnl/rofi-sys.git
cd rofi-sys
cargo build

It still failed with the error message:

error: failed to run custom build command for `rofisys v0.3.0 (/cluster/home/gaobin/rofi-sys)`

Caused by:
  process didn't exit successfully: `/cluster/home/gaobin/rofi-sys/target/debug/build/rofisys-54d7b9fa7736e779/build-script-build` (exit status: 101)
  --- stdout
  running: cd "/cluster/home/gaobin/rofi-sys/target/debug/build/rofisys-0104480a2efda797/out/ofi_src" && "sh" "-c" "exec \"$0\" \"$@\"" "autoreconf" "-ivf"

  --- stderr
  autoreconf: 'configure.ac' or 'configure.in' is required
  thread 'main' panicked at /cluster/home/gaobin/.cargo/registry/src/index.crates.io-6f17d22bba15001f/autotools-0.2.6/src/lib.rs:781:5:

  command did not execute successfully, got: exit status: 1

  build script failed, must exit now
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I am not sure how this error could happen and does it have relationship with the error of building Lamellar?

Thanks.

Derive macro for 'Dist' in simple enums and structs

It would be nice to make a distributed array of structs. It seems reasonable to support #[derive(Dist)] if a struct or enum is made entirely of things that already implement Dist. (I say 'reasonable' having never made a macro in rust, so feel free to tell me this is not a reasonable request.)

Desired uses:

use lamellar::array::{Distribution, UnsafeArray};
use lamellar::{Dist, LamellarWorld};

// An enum with no associated values
enum Colors {
   Red,   White,   Blue,   Green,   Black,   Pink
}

// A struct with some fields that are already Dist
#[derive(Dist)]
struct Size {
    length: usize,
    width: usize,
    height: usize
}

// An enum like Option (some entries have values, some do not)
#[derive(Dist)]
enum Message {
    Letter,
    Package(Size)
}

struct ArrayWrapper<T> {
    array: UnsafeArray<T>,
}

impl<T: Dist> ArrayWrapper<T> {
    fn new(world: LamellarWorld, len: usize) -> Self {
        ArrayWrapper {
            array: UnsafeArray::<T>::new(world, len, Distribution::Block),
        }
    }
}

fn main() {
    let world = lamellar::LamellarWorldBuilder::new().build();
    let _my_pe = world.my_pe();
    let num_pes = world.num_pes();
    let colors = ArrayWrapper::<Colors>::new(world.clone(), 10*num_pes);
    let sizes = ArrayWrapper::<Size>::new(world.clone(), 10*num_pes);
    let messages = ArrayWrapper::<Message>::new(world.clone(), 10*num_pes);
    colors.array.print();
    sizes.array.print();
    messages.array.print();
}

The use immediate use-case is to provide a mailbox-style array for doing some inter-process communication, but it would make a nice general capability.

Expose lamellar running environment (like my_pe) in more general way

my_pe and num_pes are not consistently accessible. Making them consistently accessible enables instrumentation (like a generic lamellar_trace that will print the current PE if known).

Options include:

  • Making my_pe and num_pes part of an lamellar_environment trait (or something similar...names are hard) that Lamellar arrays types, teams and world all implement.
  • A crate-functions like lamellar_env::my_pe() that can introspect the current running environment.

Candidates for this type of accessibility include the current world identifier, the current team identifier, the current pe identifier and the number of worlds/teams/pes.

shmem backend thread panic when running examples

Hi,

I wasn't sure if this project was to a point where issues would be appropriate, but figured it couldn't hurt. When I run many of the examples using the shmem backend without rofi on 2 or more PEs on a single node I currently get the following error. I am able to run some of the array examples without issue and generally speaking 1 PE works.

RUST_BACKTRACE=full ./lamellar_run.sh -N=2 -T=10 target/debug/examples/am_return_am

thread '' panicked at 'attempt to add with overflow', /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/traits/accum.rs:141:1
stack backtrace:
0: 0x55f174ef41dc - std::backtrace_rs::backtrace::libunwind::trace::h91c465e73bf6c785
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x55f174ef41dc - std::backtrace_rs::backtrace::trace_unsynchronized::hae9da36f5d58b5f3
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x55f174ef41dc - std::sys_common::backtrace::_print_fmt::h7f499fa126a7effb
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:67:5
3: 0x55f174ef41dc - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3e2b509ce2ce6007
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:46:22
4: 0x55f174f1631c - core::fmt::write::h753c7571fa063ecb
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/fmt/mod.rs:1168:17
5: 0x55f174ef0cd3 - std::io::Write::write_fmt::h2815c0519c99ba09
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/io/mod.rs:1660:15
6: 0x55f174ef6612 - std::sys_common::backtrace::_print::h64941a6fc8b0ed9b
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:49:5
7: 0x55f174ef6612 - std::sys_common::backtrace::print::hcf25e43e1a9b0766
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:36:9
8: 0x55f174ef6612 - std::panicking::default_hook::{{closure}}::h78d3e6cf97fc623d
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:211:50
9: 0x55f174ef61f5 - std::panicking::default_hook::hda898f8d3ad1a5ae
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:228:9
10: 0x55f17471ded3 - <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call::h432af52895dbab9c
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1868:9
11: 0x55f1746ec55c - lamellar::scheduler::work_stealing::WorkStealingInner::init::{{closure}}::h200072a14ace5090
at /home/nixes/dev_work/lamellar-runtime/src/scheduler/work_stealing.rs:362:13
12: 0x55f174ef6c85 - std::panicking::rust_panic_with_hook::h1a5ea2d6c23051aa
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:610:17
13: 0x55f174ef6952 - std::panicking::begin_panic_handler::{{closure}}::h07f549390938b73f
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:500:13
14: 0x55f174ef4684 - std::sys_common::backtrace::__rust_end_short_backtrace::h5ec3758a92cfb00d
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:139:18
15: 0x55f174ef66b9 - rust_begin_unwind
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
16: 0x55f1742d98e1 - core::panicking::panic_fmt::h3a79a6a99affe1d5
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
17: 0x55f1742d982d - core::panicking::panic::h97167cd315d19cd4
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:48:5
18: 0x55f17477f7c8 - ::sum::{{closure}}::h6f36b1ba0736544d
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/traits/accum.rs:45:28
19: 0x55f1744507d1 - core::iter::adapters::map::map_fold::{{closure}}::h5ebf1fb0cac52e67
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/adapters/map.rs:84:21
20: 0x55f1744ee609 - core::iter::traits::iterator::Iterator::fold::h2015bcf3735fbf56
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/traits/iterator.rs:2171:21
21: 0x55f174457b63 - <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::h138ac22ff629d74b
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/adapters/map.rs:124:9
22: 0x55f17477f718 - ::sum::hea1767c71216a7d1
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/traits/accum.rs:42:17
23: 0x55f174465ca4 - core::iter::traits::iterator::Iterator::sum::hf0e6c90242aea12b
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/iter/traits/iterator.rs:3088:9
24: 0x55f17476db58 - lamellar::lamellae::command_queues::calc_hash::h4e6ac3bbf895c966
at /home/nixes/dev_work/lamellar-runtime/src/lamellae/command_queues.rs:60:5
25: 0x55f17476ef87 - lamellar::lamellae::command_queues::CmdMsgBuffer::flush_buffer::hd391a19071b8fce6
at /home/nixes/dev_work/lamellar-runtime/src/lamellae/command_queues.rs:251:28
26: 0x55f174772026 - lamellar::lamellae::command_queues::InnerCQ::try_sending_buffer::hde60e6bdb568879e
at /home/nixes/dev_work/lamellar-runtime/src/lamellae/command_queues.rs:561:17
27: 0x55f174772e13 - lamellar::lamellae::command_queues::InnerCQ::send::{{closure}}::ha9c4ef14e1c03bd7
at /home/nixes/dev_work/lamellar-runtime/src/lamellae/command_queues.rs:640:24
28: 0x55f17493df7d - <core::future::from_generator::GenFuture as core::future::future::Future>::poll::hb9698704e3ece047
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
29: 0x55f1747795ba - lamellar::lamellae::command_queues::CommandQueue::send_data::{{closure}}::h0431491961bae8f1
at /home/nixes/dev_work/lamellar-runtime/src/lamellae/command_queues.rs:1074:70
30: 0x55f174943b5d - <core::future::from_generator::GenFuture as core::future::future::Future>::poll::hdf6642be1199e606
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
31: 0x55f1745a5eef - <lamellar::lamellae::shmem_lamellae::Shmem as lamellar::lamellae::LamellaeAM>::send_to_pes_async::{{closure}}::h7450ba257cc90dfc
at /home/nixes/dev_work/lamellar-runtime/src/lamellae/shmem_lamellae.rs:149:40
32: 0x55f17493613c - <core::future::from_generator::GenFuture as core::future::future::Future>::poll::h7cf96167d09b9d9e
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
33: 0x55f174592875 - <core::pin::Pin

as core::future::future::Future>::poll::h6bd2cc5f3b69113e
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/future.rs:123:9
34: 0x55f17452baad - lamellar::active_messaging::registered_active_message::RegisteredActiveMessages::add_req_to_batch::{{closure}}::h4771c74641be2b30
at /home/nixes/dev_work/lamellar-runtime/src/active_messaging/registered_active_message.rs:413:91
35: 0x55f17493e5ac - <core::future::from_generator::GenFuture as core::future::future::Future>::poll::hbb4472c25f073069
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
36: 0x55f1746e9192 - <lamellar::scheduler::work_stealing::WorkStealingInner as lamellar::scheduler::AmeSchedulerQueue>::submit_task::{{closure}}::h245eed26e28f7c85
at /home/nixes/dev_work/lamellar-runtime/src/scheduler/work_stealing.rs:222:19
37: 0x55f17493688c - <core::future::from_generator::GenFuture as core::future::future::Future>::poll::h80ea87dbd92d3eaa
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/future/mod.rs:84:19
38: 0x55f174633eab - async_task::raw::RawTask<F,T,S>::run::h835262f5039256d8
at /home/nixes/.cargo/registry/src/github.com-1ecc6299db9ec823/async-task-4.2.0/src/raw.rs:489:20
39: 0x55f174e08061 - async_task::runnable::Runnable::run::hadecd10fa8c50bbf
at /home/nixes/.cargo/registry/src/github.com-1ecc6299db9ec823/async-task-4.2.0/src/runnable.rs:309:18
40: 0x55f1746e51a2 - lamellar::scheduler::work_stealing::WorkStealingThread::run::{{closure}}::hd510133d8eb64179
at /home/nixes/dev_work/lamellar-runtime/src/scheduler/work_stealing.rs:89:21
41: 0x55f1744169e3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h0da881cbb3dcf4a8
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys_common/backtrace.rs:123:18
42: 0x55f1743cc28d - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::h9b75cd87d68c1c6a
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/thread/mod.rs:477:17
43: 0x55f174415f31 - <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once::hbebabbb798283b2c
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panic/unwind_safe.rs:271:9
44: 0x55f17491d94c - std::panicking::try::do_call::h27158e6086ce710b
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:406:40
45: 0x55f17495c13b - __rust_try
46: 0x55f17491d887 - std::panicking::try::h629ae4a22cb5191d
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:370:19
47: 0x55f1745774a1 - std::panic::catch_unwind::h0e1d859b2965093d
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panic.rs:133:14
48: 0x55f1743cc0ad - std::thread::Builder::spawn_unchecked::{{closure}}::hbfd6028abfbdc549
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/thread/mod.rs:476:30
49: 0x55f17495c1bf - core::ops::function::FnOnce::call_once{{vtable.shim}}::hb59033300323e990
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:227:5
50: 0x55f174efa973 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once::h49b6c7c5155a2296
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
51: 0x55f174efa973 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once::ha8b5234bfeb15105
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854:9
52: 0x55f174efa973 - std::sys::unix::thread::Thread::new::thread_start::h6f207dd842d64859
at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/sys/unix/thread.rs:108:17
53: 0x7f26d8e175c2 - start_thread
54: 0x7f26d8e9c584 - __clone
55: 0x0 -

Runtime differences with different backends

Here is the code that the problem was reproduced with, but I think it has to do with the backends.

use lamellar::array::prelude::*;


fn main() {
    let world = lamellar::LamellarWorldBuilder::new().build();
    let my_pe = world.my_pe();
    let array_size: usize = 5000;
    let arr = AtomicArray::<usize>::new(&world, array_size, Distribution::Block);

    arr.block_on(arr.store(my_pe, my_pe + 1));
    if my_pe == 0 {
        println!("starting collect");
        let v: Vec<usize> = arr.onesided_iter().into_iter().map (|&x| x).collect();
        if v[my_pe] == my_pe + 1 {
            println!("passed check on pe {}", my_pe);
        }
        else {
            println!("failed check on pe {}", my_pe);
        }
    }
    arr.wait_all();
    println!("done on pe {}", my_pe);
}

Lamellar Version

dev branch
0.50

Rust version
rustc 1.68.1 (8460ca823 2023-03-20)

Local

cargo run: 0.1s

Shared memory

with -N=1 -T=10: 0.4s
with -N=2 -T=5: 27s
with -N=2 -T=10: 17s
with -N=10 -T=1: 2:08m

ROFI

-N 1 --mpi=pmi2 : 2:30m
-N 2 --mpi=pmi2 5:15m
-n 10 : 8:34m
-n 20 : 9:50m

Storing struct in AtomicArray out of bounds error

Summary of Problem

When storing a struct into an AtomicArray Lamellar/Rust gives an out of bonds error.

There may be some syntax errors in the code, I can check later if you cannot get it to compile and run.

use lamellar::array::prelude::*;
use serde::{Serialize, Deserilize};
use lamellar::memregion;

#[derive(copy, Clone, Debug, Serialize, Deserialize, Default)]

pub struct Text { 
    pub val : u32;
}

impl Text {
    pub fn new() -> Self {
        self { val: 0}
    }
}

impl Dist for Text {}

fn main() {
    let world = lamellar::LamellarWorldBuilder::new().build();
    let struct_arr = AtomicArray::<Text>::new(&world, 10, Distrubution::Block);
    let usize_arr = AtomicArray::<usize>::new(&world, 10, Distrubtion::Block);
    
    struct_arr.block_on(
        struct_arr.dist_iter().
        for_each(|elem| elem.store(Text{ val: 24}))
    );
    struct_arr.print();
    
    
    usize_arr.print();
    // Store works!
    usize_arr.block_on(usize_arr.store(0,8));
    usize_arr.print();
    
    let t = Text { val: 42 };
    // Causes panic: index out of bounds for 0 or 5;
    struct_arr.store(5, t);
    struct_arr.print();
}

Rust Version -- 1.67.0
Lamellar Version -- https://github.com/pnnl/lamellar-runtime/tree/dev

Implement count for local_iter

It would be nice to be able to do things like this:

let table: LocalLockArray<T> = LocalLockArray::<u128>::new(&world, 10000, Distribution::Block);
let local_count = table
        .local_iter()
        .filter(|e| e==0)
        .count();

let local_sum = table.local_iter().sum();

Multiple worlds

Novice users might not be aware that there are pitfalls in creating multiple worlds. Is there a way to help steer them clear?

Data structure design/implementation help

Hi,

I am trying to implement a distributed data structure using Lamellar that has some desirable properties for a problem I am working on and have hit a bit of a wall. I just wanted to check in and see if something like it was possible with what Lamellar has or plans to have in the future. No worries at all if it is out of scope or otherwise.

Essentially I would like to have a distributed array of Vec<(i64,f64)> where pes can grab/put Vec<(i64,f64)> values from/to other pes. It is also highly desirable to pass (i64,f64) values to be pushed to a specific Vec<> within the array, potentially on other processors, I was able to do this with upcxx in another code base using serializing and remote procedure calls (rpc), but was unsure of how to mimic it with Lamellar.

I have attempted to use active messages to do this, but I typically have gotten stuck attempting to pass (i64,f64) to be pushed to a given Vec in the distributed array. This is likely just due to my unfamiliarity with active messages vs rpc.

Best

Add API call to get number of PE threads

I'm working on some code that will only work if the PE only has one thread to execute with (some values are not Send). I would like to check that it was started with only one thread per PE when validating the program flags.

Error: cannot find function `rofi_sub_alloc` in crate `rofisys`

cargo build --release --features enable-rofi will error out (log below) despite having the dependencies loaded. I'm sure it is something on my end but I have no idea where to look next. Thanks

Error:

error[E0425]: cannot find function 
`rofi_sub_alloc` in crate `rofisys`
   --> src/lamellae/rofi/rofi_api.rs:46:50
    |
46  |               AllocationType::Sub(pes) => rofisys::rofi_sub_alloc(
    |                                                    ^^^^^^^^^^^^^^ help: a function with a similar name exists: `rofi_alloc`
    |
   ::: /scratch/shared/apps/lamellar-runtime/build/lamellar-runtime/target/release/build/rofisys-0de24f72639c8fdf/out/bindings.rs:113:5
    |
113 | /     pub fn rofi_alloc(
114 | |         arg1: usize,
115 | |         arg2: ::std::os::raw::c_ulong,
116 | |         arg3: *mut *mut ::std::os::raw::c_void,
117 | |     ) -> ::std::os::raw::c_int;
    | |______________________________- similarly named function `rofi_alloc` defined here

For more information about this error, try `rustc --explain E0425`.

Lint check for array-length-get-in-loop-init-leads-to-deadlock issue

Issue: Use of local_data().len()
Fix: Replace with num_elems_local()
Reason: They return the same value, but local_data (sometimes) grabs a lock that might not be released as quickly as expected (e.g., during loop initialization the lock is kept for the lifetime of the loop, not just the lifetime of the initialization).

While developing a benchmark using a LocalLockArray, the "issue" version was used in a loop init that also included an array put (e.g., for i in 0...array.local_data().len() {array.put(i, ...)}). Switching the loop init to 0...array.num_elems_local() fixed the problem. The issue only comes up for LocalLockArray, but the change is valid on all array types.

Can we make a clippy check (or similar) for this?

Add pointer to docs for deadlock from into_read_only()

The following code

let world           =       lamellar::LamellarWorldBuilder::new().build();
let array = UnsafeArray::<usize>::new(world.team(), 1 , Distribution::Cyclic);        
let ro = array.clone().into_read_only();
let ro2 = array.into_read_only();

seems to produce a deadlock in which the following message repeats continually

waiting for outstanding 1 [0/1] lc: 2 dc: 0 wc: 1
ref_cnt: [0]
 am_cnt (0,0)
mode [UnsafeArray]
waiting for outstanding 1 [0/1] lc: 2 dc: 0 wc: 1
ref_cnt: [0]
 am_cnt (0,0)
mode [UnsafeArray]
waiting for outstanding 1 [0/1] lc: 2 dc: 0 wc: 1
ref_cnt: [0]
 am_cnt (0,0)
mode [UnsafeArray]
waiting for outstanding 1 [0/1] lc: 2 dc: 0 wc: 1
ref_cnt: [0]
 am_cnt (0,0)
mode [UnsafeArray]
waiting for outstanding 1 [0/1] lc: 2 dc: 0 wc: 1
ref_cnt: [0]
 am_cnt (0,0)
mode [UnsafeArray]

Is it possible to expand this message / help point the reader to the issue?

ofi_monitor_cleanup: Assertion error

When running a lamellar program in parallel there is an error when the program attempts to finish.
I believe it is somewhere between ROFI and libfabric, if you want to this in the ROFI repo, feel free to say that and close this.

The example rust program is running to completion and then the following error occurs:

test_app: prov/util/src/util_mem_monitor.c:84: ofi_monitor_cleanup: Assertion `dlist_empty(&monitor->list)' failed.

libfabric version : 1.15.0

I can provide more details, just let me know.

Functions created with lamellar::am macro are not (all) public

I have an AM defined in file1 and I'm trying to use it in file2. This results in a compiler error like

error[E0624]: associated function `create_am_group` is private
  --> src/variants/file2.rs:
   |
   |         let mut msgs = typed_am_group!(Messages, lamellar::world);
   |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ private associated function
   |
  ::: src/variants/file1.rs:
   |
   | #[lamellar::am]
   | --------------- private associated function defined here
   |
   = note: this error originates in the macro `typed_am_group` (in Nightly builds, run with -Z macro-backtrace for more info)

I get the error for typed_am_group and add_am_pe (but there may be others latent).

Automated PE selection for active messages

Hi,

This is probably a super basic question but is it currently possible to submit an active message and allow the runtime to intelligently select a PE to accept/run the message rather than selecting the PE yourself? I ran into a case with a pretty sporadic load that isn't simple to pick a suitable PE and it would be nice to avoid doing so.

Check for lock-acquiring-temporaries unused across function call

A potentail lamellar-lint: Try to detect when a temporary holds a lock but that temporary is never used again.

async fn do_stuff(table: LocalLockArray) {
   let values = table.read_local_data().await;
   let v = values[0];
   do_other_stuff(table);
}

If do_other_stuff wants to write to table, it will be blocked because the read_lock is held until do_stuff finishes. It would be nice to have a lint that (1) knows what functions grab locks and (2) indicates when those locks probably aren't needed after a certain point (and suggests the temporary be explicitly dropped).

How to run tests of Lamellar

Hi,

I would like to know how to run tests of Lamellar correctly. Should I simply run cargo test, or do I also need to set up some environment variables? Thank you.

Moreover, I guess the TEST section is missing in README.md. It is written "see TEST section below" before the section "HISTORY".

LamellarWorldBuilder::new().build() and assert failure

Given the following code:

// lib.rs
#[cfg(test)]
pub mod test {
    #[test]
    fn lamellar_fails() {
        let world = lamellar::LamellarWorldBuilder::new().build();
        assert_eq!(0,2);
    }
   
    #[test]
    fn tautology() {
        assert_eq!(0,0);
    }
}

// main.rs
fn main() { }

One gets the following errors with cargo test lamellar_fails

running 1 test
fi_domain(): /path/to/project/target/debug/build/rofisys-<hash>/out/rofi_src/transport.c:607, ret=-61 (No data available) [0]
error: test failed, to return pass `--lib`

Caused by:
    process didnt exit succuessfully: `/path/to/project/target/debug/deps/test-<hash>` (signal: 11, SIGSEV: invalid memory reference)

and with srun -N <num_pes> --mpi=pmi2 /path/to/project/target/debug/deps/test-<hash> lamellar_fails

Running 1 test
...
<project>-<hash>: prov/util/src/util_mem_monitor.c:84: ofi_monitor_cleanup: Assertion `dlist_empty(&monitor->list)`  failed.
...
srun: Terminating job step...

Is this expected? I'm trying to create some tests for an application and this issue is corrupting cargo's test framework. When running other tests individually the output from cargo test is as one would expect. However, if a test similar to tautology above is included in the suite then output is truncated as shown above such that RUST_BACKTRACE=1 cargo test provides no backtrace for this test or any other that may fail.

When running with local a similar issue exists but only states that the process did not exit correctly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.