Giter Site home page Giter Site logo

llnl / raja Goto Github PK

View Code? Open in Web Editor NEW
439.0 33.0 100.0 37.88 MB

RAJA Performance Portability Layer (C++)

License: BSD 3-Clause "New" or "Revised" License

CMake 3.76% C++ 94.43% Shell 1.50% Dockerfile 0.31%
c-plus-plus portability programming-model parallel-computing raja llnl cpp blt radiuss

raja's Introduction

RAJA

Azure Pipeline Build Status Documentation Status Coverage Join the chat at https://gitter.im/llnl/raja

RAJA is a library of C++ software abstractions, primarily developed at Lawrence Livermore National Laboratory (LLNL), that enables architecture and programming model portability for HPC applications. RAJA has two main goals:

  • To enable application portability with manageable disruption to existing algorithms and programming styles
  • To achieve performance comparable to using common programming models, such as OpenMP, CUDA, etc. directly.

RAJA offers portable, parallel loop execution by providing building blocks that extend the generally-accepted parallel for idiom. RAJA relies on standard C++14 features.

RAJA's design is rooted in decades of experience working on production mesh-based multiphysics applications. Based on the diversity of algorithms and software engineering styles used in such applications, RAJA is designed to enable application developers to adapt RAJA concepts and specialize them for different code implementation patterns and C++ usage.

RAJA shares goals and concepts found in other C++ portability abstraction approaches, such as Kokkos and Thrust. However, it includes concepts and capabilities that are absent in other models that are fundamental to applications we work with.

It is important to note that, although RAJA is used in a diversity of production applications, it is very much a work-in-progress. The community of researchers and application developers at LLNL that actively contribute to it is growing. Versions available as GitHub releases contain mostly well-used and well-tested features. Our core interfaces are fairly stable while underlying implementations are being refined. Additional features will appear in future releases.

Quick Start

The RAJA code lives in a GitHub repository. To clone the repo, use the command:

git clone --recursive https://github.com/llnl/raja.git

Then, you can build RAJA like any other CMake project, provided you have a C++ compiler that supports the C++14 standard. The simplest way to build the code, using your system default compiler, is to run the following sequence of commands in the top-level RAJA directory (in-source builds are not allowed!):

mkdir build
cd build
cmake ../
make

More details about RAJA configuration options are located in the RAJA User Guide (linked below).

We also maintain a RAJA Template Project that shows how to use RAJA in a CMake project, either as a Git submodule or as an installed library.

User Documentation

The RAJA User Guide is the best place to start learning about RAJA and how to use it.

The most recent version of the User Guide (RAJA develop branch): https://raja.readthedocs.io

To access docs for other RAJA released versions: https://readthedocs.org/projects/raja/

To cite RAJA, please use the following references:

  • RAJA Performance Portability Layer. https://github.com/LLNL/RAJA

  • D. A. Beckingsale, J. Burmark, R. Hornung, H. Jones, W. Killian, A. J. Kunen, O. Pearce, P. Robinson, B. S. Ryujin, T. R. W. Scogland, "RAJA: Portable Performance for Large-Scale Scientific Applications", 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). Download here

Related Software

The RAJA Performance Suite contains a collection of loop kernels implemented in multiple RAJA and non-RAJA variants. We use it to monitor and assess RAJA performance on different platforms using a variety of compilers. Many major compiler vendors use the Suite to improve their support of abstractions like RAJA.

The RAJA Proxies repository contains RAJA versions of several important HPC proxy applications.

CHAI provides a managed array abstraction that works with RAJA to automatically copy data used in RAJA kernels to the appropriate space for execution. It was developed as a complement to RAJA.

Communicate with Us

The most effective way to communicate with the core RAJA development team is via our mailing list: [email protected]

You are also welcome to join our RAJA Google Group.

If you have questions, find a bug, or have ideas about expanding the functionality or applicability of RAJA and are interested in contributing to its development, please do not hesitate to contact us. We are very interested in improving RAJA and exploring new ways to use it.

Contributions

The RAJA team follows the GitFlow development model. Folks wishing to contribute to RAJA, should include their work in a feature branch created from the RAJA develop branch. That branch contains the latest work in RAJA. Then, create a pull request with the develop branch as the destination. Periodically, we merge the develop branch into the main branch and tag a new release.

Authors

Please see the RAJA Contributors Page, to see the full list of contributors to the project.

License

RAJA is licensed under the BSD 3-Clause license.

Copyrights and patents in the RAJA project are retained by contributors. No copyright assignment is required to contribute to RAJA.

Unlimited Open Source - BSD 3-clause Distribution LLNL-CODE-689114 OCEC-16-063

For release details and restrictions, please see the information in the following:

SPDX usage

Individual files contain SPDX tags instead of the full license text. This enables machine processing of license information based on the SPDX License Identifiers that are available here: https://spdx.org/licenses/

Files that are licensed as BSD 3-Clause contain the following text in the license header:

SPDX-License-Identifier: (BSD-3-Clause)

External Packages

RAJA bundles its external dependencies as submodules in the git repository. These packages are covered by various permissive licenses. A summary listing follows. See the license included with each package for full details.

PackageName: BLT
PackageHomePage: https://github.com/LLNL/blt
PackageLicenseDeclared: BSD-3-Clause

PackageName: camp
PackageHomePage: https://github.com/LLNL/camp
PackageLicenseDeclared: BSD-3-Clause

PackageName: CUB
PackageHomePage: https://github.com/NVlabs/cub
PackageLicenseDeclared: BSD-3-Clause

PackageName: rocPRIM
PackageHomePage: https://github.com/ROCmSoftwarePlatform/rocPRIM.git
PackageLicenseDeclared: MIT License

raja's People

Contributors

adayton1 avatar adrienbernede avatar ajkunen avatar artv3 avatar bmhan12 avatar braxtoncuneo avatar crobeck avatar dannnno avatar davidbeckingsale avatar davidpoliakoff avatar gberg617 avatar gzagaris avatar homerdin avatar jeffhammond avatar johnbowen42 avatar johnhynes avatar jonesholger avatar joshessman-llnl avatar kab163 avatar ksuarez1423 avatar mdavis36 avatar mrburmark avatar noelchalmers avatar pearce8 avatar rchen20 avatar rhornung67 avatar robinson96 avatar tepperly avatar trws avatar willkill07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raja's Issues

Collect requirements and use cases for scan support

Several users have asked if RAJA will eventually support scan operations.

Meet with potential users and document use cases and requirements.

Based on this input, add issues, prioritize, and schedule work.

Improve RAJA::forallN documentation

It would be really helpful to have more documentation about the different policies for RAJA::forallN. What's the semantics of z, y, and x in cuda_block_z_exec, cuda_threadblock_y_exec, and cuda_threadblock_x_exec? Is one preferable for the stride-1 dimension in an iteration?

For OpenMP policies, can I get collapse without getting nowait? What are all the available options, and what do they mean?

Gather needs/requirements for RAJA "external" library

Such a library would contain things based on RAJA constructs that would help users build specifc capabilities to meet their needs. Items mentioned for inclusion: stencil support, etc.

Document needs/reqs and design on Confluence.

Fix IndexSet implementation to work in GPU memory space

This mainly involves removal of polymorphism via virtual functions. There are several potential solutions to this. It may require some exploration/prototyping to figure out what's best.

Also, build unit tests for this.

./scripts/config-build.py AssertionError: Could not find cmake cache file

Hi, I was following instructions on http://software.llnl.gov/RAJA/config_build.html .
I got the following error:

[liao6@tux385:~/workspace/ASC-project/RAJA.git]./scripts/config-build.py
Traceback (most recent call last):
File "./scripts/config-build.py", line 85, in
assert os.path.exists( cachefile ), "Could not find cmake cache file '%s'." % cachefile
AssertionError: Could not find cmake cache file '/home/liao6/workspace/ASC-project/RAJA.git/host-configs/other/tux385.llnl.gov.cmake'.

Is this expected?

1D Reductions

We need to support more than 1D reductions both in block dimension and in grid dimension. To start I have created a branch that uses generic thread and block IDs like so:

int blockId = blockIdx.x
+ blockIdx.y * gridDim.x
+ gridDim.x * gridDim.y * blockIdx.z;

int threadId = threadIdx.x +
blockDim.x * threadIdx.y +
(blockDim.x * blockDim.y) * threadIdx.z;

current unit tests all pass, but this really needs to be exercised with 3D kernel launches

BTW the branch will be 3d-generic-reduction

Develop wrappers for atomic operations

Some app developers have requested wrappers for atomic operations (like we have defined for CUDA reductions) so that they can use these in kernels in a portable way. For example, what we have now is a limited set and only works with CUDA. If you try to switch the execution policy to run the same code with OpenMP, for example, it won't work.

Interface abstraction hardening tracking issue

This is meant to be a tracking issue for the general discussion around establishing the interface contracts RAJA presents to users, and requires internally. @rhornung67, @Keasler, @ajkunen, @davidbeckingsale, @DavidPoliakoff and myself have had a number of discussions on these topics, and I would be grateful for any clarifications or extensions you all would like to add on these topics. Not all of these things will necessarily be done, but they're things we've talked about enough that I think they deserve to at least be considered, and this will help us keep track of them.

  • Define the interface elements required of:
    • Iterables:
      • Segments: PR #39 makes a first pass at this, allowing any random access iterable to be passed as an Iterable, this may be a first step
      • IndexSets (if different from segments)
      • others: are forward iterators sufficient for sequential policies? This might be useful for implementing IndexSet style generators.
    • Execution policies
    • Backend implementations (like cilk/openmp/...)
  • Clarify what is the job of the execution policy and what is the job of the iterable, these sometimes get conflated a bit and it may help for clarity of the interface to normalize it where possible.
  • Define what interfaces are user-facing and what interfaces are "internal." This is important for defining how things can be extended, and may allow us greater internal flexibility while maintaining user-facing API compatibility.
  • Explore the possibility of normalizing the interfaces used for forall and forallN policies (highly related to the required interface of an execution policy, but not necessarily the same)
  • Investigate what will be required to support:
    • Plugins, such as CHAI, profiling etc.
    • Asynchronous regions, including dependencies between them
    • Generalized, and possibly more efficient, reductions (option note, if the lambda is copied by value into each thread or task context, this may have some nice and interesting possibilities)
    • NOTE: the two above may be approachable as a common item, we've been calling one option for this a "context," somewhat in the style of CHAI's context mechanism but more generalized
  • Wherever possible establish formal contracts for all of the above, in the style of those discussed in c interfaces and implementations or the flux RFC process, with what is an error checked at compile time, error checked at runtime, and what is an unchecked error. This is hard and much more like working on a standard, but the more we keep it in mind the easier it would be to do later.

Fix "vanilla cmake" build system

Sorry if this comes off as blunt...

@rhornung67 and I were trying to get RAJA to build on BG/Q with no success.

I didn't review any of the vanilla cmake PR, but now that I'm looking at it.... well, it's worrisome.
Using host-config files was pretty rock solid, and using raw cmake seems terrible.

Now instead of baking in the compiler+architecture flags in a way that makes sense, using a host-config file, we have a compiler cmake module with a large switch-yard in it. Is this really better?

My understanding was that if we used a tool like Spack, it would have been able to invoke cmake directly and bypass the host-config script. This allowed us to either use the HC files or not. But now we are forced into using an odd combination of shell scripts and cmake modules that take the place of the HC files.

I think this was a major regression, both on platform support and usability.

forallN development tasks

@trws

  • Eliminate Code Gen (#4)
  • Intel compiler issues (slowness in compiling, performance issues) (#5)
  • Clang OpenMP (backed out in commit ceca2c6 ??? need to hunt down specific commit?)
  • nvcc lambda "introspection???" (need to generate a branch for this)
  • add static_assert's to help the user out!
  • add sequential tests for CPU nested and CPU nested reduce tests so the forallN machinery gets tested for all configurations (i.e., without OpenMP enabled)

BG/Q timer output issues

Bad timer output running RAJA version of LULESH-v1.0 with OpenMP on BG/Q. I'm running the code in @trws branch trws:chrono-timer-fix.

Here's what I see when compiled with gcc-4.8.4:

"chrono" option (BAD)
Total Cycle Time (sec) = -100876817.742690
Total main Time (sec) = 1351950087.947958

"gettime" option (BAD)
Total Cycle Time (sec) = -127775258991.775274
Total main Time (sec) = 127775260511.756386

Here's what I see with with clang-3.9.0:

"chrono" option (GOOD)
Total Cycle Time (sec) = 79.276949
Total main Time (sec) = 79.339523

"gettime" option (BAD)
Total Cycle Time (sec) = 20221439.999974
Total main Time (sec) = -9688.000000

I believe the output is good with either timer option when running with gcc compilation on LLNL Linux clusters.

Imrpove error handling for adding segments to IndexSet

Currently, if one attempts to add an invalid segment type to an IndexSet, the code silently ignores the segment and doesn't add it. This should be reported as a warning, halt the code, or whatever mechanism we choose to help users understand anomalous usage.

Note: the whole 'valid segment' concept may go away in the future when we improve the implementation of IndexSet. Nevertheless, it is an issue now.

Add support for array reductions

We may want to support reduction operations for data beyond scalars.

First, meet with application users and collect use cases from different codes to see what's really needed. For example, do users need reducer objects that take a pointer and a length in their constructors, take a pointer in their reduction operators, and return a pointer to the reduced array in the () operator. The reduced array is equivalent to applying the associated scalar reduction to each element in the array.

Second, document user input. Then, based on what's needed, add issues, prioritize, and schedule work.

Note that we don't need "loc" reductions for this because it doesn't really make sense.

Broken links in README.md

Hey guys,

Out today, but noticed the links in the readme on github are broken. Oof. Will fix tomorrow if one of you enterprising souls doesn't do it first

Develop more flexible, general IndexSet implementation

Issues with current implementation:

  • switch statements in routines
  • supported segment types are hard-coded
  • doesn't support user-defined segment types without code modifications or writing a new IndexSet class
  • doesn't limit templated code instantiation to only the segment types that are used in the index set

Develop OpenMP 4.5 backend

Now that we have a viable clang compiler for this, build an OpenMP 4.5 backend to be able to compare performance, etc. Tasks include:

  • traversals and policies
  • reductions
  • unit tests

Intel compiler 6x OpenMP performance degradation for feature/keasler/BoxSegment

The LULESH v1.0 application runs 6x slower on the feature/keasler/BoxSegment branch due to an Intel compiler bug. gcc-4.9.3p runs at the same speed on the develop branch and the feature/keasler/BoxSegment branch. That said, the Intel compiler produces code that is over 30% faster on the develop branch, so this is not a dis on the Intel compiler.

Decorate functions consistently with constexpr/const/noexcept

The merge of #7 causes clang to complain about the usage constexpr without const for member functions.

include/RAJA/LegacyCompatibility.hxx(127): warning #3699: constexpr non-static member function will not be implicitly 'const' in C++14

In general, we should ensure that functions are decorated properly with constexpr/const/noexcept where appropriate (noexcept in particular with more efficient codegen). C++14 requires const to be added to constexpr non-member functions. noexcept should be added to any function which cannot throw. This could be particularly useful with IndexSet/Segment and ensuring std::vector can call std::move_if_noexcept when resizing.

Reorganize header files for flexible policy selection

Currently, we encourage users to include a single RAJA header file in their applications, which pulls in all of RAJA functionality that is enabled. We need to break this apart into a sensible scheme that enables users to clearly see how to include only the parts of RAJA that they need. For example, if they only want to use CUDA, they should be able to only include RAJA-CUDA support via a single header file and ignore all the rest (e.g., OpenMP, etc.).

We've started to discuss this on our Confluence page, but it requires group discussion to full flesh out.

Also, this should be done after the execution policy PR is merged.

Add reset methods to reductions

Several users have asked for the ability to re-initialize reduction objects so they can reuse them, without having to construct new ones.

Document needs for dependent tasks in RAJA

  • Prototype scope operation
  • Need to understand which programming model it is applied to (e.g., track execution context/dependency information). Need handle to interact with.
  • Develop abstraction for CUDA stream scope

How to do these sorts of things without severe code disruption or violating encapsulation??

Document or remove prototypes with unclear purpose or usage

I've noticed a couple of outward-facing functions that puzzle me. Thoughts/history @Keasler, @rhornung67?

  • forall_segments: this has definitions in a couple of places, but is not used by anything in raja itself. Is this a deprecated usage, or something in active use?
  • void forall(const INDEXSET_T& iset, LOOP_BODY loop_body): This takes any first argument at all and treats it as an indexset. Is there an assumption someplace that custom IndexSet types do not need to inherit from IndexSet?

CUDA 8.0RC1 build issue

SM60 declares an atomicAdd for doubles which conflicts with the CAS variants we currently have. We will likely need some gencode or CUDA_ARCH range checks and/or enforcement.

Build fails with CUDA v8.0

I tried to build RAJA on a POWER8 system, but it failed with the following error messages.

[ 15%] Built target lulesh-OMP.exe
[ 16%] Building NVCC (Device) object test/LULESH-v1.0/LULESH-v1.0_RAJA-variants/CMakeFiles/lulesh-RAJA-parallel.exe.dir/lulesh-RAJA-parallel.exe_generated_luleshRAJA-parallel.cxx.o
/home/mstars/RAJA-develop/include/RAJA/exec-cuda/reduce_cuda.hxx(269): error: cannot overload functions distinguished by return type alone

1 error detected in the compilation of "/tmp/tmpxft_0000fe77_00000000-5_luleshRAJA-parallel.cpp4.ii".
CMake Error at lulesh-RAJA-parallel.exe_generated_luleshRAJA-parallel.cxx.o.cmake:260 (message):
Error generating file
/home/mstars/RAJA-develop/build/test/LULESH-v1.0/LULESH-v1.0_RAJA-variants/CMakeFiles/lulesh-RAJA-parallel.exe.dir//./lulesh-RAJA-parallel.exe_generated_luleshRAJA-parallel.cxx.o

make[2]: *** [test/LULESH-v1.0/LULESH-v1.0_RAJA-variants/CMakeFiles/lulesh-RAJA-parallel.exe.dir/lulesh-RAJA-parallel.exe_generated_luleshRAJA-parallel.cxx.o] Error 1
make[1]: *** [test/LULESH-v1.0/LULESH-v1.0_RAJA-variants/CMakeFiles/lulesh-RAJA-parallel.exe.dir/all] Error 2
make: *** [all] Error 2

Compiling for shared libraries

The rest of my code is compiled with -fPIC -DPIC, and RAJA won't compile out of the box with -fPIC -DPIC. I wish there was a way to do something like --enable-shared when configuring RAJA. Simillarly, it would be nice if I could just --enable-cuda or --disable-cuda instead of having to reach in and pick a config file.

Regards,

Tom

Write introductory user documentation

Make RAJA easier to build and use. Documentation needs to include some basic examples of the simpler forall constructs, as well as more advanced use cases. Just having the included version of LULESH and Kripke is insufficient for this purpose.

Other things to address in user documentation:

  • recommended pattern for developing app-specific execution policy

XL compiler failing to build with errors in real_datatypes.hxx

When building RAJA code with the XL compiler targeting Power 8, the following error occurs in real_datatypes.hxx

error: unknown linkage language "builtin"

This happens when it gets to building LULESH, or if you skip building these, on your own code. RAJA itself builds OK.

A possible fix is to comment out, as below:

#elif defined(RAJA_COMPILER_XLC12)
/*
#ifndef RAJA_COMPILER_XLC_POWER8
extern
#ifdef __cplusplus
"builtin"
#endif
void __alignx(int n, const void* adds);
#endif
*/

omp_get_max_threads in forall_segments

Our use of omp_get_max_threads in forall_segments is questionable. For example, it may do the wrong thing if environment variables are not properly set. Look into this more carefully to determine a more robust solution.

RAJA assumes linux

There are a few places in the repository that use "sched_*" or linux-specific timer interfaces. It would be good to shift, or at least add a fallback, to C++11 standard alternatives to these. Examples:

  • sched_yield() -> std::this_thread::yield()
  • clock/gettime/etc. -> std::chrono::...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.