Giter Site home page Giter Site logo

nekbone's Introduction

Nek5000

Short Tests Examples
Build Build Status

Nek5000 is a fast and scalable open source CFD solver.

Release Notes

Make sure to read the release notes before using the code.

Getting Started

See here.

Troubleshooting

If you run into problems compiling, installing, or running Nek5000, please send a message to the User's Group mailing list.

Reporting Bugs

Nek5000 is hosted on GitHub and all bugs are reported and tracked through the Issues feature on GitHub. However, GitHub Issues should not be used for common troubleshooting purposes. If you are having trouble installing the code or getting your model to run properly, you should first send a message to the User's Group mailing list. If it turns out your issue really is a bug in the code, an issue will then be created on GitHub. If you want to request that a feature be added to the code, you may create an Issue on GitHub.

Contributing

Our project is hosted on GitHub. If you are planning a large contribution, we encourage you to discuss the concept here on GitHub and interact with us frequently to ensure that your effort is well-directed.

nekbone's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nekbone's Issues

Non-reproducibility with np=1

Ron's email to Paul and me on 4/14/2017:

One of the Nekbone examples (Nekbone/test/example1) runs a range
of nelt (1, 2, ..., 50) with nx1=10. For each nelt, Nekbone creates the "best" possible brick geometry (e.g, nelt=6 is a 3x2x1 geometry,
nelt=7 is 7x1x1, nelt=8 is 2x2x2, etc) and runs 100 cg iterations.

For nelt >= 2, the results are fully reproducible within 5 significant digits on all the implementations I've tried (CPU, GPU w/OpenACC, GPU w/CUDA). However, the nelt=1 case is not reproducible in general.
It isn't reproducible using only the "master" branch with np=1 on different CPU models.

For example, here are the results for the one-element case with np=1 on two CPUs: Intel Xeon E5-2650 and Intel Core i5-6400, respectively.

cg: 0 1.6117E+01
cg: 101 1.3185E-29 3.7009E-01 1.9081E-01 2.3652E-57

cg: 0 1.6117E+01
cg: 101 1.0378E-29 3.9533E-01 1.5498E-01 2.1325E-57

The values at the beginning of the CG sequence are (iter, rnorm); and the values at the end are (iter, rnorm, alpha, beta, pap). I'm not sure what alpha, beta, and pap represent? No mention in the readme...

Technical Report

Content of Document

  • Current on-node results:
    • Time-to-solution vs nelt
    • GFLOPS vs nelt
    • Latency study (see Issue #1 )
  • Multi-GPU results on Summitdev
    • GPU-direct

Target GPU systems

  • Telsa (OLCF Titan)
  • Maxwell (MCS tesla)
  • Pascal (JLSE neddy)
  • Volta (OLCF Summitdev)

Error check with exact solutions

This will be discussed with Paul and decide if we want to support testings
with exact solutions; current setting is done for purpose of performance tests only and so the solution will vary as you vary # of MPI ranks.

MPI_WTIME related bug with GCC 7.1 and OpenMPI

I've been building Nekbone on Arm (Aarch64) hardware with both the Arm compiler and GCC.
The Arm build appears to have worked, but the timing stats in the GCC build fail - the call to MPI_WTIME is not returning a meaningful number, leading to a total runtime recorded of 0.0s!

I've also observed the same issue on x86 hardware with the same compiler (GCC 7.1) and MPI (OpenMPI 3.0.0).

The behaviour can be worked around, either by calling an alternative timing routine or replacing "include 'mpif.h'" with "use mpi" in "real*8 function dnekclock()".

...is this a know issue?

Restructuring plan

Format Restructuring in Nekbone

  • Separate ax ACC and CUDA routines (example: ax_acc and ax_cuda)

Applying Nekbone Restructuring to Nek5000

  • (low priority) Apply CUDA kernels to Nek5000 for demonstrative purposes

Profiling Studies on GPU

Sources of latency as inferred from nvprof

  • stall_memory_dependency
  • stall_pipe_busy
  • stall_sync

FLOPS (current vs. Jing)

  • run Jing's version
  • identify FLOPS difference

Regression tests

  • Install Jenkins on Tesla workstation at ANL
  • Implement test scripts in Jenkins
  • Remove test scripts from /test/nek_gpu1/scripts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.