berkeleylab / inference-engine Goto Github PK

A deep learning library for use in high-performance computing applications in modern Fortran

License: Other

Fortran 94.83% Shell 4.74% Gnuplot 0.43%

deep-learning fortran2018 inference artificial-intelligence machine-learning neural-networks ann artificial-neural-networks feedforward-neural-network neural-network

inference-engine's Introduction

  _        __                                                     _            
 (_)      / _|                                                   (_)           
  _ _ __ | |_ ___ _ __ ___ _ __   ___ ___         ___ _ __   __ _ _ _ __   ___ 
 | | '_ \|  _/ _ \ '__/ _ \ '_ \ / __/ _ \  __   / _ \ '_ \ / _` | | '_ \ / _ \
 | | | | | ||  __/ | |  __/ | | | (_|  __/ |__| |  __/ | | | (_| | | | | |  __/
 |_|_| |_|_| \___|_|  \___|_| |_|\___\___|       \___|_| |_|\__, |_|_| |_|\___|
                                                             __/ |             
                                                            |___/

Inference-Engine

Overview
Downloading, Building and testing
Examples
Documentation

Overview

Inference-Engine supports research in concurrent, large-batch inference and training of deep, feed-forward neural networks. Inference-Engine targets high-performance computing (HPC) applications with performance-critical inference and training needs. The initial target application is in situ training of a cloud microphysics model proxy for the Intermediate Complexity Atmospheric Research (ICAR) model. Such a proxy must support concurrent inference at every grid point at every time step of an ICAR run. For validation purposes, Inference-Engine also supports the export and import of neural networks to and from Python by the companion package nexport.

The features of Inference-Engine that make it suitable for use in HPC applications include

Implementation in Fortran 2018.
Exposing concurrency via

Elemental, implicitly pure inference procedures,
An elemental and implicitly pure activation strategy, and
A pure training subroutine,

Gathering network weights and biases into contiguous arrays for efficient memory access patterns, and
User-controlled mini-batch size facilitating in situ training at application runtime.

Making Inference-Engine's infer functions and train subroutines pure facilitates invoking those procedures inside Fortran do concurrent constructs, which some compilers can offload automatically to graphics processing units (GPUs). The use of contiguous arrays facilitates spatial locality in memory access patterns. User control of mini-batch size facilitates in-situ training at application runtime.

The available optimizers for training neural networks are

Stochastic gradient descent
Adam (recommended)

Build and Test

With the Fortran Package Manager (fpm) and a recent version of a Fortran compiler installed, enter one of the commmands below to build the Inference-Engine library and run the test suite:

GNU (`gfortran`) 13 or higher required

fpm test --profile release

Intel (`ifx`)

fpm test --compiler ifx --profile release --flag -O3

Experimental: Automatic offloading of `do concurrent` to GPUs

This capability is under development with the goal to facilitate automatic GPU offloading via the following command:

fpm test --compiler ifx --profile releae --flag "-fopenmp-target-do-concurrent -qopenmp -fopenmp-targets=spir64 -O3"

LLVM (`flang-new`)

Building with flang-new requires passing flags to enable the compiler's experimental support for assumed-rank entities:

fpm test --compiler flang-new --flag "-mmlir -allow-assumed-rank -O3"

A script that might help with building flang-new from source is in the handy-dandy repository.

NAG (`nagfor`)

fpm test --compiler nagfor --flag -fpp --profile release

HPE (`crayftn.sh`) -- under development

Support for the Cray Compiler Environment (CCE) Fortran compiler is under development. Building with the CCE ftn compiler wrapper requires an additional trivial wrapper shell script. For example, create a file crayftn.sh with the following contents and place this file's location in your PATH:

#!/bin/bash

ftn "$@"

Then execute

fpm test --compiler crayftn.sh

Examples

The example subdirectory contains demonstrations of several intended use cases.

Configuring a Training Run

To see the format for a JSON configuration file that defines the hyperparameters and a new network configuration for a training run, execute the provided training-configuration output example program:

% ./build/run-fpm.sh run --example print-training-configuration
Project is up to date
 {
     "hyperparameters": {
         "mini-batches" : 10,
         "learning rate" : 1.50000000,
         "optimizer" : "adam"
     }
 ,
     "network configuration": {
         "skip connections" : false,
         "nodes per layer" : [2,72,2],
         "activation function" : "sigmoid"
     }
 }

As of this writing, the JSON file format is fragile. Because an Intel ifx compiler bug prevents using our preferred JSON interface, rojff, Inference-Engine currently uses a very restricted JSON subset written and read by the sourcery utility's string_t type-bound procedures. For this to work, it is important to keep input files as close as possible to the exact form shown above. In particular, do not split, combine or reorder lines. Adding or removing whitespace should be ok.

Documentation

Please see the Inference-Engine GitHub Pages site for HTML documentation generated by ford.

inference-engine's People

Contributors

Stargazers

Watchers

Forkers

everythingfunctional scrasmussen zhawhjw kareem-weaver federicavil xawpaw

inference-engine's Issues

Feature: concurrent multi-inference

PR #7 contains a skeletal demonstration of concurrent inference using multiple networks.

To Do

Finish the space_delimited_strings_to_array() internal function.
Define an array of inputs for each network
Write a test

The new test could evaluate the XOR truth table concurrently with each do concurrent iteration using an independent copy of the XOR neural network evaluation. This is potentially faster than the previous sequential evaluation of the truth table in the existing tests.

Feature: allow non-uniform layer widths in JSON I/O

Update README based on new directory changes

Update README based on new directory changes, see comment posted by @rouson in #108 (comment)_

Investigate procedure pointers as a replacement for the activation_strategy polymorphic components

Procedure pointers might be easier for automatic parallelization on GPUs by compilers translating source to OpenMP.

Remove nagfor workaround

There is a workaround for a nagfor bug in the src dir that can be removed because the latest version of nagfor should have fixed this.

Remove macro and test to make sure it works
move .F90 to .f90

Investigate the use of locality specificiers, including `reduce`, in do concurrent

This could expose opportunities for optimization.

cloud-microphysics/setup.sh fails when ~/.local/lib/pkgconfig is missing

On macOS with no ~./local/lib/pkgconfig directory, executing

git clone [email protected]:berkeleylab/inference-engine
cd inference-engine
./setup

completes and all tests pass. However,

cd cloud-microphysics
./setup.sh

yields the following trailing output:

realpath: /Users/rouson/.local/lib/pkgconfig: No such file or directory

After new version of ifx released, remove bug workarounds where can

After new version of ifx released, remove bug workarounds where can.

Remove ifdef related to associate-stmt in get_key_value in inference_engine_s.F90
Try to remove ifdef related to associate-stmt in from_file in training_configuration_s.f90
...

Support LLVM Flang

Currently, the command

fpm test --compiler flang-new --flag "-mmlir -allow-assumed-rank"

yields the trailing output

[ 57%]        inference_engine_m_.f90  done.

error: Semantic errors in ././src/inference_engine/inference_engine_m_.f90
./././src/inference_engine/inference_engine_m_.f90:72:32: error: Result of pure function may not have polymorphic ALLOCATABLE ultimate component '%activation_strategy_'
        type(inference_engine_t) inference_engine
                                 ^^^^^^^^^^^^^^^^
./././src/inference_engine/inference_engine_m_.f90:30:50: Declaration of 'activation_strategy_'
      class(activation_strategy_t), allocatable :: activation_strategy_ ! Strategy Pattern facilitates elemental activation
                                                   ^^^^^^^^^^^^^^^^^^^^
./././src/inference_engine/inference_engine_m_.f90:104:24: error: Result of pure function may not have polymorphic ALLOCATABLE ultimate component '%activation_strategy_'
        type(exchange_t) exchange
                         ^^^^^^^^
./././src/inference_engine/inference_engine_m_.f90:52:50: Declaration of 'activation_strategy_'
      class(activation_strategy_t), allocatable :: activation_strategy_ ! Strategy Pattern facilitates elemental activation
                                                   ^^^^^^^^^^^^^^^^^^^^
<ERROR> Compilation failed for object " src_inference_engine_inference_engine_m_.f90.o "
<ERROR> stopping due to failed compilation
STOP 1

Add `cloud-microphysics` setup script to CI

Add cloud-microphysics setup script to CI

See comment posted by @rouson in #108 (comment)_

add topics

I suggest adding topics such as ann, neural-network, feedforward-neural-network in the About section at https://github.com/BerkeleyLab/inference-engine

Use the Sourcery Institute DAG library to draw the XOR network

See DAG.

Develop alternative "infer" method(s)

Each of the three-line do concurrent/dot_product blocks in the [infer] type-bound procedure can be collapsed down to a one-line invocation of matmul. By default, it seems likely that most compilers would generate faster code with matmul, but it's best to be able to compare the two approaches with multiple compilers on multiple platforms to determine whether or not matmul is always superior. Scenarios to consider:

Using the compiler's default matmul implementation.
Using a compiler option, if available, to switch to an optimized library version of matmul.
Using do concurrent with with various optimization flags (-O...) set.
Using a compiler option, if available, to offload do concurrent to a GPU.
Using a compiler option, if available, to offload matmul to a GPU.

Option 4 and possibly option 5 are available with the Intel ifx compiler as of the 2022.3 version of oneaAPI released two weeks ago. Option 4 and possibly option 5 has also been available with the NVIDIA nvfortran compiler since about 2 years ago, but nvfortran has limited support for Fortran 2008 and extremely limited support for Fortran 2018. I believe our only 2008 features are do concurrent (which nvfortran supports), module function/module subroutine interface bodies, and submodule, which nvfortran might or might not support. Working around the latter two features would require a lot of code revision, but would not be too painful.

Let's develop an alternative implementation of infer that does this and enable switching between the two with a C preprocessor macro something like

#ifdef DO_CONCURRENT_INFER
  module procedure infer
     ! (concurrent infer implementation)
  end procedure
#else
  module procedure infer
     ! (matmul implementation)
  end procedure
#endif

Prep for opening the source

Add LICENSE.txt file with copyright notice and license agreement
Add statement referring to the license at the top of each source file
Add build instructions to the README.md
Add a basic ford project file
Set up the CI to post the ford documentation to GitHub Pages

Issue Compiling with ifx

Here's the error I get trying to compile with ifx:

layer_s.f90                            failed.
[ 57%] Compiling...
././src/inference_engine/layer_s.f90(22): error #6197: An assignment of different structure types is invalid.   [CONSTRUCT]
    layer%neuron = neuron_t(layer_lines, start+1)
-------------------^

Check `do-concurrent` `reduce` locality specifier support on different compilers

Check do-concurrent reduce locality specifier support on different compilers that we are or hope to use to compiler inference-engine, as we may want to add said locality specifier to the code.

Isolate and report NAG compiler bug

git checkout add-file-reader
fpm test --compiler nagfor --flag -fpp

...

NAG Fortran Compiler Release 7.1(Hanzomon) Build 7113
Questionable: ./src/inference_engine_s.f90, line 229: Variable C set but never referenced
Panic: ./src/inference_engine_s.f90: free_TBF_item: Invalid item?
Internal Error -- please report this bug
Abort
<ERROR> Compilation failed for object " src_inference_engine_s.f90.o "
<ERROR>stopping due to failed compilation
STOP 1

Fix: Disambiguate the `tensor_range` names to make the JSON format valid

Add as many Fortran compilers to CI as possible

Potential performance optimizations in the train subroutine

Hoist the if (.not. allocated(...)) ... blocks into an initialization procedure and instead assert allocated status
Have caller provide a cost array that is of size num_mini_batches (make cost intent(inout))
Allocate pair_cost array of size maxval(size(mini_batches))

Suggestion: short description on repo page

We should add a short description to the repo. Something along the lines of:
"A Fortran 2018 <package/module> which enables inference using trained dense neural networks."

./build/run-fpm.sh in inference-engine/cloud-microphysics does not run

./setup.sh runs under directories inference-engine and inference-engine/cloud-microphysics

However ./build/run-fpm.sh run train-cloud-microphysics -- --base training --epochs 10 --start 720

does not run in inference-engine/cloud-microphysics

Train additional Thompson microphysics procedure proxies

The [learn-microphysics-procedures.f90] example demonstrates how to train a neural network to model functions in the mp_thompson.f90 module. A possible next step could be modeling the rest of the procedures in the same module. The full set of procedures is below.

Functions

Subroutines

SUBROUTINE mp_gt_driver(
- arguments:

    qv, qc, qr, qi, qs, qg, ni, nr, &
    th, pii, p, dz, dt_in, itimestep, &
    RAINNC, RAINNCV, &
    SNOWNC, SNOWNCV, &
    GRAUPELNC, GRAUPELNCV, SR, &
    ids,ide, jds,jde, kds,kde, &             ! domain dims
    ims,ime, jms,jme, kms,kme, &             ! memory dims
    its,ite, jts,jte, kts,kte               ! tile dims

subroutine mp_thompson
- arguments:

     qv1d, qc1d, qi1d, qr1d, qs1d, qg1d, ni1d, &
     nr1d, t1d, p1d, dz1d, &
     pptrain, pptsnow, pptgraul, pptice, &
     kts, kte, dt, i, j

subroutine qr_acr_qg
- arguments: none
- reads/writes: qr_acr_qg_mpt.dat
- module variables accesses: ____
subroutine qr_acr_qs
- arguments: none
- reads/writes qr_acr_qs_mpt.dat
- module variables accesses: ____
subroutine freezeH2O
- arguments: none
- reads/writes: freezeH2O_mpt.dat
- module variables accessed: ____
subroutine qi_aut_qs
- arguments: none
- no reads or writes
- module variables accessed: ___
subroutine table_Efrw
- arguments: none
- reads/writes: none
- module variables accessed: ___
subroutine table_Efsw
- arguments: none
- reads/writes: none
- module variables accessed: ___
subroutine table_dropEvap
- arguments: none
- reads or writes: none
- module variables accessed: ___
SUBROUTINE GCF
- arguments: GAMMCF, A, X , GLN
- reads or writes: none
- module variables accessed: ___
SUBROUTINE GSER - arguments: GAMSER, A, X, GLN`
- reads or writes: none
- module variables accessed: ___

Segmentation fault when running examples on main branch

For some reason, when I clone the main branch, I get a segmentation fault when I run the examples (e.g. learn-addition) using gfortran version 14.1.0.

fpm run --example learn-addition --profile release --flag "-fopenmp" -- --output-file "test-addition-output"

berkeleylab / inference-engine Goto Github PK

inference-engine's Introduction

Inference-Engine

Table of contents

Overview

Build and Test

GNU (gfortran) 13 or higher required

Intel (ifx)

Experimental: Automatic offloading of do concurrent to GPUs

LLVM (flang-new)

NAG (nagfor)

HPE (crayftn.sh) -- under development