Giter Site home page Giter Site logo

3dem / relion Goto Github PK

View Code? Open in Web Editor NEW
425.0 52.0 194.0 59.93 MB

Image-processing software for cryo-electron microscopy

Home Page: https://relion.readthedocs.io/en/latest/

License: GNU General Public License v2.0

CMake 0.55% C 2.36% C++ 83.85% Shell 0.04% Cuda 12.58% Python 0.56% Makefile 0.01% HTML 0.07%
microscopy cryo-em hpc cuda image-processing biomolecular-dynamics molecular-biology molecular-structures atomistic-models ctf high-performance-computing graphics-programming gui microscope optimization-algorithms maximum-likelihood likelihood relion regularization nvidia

relion's Introduction

RELION 4.0.1

RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy. It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology.

The underlying theory of MAP refinement is given in a scientific publication. If RELION is useful in your work, please cite this paper.

The more comprehensive documentation of RELION is stored here.

Installation

More extensive options and configurations are available here, but the outlines to clone and install relion for typical use are made easy through cmake.

On Debian or Ubuntu machines, installing cmake, the compiler, and additional dependencies (mpi, fftw) is as easy as:

sudo apt install cmake git build-essential mpi-default-bin mpi-default-dev libfftw3-dev libtiff-dev libpng-dev ghostscript libxft-dev

RedHat-like systems (CentOS, RHEL, Scientific Linux etc) use yum package manager:

sudo yum install cmake git gcc gcc-c++ openmpi-devel fftw-devel libtiff-devel libpng-devel ghostscript libXft-devel libX11-devel

Once git and cmake are installed, relion can be easily installed through:

git clone https://github.com/3dem/relion.git
cd relion
git checkout master # or ver4.0; see below
mkdir build
cd build
cmake ..
make

By performing git checkout ver4.0 instead of git checkout master, you can access the latest (developmental) updates for RELION 4.0.x. The code there is not tested as throughfully as that in the master branch and not generally recommended.

The binaries will be produced in the build/bin directory. If you want to copy binaries into somewhere else, run cmake with -DCMAKE_INSTALL_PREFIX=/where/to/install/ and perform make install as the final step. Do not specify the build directory itself as CMAKE_INSTALL_PREFIX! This will not work.

Also note that the MPI library used for compilation must be the one you intend to use RELION with. Compiling RELION with one version of MPI and running the resulting binary with mpirun from another version can cause crash. See our wiki below for details.

In any case, you have to make sure your PATH environmental variable points to the directory containing relion binaries. Launching RELION as /path/to/relion is NOT a right way; this starts the right GUI, but the GUI might invoke other versions of RELION in the PATH.

If FLTK related errors are reported, please add -DFORCE_OWN_FLTK=ON to cmake. For FFTW related errors, try -DFORCE_OWN_FFTW=ON.

RELION also requires libtiff. Most Linux distributions have packages like libtiff-dev or libtiff-devel. Note that you need a developer package. You need version 4.0.x to read BigTIFF files. If you installed libtiff in a non-standard location, specify the location by -DTIFF_INCLUDE_DIR=/path/to/include -DTIFF_LIBRARY=/path/to/libtiff.so.5.

Updating

RELION is intermittently updated, with both minor and major features. To update an existing installation, simply use the following commands

cd relion
git pull
cd build
make
make install # Only when you have specified CMAKE_INSTALL_PREFIX in the cmake step

If something went wrong, remove the build directory and try again from cmake.

Class Ranker

The default model for the class ranker has been trained and tested in Python 3.9.12 with Pytorch 1.10.0 and Numpy 1.20.0. If you wish to retrain the class ranker model with your own data, please refer to this repo.

relion's People

Contributors

andschenk avatar arom4github avatar aromnvidia avatar bbockelm avatar bforsbe avatar biochem-fan avatar charlescongdon avatar colinpalmer avatar dkimanius avatar do-jason avatar dtegunov avatar huwjenkins avatar joton avatar js947 avatar kiyotune avatar kiyotune-ipr avatar lqhuang avatar martin-g avatar martinsalinas98 avatar mattiadanza avatar mcianfrocco avatar pmargara avatar scheres avatar smsaladi avatar tjragan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

relion's Issues

Error in 2D classification using GPU not CPU

Originally reported by: Marcus Fislage (Bitbucket: mfislage, GitHub: mfislage)


I was able to successfully import a micrograph STAR file with CTF information, and autopick star and box files. I then was able to extract both unbinned and binned particles. During 2D classification with either set, however, when GPU is enabled I receive the following error after "Estimating initial noise spectra" is completed:
"Projector::get2DSlice%%ERROR: Dimension of the data array should be 2 or 3
File: Relion2/install/directory/src/projectorh.h line:224"
With CPU only, the job runs fine.
Graphics card is GTX970, Cuda is 7.5, and nvidia drivers are 352.39


Symbolic links confuse Import particle coordinates function

Originally reported by: Matt Iadanza (Bitbucket: attamatti, GitHub: attamatti)


I made a symbolic link to an old micrographs folder that contains my micrographs and particle coordinate files.

When I imported the coordinate stars rather than putting the files in:
Import/job001/micrographs/

it put them in:
Import/job001/path/to/symbolic/link/micrographsmicrographs/

although the inupt coordinates file still lists them as in Import/job001/micrographs/

I don't imagine people would normally be doing this for coordinate files, but we do routinely use symbolic links to avoid making multiple copies of our micrographs.


librelion_lib.so error

Originally reported by: Matt Iadanza (Bitbucket: attamatti, GitHub: attamatti)


Everything appeared to go as expected with the install but I'm getting the following error when trying to run:

relion2-beta/build/lib/librelion_lib.so: undefined symbol: _ZN6Fl_BoxC2EiiiiPKc

cmake appeared to run without errors
make ran with only this warning:

configure: WARNING: Ignoring libraries " -lSM -lICE" requested by configure.

system info:
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.7 (Final)
Release: 6.7
Codename: Final


Compilation issues under Centos 7

Originally reported by: AndrewPurkiss (Bitbucket: AndrewPurkiss, GitHub: Unknown)


I'm attempting to install running under Centos 7. So far, I have come across a couple of issues.

  1. I've installed cmake 2.8.12 from source, as Centos 7 only has 2.8.11. I also tried the following with cmake3
  2. The MPI compilers are not found with cmake as below

$ cmake ../
-- BUILD TYPE set to the default type: 'Release'
-- Setting fallback CUDA_ARCH=35
-- Setting cpu precision to double
-- Setting gpu precision to single
-- Using cuda wrapper to compile....
-- Cuda version is >= 7.5 and single-precision build, enable double usage warning.
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
Could NOT find MPI_C (missing: MPI_C_LIBRARIES MPI_C_INCLUDE_PATH)
Call Stack (most recent call first):
/usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake/Modules/FindMPI.cmake:587 (find_package_handle_standard_args)
CMakeLists.txt:168 (find_package)

-- Configuring incomplete, errors occurred!
See also "/home/purkis01/2016/Software/Relion2/relion2-beta/build/CMakeFiles/CMakeOutput.log".

I have tried setting the environment variables
MPI_C_COMPILER /usr/lib64/openmpi/bin/mpicc
MPI_CXX_COMPILER /usr/lib64/openmpi/bin/mpicxx

but the CMakeLists.txt file seems to ignore these and I get
//Path to a program.
MPI_C_COMPILER:FILEPATH=MPI_C_COMPILER-NOTFOUND

in the CMakeCache.txt file

When I enter the mpicc and mpicxx full paths into the CMakeCache.txt file, then
cmake ../
completes fine.

  1. running make -j 8 then fails with a linking error with librelion_gpu_util.so

$ make -j 8
[ 1%] [ 2%] [ 3%] [ 5%] [ 6%] [ 6%] [ 7%] Scanning dependencies of target copy_scripts
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir//gpu_utils/cuda_kernels/./relion_gpu_util_generated_helper.cu.o
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir/
/gpu_utils/./relion_gpu_util_generated_cuda_autopicker.cu.o
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir//gpu_utils/./relion_gpu_util_generated_cuda_benchmark_utils.cu.o
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir/
/gpu_utils/./relion_gpu_util_generated_cuda_backprojector.cu.o
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir//gpu_utils/./relion_gpu_util_generated_cuda_helper_functions.cu.o
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir/
/gpu_utils/./relion_gpu_util_generated_cuda_projector.cu.o
Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir//gpu_utils/./relion_gpu_util_generated_cuda_ml_optimiser.cu.o
[ 7%] Built target copy_scripts
[ 8%] Building NVCC (Device) object src/apps/CMakeFiles/relion_gpu_util.dir/
/gpu_utils/./relion_gpu_util_generated_cuda_projector_plan.cu.o
Scanning dependencies of target relion_gpu_util
Linking CXX shared library ../../lib/librelion_gpu_util.so
Error running link command: No such file or directory
make[2]: *** [lib/librelion_gpu_util.so] Error 2
make[1]: *** [src/apps/CMakeFiles/relion_gpu_util.dir/all] Error 2
make: *** [all] Error 2


continue run crash

Originally reported by: bzuber (Bitbucket: bzuber, GitHub: Unknown)


If you forget giving an optimiser starfile when continuing a 3D classification job, relion crashes with the following error in the terminal:
Warning: invalid optimiser.star filename provided for continuation run:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted (core dumped)

An error popup in relion, would be better.


Helical autopicking failure

Originally reported by: yjin004 (Bitbucket: yjin004, GitHub: Unknown)


I have manually picked 10 micrograhs got apoximiately 120 filaments. I used particle extraction ending with ~1200 particles
I ran 2D classification and tried to use the classes.mrc from iteration 25 as autopicking template.

Relion run crashes without a reason

*** The command is:
which relion_autopick_mpi --i CtfFind/job003/micrographs_ctf.star --ref Class2D/job007/run_it025_classes.mrcs --odir AutoPick/job012/ --pickname autopick --invert --ctf --ang 5 --shrink 1 --lowpass 20 --angpix 1.31 --angpix_ref 1.31 --threshold 0.4 --min_distance 150 --max_stddev_noise 1.1 --helix --helical_tube_outer_diameter 96 --helical_tube_kappa_max 0.1 --helical_tube_length_min -1
echo CtfFind/job003/micrographs_ctf.star > AutoPick/job012/coords_suffix_autopick.star

the run.err

[localhost.localdomain:04415] 3 more processes have sent help message help-orte-odls-base.txt / orte-odls-base:could-not-kill
[localhost.localdomain:04415] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages


ctffind crash

Originally reported by: bzuber (Bitbucket: bzuber, GitHub: Unknown)


Not sure if this is a relion issue and how often it would happen.
I ran ctffind4 through relion on the betagal dataset. One ctffind run crashed.

relion command is:
which relion_run_ctffind_mpi --i MotionCorr/movie_correction/corrected_micrographs.star --o CtfFind/job006/ --CS 2 --HT 300 --AmpCnst 0.1 --XMAG 10000 --DStep 3.54 --Box 512 --ResMin 30 --ResMax 7.1 --dFMin 5000 --dFMax 50000 --FStep 500 --dAst 100 --ctffind_exe "/usr/bin/ctffind4 --omp-num-threads 1 --old-school-input" --ctfWin -1 --only_do_unfinished

run standard error is:
WARNING: there was an error in executing: csh CtfFind/job006/Micrographs/Falcon_2012_06_12-15_53_09_0_ctffind3.com
WARNING: cannot find line with Final values in CtfFind/job006/Micrographs/Falcon_2012_06_12-15_53_09_0_ctffind3.log
WARNING: skipping, since cannot get CTF values for MotionCorr/job004/Micrographs/Falcon_2012_06_12-15_53_09_0.mrc

this faulty process logfile is:
** Welcome to CTFFind **
Version - 4.0.16

               Input Mode: Batch
              Date & Time: 2016-07-01 13:40:00

Copyright 2015 Howard Hughes Medical Institute. All rights reserved.
Use is subject to Janelia Farm Research Campus Software Copyright 1.1
license terms ( http://license.janelia.org/license/jfrc_copyright_1_1.html )

CS[mm], HT[kV], AmpCnst, XMAG, DStep[um]
2.0 300.0 0.10 10000.0 3.540

Total execution time : 0 seconds

2016-07-01 13:40:00: Fatal error (FileCopyRaw): Source file does not exist: .CTFFind_ikIiBXMT8XU7pCXT

this process com file is:
#!/usr/bin/env csh
/usr/bin/ctffind4 --omp-num-threads 1 --old-school-input > CtfFind/job006/Micrographs/Falcon_2012_06_12-15_53_09_0_ctffind3.log << EOF
CtfFind/job006/Micrographs/Falcon_2012_06_12-15_53_09_0.mrc
CtfFind/job006/Micrographs/Falcon_2012_06_12-15_53_09_0.ctf
2, 300, 0.1, 10000, 3.54
512, 30, 7.1, 5000, 50000, 500, 100
EOF

I greped the weird file name from the logfile (.CTFFind_ikIiBXMT8XU7pCXT) on all file in the directory. three log files in addition to the one above contained that name but only had a warning and no fatal error. Here is one of them:

              **  Welcome to CTFFind  **
                   Version - 4.0.16

               Input Mode: Batch
              Date & Time: 2016-07-01 13:40:00

Copyright 2015 Howard Hughes Medical Institute. All rights reserved.
Use is subject to Janelia Farm Research Campus Software Copyright 1.1
license terms ( http://license.janelia.org/license/jfrc_copyright_1_1.html )

CS[mm], HT[kV], AmpCnst, XMAG, DStep[um]
2.0 300.0 0.10 10000.0 3.540

**warning(FileDelete): attempt to delete file which does not exist: .CTFFind_ikIiBXMT8XU7pCXT

Summary information for file CtfFind/job006/Micrographs/Falcon_2012_06_12-15_56_10_0.mrc
Number of columns, rows, sections: 1950 1950 1
MRC data mode: 2
Bit depth: 32
Pixel size: .000 .000 .000

Working on micrograph 1 of 1

SEARCHING CTF PARAMETERS...

100% [==============================] done!

  DFMID1      DFMID2      ANGAST          CC
30000.00    30000.00      -15.00    -0.01708  

REFINING CTF PARAMETERS...
DFMID1 DFMID2 ANGAST CC
30212.82 30000.00 -21.18 -0.01676 Final Values

Estimated defocus values : 30212.82 , 30000.00 Angstroms
Estimated azimuth of astigmatism: -21.18 degrees
Score : -.01676
Thon rings with good fit up to : 19.9 Angstroms

Summary of results : CtfFind/job006/Micrographs/Falcon_2012_06_12-15_56_10_0.txt
Diagnostic images : CtfFind/job006/Micrographs/Falcon_2012_06_12-15_56_10_0.ctf
Detailled results, including 1D fit profiles: CtfFind/job006/Micrographs/Falcon_2012_06_12-15_56_10_0_avrot.txt
Use this command to plot 1D fit profiles : ctffind_plot_results.sh CtfFind/job006/Micrographs/Falcon_2012_06_12-15_56_10_0_avrot.txt

Total execution time : 4 minutes and 7 seconds
2016-07-01 13:44:08 : CTFFind finished cleanly.

Cheers
Ben


Refinement crashes with min_diff2=nan on GPU, not CPU

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


I'm executing

#!bash
mpirun -n 3 `which relion_refine_mpi` --o test/run1 --auto_refine --split_random_halves --i particles.star --ref ref.mrc --ini_high 5 --dont_combine_weights_via_disc --preread_images --pool 1  --ctf --ctf_corrected_ref --particle_diameter 190 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 5 --auto_local_healpix_order 5 --offset_range 1 --offset_step 0.5 --sym D7 --low_resol_join_halves 40 --norm --scale --j 4 --gpu

on this sample data set. It crashes saying

#!bash
ipart= 0 adaptive_fraction= 0.999
 min_diff2= nan
Dumped data: error_dump_pdf_orientation, error_dump_pdf_orientation and error_dump_unsorted.
filteredSize == 0

If I run with --firstiter_cc, it crashes on the second iteration, otherwise on the first. If I run without --gpu, everything is fine.


Relion MPI setup of GPUs in a multi node GPU cluster is inefficient

Originally reported by: Bharat Reddy (Bitbucket: barureddy, GitHub: barureddy)


We have a few gpu nodes on a cluster where they range from 2-4 GPUs/node. On a single 2 GPU node it is easy to request 3 mpi processes for 3D classification and have it allocate a master process to a CPU and then a slave process to each GPU. This is not the case when I experimented with two 2 GPU nodes. In order to test this, I tried two different options where I request 2 full nodes and either followed your recommendations and ran 5 MPI processes or tried something different and ran 6 mpi process. With the 5 MPI process option, I get the first node running the master process and then a GPU running two mpi processes with the other GPU idle. The second node has 2 mpi processes running on one GPU and the other GPU running idle. With the 6 MPI process option, I get the first node running the master process and then a GPU running 1 mpi processes with the other GPU idle. The second node has 1 mpi process running on one GPU while the other has 3 mpi processes.

Could it be I am not requesting the right number of mpi processes? How do you deal with requesting an odd number of mpi processes on multiple nodes?

I have not tried to manually assign the GPUs yet.

Attached are the log files for the 5 and 6 mpi jobs. I did not let the jobs run to completion and stopped them after several iterations. Also attached is the sbatch script I used to submit my jobs to the node. The only thing I change between the 5 and 6 mpi jobs is the numerical value of X in mpirun -n X ... .


Submit particle subtraction to queue causes segmentation fault

Originally reported by: Callum Smits (Bitbucket: c_smits, GitHub: Unknown)


Hi,

When I try to submit a particle subtraction job to queue RELION crashes with a segmentation fault (login node process limit is 2gb, hence trying to submit to queue). The same job run on an interactive qsub session works fine.

Relevant info:
Cent OS 6.7, Intel compilers 15.0.3.187, cuda 7.5, openmpi 1.8.7, fftw 3.3.4, forced own FLTK


ERROR: Adding an empty nodename

Originally reported by: bzuber (Bitbucket: bzuber, GitHub: Unknown)


I forgot to give an input micrograph starfile to run CTFFIND4. Relion crashes when I click run now. The error in the terminal (see below) is not very helpful. Relion should detect that an input starfile is missing. I got the error when I was trying to run a single process, as well as several mpi processes on my local workstation. When I then gave a input starfile, it ran fine.

PipeLine::addNode ERROR: Adding an empty nodename. Did you fill in all Node names correctly?
File: /usr/local/relion2-beta/src/pipeliner.cpp line: 30

The command is :
which relion_run_ctffind_mpi --i --o CtfFind/job007/ --CS 2 --HT 300 --AmpCnst 0.1 --XMAG 10000 --DStep 3.54 --Box 512 --ResMin 30 --ResMax 7.1 --dFMin 5000 --dFMax 50000 --FStep 500 --dAst 100 --ctffind_exe "/usr/bin/ctffind4 --omp-num-threads 1 --old-school-input" --ctfWin -1


kill job

Originally reported by: bzuber (Bitbucket: bzuber, GitHub: Unknown)


Would adding a "kill job" button make sense? Of course one can do it from the terminal but maybe doing it within the gui would allow to associate some tasks with it, like delete temp files for example.


Segfault in relion_refine_mpi with --firstiter_cc and --gpu

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


I hope it's an actual bug this time ;-)

I'm running 3D refinement using

#!bash
mpirun -n 3 `which relion_refine_mpi` --o RefineInitial/run1 --auto_refine --split_random_halves --i particles.star --ref emd_2984_280.mrc --firstiter_cc --ini_high 30 --dont_combine_weights_via_disc --pool 3 --ctf --ctf_corrected_ref --particle_diameter 200 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 10 --offset_step 2 --sym D2 --low_resol_join_halves 40 --norm --scale  --j 1 --gpu

(template created in GUI, names modified and launched in terminal), and it crashes saying

#!bash
KERNEL_ERROR: invalid argument in /home/dtegunov/Desktop/relion2beta/src/gpu_utils/cuda_helper_functions.cu at line 598 (error-code 11)
[dtegunov:09959] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10d10)[0x7f72177d9d10]
[dtegunov:09959] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(raise+0x29)[0x7f72177d9bd9]
[dtegunov:09959] [ 2] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_gpu_util.so(_Z20runDiff2KernelCoarseR19CudaProjectorKernelPfS1_S1_S1_S1_S1_S1_R21OptimisationParamtersP11MlOptimisermiiiiiP11CUstream_stb+0x9ce)[0x7f7216a4e5ae]
[dtegunov:09959] [ 3] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_gpu_util.so(_Z30getAllSquaredDifferencesCoarsejR21OptimisationParamtersR18SamplingParametersP11MlOptimiserP15MlOptimiserCudaR13CudaGlobalPtrIfLb1EE+0x13d0)[0x7f7216a583e0]
[dtegunov:09959] [ 4] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_gpu_util.so(_ZN15MlOptimiserCuda32doThreadExpectationSomeParticlesEi+0x2ea9)[0x7f7216a68779]
[dtegunov:09959] [ 5] /home/dtegunov/Desktop/relion2beta/build/lib/librelion_lib.so(_Z11_threadMainPv+0x1d)[0x7f721867639d]
[dtegunov:09959] [ 6] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa)[0x7f72177d06aa]
[dtegunov:09959] [ 7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f7217505e9d]

It doesn't crash on the GPU if I remove --firstiter_cc, and the CPU version runs fine with --firstiter_cc. Not sure if I can provide my test data due to its size, but maybe there are some debug flags I can set that will give you more information to work with?


Trouble with installing on a PC

Originally reported by: yjin004 (Bitbucket: yjin004, GitHub: Unknown)


I am having trouble with installing relion 2.0 on a PC. I have installed relion 2.0 on a workstation without trouble.

It always shows error at this step

[ 52%] Linking CXX shared library ../../lib/librelion_lib.so
/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libfftw3.a: could not read symbols: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [lib/librelion_lib.so] Error 1
make[1]: *** [src/apps/CMakeFiles/relion_lib.dir/all] Error 2
make: *** [all] Error 2


Using non-default compilers

Originally reported by: Jon Diprose (Bitbucket: well-jon, GitHub: Unknown)


cmake appears to determinedly ignore both PATH and CC environment variables when deciding what compiler to use. It might be useful to others if you could add a section to your installation documentation that suggests using something like:

cmake -DCMAKE_C_COMPILER=which gcc -DCMAKE_CXX_COMPILER=which g++ ...

to get the behaviour that I was naively expecting.


Non-CUDA build failure

Originally reported by: Jon Diprose (Bitbucket: well-jon, GitHub: Unknown)


I think this is similar to issue #3 except for a non-CUDA build. Firstly, src/ml_model.cpp won't build without CUDA - see the attached rescomp-make.log and relion-2.0b-ml_model-no-cuda.patch that makes the problem go away by hiding the cuda_skunks.cuh-requiring code in #ifdef CUDA blocks.
However, it then fails to link because it fails to find -lrelion_gpu_util - see the attached rescomp-make-2.log. Suggested fix is relion-2.0b-CMakeLists-no-cuda.patch, which checks CUDA_FOUND to see if relion_gpu_util should be on the list of target_link_libraries.
After that it needs re-cmake-ing, so a complete rebuild - see rescomp-configure-3.log and rescomp-make-3.log - and builds for me! Hurrah.


Build failure with -DFORCE_OWN_FLTK=ON

Originally reported by: Jon Diprose (Bitbucket: well-jon, GitHub: Unknown)


Scanning dependencies of target FLTK
[ 1%] Creating directories for 'FLTK'
[ 2%] Performing download step (download, verify and extract) for 'FLTK'
-- downloading...
src='https://drive.google.com/uc?export=download&id=0B942d76zVnSeazZWcExRaXIyVDg'
dst='/apps/well/relion/2.0b-20160622-gcc4.9.3/external/fltk/fltk-1.3.3-source.tar.gz'
timeout='none'
CMake Error at /apps/well/relion/2.0b-20160622-gcc4.9.3/build/FLTK-prefix/src/FLTK-stamp/download-FLTK.cmake:9 (file):
file DOWNLOAD HASH mismatch

for file: [/apps/well/relion/2.0b-20160622-gcc4.9.3/external/fltk/fltk-1.3.3-source.tar.gz]
  expected hash: [9ccdb0d19dc104b87179bd9fd10822e3]
    actual hash: [d41d8cd98f00b204e9800998ecf8427e]

make[2]: *** [FLTK-prefix/src/FLTK-stamp/FLTK-download] Error 1
make[1]: *** [CMakeFiles/FLTK.dir/all] Error 2
make: *** [all] Error 2


Automatic switch from GPU to CPU when out of memory

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


Having just crashed during a final refinement iteration due to insufficient GPU memory (on a 980, not Titan X), I'm wondering if implementing an automatic switch could save some users from the trouble of continuing from the last successful iteration with --gpu off. Since optimiser.star doesn't have rlnHasConverged = 1 yet, doing so adds one unnecessary refinement iteration before the final iteration is run, taking quite a bit of time on a CPU.


CUDA 8.0 RC "thrust" bug for GTX 1070/1080 users

Originally reported by: Lu Gan (Bitbucket: lu_gan, GitHub: Unknown)


Dear RELION 2 beta testers,

This issue might only apply to those who built a system similar to ours:
Ubuntu 14.04.4 LTS
2x GTX 1070
CUDA 8.0 release candidate

You might get an error that looks like this during the make step:

error: no default constructor exists for class "thrust::detail::execute_with_allocator<AllocatorThrustWrapper, thrust::system::cuda::detail::execute_on_stream_base>"

It turns out there's a problem with the thrust library of the CUDA 8 release candidate, which surfaced a few days ago:

NVIDIA/thrust#800

You can download the patched thrust files here:

https://github.com/thrust/thrust

After doing this library surgery, I was able to build RELION 2 normally and run 3D classification & auto-refine successfully.

Cheers,
Lu


--pool automatically set to 1 in combination with --preread_images

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


No matter what I set in the GUI, when I press 'print command', it sets --pool 1 if --preread_images is set. I understand that --pool is meant to reduce the number of disk accesses, and that aspect becomes irrelevant when all images are already in memory. However, doesn't it also provide better load balancing within each MPI process? I'm getting better performance if I set it to 30, despite --preread_images. Maybe the default shouldn't be 1?


lock or crash in 2D classification

Originally reported by: craigyk (Bitbucket: craigyk, GitHub: craigyk)


I was running a larger 2D classification job (170k particles) over 4 nodes (each node has a K40) and it looks like it hit some errors 8 iterations in. I've attached the files it wrote out, and the output to stderr is below:

In thread 0
Dumped data: error_dump_pdf_orientation, error_dump_pdf_orientation and error_dump_unsorted.
filteredSize == 0
File: /eppec/storage/sw/relion/2.0-beta/src/gpu_utils/cuda_ml_optimiser.cu line: 1552

In thread 0

exp_fn_img= 000001@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000002@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000003@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000004@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000005@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000006@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000007@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000008@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000009@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs
000010@Extract/job018/frames/15jul18a_b_00007gr_00008sq_v01_00002hl_00002en.frames_b_a.mrcs

ipart= 0 adaptive_fraction= 0.999
min_diff2= 3124.7
Dumped data: error_dump_pdf_orientation, error_dump_pdf_orientation and error_dump_unsorted.
filteredSize == 0
File: /eppec/storage/sw/relion/2.0-beta/src/gpu_utils/cuda_ml_optimiser.cu line: 1552

In thread 0


Unnecessary warning using motioncorr via relion

Originally reported by: Marcus Fislage (Bitbucket: mfislage, GitHub: mfislage)


When I run motioncorr via Relion I use under other motioncorr arguments: -hgr 0 -pbx 150 -fod 10 -fgr /catalina.F30/Image/16may08_ref_a/rawdata/16jun13a_13105011_07_7676x7420_norm_1.mrc

This results in a warning
WARNING: Option -hgr 0 -pbx 150 -fod 10 -fgr /catalina.F30/Image/16may08_ref_a/rawdata/16jun13a_13105011_07_7676x7420_norm_1.mrc is not a valid RELION argument

However the output seems fine and the logfile of each micrograph shows that motioncorr used my additional commands.


CTFFIND4 compatibility

Originally reported by: craigyk (Bitbucket: craigyk, GitHub: craigyk)


Hi,

I don't know if this is a CTFFIND4 version compatibility thing, but I've run into some problems with the run_ctffind on our version of CTFFIND4 4.0.16.

I've fixed these problems and could submit a pull request. It essentially boiled down to using bash instead of csh in the wrapper, and tweaking the delimiter and word counts used for parsing the output file.


Relion_star_scripts incorrect permissions

Originally reported by: AndrewPurkiss (Bitbucket: AndrewPurkiss, GitHub: Unknown)


Dear All,

Just a minor one, reported by a user.

The following scripts, in the bin directory, only get -rw-r--r-- permissions set by the installation:

relion_star_datablock_ctfdat
relion_star_datablock_singlefiles
relion_star_datablock_stack
relion_star_loopheader
relion_star_plottable
relion_star_printtable

as well as
relion_qsub.csh

I would have expected them to have execute permission as well, especially as some of them call others in the list.

Thanks,

Andrew


Enable avx in local fftw build

Originally reported by: Jon Diprose (Bitbucket: well-jon, GitHub: Unknown)


The build framework currently doesn't enable avx for the local build of fftw. This is done via fftw's configure's '--enable-avx' argument. Whilst I'm sure cmake can be told to figure it out for itself, the attached patch adds a cmake variable FFTW_ENABLE_AVX and uses it to add '--enable-avx' to the configure arguments when set to ON. To use:

cmake -DFORCE_OWN_FFTW=ON -DFFTW_ENABLE_AVX=ON ...

I use the analyze_x86.pl script to check what instructions have been generated. Before the change, libfftw3.so.3.4.4 contained no avx instructions and afterwards I see 55657.


GPU Relion hangs - 2D classification on 380K particles.

Originally reported by: Harry Kao (Bitbucket: yk2385, GitHub: Unknown)


Hi,

This could be the same or related to issue #24. Our hardware (512GB sys ram, 4 X Titan X 12 GB ram each)

Our Relion 2D classification ran fine for about 8 iterations then starts to hang at the beginning of a new iteration. Continue run would work for one or two more iteration then hangs again. If 'continue run' could work, does this mean the data in memory differs from read data at the beginning of each run?

Here is part of the screen error output:

exp_fn_img= 000005@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000006@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000007@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000008@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs

ipart= 0 adaptive_fraction= 0.999
min_diff2= 7772.56

exp_fn_img= 000005@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000006@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000007@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000008@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs

ipart= 0 adaptive_fraction= 0.999
min_diff2= 7879.3

exp_fn_img= 000005@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000006@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000007@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs
000008@Particles/mics/June13/Jun13_15.46.01.tif_dosefilter_noise_june13_autopick.mrcs

ipart= 0 adaptive_fraction= 0.999
min_diff2= 7766.62
Dumped data: error_dump_pdf_orientation, error_dump_pdf_orientation and error_dump_unsorted.
filteredSize == 0
File: /software/relion-2.0b/src/gpu_utils/cuda_ml_optimiser.cu line: 1552

In thread 0
Dumped data: error_dump_pdf_orientation, error_dump_pdf_orientation and error_dump_unsorted.
filteredSize == 0
File: /software/relion-2.0b/src/gpu_utils/cuda_ml_optimiser.cu line: 1552

In thread 2
Dumped data: error_dump_pdf_orientation, error_dump_pdf_orientation and error_dump_unsorted.
filteredSize == 0
File: /software/relion-2.0b/src/gpu_utils/cuda_ml_optimiser.cu line: 1552

Here is the log output when Relion hangs:
Expectation iteration 17 of 50
000/??? sec ~~(,_,"> [oo]Abort is in progress...hit ctrl-c again within 5 seconds to forcibly terminate


opening logfile.pdf

Originally reported by: bzuber (Bitbucket: bzuber, GitHub: Unknown)


This is probably more an issue of how my system is configured (default application for pdf viewing, or I don't have the right pdf viewer installed). I cannot open logfile.pdf after polishing through the GUI. The following error gets displayed in the terminal:
sh: 1: ������: not found

What is the command that relion sends to open the pdf file?


Ignoring non-existent motion-corrected images in gCTF

Originally reported by: Mazbit (Bitbucket: Mazbit, GitHub: Unknown)


Hi guys,

I come across a tiny issue. My test data deliberately contains awful movies of broken ice, for which MOTIONCORR fails without producing a corrected image --> good!

Nevertheless, corrected_micrographs.star includes the non-existing images and produces an error with gCTF. Perhaps corrected_micrographs.star could be generated based on actual MOTIONCORR outputs.

Thanks
Mazdak


GPU job from RELION not accepted on older card (sm 3.5)

Originally reported by: AndreHeuer (Bitbucket: Xenoprime, GitHub: Xenoprime)


We have an older graphics card in a workstation which should still be able to do a good job for non-titan data-sets.

Card:

  • Name / compute version / memory
  • GeForce GTX 780 Ti / 3.5 / 3020

Despite no problems during build we were unable to run any GPU jobs from RELION .

Problem:

  • Job is submitted but hangs after GPU initialization
  • 2Dclassification job does not crash but just "hangs" after "Estimating initial noise spectra"
  • 2Dclassification job does not continue into "Estimating accuracies in the orientational assignment ... " (2D classif)

Note:

  • other GPU calculations (eg. GCTF) work on this machine/setp (also submission from inside RELION )
  • we are able to run RELION -2-beta with using GPUs on other cards with compute version 5.2

We tried to corner the problem - with no positive result:

  • enforced "cmake -DCUDA_ARCH=35"
  • small amount of particles with small box (1000 particles 140 pix)
  • classical CPU 2D-classification works (so data is fine)
  • "waiting" ... job not received by GPU after 12h+
  • submitting to 2 or 1 of the present GPUs did not make any difference
  • nvidia-smi does not show any RELION process running or changes in memory usage

Question:

  • have there been other issues with older cards?
  • is there something we can to find the problem

Suggestion:

  • It would be nice to make RELION understand or check whether a job is not accepted/starting on the GPU and cause an error / abord the job.

More detailed description of where the GPU job hangs / system:

  • === RELION MPI setup ===
    • Number of MPI processes = 2
    • Master (0) runs on host = cryosun
    • Slave 1 runs on host = cryosun
  • =================
  • Running CPU instructions in double precision.
    • WARNING: Changing psi sampling rate (before oversampling) to 5.625 degrees, for more efficient GPU calculations
  • Estimating initial noise spectra
  • 000/??? sec ~~(,_,"> [oo]
  • 1/ 1 sec ...........~~(,_,">
  • uniqueHost cryosun has 1 ranks.
  • Using explicit indexing on slave 0 to assign devices 0
  • Thread 0 on slave 1 mapped to device 0
  • __ no further error or feedback from here on __

Top:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13901 heuer 20 0 327m 22m 7856 R 99.9 0.0 0:41.34 relion_refine_mpi --o Class2D/j..
13903 heuer 20 0 72.3g 18m 9224 R 99.9 0.0 0:41.34 relion_refine_mpi --o Class2D/j..
13902 heuer 20 0 72.3g 20m 9312 R 99.5 0.0 0:41.31 relion_refine_mpi --o Class2D/j..

Simple GPU info query:

You have 2 nVidia GPGPU.

  • DeviceID Name Version Memory(Mb)
  • 0 GeForce GTX 780 Ti 3.5 3020
  • 1 GeForce GTX 780 Ti 3.5 3020 Has Monitor

Import browse button not returning correct relative directories

Originally reported by: Callum Smits (Bitbucket: c_smits, GitHub: Unknown)


Hi,

I setup a directory structure:
betagalTest/
betagalTest_r2/
betagal_r2/

The betagalTest contained files from the 1.4 tutorial and I was importing files into RELION2 projects (the two directories ending _r2). Using the import task, clicking browse and navigating to betagalTest/ generated the correct path from the betagal_r2 directory (../betagalTest/whateverfile) but an incorrect relative path when run from the betagalTest_r2 directory (../whateverfile).

For relevant system info, see issue #20


Problem with linux workstation install

Originally reported by: David James Gill (Bitbucket: davjgill, GitHub: Unknown)


I am having trouble installing onto a linux workstation. It is crashing on the make step. It already has relion1.4 installed.

I have attached the bash shell cmake and make log outputs. This error is consistently appearing in multiple attempts to install.

Skipping the GUI install and forcing FFTW and FLTK install does not overcome the problem.


Crash during 3D refinement with --gpu

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


I referred to this first in #24, sorry it took so long to provide data.

Using this minimal test case, I execute

#!bash
mpirun -n 3 `which relion_refine_mpi` --o run2_ct4 --continue run1_it004_optimiser.star --j 4 --gpu

to make it crash saying

#!bash
 exp_fn_img= 000029@20S_028_Mar28_16.17.32_particles.mrcs
000030@20S_028_Mar28_16.17.32_particles.mrcs
000031@20S_028_Mar28_16.17.32_particles.mrcs
000032@20S_028_Mar28_16.17.32_particles.mrcs

 ipart= 0 adaptive_fraction= 0.999
 min_diff2= 15705.8

Meanwhile, executing without --gpu produces no errors. I can remove the problematic particles and finish this iteration, but during the next one different particles will cause the same problem, and so on.


Check for GCC <= 4.9 before compiling

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


I think a few help requests could be avoided in the future if cmake made sure GCC's version (though there are probably similar issues with other compilers?) is <= the maximum supported by the CUDA SDK, before make is executed. Especially when running the latter with many threads, NVCC's complaints about GCC can be hard to find.


Noise estimation accesses disk with --preread_images & --no_parallel_disc_io

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


At the beginning of a refinement, after all particles have been pre-read into RAM, the initial noise spectrum estimation fetches all the particles once again from the disk, despite having them in memory already. This occurs only when both --preread_images and --no_parallel_disc_io are set, due to line 1547 in ml_optimiser.cpp. I think it's a useful parameter combination for systems with a lot of GPU power but little RAM. Thus, it would be great if that extra time could be saved.


Non-deterministic relion_autopick on GPU

Originally reported by: Dimitry Tegunov (Bitbucket: DTegunov, GitHub: DTegunov)


Running

#!bash
`which relion_autopick` --i micrographs_mini.star --ref templates.star --odir average/ --pickname autopick --invert  --ctf  --ang 5 --shrink 1 --lowpass 20 --angpix 0.8 --angpix_ref 0.8 --particle_diameter 200  --threshold 0.3 --min_distance 100 --max_stddev_noise 1.1 --gpu

4 times yields 4 different results with 154, 159, 160, or 161 particles. micrographs_mini.star contains one 2954x3056 px micrograph; templates.star contains 7 280x280 px templates.

System:
Ubuntu 14.04,
Titan X cards,
352.79 drivers,
7.5 SDK,
Kernels compiled with sm_35


CMAKE warnings during configuration for 2.0b6

Originally reported by: monash_lancew (Bitbucket: monash_lancew, GitHub: Unknown)


Upgrading from 2.0b3 to 2.0b6 introduced the following:

CMake Warning (dev) at src/apps/CMakeLists.txt:39 (add_dependencies):
Policy CMP0046 is not set: Error on non-existent dependency in
add_dependencies. Run "cmake --help-policy CMP0046" for policy details.
Use the cmake_policy command to set the policy and suppress this warning.

The dependency target "FFTW3" of target "relion_lib" does not exist.
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at src/apps/CMakeLists.txt:43 (add_dependencies):
Policy CMP0046 is not set: Error on non-existent dependency in
add_dependencies. Run "cmake --help-policy CMP0046" for policy details.
Use the cmake_policy command to set the policy and suppress this warning.

The dependency target "FLTK" of target "relion_lib" does not exist.
This warning is for project developers. Use -Wno-dev to suppress it.


Best MPI setting for 2D classification steps

Originally reported by: yjin004 (Bitbucket: yjin004, GitHub: Unknown)


Our lab have recently installed a single GTX1080 card onto our 24 core workstation.
We have successfully set up a GPU run.
We are currently running 2D classifications with 2MPI procs and 2 threads together with 1 GPU. It ran faster than using 24 cores alone.
It is clear for 3D steps assigning more MPI procs would not help the run greatly.

However, we would like to ultilize the use of both CPU and GPU during 2D runs, which may improve speed.

What would be the recommended MPI procs/threads settings for GPU 2D runs with single GPU? Will additional MPI procs speed up the run?


Save by rlnCtfFigureOfMerit

Originally reported by: Mazbit (Bitbucket: Mazbit, GitHub: Unknown)


Hi guys,

I can sort micrographs after ctf estimation by rlnCtfFigureOfMerit through the "old school display". But the "save star with selected images" option is greyed out. I really enjoyed this feature in version 1.4 for discarding images with drift, astigmatism, strong, etc. Could you please enable this feature?

Thanks
Mazdak


2D classification helical outer diameter option warning

Originally reported by: yjin004 (Bitbucket: yjin004, GitHub: Unknown)


I am executing the following command through GUI.

which relion_refine_mpi --continue Class2D/job012/run_it005_optimiser.star --o Class2D/job012/run_ct5 --dont_combine_weights_via_disc --no_parallel_disc_io --pool 3 --iter 25 --tau2_fudge 2 --particle_diameter 120 --oversampling 1 --psi_step 10 --offset_range 5 --offset_step 2 --helical_outer_diameter 90 --bimodal_psi --sigma_psi 5 --j 1

on a helical dataset

the run continues normally but in the error box showing in red is:
The following warnings were encountered upon command-line parsing:
WARNING: Option --helical_outer_diameter is not a valid RELION argument

--helical_outer_diameter correspond to tube diameter in GUI window under helix tab.


Local install issues

Originally reported by: Robert McLeod (Bitbucket: robbmcleod, GitHub: robbmcleod)


My GPU nodes are on a cluster where I don't have sudo. The default 'make install' tries to install globally, which of course doesn't work. This is a change from how Relion has worked in the past. Here is what I changed to install it locally in my home directory:

from ~/relion2-beta/build/

cmake -DCMAKE_INSTALL_PREFIX="~/relion2-beta/" ..

make all install

Yields the executables in ~/relion2-beta/bin/ and the libraries in ~/relion2-beta/lib/

Following bugs are related to said local install:

Bug: However, relion main executable isn't copied with the new make prefix. Can be copied manually.

Bug: All of the relion_star* bash scripts need to be made executable with chmod after installation.

Bug: ~/relion2-beta/bin/relion_refine_mpi: error while loading shared libraries: libfltk_images.so.1.3: cannot open shared object file: No such file or directory

Work-around: The ftkl and fftw libraries weren't created on install. I copied them over from relion-1.4 install and then added the ~/relion2-beta/lib to my LD_LIBRARY_PATH environment variable in my submission script.

After resolution of these problems I appear to have successfully started a 2D classification on a known-good dataset on one of our GPU servers (4 Tesla K80s). It does claim to be running in double-precision mode.


pixel size setting in manual picking

Originally reported by: bzuber (Bitbucket: bzuber, GitHub: Unknown)


When doing manual picking, the popup help of pixel size parameter says that if a ctf-containing starfile is given as input, this parameter will be ignored and the pixel size will be computed from the values in the starfile. This seems to be the case when the value is positive. But when left to -1, then the micrographs appear too much low-pass filtered. So I guess a pixel size of 1 is assumed. Wouldn't it make more sense that if pixel size is set to -1, then the pixel size is set correctly (in this case 3.54)?
Ben


Test

Originally reported by: Sjors Scheres (Bitbucket: scheres, GitHub: scheres)


This is where issueswill be reported, questions may be asked, or proposals for change made be made. Please check that your questions isn't already answered in this list of issues, on the Wiki, or in the new tutorial.


GPU out of memory/ segmentation fault on Titan X

Originally reported by: Marcus Fislage (Bitbucket: mfislage, GitHub: mfislage)


Our workstation has 16 cores, 512GB ram, and 4 Titan X cards(12GB ram each)

The job (45K particles of 400x400, 3D_auto_refine) is submitted directly as follows:

mpirun -np 8 -machinefile machinefile /computer.raid/software/relion-2.0b/bin/relion_refine_mpi --o GRefine3D/grun1 --auto_refine --split_random_halves --i 45k-particles.star --pool 30 --ref TcLSU.mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --particle_diameter 386 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 3 --offset_range 10 --offset_step 4 --sym C1 --low_resol_join_halves 40 --norm --scale --j 2 --gpu 0,1,2,3 >> r2.log &

We tried different variations of number of MPI processes, threads, and --free_gpu_memory (128 to 2048), all resulted in the same segmentation fault after about 20 iterations.


Estimated memory for expectation step > 9.80245 Gb.
Estimated memory for maximization step > 17.3816 Gb.

WARNING: Ignoring required free GPU memory amount of 400 MB, due to space insufficiency.
WARNING: The available space on the GPU (170 MB) might be insufficient for the expectation step.
WARNING: Ignoring required free GPU memory amount of 400 MB, due to space insufficiency.
WARNING: The available space on the GPU (170 MB) might be insufficient for the expectation step.

mpirun noticed that process rank 4 with PID 6672 on node computer exited on signal 11 (Segmentation fault).

ERROR: out of memory in /computer.raid/software/relion-2.0b/src/gpu_utils/cuda_projector.cu at line 115 (error-code 2)
[computer:03635] *** Process received signal ***
[computer:03635] Signal: Segmentation fault (11)
[computer:03635] Signal code: (-6)
[computer:03635] Failing at address: 0x22b800000e33
[computer:03635] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f1ab9fff870]
[computer:03635] [ 1] /lib64/libpthread.so.0(raise+0x2b)[0x7f1ab9fff73b]
[computer:03635] [ 2] /computer.raid/software/relion-2.0b/lib/librelion_gpu_util.so(_ZN13CudaProjector9setMdlDimEiiiiiii+0x50b)[0x7f1ac373d7bb]
[computer:03635] [ 3] /computer.raid/software/relion-2.0b/lib/librelion_gpu_util.so(_ZN14MlDeviceBundle22setupFixedSizedObjectsEv+0x2c3)[0x7f1ac371eba3]
[computer:03635] [ 4] /computer.raid/software/relion-2.0b/lib/librelion_lib.so(_ZN14MlOptimiserMpi11expectationEv+0x13f9)[0x7f1ac3beea89]
[computer:03635] [ 5] /computer.raid/software/relion-2.0b/lib/librelion_lib.so(_ZN14MlOptimiserMpi7iterateEv+0x5d)[0x7f1ac3bf70ed]
[computer:03635] [ 6] /computer.raid/software/relion-2.0b/bin/relion_refine_mpi(main+0x69)[0x40b2d9]
[computer:03635] [ 7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f1ab9c69b05]
[computer:03635] [ 8] /computer.raid/software/relion-2.0b/bin/relion_refine_mpi[0x40b44f]
[computer:03635] *** End of error message ***
ERROR: out of memory in /computer.raid/software/relion-2.0b/src/gpu_utils/cuda_projector.cu at line 116 (error-code 2)
ERROR: out of memory in /computer.raid/software/relion-2.0b/src/gpu_utils/cuda_projector.cu at line 115 (error-code 2)
[computer:03643] *** Process received signal ***
[computer:03643] Signal: Segmentation fault (11)
[computer:03643] Signal code: (-6)
[computer:03643] Failing at address: 0x22b800000e3b
[computer:03643] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f36921c9870]
[computer:03643] [ 1] /lib64/libpthread.so.0(raise+0x2b)[0x7f36921c973b]
[computer:03643] [ 2] /computer.raid/software/relion-2.0b/lib/librelion_gpu_util.so(_ZN13CudaProjector9setMdlDimEiiiiiii+0x50b)[0x7f369b9077bb]
[computer:03643] [ 3] /computer.raid/software/relion-2.0b/lib/librelion_gpu_util.so(_ZN14MlDeviceBundle22setupFixedSizedObjectsEv+0x2c3)[0x7f369b8e8ba3]
[computer:03643] [ 4] /computer.raid/software/relion-2.0b/lib/librelion_lib.so(_ZN14MlOptimiserMpi11expectationEv+0x13f9)[0x7f369bdb8a89]
[computer:03643] [ 5] /computer.raid/software/relion-2.0b/lib/librelion_lib.so(_ZN14MlOptimiserMpi7iterateEv+0x5d)[0x7f369bdc10ed]
[computer:03643] [ 6] /computer.raid/software/relion-2.0b/bin/relion_refine_mpi(main+0x69)[0x40b2d9]
[computer:03643] [ 7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f3691e33b05]
[computer:03643] [ 8] /computer.raid/software/relion-2.0b/bin/relion_refine_mpi[0x40b44f]
[computer:03643] *** End of error message ***
ERROR: out of memory in /computer.raid/software/relion-2.0b/src/gpu_utils/cuda_projector.cu at line 116 (error-code 2)


Slaves are not split on GPU with version 2.0_b7

Originally reported by: Özkan Yildiz (Bitbucket: oeyildiz, GitHub: Unknown)


After the new compilation to v2.0_b7, a 2D classification that ran with version v2.0.0 (with exactly the same command line) did not continue after mapping the slaves to the GPU devices, as seen below. The CPU processes keep running, but the output stops. No error messages are shown at all.

We compiled it with
cmake3 -DCMAKE_INSTALL_PREFIX=/usr/local/relion_2.0beta/ -DCUDA_ARCH=52 ..
make
make install

Our GPU set up is:
2 TITAN X running on DELL PE T630 (2x 18 core, E5-2699, 768 GB RAM)

nvidia-smi gives back:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:04:00.0 Off | N/A |
| 22% 33C P8 15W / 250W | 2MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:83:00.0 Off | N/A |
| 22% 29C P8 16W / 250W | 2MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

Relion command line and output:

mpirun -np 5 relion_refine_mpi --o Class2D/job062/run --i Select/Particles2D/particles.star --dont_combine_weights_via_disc --pool 100 --ctf --iter 30 --tau2_fudge 2 --particle_diameter 190 --K 15 --flatten_solvent --zero_mask -strict_highres_exp 15 --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 4 --gpu --sigma_ang 5

WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host: x36b
Registerable memory: 524288 MiB
Total memory: 720802 MiB

Your MPI job will continue, but may be behave poorly and/or hang.

=== RELION MPI setup ===

  • Number of MPI processes = 5
  • Number of threads per MPI process = 4
  • Total number of threads therefore = 20
  • Master (0) runs on host = x36b
  • Slave 1 runs on host = x36b
  • Slave 2 runs on host = x36b
  • Slave 3 runs on host = x36b
    =================
  • Slave 4 runs on host = x36b
    Running CPU instructions in double precision.
    [x36b:28897] 4 more processes have sent help message help-mpi-btl-openib.txt / reg mem limit low
    [x36b:28897] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
  • WARNING: Changing psi sampling rate (before oversampling) to 11.25 degrees, for more efficient GPU calculations
    Estimating initial noise spectra
    13.62/13.62 min ............................................................~~(,_,">
    uniqueHost x36b has 4 ranks.
    GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
    Thread 0 on slave 1 mapped to device 0
    Thread 1 on slave 1 mapped to device 0
    Thread 2 on slave 1 mapped to device 0
    Thread 3 on slave 1 mapped to device 0
    GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
    Thread 0 on slave 2 mapped to device 0
    Thread 1 on slave 2 mapped to device 0
    Thread 2 on slave 2 mapped to device 0
    Thread 3 on slave 2 mapped to device 0
    GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
    Thread 0 on slave 3 mapped to device 1
    Thread 1 on slave 3 mapped to device 1
    Thread 2 on slave 3 mapped to device 1
    Thread 3 on slave 3 mapped to device 1
    GPU-ids not specified for this rank, threads will automatically be mapped to available devices.
    Thread 0 on slave 4 mapped to device 1
    Thread 1 on slave 4 mapped to device 1
    Thread 2 on slave 4 mapped to device 1
    Thread 3 on slave 4 mapped to device 1

KERNEL_ERROR: invalid device function in cuda_ml_optimiser.cu at line 258

Originally reported by: michaelcianfrocco (Bitbucket: michaelcianfrocco, GitHub: Unknown)


Hello!

I apologize if this is a very basic problem, but I've run into this error message when I try to do a test run of relion_refine:

KERNEL_ERROR: invalid device function in /home/ubuntu/relion2-beta/src/gpu_utils/cuda_ml_optimiser.cu at line 258 (error-code 8)

The command that I am trying to run:

relion_refine --o Class2D/job001/run --i 16may23a_untiltstack.mrcs --dont_combine_weights_via_disc --no_parallel_disc_io --pool 3 --iter 25 --tau2_fudge 2 --particle_diameter 200 --K 1 --flatten_solvent --zero_mask --oversampling 1 --psi_step 10 --offset_range 5 --offset_step 2 --norm --scale --j 1 --gpu 0 --angpix 3

I'm not sure if it's related to the GPU card that I'm trying to use, or if it's something else (I'm new at compiling GPU code). Or, if I should be specifying double precision or not.

Info about my system:
OS: Ubuntu 14.10 LTS on Amazon Web Services (g2.2xlarge instance)
GPU: GRID K520

I also compiled it using this cmake command (although I had the same error if I used the default):

cmake -DCUDA_ARCH=50

Thank you!


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.