deepmodeling / deepmd-kit Goto Github PK

View Code? Open in Web Editor NEW

1.4K 47.0 499.0 60.92 MB

A deep learning package for many-body potential energy representation and molecular dynamics

Home Page: https://docs.deepmodeling.com/projects/deepmd/

License: GNU Lesser General Public License v3.0

Python 46.45% Shell 0.35% CMake 1.06% C++ 48.32% C 1.17% Cuda 2.60% Dockerfile 0.01% JavaScript 0.02% SWIG 0.01%

deep-learning molecular-dynamics deepmd lammps potential-energy python tensorflow cpp cuda rocm

deepmd-kit's People

Contributors

Stargazers

Watchers

Forkers

thunderdruid jiangmoting 0ut0fcontrol z5476t4508 nuaajsh frankhan91 niclare-mao flower0226 cationly lorenzo-rovigatti plin1112 ruanyangry clarissading njzjz lingtikong quanshengwu jessicawang720 njustcodingjs forgotten zezhong-zhang hewg2008 mdsyn2019 liyibaipnu muwenyang001 fkxie aditikhot amcadmus junchiehwang chenxingqiang angusezhang chc273 salinelake qshao iceplussss jimbo994 hxtp wsyxbcl longforyou ljzhou86 liming-liu waldonchen sss9054 eipgen rex881026 captaindasheng kuan-louis goldernretriever bravesnowman benjaminchen22 yabeiwu gaosilagelangri aixuan1995py ixsluo junfanxia amartyabose int-zero qzhu2017 liupeng66 whybit9222 mikhjones jameswind jwz360 haidi-ustc asahixx nitin0301 zhangyongsdu felix5572 jushinpon xian9ji songsiwei maruf001 gzerze zhu-liu hnlab mqaisran dongdawn jasongyy jiaminghu121 cuihongqian siddarthachar marian-code genshen denghuilu xuxrishandsome sailfish009 dingye18 bwang-ecnu lizhen-dlut walkjoker-c bayerl daniel1991zy rogerxujiang cuidachao beyond117 chemshift superxiang mohuangrui org-mars zhenglz davidetisi

deepmd-kit's Issues

[BUG] zero result of test_descrpt_smooth

Summary

In gpu mode, I tried test_descrpt_smooth and it will failed like this

but in cpu mode it works well.
Steps to Reproduce

Further Information, Files, and Links

ntypes error

I set the typemap in param.json as ["Co","Ni"], and use a dataset of bulk Ni. Then ntypes error happens！

Traceback (most recent call last):
  File "/nfs-share/home/1800011848/miniconda3/bin/dp", line 10, in <module>
    sys.exit(main())
  File "/nfs-share/home/1800011848/miniconda3/lib/python3.8/site-packages/deepmd/main.py", line 66, in main
    train(args)
  File "/nfs-share/home/1800011848/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 81, in train
    _do_work(jdata, run_opt)
  File "/nfs-share/home/1800011848/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 134, in _do_work
    model.build (data, stop_batch)
  File "/nfs-share/home/1800011848/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 209, in build
    assert (self.ntypes == data.get_ntypes()), "ntypes should match that found in data"
AssertionError: ntypes should match that found in data

And the same param works well when using a alloy dataset( all type.raw have two elements)

The phonon spectrum calculated using the GPU

How to convert data from other softwares to the format required for deepmodeling training?

doc for data format conversion.

new compiler error after open kspace and kokkos: Force has no member named 'bounds'

../pair_nnp.cpp: In member function ‘virtual void LAMMPS_NS::PairNNP::coeff(int, char**)’:
../pair_nnp.cpp:819:12: error: ‘class LAMMPS_NS::Force’ has no member named ‘bounds’
force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
^
../pair_nnp.cpp:820:12: error: ‘class LAMMPS_NS::Force’ has no member named ‘bounds’
force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);

Segmentation fault (core dumped) when using compute function of NNPInter.h

Dear all, I'm now trying to write a C++ interface for calculating force as well as energy with structure information given for DeePMD potential. From NNPInter.h, I could just pass the value to the compute function to get the energy and force.

  void compute (ENERGYTYPE &			ener,
		vector<VALUETYPE> &		force,
		vector<VALUETYPE> &		virial,
		vector<VALUETYPE> &		atom_energy,
		vector<VALUETYPE> &		atom_virial,
		const vector<VALUETYPE> &	coord,
		const vector<int> &		atype,
		const vector<VALUETYPE> &	box,
		const vector<VALUETYPE>	&	fparam = vector<VALUETYPE>(),
		const vector<VALUETYPE>	&	aparam = vector<VALUETYPE>());

However, when executing the compute function from NNPInter.h on GPU, it raise error(shown below):

2021-04-28 17:27:07.360527: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-04-28 17:27:07.437428: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-28 17:27:07.441541: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-04-28 17:27:07.442388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:61:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-04-28 17:27:07.442490: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-04-28 17:27:07.445258: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-04-28 17:27:07.447935: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-04-28 17:27:07.448858: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-04-28 17:27:07.451637: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-04-28 17:27:07.453269: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-04-28 17:27:07.459345: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-04-28 17:27:07.460596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-04-28 17:27:07.460629: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-04-28 17:27:08.326936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-28 17:27:08.326977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-04-28 17:27:08.326993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-04-28 17:27:08.328718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 29259 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:61:00.0, compute capability: 7.0)
2021-04-28 17:27:08.849037: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
cuda assert: misaligned address /data/share/soft/deepmd-kit/source/op/prod_virial_se_a_gpu.cc 88
2021-04-28 17:27:09.332565: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address
2021-04-28 17:27:09.332653: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
Segmentation fault (core dumped)

As shown, CUDA_ERROR_MISALIGNED_ADDRESS error raised. From printing each variables passed to the function, coordinations as well as box information are right. And I used cuda-gdb to debug, shows the backtrace information below:

CUDA Exception: Warp Misaligned Address
The exception was triggered at PC 0x2aac30047090

Thread 60 "call" received signal CUDA_EXCEPTION_6, Warp Misaligned Address.
[Switching focus to CUDA kernel 0, grid 4, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
0x00002aac300470a0 in get_i_idx_se_a(int, int const*, int*)<<<(1,1,1),(256,1,1)>>> ()

So where might the issue come from and how could I try to fix it?
Thanks!

DeepMD installation error

A tried to install deepmd using conda, but i got this error:

conda install deepmd-kit=*=cpu lammps-dp==*cpu -c deepmodeling Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

lammps-dp[build=*cpu]
deepmd-kit[build=*cpu]

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

MPI communicator change suggestion

deepmd-kit/source/lmp/pair_nnp.cpp

Line 44 in 2111ebb

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

I think the communicator in this line will not work when lammps is executed with the -partition option. The right communicator that should go here is called world. You can check how it's done here.

When one uses the -partition option each partition has its own communicator called "world". MPI_COMM_WORLD will include all MPI processes of all partitions while world only knows about the MPI processes of an individual partition. I assume that the parallelization of deepmd should only be done at the partition level and the communication among partitions is not relevant. Thus I suggest to change MPI_COMM_WORLD to world.

Let me know if I'm wrong. Thanks!

Pablo

error duing compilation of user-deepmd module in lammps

hello

I compiled the deepmd-kit (using tensorflow-1.5.0 and gnu-4.9.3) and made the lammps module. the serial version of lammps without the deepmd module compiles fine.
however when I tried to compile lammps with the deepmd module I'm seeing the following errors:

g++ -g -O3 -DLAMMPS_GZIP -DLAMMPS_MEMALIGN=64 -std=c++11 -DHIGH_PREC -I/g/g99/hamel2/build_tf/1.5.0/gpu/include -I/g/g99/hamel2/deepmd-kit/include/deepmd -std=c++11 -DHIGH_PREC -I/g/g99/hamel2/build_tf/1.5.0/gpu/include -I/g/g99/hamel2/deepmd-kit/include/deepmd -std=c++11 -DHIGH_PREC -I/g/g99/hamel2/build_tf/1.5.0/gpu/include -I/g/g99/hamel2/deepmd-kit/include/deepmd -std=c++11 -DHIGH_PREC -I/g/g99/hamel2/build_tf/1.5.0/gpu/include -I/g/g99/hamel2/deepmd-kit/include/deepmd -I../STUBS -c ../pair_nnp.cpp
../pair_nnp.cpp: In member function 'virtual void LAMMPS_NS::PairNNP::compute(int, int)':
../pair_nnp.cpp:33:20: error: invalid use of incomplete type 'class LAMMPS_NS::Atom'
double **x = atom->x;
^
In file included from ../pointers.h:26:0,
from ../memory.h:18,
from ../pair_nnp.cpp:3:
../lammps.h:29:9: error: forward declaration of 'class LAMMPS_NS::Atom'
class Atom *atom; // atom-based quantities
^
...

any suggestions?

How to convert data from other softwares into the format required by DeePMD training?

doc for data format conversion.

TensorFlow 1.13.1 warning

Maybe deepmd needs some changes if it plans to support for subsequent TensorFlow versions.

# DEEPMD: computed energy bias
WARNING:tensorflow:From /home/jzzeng/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
# DEEPMD: built lr
WARNING:tensorflow:From /home/jzzeng/deepmd_root/bin/../lib/deepmd/Model.py:890: to_double (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
# DEEPMD: built network
WARNING:tensorflow:From /home/jzzeng/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.

Error run dp on mac

Hi,

I installed deepmd on mac using
conda install -c conda-forge deepmd-kit
and I can confirm that it is installed using

conda list deepmd
# packages in environment at /Users/Zezhong/anaconda3/envs/atomate:
#
# Name                    Version                   Build  Channel
deepmd-kit                1.1.1            py37h2af55cb_0    conda-forge

The tensorflow is installed with version 2.0.0

conda list tensorflow
# packages in environment at /Users/Zezhong/anaconda3/envs/atomate:
#
# Name                    Version                   Build  Channel
tensorflow                2.0.0           mkl_py37hda344b4_0
tensorflow-base           2.0.0           mkl_py37h66b1bf0_0
tensorflow-estimator      2.0.0              pyh2649769_0

When I ran dp, it has the following error, is it due to the verison of tensorflow? Many thanks for your help.

WARNING:tensorflow:From /Users/Zezhong/anaconda3/envs/atomate/lib/python3.7/site-packages/tensorflow_core/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
  File "/Users/Zezhong/anaconda3/envs/atomate/bin/dp", line 5, in <module>
    from deepmd.main import main
  File "/Users/Zezhong/anaconda3/envs/atomate/lib/python3.7/site-packages/deepmd/__init__.py", line 2, in <module>
    from .DeepEval   import DeepEval
  File "/Users/Zezhong/anaconda3/envs/atomate/lib/python3.7/site-packages/deepmd/DeepEval.py", line 18, in <module>
    op_module = tf.load_op_library(os.path.join(module_path, "libop_abi.{}".format(ext)))
  File "/Users/Zezhong/anaconda3/envs/atomate/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/Zezhong/anaconda3/envs/atomate/lib/python3.7/site-packages/deepmd/libop_abi.dylib, 6): Library not loaded: @rpath/libtensorflow_framework.so
  Referenced from: /Users/Zezhong/anaconda3/envs/atomate/lib/python3.7/site-packages/deepmd/libop_abi.dylib
  Reason: image not found

The phonon spectrums calculated using the cpu version and gpu version are different？

The cpu or gpu version is installed using the conda ：
conda install deepmd-kit=*=cpu lammps-dp==cpu -c deepmodeling or
conda install deepmd-kit==gpu lammps-dp==*gpu -c deepmodeling，
then I installed the lammp-dp python interface by using : conda install -c deepmodeling pylammps-dp；
then installed the phonolammps using ：pip install phonolammps

However，the calculated phonon spectrums are very different！It seems the CPU Version give the correct result，while the GPU version is not.

attached is files used for calculations.

I wondered whether i did mistakes in my calculations? or it's a bug?
DP.zip

model compression

add documents for model compression

bug in log level

I want to discuss the determination of the log_level by argparse, the code is in the following

deepmd-kit/source/train/main.py

Lines 42 to 50 in 18ba5b1

    
           parser_log.add_argument( 
        
               "-v", 
        
               "--verbose", 
        
               default=2, 
        
               action="count", 
        
               dest="log_level", 
        
               help="set verbosity level 0 - 3, 0=ERROR, 1(-v)=WARNING, 2(-vv)=INFO " 
        
               "and 3(-vvv)=DEBUG", 
        
           )

Surprisingly, the action of "count" is to ADD the number of flags to the default value of log_level. Then if one sets -vvv, he/she will have a log_level of 5 rather than 3. To reproduce this behavoior,

import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-v", default=2, action="count", dest="level")
args = parser.parse_args(['-vvv'])
print(args.level)

Model compression error: descriptor type must be se_a!

model trainned with se_e2_a type can not be compressed with the dp compress command for the latest devel version deepmp-kit.

The Pip-installed newest deepmd-kit (1.3.3) requires libtensorflow_framework.so.2

The recommended tensorflow-1.14.0 seems not provide libtensorflow_framework.so.2, but libtensorflow_framework.so.1.
dp -h
returns

tensorflow.python.framework.errors_impl.NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory

Error dp test

The erros shows as follow:
Traceback (most recent call last):
File "/soft/anaconda3/bin/dp", line 10, in
sys.exit(main())
File "/soft/anaconda3/lib/python3.7/site-packages/deepmd/main.py", line 69, in main
test(args)
File "/soft/anaconda3/lib/python3.7/site-packages/deepmd/test.py", line 20, in test
test_ener(args)
File "/soft/anaconda3/lib/python3.7/site-packages/deepmd/test.py", line 64, in test_ener
l2e = (l2err (energy - test_data["energy"][:numb_test].reshape([-1,1])))
KeyError: 'energy'

Documentation on descriptors

i am interested with your work on building the descrpt for molecules. but when i go the the op source part of your repo, there are many different suffix for a same file. so are there any comments or docs explaining what all those different suffixes mean or which version of descrpt op codes is the latest? i would like to have a deep look into your descrpt code and want to do some additional work based on it, but now are really confused with the code suffixes, looking forward to your response!

Originally posted by @kleinZhf in #450

[BUG] Single precision training error

Summary

Deepmd-kit version, installation way, input file, running commands, error log, etc.
version: latest version of devel branch;
installation way: python interface with single precision, set cmake_args:

    cmake_args=[
        f"-DTENSORFLOW_ROOT:STRING={tf_install_dir}",
        "-DBUILD_PY_IF:BOOL=TRUE",
        "-DBUILD_CPP_IF:BOOL=FALSE",
        "-DFLOAT_PREC:STRING=low",
    ]

Installation works fine.

pip install .

Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Processing /root/denghui/dp-api/deepmd-kit
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Requirement already satisfied: pyyaml in /root/denghui/dp-api/tensorflow_venv/lib/python3.6/site-packages/PyYAML-5.4.1-py3.6-linux-x86_64.egg (from deepmd-kit==1.2.3.dev627+g59c6fde.d20210425) (5.4.1)
Requirement already satisfied: scipy in /root/denghui/dp-api/tensorflow_venv/lib/python3.6/site-packages (from deepmd-kit==1.2.3.dev627+g59c6fde.d20210425) (1.5.4)
Requirement already satisfied: typing-extensions in /root/denghui/dp-api/tensorflow_venv/lib/python3.6/site-packages (from deepmd-kit==1.2.3.dev627+g59c6fde.d20210425) (3.7.4.3)
Requirement already satisfied: dargs>=0.2.2 in /root/denghui/dp-api/tensorflow_venv/lib/python3.6/site-packages/dargs-0.2.2-py3.6.egg (from deepmd-kit==1.2.3.dev627+g59c6fde.d20210425) (0.2.2)
Requirement already satisfied: numpy in /root/denghui/dp-api/tensorflow_venv/lib/python3.6/site-packages (from deepmd-kit==1.2.3.dev627+g59c6fde.d20210425) (1.19.2)
Requirement already satisfied: tqdm in /root/denghui/dp-api/tensorflow_venv/lib/python3.6/site-packages (from deepmd-kit==1.2.3.dev627+g59c6fde.d20210425) (4.59.0)
Building wheels for collected packages: deepmd-kit
  Building wheel for deepmd-kit (PEP 517) ... done
  Created wheel for deepmd-kit: filename=deepmd_kit-1.2.3.dev627+g59c6fde.d20210425-cp36-cp36m-linux_x86_64.whl size=1499796 sha256=4f3dec01afef1c8617cb4e5dafa9bbaeb5e98e2e7b139dd0198215f7dac3f35f
  Stored in directory: /root/.cache/pip/wheels/21/f4/ed/167c943f5247a0b258bf59868ff9e8028e9cf4bd783233c161
Successfully built deepmd-kit
Installing collected packages: deepmd-kit
Successfully installed deepmd-kit-1.2.3.dev627+g59c6fde.d20210425

Steps to Reproduce

cd $deepmd_source_dir/examples/water/se_e2_a
dp train input.json

error occurs:

DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------
DEEPMD INFO    found 3 system(s):
DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
DEEPMD INFO                               ../data/data_0/     192       1      80  0.250    T
DEEPMD INFO                               ../data/data_1/     192       1     160  0.500    T
DEEPMD INFO                               ../data/data_2/     192       1      80  0.250    T
DEEPMD INFO    --------------------------------------------------------------------------------------
DEEPMD INFO    ---Summary of DataSystem: validation   -----------------------------------------------
DEEPMD INFO    found 1 system(s):
DEEPMD INFO                                        system  natoms  bch_sz   n_bch   prob  pbc
DEEPMD INFO                                ../data/data_3     192       1      80  1.000    T
DEEPMD INFO    --------------------------------------------------------------------------------------
DEEPMD INFO    training without frame parameter
2021-04-25 17:31:15.020943: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-04-25 17:31:15.021616: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500000000 Hz
2021-04-25 17:31:15.366409: F tensorflow/core/framework/tensor.cc:665] Check failed: dtype() == expected_dtype (2 vs. 1) float expected, got double
Aborted

Further Information, Files, and Links

unable to import deepmd package in jupyter notebook

I installed deepmk-kit by conda command "conda install deepmd-kit=*=cpu lammps-dp==*cpu -c deepmodeling". When i import deepmd in python's interactive face, there appeared following message.

>>> import deepmd
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/linazhao/anaconda3/envs/dp/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:61: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term

And when i tried to import deepmd in jupyter notebook, the error ocurred indicating "ModuleNotFoundError: No module named 'deepmd'". I examined the packages installed in conda environment, and indentified the existence of the deepmd-kit installed from deepmodeling channel. But why it cannot be imported in the jupyter notebook?

[BUG] Lammps compilation fails when using low-precision mode USER-DEEPMD

Summary

The low-precision mode deepmd module cannot be compiled by LAMMPS

Deepmd-kit version, installation way, input file, running commands, error log, etc.

vserion: 2.0.0 devel branch
installation way: C++ interface
key commands:
cmake -DUSE_CUDA_TOOLKIT=true -DTENSORFLOW_ROOT=$tensorflow_root -DCMAKE_INSTALL_PREFIX=$deepmd_root -DFLOAT_PREC=low ..
After the compilation of low-precision deepmd, there's an error when compile lammps.

Steps to Reproduce

Following the step after the above cmake command from https://deepmd.readthedocs.io/en/master/install.html#install-the-c-interface, then the error will occur at the last step of Install LAMMPS’s DeePMD-kit module

Further Information, Files, and Links

[Feature Request] Auto convertion of model to DeePMD-kit v2.0 compatibility

Summary

Automatic conversion of frozen model file from v1 compatible to v2 compatible.

Detailed Description

After the upgrading to DeePMD-kit v2, the frozen models trained with v1 cannot be used anymore. It would be useful to provide users a tool that convert the v1 frozen models to v2.

Further Information, Files, and Links

error: Not found: No attr named 'T' in NodeDef when running lammps

I tried the water data in the deepmd-kit' repository example folder, and obtained the frozened model, graph.pb. But when running lammps by "lmp -i in.lammps", the errors occured. The error message was:

lmp: Relink `/home/linazhao/anaconda3/envs/dpmd/bin/../lib/./libgfortran.so.4' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
2021-03-15 13:27:39.025473: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
LAMMPS (29 Oct 2020)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:94)
Warning: please export TSAN_OPTIONS='ignore_noninstrumented_modules=1' to avoid false positive reports from the OpenMP runtime!
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  triclinic box = (0.0000000 0.0000000 0.0000000) to (12.444700 12.444700 12.444700) with tilt (0.0000000 0.0000000 0.0000000)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  192 atoms
  read_data CPU = 0.001 seconds
Summary of lammps deepmd module ...
  >>> Info of deepmd-kit:
  installed to:       /home/linazhao/anaconda3/envs/dpmd
  source:             v1.3.2
  source brach:       HEAD
  source commit:      2644cca
  source commit at:   2021-03-02 06:30:13 +0800
  build float prec:   double
  build with tf inc:  /home/linazhao/anaconda3/envs/dpmd/include;/home/linazhao/anaconda3/envs/dpmd/include
  build with tf lib:  /home/linazhao/anaconda3/envs/dpmd/lib/libtensorflow_cc.so;/home/linazhao/anaconda3/envs/dpmd/lib/libtensorflow_framework.so
  set tf intra_op_parallelism_threads: 0
  set tf inter_op_parallelism_threads: 0
  >>> Info of lammps module:
  use deepmd-kit at:  /home/linazhao/anaconda3/envs/dpmd
  source:             v1.3.2
  source branch:      HEAD
  source commit:      2644cca
  source commit at:   2021-03-02 06:30:13 +0800
  build float prec:   double
  build with tf inc:  /home/linazhao/anaconda3/envs/dpmd/include;/home/linazhao/anaconda3/envs/dpmd/include
  build with tf lib:  /home/linazhao/anaconda3/envs/dpmd/lib/libtensorflow_cc.so;/home/linazhao/anaconda3/envs/dpmd/lib/libtensorflow_framework.so
2021-03-15 13:27:39.160914: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-15 13:27:39.161764: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-03-15 13:27:39.476767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:17:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.8GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-03-15 13:27:39.477166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:65:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.8GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-03-15 13:27:39.477186: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-03-15 13:27:39.478532: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-03-15 13:27:39.479767: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-03-15 13:27:39.480076: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-03-15 13:27:39.481273: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-03-15 13:27:39.481847: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-03-15 13:27:39.484181: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-03-15 13:27:39.485599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1
2021-03-15 13:27:39.485626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-03-15 13:27:40.234693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-15 13:27:40.234741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 1
2021-03-15 13:27:40.234747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N N
2021-03-15 13:27:40.234751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 1:   N N
2021-03-15 13:27:40.236264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7184 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:17:00.0, compute capability: 7.5)
2021-03-15 13:27:40.237286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7181 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2070 SUPER, pci bus id: 0000:65:00.0, compute capability: 7.5)
Not found: No attr named 'T' in NodeDef:
         [[{{node DescrptSeA}}]]
         [[DescrptSeA]]

the deepmd version is v.1.3.2.

Documentation on descritptors

There are several types of descriptors implemented in DeePMD-kit. A detailed documentation on them is necessary.

Pair style restartinfo set but has no restart support

Pair style restartinfo set but has no restart support

Per https://lammps.sandia.gov/doc/Errors_warnings.html

This pair style has a bug, where it does not support reading and writing information to a restart file, but does not set the member variable “restartinfo” to 0 as required in that case.

Does deepmd-kit have any plan to add ROCm support, like the way TensorFlow did?

* Error in `lmp': free(): invalid size: 0x00002b478c4dd040 *

Before asking questions, you can

search the previous issues or discussions
check the document, especially training parameters.

Please do not post requests for help (e.g. with installing or using deepmd-kit) here.
Instead go to discussions.

This issue tracker is for tracking deepmd-kit development related issues only.

Thanks for your cooperation.

2021-04-08 11:15:55.436167: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:161 : Resource exhausted: OOM when allocating tensor with shape[173107136,50] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
2021-04-08 11:15:55.483135: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at matmul_op.cc:481 : Resource exhausted: OOM when allocating tensor with shape[173107136,50] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
*** Error in `lmp': free(): invalid size: 0x00002b478c4dd040 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x2b445499a499]
/public/home/users/caep003/DP_ZHANG/deepmd-kit/bin/../lib/libtensorflow_framework.so.2(+0xe39421)[0x2b4452424421]
/public/home/users/caep003/DP_ZHANG/deepmd-kit/bin/../lib/libtensorflow_framework.so.2(_ZN10tensorflow6TensorD1Ev+0x46)[0x2b4452426886]
/public/home/users/caep003/DP_ZHANG/deepmd-kit/bin/../lib/libtensorflow_framework.so.2(+0xf1e8c2)[0x2b44525098c2]
/public/home/users/caep003/DP_ZHANG/deepmd-kit/bin/../lib/libtensorflow_framework.so.2(+0xfcbd81)[0x2b44525b6d81]
/public/home/users/caep003/DP_ZHANG/deepmd-kit/bin/../lib/libtensorflow_framework.so.2(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x57)[0x2b44525b2e47]
/public/home/users/caep003/DP_ZHANG/deepmd-kit/bin/../lib/libtensorflow_framework.so.2(+0xfb915c)[0x2b44525a415c]
/lib64/libpthread.so.0(+0x7e25)[0x2b4454cede25]
/lib64/libc.so.6(clone+0x6d)[0x2b4454a17bad]
======= Memory map: ========
2b4444509000-2b444452b000 r-xp 00000000 08:03 794042 /usr/lib64/ld-2.17.so
2b444452b000-2b4444540000 rw-p 00000000 00:00 0
2b444455f000-2b4444562000 rw-p 00000000 00:00 0
2b4444562000-2b4444604000 r--p 00000000 00:27 12741564168 /public/home/users/caep003/DP_ZHANG/deepmd-kit/lib/libstdc++.so.6.0.26
2b4444604000-2b4444683000 r-xp 000a2000 00:27 12741564168 /public/home/users/caep003/DP_ZHANG/deepmd-kit/lib/libstdc++.so.6.0.26
2b4444683000-2b44446c4000 r--p 00121000 00:27 12741564168 /public/home/users/caep003/DP_ZHANG/deepmd-kit/lib/libstdc++.so.6.0.26
2b44446c4000-2b44446cf000 r--p 00161000 00:27 12741564168 /public/home/users/caep003/DP_ZHANG/deepmd-kit/lib/libstdc++.so.6.0.26
2b44446cf000-2b44446d3000 rw-p 0016c000 00:27 12741564168 /public/home/users/caep003/DP_ZHANG/deepmd-kit/lib/libstdc++.so.6.0.26

Issue while running LAMMPS

Hi,

I have trained the AIMD data using deepmd and freeze the model using dp freeze.
While running the LAMMPS, I am getting the following error:

Invalid argument: NodeDef mentions attr 'T' not in Op<name=DescrptSeA; signature=coord:double, type:int32, natoms:int32, box:double, mesh:int32, davg:double, dstd:double -> descrpt:double, descrpt_deriv:double, rij:double, nlist:int32; attr=rcut_a:float; attr=rcut_r:float; attr=rcut_r_smth:float; attr=sel_a:list(int); attr=sel_r:list(int)>; NodeDef: {{node DescrptSeA}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[DescrptSeA]]

I have updated the software using conda, but still getting the same error.
Thanks,
Mayank

How to train the special element with different atoms and structrure with deepmd-kit?

I am training an element PD, I have many structures with different atom numbers. I have many OUTCARs. Is it possible to use these OUTCARS(only with one step self-consistent computing), to train a initial *pd.file ? After I run the script:

from dpdata import LabeledSystem,MultiSystems
from glob import glob
"""
process multi systems
"""
fs=glob('./*/OUTCAR') # remeber to change here !!!
ms=MultiSystems()
for f in fs:
try:
ls=LabeledSystem(f)
except:
print(f)
if len(ls)>0:
ms.append(ls)

ms.to_deepmd_raw('deepmd')
ms.to_deepmd_npy('deepmd')

My second question is that, if possible, how do I modify my Li.json file before train?
my Li.json :
{
"_comment": " model parameters",
"model": {
"type_map": ["Li"],
"descriptor" :{
"type": "se_a",
"sel": [],
"rcut_smth": 0.50,
"rcut": 6.00,
"neuron": [25, 50, 100],
"resnet_dt": false,
"axis_neuron": 16,
"seed": 1,
"_comment": " that's all"
},
"fitting_net" : {
"n_neuron": [240, 240, 240],
"resnet_dt": true,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},

"learning_rate" :{
    "type":         "exp",
    "start_lr":     0.001,
    "decay_steps":  500,
    "decay_rate":   0.95,
    "_comment":     "that's all"
},

"loss" :{
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0,
    "limit_pref_v": 0,
    "_comment":     " that's all"
},

"_comment": " traing controls",
"training" : {
    "systems":      ["./"],     
    "set_prefix":   "set",
    "stop_batch":   500000,
    "batch_size":   1,

    "seed":         1,

    "_comment": " display and restart",
    "_comment": " frequencies counted in batch",
    "disp_file":    "lcurve.out",
    "disp_freq":    100,
    "numb_test":    10,
    "save_freq":    1000,
    "save_ckpt":    "model.ckpt",
    "load_ckpt":    "model.ckpt",
    "disp_training":true,
    "time_training":true,
    "profiling":    false,
    "profiling_file":"timeline.json",
    "_comment":     "that's all"
},

"_comment":         "that's all"

how to modify the "systems": ["./"], part?

Thanks.

[BUG] _Replace With Suitable Title_

Summary

Deepmd-kit version, installation way, input file, running commands, error log, etc.

Steps to Reproduce

Further Information, Files, and Links

Specify pair style double times leads to memory leak

deepmd-kit/source/lmp/pair_nnp.cpp

Line 455 in 2111ebb

nnp_inter.init (arg[0], get_node_rank());

If we specify the pair style double times, nnp_inter will be initialized double times too, which is not allowed and will lead to memory leak in release mode. It seems that pairs provided by lammps can be specified several times safely.

Let me know if I'm wrong. Thank you.

[Feature Request] Auto conversion of training input script

Summary

On the devel branch, the validation dataset is introduced, correspondingly the input script is changed. An auto conversion from the old input to the new input script is needed, so that the code is compatible with old-styled input script.

Detailed Description

An example of the old-style input script is

"training" : {
    "systems" : ["foo"],
    "batch_size" : "auto"
}

A example of the new input script is

"training": {
    "training_data": {
         "systems" : ["foo"],
         "batch_size" : "auto"
    },
    "validation_data": {
         "systems" : ["bar"],
         "batch_size" : 2
    },
}

The old script could be coveted to the new style to place the "systems" and "batch_size" to the "training_data", and leave the "validation_data" unset.

Further Information, Files, and Links

an illegal memory access was encountered when fix npt is used together with a compute command in lammps

For the recent versions of deepmd-kit (from dp1.3.1 to the most recent master version), there might be a bug in the GPU version due to an illegal memory access. This only occur when a compute that will compute potential energy (or any computes that whill rely on potential energy, such as stress) is invoked together with a fix npt command. An example is given below:

fix 2 mobile npt temp ${initTemp} ${initTemp} 0.1 x 1 1 2 y 1 1 2 z 1 1 2
compute potential all pe/atom
thermo 100
thermo_style custom step pe ke temp pxx pyy pzz pxy pyz pxz vol
dump 2 all custom 1000 ${file}-equal id type x y z c_potential

I guess this is because both the compute and the fix npt commands will call GPU to calculate the potential energy from the deep neural netowrk potential. Availiable memory should all be allocated for the fix npt command, and thus the access to the GPU memory from the compute command will be treated as illegal.

Full descritpion of the error is:

2021-04-05 12:21:07.174540: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-04-05 12:21:07.174566: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:220] Unexpected Event status: 1
[gadi-gpu-v100-0128:925339] *** Process received signal ***
[gadi-gpu-v100-0128:925339] Signal: Aborted (6)
[gadi-gpu-v100-0128:925339] Signal code: (-6)
[gadi-gpu-v100-0128:925339] [ 0] /lib64/libpthread.so.0(+0x12b20)[0x14a2fec2cb20]
[gadi-gpu-v100-0128:925339] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x14a2fe88e7ff]
[gadi-gpu-v100-0128:925339] [ 2] /lib64/libc.so.6(abort+0x127)[0x14a2fe878c35]
[gadi-gpu-v100-0128:925339] [ 3] /scratch/qf9/yxz565/softwares/tensorflow2.3.0_root/lib/libtensorflow_cc.so.2(+0xc94a2b7)[0x14a30e4142b7]
[gadi-gpu-v100-0128:925339] [ 4] /scratch/qf9/yxz565/softwares/tensorflow2.3.0_root/lib/libtensorflow_cc.so.2(_ZN10tensorflow8EventMgr10PollEventsEbPN4absl14lts_2020_02_2513InlinedVectorINS0_5InUseELm4ESaIS4_EEE+0x161)[0x14a30de4bf81]
[gadi-gpu-v100-0128:925339] [ 5] /scratch/qf9/yxz565/softwares/tensorflow2.3.0_root/lib/libtensorflow_cc.so.2(_ZN10tensorflow8EventMgr8PollLoopEv+0xa4)[0x14a30de4c374]
[gadi-gpu-v100-0128:925339] [ 6] /scratch/qf9/yxz565/softwares/tensorflow2.3.0_root/lib/libtensorflow_framework.so.2(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x4b1)[0x14a300d9bb71]
[gadi-gpu-v100-0128:925339] [ 7] /scratch/qf9/yxz565/softwares/tensorflow2.3.0_root/lib/libtensorflow_framework.so.2(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x43)[0x14a300d99263]
[gadi-gpu-v100-0128:925339] [ 8] /scratch/qf9/yxz565/softwares/tensorflow2.3.0_root/lib/libtensorflow_framework.so.2(+0x1103547)[0x14a300d8a547]
[gadi-gpu-v100-0128:925339] [ 9] /lib64/libpthread.so.0(+0x814a)[0x14a2fec2214a]
[gadi-gpu-v100-0128:925339] [10] /lib64/libc.so.6(clone+0x43)[0x14a2fe953f23]
[gadi-gpu-v100-0128:925339] *** End of error message ***

cudatoolkit 11.0 doesn't support compute_86

deepmd-kit/source/op/cuda/CMakeLists.txt

Line 32 in b3b2662

-gencode arch=compute_86,code=sm_86; # Anpere - RTX 3090

compute_86 is only supported in cudatoolkit >= 11.1

related PRs: #353, #354

How to install an older version of deepmd using conda

Hi,

I have upgraded deepmd-kit using conda. But it's not supporting the previously trained model. So I want to switch back to the previous version 1.2. Please let me know how I can do that.

Thanks,
Mayank

[BUG]libop_abi.so undefined symbol

Hi guys, I cloned the latest version of deepmd-kit, master branch and devel branch.
My TF environment is tensorflow-gpu 2.4.1 ( pip source ), with python3.6 , gcc/g++ 7.3.1 , cuda10.1
When I 'pip install .' , it would deliver the error message of "libop_abi.so : undefined symbol: _ZN10tensorflow8OpKernel11TraceStringEPNS_15OpKernelContextEb "

Request for specification of the the element sequence as well as the unit of the virial matrix.

Some people used to set the virial matrix element sequence as {xx yy zz xy xz yz ...}, and others prefer {xx xy xz yx yy yz ...}. Users may not be sure which order you used in this kit.
Likewise, the units of the A^3*bar or the eV for virial quantify are both in common use, for which I think the unit need to be give out clearly.

compile or conda on IBM Power9 AC922 powerpc system

Given the emerging prevalence of IBM's AC922 nodes (a la Summit) in the ML space,
would you consider providing compilation instruction or releasing a conda package for the ppc64le architecture?

From talking to a few collaborators, compiling a customizable version of DeePMD with tensorflow's C++ interface is a major road block for in-depth research projects.

When training my own methane data, it went wrong and displayed "Floating point exception (core dumped).

We are training a set of data which is a methane molecule system. But it always went wrong when we began to train the data. DeepMD-kit only told us that "Floating point exception (core dumped)", without any further instructions. Could you please tell us some possible mistakes we might make?

an error during compressing a model: "The name 'DescrptSeA' refers to an Operation not in the graph"

I trained a model with the the latest deepmdkit api branch. More detailed information about the error can be found from the outlog attached below.

DEEPMD INFO stage 1: train or refine the model with tabulation
DEEPMD INFO installed to: /tmp/pip-req-build-x5wk7g0h/_skbuild/linux-x86_64-3.7/cmake-install
DEEPMD INFO source : v1.2.2-462-gcb8f04f
DEEPMD INFO source brach: api
DEEPMD INFO source commit: cb8f04f
DEEPMD INFO source commit at: 2021-03-23 14:22:23 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build with tf inc: /projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/tensorflow/include;/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/tensorflow/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: m3a002
DEEPMD INFO CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
2021-03-29 01:37:51.500692: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-29 01:37:51.501443: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/1.10.7-mlx/lib:/usr/local/python/3.7.3-system/lib:/usr/local/gcc/8.1.0/lib64:/opt/munge-0.5.11/lib:/opt/slurm-19.05.4/lib:/opt/slurm-19.05.4/lib/slurm:/opt/munge-0.5.11/lib:/opt/slurm-19.05.4/lib:/opt/slurm-19.05.4/lib/slurm:
2021-03-29 01:37:51.501502: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-03-29 01:37:51.501551: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (m3a002): /proc/driver/nvidia/version does not exist
DEEPMD INFO ---Summary of DataSystem--------------------------------------------------------------
DEEPMD INFO found 85 system(s):

2021-03-29 01:38:05.561486: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-03-29 01:38:05.562771: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2494210000 Hz
2021-03-29 01:38:08.133517: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
DEEPMD INFO training data with min nbor dist: 0.9349186607997055
DEEPMD INFO training data with max nbor size: [34, 16]
2021-03-29 01:40:50.247771: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-29 01:40:50.248306: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-29 01:40:50.249084: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Traceback (most recent call last):
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/bin/dp", line 8, in
sys.exit(main())
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/main.py", line 352, in main
compress(**dict_args)
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/entrypoints/compress.py", line 102, in compress
log_path=log_path,
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/entrypoints/train.py", line 211, in train
_do_work(jdata, run_opt)
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/entrypoints/train.py", line 291, in _do_work
model.build(data, stop_batch)
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/trainer.py", line 275, in build
self.descrpt.enable_compression(self.min_nbor_dist, self.model_param['compress']['model_file'], self.model_param['compress']['table_config'][0], self.model_param['compress']['table_config'][1], self.model_param['compress']['table_config'][2], self.model_param['compress']['table_config'][3])
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/descriptor/se_a.py", line 261, in enable_compression
self.table = DeepTabulate(self.model_file, self.type_one_side)
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/deepmd/utils/tabulate.py", line 44, in init
self.sel_a = self.graph.get_operation_by_name('DescrptSeA').get_attr('sel_a')
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3854, in get_operation_by_name
return self.as_graph_element(name, allow_tensor=False, allow_operation=True)
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3726, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/projects/lj94/softwares/tensorflow2.4.0_venv_GPU_api/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3786, in _as_graph_element_locked
"graph." % repr(name))
KeyError: "The name 'DescrptSeA' refers to an Operation not in the graph."

example files for raw data

I am wondering if there are some example files for the data preparation, like box.raw, coord.raw, force.raw, energy.raw and virial.raw. These files are now not available in the /example folder except for the type.raw. Can you kindly provide some example files, thanks very much.

running error on gpu_cuda.h

I got a error when run MD simuation with the API branch deepmd-kit. the error looks like:
cuda assert: invalid argument /scratch/qf9/yxz565/softwares/deepmd-kit-api-20210417/source/lib/include/gpu_cuda.h 48.

I used 4 V100 GPU (mpirun -np 4) with cuda/10.1, cudnn/7.6.5-cuda10.1, nccl/2.6.4-1+cuda10.1 and openmpi/4.0.1. This error also occurs for cuda 11, cudnn 8. The error does not occur for the API brach before 20th March 2021.

[Bug] Got an error when unittest on /source/tests , latest devel branch

Got an error when unittest on /source/tests , latest devel branch
cuda assert: an illegal memory access was encountered /tmp/pip-req-build-1dcl1ksu/source/lib/include/gpu_cuda.h 108

Originally posted by @jxxiaoshaoye in #533 (comment)

Error while running dp_ipi

I am using conda installed depend-kit v1.1.2.

While using ipi with dp_ipi,
'dp_ipi water.json & ' gives this message and stop running:

Not found: Op type not registered 'Descrpt' in binary running on ###. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

Can anyone help, thanks.

incompatible with tensorflow 2.3

When I built C++ interface with tensorflow 2.3, I got the following error:

2020-10-11T06:00:43.0379869Z -- The C compiler identification is GNU 5.4.0
2020-10-11T06:00:43.1268040Z -- The CXX compiler identification is GNU 5.4.0
2020-10-11T06:00:43.1388043Z -- Detecting C compiler ABI info
2020-10-11T06:00:43.2199903Z -- Detecting C compiler ABI info - done
2020-10-11T06:00:43.2427331Z -- Check for working C compiler: $BUILD_PREFIX/bin/x86_64-conda_cos6-linux-gnu-cc - skipped
2020-10-11T06:00:43.2433092Z -- Detecting C compile features
2020-10-11T06:00:43.2438336Z -- Detecting C compile features - done
2020-10-11T06:00:43.2503479Z -- Detecting CXX compiler ABI info
2020-10-11T06:00:43.3555135Z -- Detecting CXX compiler ABI info - done
2020-10-11T06:00:43.3765219Z -- Check for working CXX compiler: $BUILD_PREFIX/bin/x86_64-conda_cos6-linux-gnu-c++ - skipped
2020-10-11T06:00:43.3771610Z -- Detecting CXX compile features
2020-10-11T06:00:43.3781996Z -- Detecting CXX compile features - done
2020-10-11T06:00:43.3828492Z -- Found Git: $BUILD_PREFIX/bin/git (found version "2.23.0") 
2020-10-11T06:00:43.3973163Z -- Enabled cpp interface build, looking for tensorflow_cc and tensorflow_framework
2020-10-11T06:00:43.3988778Z -- Found TensorFlow: $PREFIX/include;$PREFIX/include, $PREFIX/lib/libtensorflow_cc.so;$PREFIX/lib/libtensorflow_framework.so, $PREFIX/lib/libtensorflow_framework.so  in $PREFIX;$PREFIX/../tensorflow_core;$PREFIX;$PREFIX/../tensorflow_core;/usr/;/usr/local/
2020-10-11T06:00:43.4001209Z -- Looking for pthread.h
2020-10-11T06:00:43.4771856Z -- Looking for pthread.h - found
2020-10-11T06:00:43.4772494Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
2020-10-11T06:00:43.5583059Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
2020-10-11T06:00:43.5588795Z -- Looking for pthread_create in pthreads
2020-10-11T06:00:43.6227628Z -- Looking for pthread_create in pthreads - not found
2020-10-11T06:00:43.6228865Z -- Looking for pthread_create in pthread
2020-10-11T06:00:43.7007362Z -- Looking for pthread_create in pthread - found
2020-10-11T06:00:43.7021906Z -- Found Threads: TRUE  
2020-10-11T06:00:44.2099532Z -- Automatically determined OP_CXX_ABI=1 
2020-10-11T06:00:44.2104652Z -- Set GLIBCXX_USE_CXX_ABI=1 when compiling ops
2020-10-11T06:00:44.2352511Z -- Found CUDA in /usr/local/cuda, build nv GPU support
2020-10-11T06:00:44.5278638Z -- Found OpenMP_C: -fopenmp (found version "4.0") 
2020-10-11T06:00:44.6294240Z -- Found OpenMP_CXX: -fopenmp (found version "4.0") 
2020-10-11T06:00:44.6304501Z -- Found OpenMP: TRUE (found version "4.0")  
2020-10-11T06:00:44.6495682Z -- Found CUDA: /usr/local/cuda (found version "10.1") 
2020-10-11T06:00:44.6516270Z -- CUDA major version is 10
2020-10-11T06:00:44.6923192Z -- Configuring done
2020-10-11T06:00:44.7279016Z -- Generating done
2020-10-11T06:00:44.7307389Z -- Build files have been written to: $SRC_DIR/source/build
2020-10-11T06:00:44.7576359Z [  2%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_gelu.cu.o
2020-10-11T06:00:45.0771365Z Scanning dependencies of target deepmd
2020-10-11T06:00:45.1164159Z [  5%] Building CXX object lib/CMakeFiles/deepmd.dir/src/DataModifier.cc.o
2020-10-11T06:00:49.5552365Z In file included from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/tensor.h:26:0,
2020-10-11T06:00:49.5553436Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/public/session.h:24,
2020-10-11T06:00:49.5554143Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/common.h:3,
2020-10-11T06:00:49.5554783Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/NNPInter.h:3,
2020-10-11T06:00:49.5555305Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/DataModifier.h:3,
2020-10-11T06:00:49.5555757Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/src/DataModifier.cc:1:
2020-10-11T06:00:49.5557735Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/types.h: In instantiation of 'struct tensorflow::DataTypeToEnum<std::__cxx11::basic_string<char> >':
2020-10-11T06:00:49.5561720Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/tensor.h:824:45:   required from 'typename tensorflow::TTypes<T, NDIMS>::Tensor tensorflow::Tensor::shaped(tensorflow::gtl::ArraySlice<long long int>) [with T = std::__cxx11::basic_string<char>; long unsigned int NDIMS = 1ul; typename tensorflow::TTypes<T, NDIMS>::Tensor = Eigen::TensorMap<Eigen::Tensor<std::__cxx11::basic_string<char>, 1, 1, long int>, 16, Eigen::MakePointer>; tensorflow::gtl::ArraySlice<long long int> = absl::lts_2020_02_25::Span<const long long int>]'
2020-10-11T06:00:49.5565815Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/tensor.h:455:24:   required from 'typename tensorflow::TTypes<T>::Flat tensorflow::Tensor::flat() [with T = std::__cxx11::basic_string<char>; typename tensorflow::TTypes<T>::Flat = Eigen::TensorMap<Eigen::Tensor<std::__cxx11::basic_string<char>, 1, 1, long int>, 16, Eigen::MakePointer>]'
2020-10-11T06:00:49.5569248Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/common.h:167:35:   required from 'VT session_get_scalar(tensorflow::Session*, std::__cxx11::string, std::__cxx11::string) [with VT = std::__cxx11::basic_string<char>; std::__cxx11::string = std::__cxx11::basic_string<char>]'
2020-10-11T06:00:49.5571217Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/src/DataModifier.cc:51:32:   required from 'VT DataModifier::get_scalar(const string&) const [with VT = std::__cxx11::basic_string<char>; std::__cxx11::string = std::__cxx11::basic_string<char>]'
2020-10-11T06:00:49.5572009Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/src/DataModifier.cc:40:58:   required from here
2020-10-11T06:00:49.5572930Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/types.h:361:3: error: static assertion failed: Specified Data Type not supported
2020-10-11T06:00:49.5573658Z    static_assert(IsValidDataType<T>::value, "Specified Data Type not supported");
2020-10-11T06:00:49.5574037Z    ^
2020-10-11T06:00:49.5575486Z In file included from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/public/session.h:24:0,
2020-10-11T06:00:49.5576494Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/common.h:3,
2020-10-11T06:00:49.5576998Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/NNPInter.h:3,
2020-10-11T06:00:49.5577505Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/DataModifier.h:3,
2020-10-11T06:00:49.5578381Z                  from /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/src/DataModifier.cc:1:
2020-10-11T06:00:49.5580941Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/tensor.h: In instantiation of 'typename tensorflow::TTypes<T, NDIMS>::Tensor tensorflow::Tensor::shaped(tensorflow::gtl::ArraySlice<long long int>) [with T = std::__cxx11::basic_string<char>; long unsigned int NDIMS = 1ul; typename tensorflow::TTypes<T, NDIMS>::Tensor = Eigen::TensorMap<Eigen::Tensor<std::__cxx11::basic_string<char>, 1, 1, long int>, 16, Eigen::MakePointer>; tensorflow::gtl::ArraySlice<long long int> = absl::lts_2020_02_25::Span<const long long int>]':
2020-10-11T06:00:49.5584999Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/tensor.h:455:24:   required from 'typename tensorflow::TTypes<T>::Flat tensorflow::Tensor::flat() [with T = std::__cxx11::basic_string<char>; typename tensorflow::TTypes<T>::Flat = Eigen::TensorMap<Eigen::Tensor<std::__cxx11::basic_string<char>, 1, 1, long int>, 16, Eigen::MakePointer>]'
2020-10-11T06:00:49.5587220Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/include/common.h:167:35:   required from 'VT session_get_scalar(tensorflow::Session*, std::__cxx11::string, std::__cxx11::string) [with VT = std::__cxx11::basic_string<char>; std::__cxx11::string = std::__cxx11::basic_string<char>]'
2020-10-11T06:00:49.5588729Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/src/DataModifier.cc:51:32:   required from 'VT DataModifier::get_scalar(const string&) const [with VT = std::__cxx11::basic_string<char>; std::__cxx11::string = std::__cxx11::basic_string<char>]'
2020-10-11T06:00:49.5589712Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/work/source/lib/src/DataModifier.cc:40:58:   required from here
2020-10-11T06:00:49.5591066Z /home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/tensorflow/core/framework/tensor.h:824:45: error: 'v' is not a member of 'tensorflow::DataTypeToEnum<std::__cxx11::basic_string<char> >'
2020-10-11T06:00:49.5591829Z    CheckTypeAndIsAligned(DataTypeToEnum<T>::v());
2020-10-11T06:00:49.5592086Z                                              ^
2020-10-11T06:00:49.8474306Z [  8%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_descrpt_se_a.cu.o
2020-10-11T06:00:49.9961549Z cc1plus: warning: unrecognized command line option '-Wno-ignored-attributes'
2020-10-11T06:00:50.0164366Z make[2]: *** [lib/CMakeFiles/deepmd.dir/src/DataModifier.cc.o] Error 1
2020-10-11T06:00:50.0178100Z make[1]: *** [lib/CMakeFiles/deepmd.dir/all] Error 2
2020-10-11T06:00:50.0188577Z make[1]: *** Waiting for unfinished jobs....
2020-10-11T06:00:50.0265472Z [ 10%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_descrpt_se_r.cu.o
2020-10-11T06:01:04.1798940Z [ 13%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_prod_force_se_a.cu.o
2020-10-11T06:01:07.9069802Z [ 16%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_prod_force_se_r.cu.o
2020-10-11T06:01:11.6161647Z [ 18%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_prod_virial_se_a.cu.o
2020-10-11T06:01:12.1142464Z [ 21%] Building NVCC (Device) object op/cuda/CMakeFiles/deepmd_op_cuda.dir/deepmd_op_cuda_generated_prod_virial_se_r.cu.o
2020-10-11T06:01:19.5066005Z Scanning dependencies of target deepmd_op_cuda
2020-10-11T06:01:19.5148350Z [ 24%] Linking CXX shared library libdeepmd_op_cuda.so
2020-10-11T06:01:19.6043820Z Warning: Unused direct dependencies:
2020-10-11T06:01:19.6044114Z 	
2020-10-11T06:01:19.6044272Z 	/lib64/libm.so.6
2020-10-11T06:01:19.6044589Z 	/home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_build_env/lib/libgomp.so.1
2020-10-11T06:01:19.6045093Z 	/home/conda/feedstock_root/build_artifacts/libdeepmd_1602395736026/_build_env/lib/libgcc_s.so.1
2020-10-11T06:01:19.6128182Z [ 24%] Built target deepmd_op_cuda
2020-10-11T06:01:19.6134783Z make: *** [all] Error 2

See here if you want to ask questions

Please go to https://github.com/deepmodeling/deepmd-kit/discussions to ask questions. Before asking questions, you can

search the previous discussions
check the document, especially training parameters

Please provide necessary information including the version of software and installation way, input file, running commands, error log , etc., AS DETAILED AS POSSIBLE to help locate and reproduce your problem.

dp freeze failed on the api branch

DeepMD-kit: branch api
Python: 3.6
TF: 1.8

To reproduce, firstly change the stop_batch in /path/to/deepmd-kit/deepmd-kit/examples/water/train/water_se_a.json to 1000, then

cd /path/to/deepmd-kit/examples/water/train
dp train water_se_a.json
dp freeze

I got the following error:

Traceback (most recent call last):
  File "/xxxxx/venvs/py3.6-tf1.8/bin/dp", line 11, in <module>
    load_entry_point('deepmd-kit==1.2.3.dev295+g2d9c702.d20210222', 'console_scripts', 'dp')()
  File "/xxxxx/venvs/py3.6-tf1.8/lib/python3.6/site-packages/deepmd_kit-1.2.3.dev295+g2d9c702.d20210222-py3.6-linux-x86_64.egg/deepmd/main.py", line 267, in main
    set_log_handles(args.log_level, Path(args.log_path))
  File "/usr/lib64/python3.6/pathlib.py", line 999, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/lib64/python3.6/pathlib.py", line 656, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib64/python3.6/pathlib.py", line 640, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

The bug should be caused by the improper treatment of the default log file. The set_log_handles(args.log_level, Path(args.log_path)) does not accept args.log_path == None .

coord.xyz and deepmd-kit for MacOS

Can you provide the an example of coord.xyz for ipi example? Such as water model as you used in paper.
Besides, deepmd-kit does not support running on macos, aren't they?
I checked the code on Model.py and CMakeLists, you just load the *.so for op_library, but on MacOS, it will build a *.dylib files but .so files.
Will you provide a version of deepmd-kit on MacOS for test only?

Could build plumed in lammps when install deepmd-kit

Hi,
I want to use plumed to do metadynamics in lammps when using deep potential. But currently, the lammps compiled by deepmd-kit does not include plumed. It will so kind that developer will add plumed in lammps.
With my best regards,
Junbo

	parser_log.add_argument(
	"-v",
	"--verbose",
	default=2,
	action="count",
	dest="log_level",
	help="set verbosity level 0 - 3, 0=ERROR, 1(-v)=WARNING, 2(-vv)=INFO "
	"and 3(-vvv)=DEBUG",
	)

deepmodeling / deepmd-kit Goto Github PK

deepmd-kit's People

Contributors

Stargazers

Watchers

Forkers

deepmd-kit's Issues

Recommend Projects

Recommend Topics

Recommend Org