Giter Site home page Giter Site logo

Comments (6)

Gword avatar Gword commented on August 23, 2024

可能是编译命令里路径出了一些问题,您能把截图发全一些吗?

from jnerf.

ZhangXiaoXuan2019 avatar ZhangXiaoXuan2019 commented on August 23, 2024

您好,感谢您及时地回复我们,以下是上面截图中更为完整的报错信息,我们对[Reason]后的信息进行了必要的换行

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py", line 127, in local_model_render
img, img_tar = self.render_img(client_s_idx) # in dataset, the image index is the client index

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py", line 167, in render_img
pos, dir = self.sampler.sample(img_ids, rays_o, rays_d)

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/density_grid_sampler.py", line 137, in sample
coords, rays_index, rays_numsteps, rays_numsteps_counter = self.rays_sampler.execute(

File "/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/ray_sampler.py", line 34, in execute
coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code(

RuntimeError: �[38;5;1m[f 0702 11:29:38.938348 28 executor.cc:665]

Execute fused operator(1/2) failed.

[Input]: float32[1024,3,], float32[1024,3,1,], uint8[1310720,], float32[150,11,], int32[640000,], float32[150,4,3,], float32[1048576,7,], int32[1024,1,], int32[1024,2,], int32[2,],

[Output]: float32[1048576,7,], int32[1024,1,], int32[1024,2,], int32[2,],

tools/run_fednerf.py:48 <>
tools/run_fednerf.py:41


/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/fedavg_api.py:116
/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/client.py:28
/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py:127 <local_model_render>
/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/FedML/fedml_api/standalone/fedavg/my_model_trainer.py:167 <render_img>
/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/density_grid_sampler.py:137
/home/xiaoxuan/PythonWorks/JNeRF_Fed_KD/python/jnerf/models/samplers/density_grid_sampler/ray_sampler.py:34

[Reason]: �[38;5;1m[f 0702 11:29:38.938163 28 cache_compile.cc:295] Check failed: found Something wrong... Could you please report this issue?
Include file pcg32.h not found in [
/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/src,/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8,
/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8,/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include,
/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc,/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70,
/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc,]
Commands:
"/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc" "/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70/jit/code__IN_SIZE_6__in0_dim_2__in0_type_float32__in1_dim_3__in1_type_float32__in2_dim_1__in2____hash_a7c7342d82088594_op.cc"
-std=c++14
-Xcompiler
-fPIC
-Xcompiler -march=native
-Xcompiler -fdiagnostics-color=always
-lstdc++ -ldl -shared
-I"/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/src"
-I/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8
-I/home/xiaoxuan/miniconda3/envs/JNeRF/include/python3.8 -DHAS_CUDA -DIS_CUDA
-I"/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include"
-I"/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc"
-lcudart -L"/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64"
-Xlinker -rpath="/home/xiaoxuan/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64"
-I"/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70"
-L"/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70"
-Xlinker -rpath="/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70"
-L"/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default"
-Xlinker -rpath="/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default"
-l:"jit_utils_core.cpython-38-x86_64-linux-gnu".so
-l:"jittor_core.cpython-38-x86_64-linux-gnu".so
-x cu --cudart=shared -ccbin="/usr/bin/g++" --use_fast_math -w
-I"/home/xiaoxuan/miniconda3/envs/JNeRF/lib/python3.8/site-packages/jittor/extern/cuda/inc"
-arch=compute_70 -code=sm_70 -o "/home/xiaoxuan/.cache/jittor/jt1.3.4/g++9.4.0/py3.8.13/Linux-5.8.0-50x17/IntelRXeonRGolxda/default/cu11.2.152_sm_70/jit/code__IN_SIZE_6__in0_dim_2__in0_type_float32__in1_dim_3__in1_type_float32__in2_dim_1__in2____hash_a7c7342d82088594_op.so"�[m�[m

from jnerf.

Gword avatar Gword commented on August 23, 2024

看编译命令确实没有pcg32.h的路径,该编译选项是在 coords_out.compile_options = proj_options设置的,您可以在这句话后面打印一下coords_out.compile_options看是否有pcg32的路径。

from jnerf.

ZhangXiaoXuan2019 avatar ZhangXiaoXuan2019 commented on August 23, 2024

您好!感谢您的建议。
我们这里所提出的这个运行时错误,在且仅在调用render_test函数时才会出现,在模型训练中调用sampler采样时,并无运行时错误。
报错的
coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code(
一句,在 coords_out.compile_options = proj_options一句之前,所以报错时 coords_out.compile_options = proj_options一句并未执行,coords_out.compile_options 自然为空dict。
我们又尝试在报错语句前后都执行 coords_out.compile_options = proj_options,一句,错误未排除。
您看您是否还有其他建议?:)

from jnerf.

Gword avatar Gword commented on August 23, 2024

jittor是lazy执行的所以一般执行jt.code后并不会马上编译,可能您修改了哪里导致它没有lazy执行了,您能把ray_sampler.py的代码贴给我看一下吗?
以及您可以尝试在coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code之前添加rays_o.compile_options = proj_options通过输入设置编译选项。

from jnerf.

ZhangXiaoXuan2019 avatar ZhangXiaoXuan2019 commented on August 23, 2024

您好,感谢您的耐心解答。在”jt.code“一句之前添加rays_o.compile_options = proj_options不解决问题。
ray_sampler.py文件如下。事实上我们目前没有对ray_sampler.py文件,以及python/jnerf/下的任何文件做出修改。且采样器在训练时执行采样操作不报错,当且仅当在render_test函数中报错。
import os
import jittor as jt
from jittor import Function, exp, log
import numpy as np
import sys
from jnerf.ops.code_ops.global_vars import global_headers, proj_options
jt.flags.use_cuda = 1

class RaySampler(Function):
def init(self, density_grad_header, near_distance, cone_angle_constant, aabb_range=(-1.5, 2.5), n_rays_per_batch=4096, n_rays_step=1024):
self.density_grad_header = density_grad_header
self.aabb_range = aabb_range
self.near_distance = near_distance
self.n_rays_per_batch = n_rays_per_batch
self.num_elements = n_rays_per_batch*n_rays_step
self.cone_angle_constant = cone_angle_constant
self.path = os.path.join(os.path.dirname(file), '..', 'op_include')
self.ray_numstep_counter = jt.zeros([2], 'int32')

def execute(self, rays_o, rays_d, density_grid_bitfield, metadata, imgs_id, xforms):
    # input
    # rays_o n_rays_per_batch x 3
    # rays_d n_rays_per_batch x 3
    # bitfield 128 x 128 x 128 x 5 / 8
    # return
    # coords_out=[self.num_elements,7]
    # rays index : store rays is used ( not for -1)
    # rays_numsteps [0:step,1:base]
    jt.init.zero_(self.ray_numstep_counter)
    coords_out = jt.empty((self.num_elements, 7), 'float32')
    self.n_rays_per_batch=rays_o.shape[0]
    rays_index = jt.empty((self.n_rays_per_batch, 1), 'int32')
    rays_numsteps = jt.empty((self.n_rays_per_batch, 2), 'int32')
    coords_out, rays_index, rays_numsteps,self.ray_numstep_counter = jt.code(
        inputs=[rays_o, rays_d, density_grid_bitfield, metadata, imgs_id, xforms], outputs=[coords_out,rays_index,rays_numsteps,self.ray_numstep_counter], 
        cuda_header=global_headers+self.density_grad_header+'#include "ray_sampler.h"',  cuda_src=f"""
 
    @alias(rays_o, in0)
    @alias(rays_d, in1)
    @alias(density_grid_bitfield,in2)
    @alias(metadata,in3)
    @alias(imgs_index,in4)
    @alias(xforms_input,in5)
    @alias(ray_numstep_counter,out3)
    @alias(coords_out,out0)
    @alias(rays_index,out1)
    @alias(rays_numsteps,out2)

    cudaStream_t stream=0;
    cudaMemsetAsync(coords_out_p, 0, coords_out->size);
 
    const unsigned int num_elements=coords_out_shape0;
    const uint32_t n_rays=rays_o_shape0;
    BoundingBox m_aabb = BoundingBox(Eigen::Vector3f::Constant({self.aabb_range[0]}), Eigen::Vector3f::Constant({self.aabb_range[1]}));
    float near_distance = {self.near_distance};
    float cone_angle_constant={self.cone_angle_constant};  
    linear_kernel(rays_sampler,0,stream,
        n_rays, m_aabb, num_elements,(Vector3f*)rays_o_p,(Vector3f*)rays_d_p, (uint8_t*)density_grid_bitfield_p,cone_angle_constant,(TrainingImageMetadata *)metadata_p,(uint32_t*)imgs_index_p,
        (uint32_t*)ray_numstep_counter_p,((uint32_t*)ray_numstep_counter_p)+1,(uint32_t*)rays_index_p,(uint32_t*)rays_numsteps_p,PitchedPtr<NerfCoordinate>((NerfCoordinate*)coords_out_p, 1, 0, 0),(Eigen::Matrix<float, 3, 4>*) xforms_input_p,near_distance,rng);   

    rng.advance();

""")

    coords_out.compile_options = proj_options
    # print(coords_out.compile_options)
    coords_out.sync()
    coords_out = coords_out.detach()
    rays_index = rays_index.detach()
    rays_numsteps = rays_numsteps.detach()
    self.ray_numstep_counter = self.ray_numstep_counter.detach()
    samples=self.ray_numstep_counter[1].item()
    coords_out=coords_out[:samples]
    return coords_out, rays_index, rays_numsteps, self.ray_numstep_counter

def grad(self, grad_x):
    ##should not reach here
    assert(grad_x == None)
    assert(False)
    return None

此外,您提到的,导致jt.code一句没有lazy执行的原因,是可以通过某种方法去追溯的吗?
如果有,我们希望了解该种方法,因为日后我们可能要再基于JNeRF的实现做一些新的改动。每次出现问题都来麻烦JNeRF团队提供建议,也是不太现实的。

from jnerf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.