Giter Site home page Giter Site logo

whuluojiateam / luojianet Goto Github PK

View Code? Open in Web Editor NEW
184.0 9.0 36.0 244 MB

Home Page: http://58.48.42.237/luojiaNet/

License: Apache License 2.0

CMake 0.67% Shell 0.67% Python 29.22% C++ 58.84% Cuda 2.07% C 6.51% Batchfile 0.02% Assembly 1.05% Java 0.80% Objective-C 0.01% Objective-C++ 0.01% QML 0.02% Terra 0.01% HTML 0.01% Dockerfile 0.11% Julia 0.01%

luojianet's Introduction

LuoJiaNet介绍

遥感专用机器学习框架LuoJiaNET,由武汉大学LuoJiaNET框架团队与华为MindSpore框架研究小组联合打造而成, 是遥感领域首个国产化自主可控的遥感专用机器学习框架,针对遥感数据像幅尺寸大、数据通道多、尺度变化大等特性, 具备内存可扩展、尺度通道灵活创建、数据通道自主优选、框架与数据协同处理的特点。可兼容已有深度学习框架, 并提供用户友好的、可拖拽的交互式网络结构搭建界面的方法。能屏蔽不同硬件设备间差异,同时管理多样化的遥感影像样本库LuoJiaSET, 实现遥多源感影像样本的高效存储管理。

LuoJiaNET同时与国产人工智能硬件NPU深度融合,可同时支持CPU、GPU、NPU等硬件资源,使智能计算软硬件充分协同,形成融合探测机理与地学知识的统一计算图表达、编译优化、图算融合、 自动混合并行的新一代遥感智能解译框架,可进行遥感样本自动提纯与增广,充分融合探测机理与地学知识。

昇腾全栈

  1. 请参照LuoJiaNet安装手册 安装whl包或源码编译安装

  2. 执行以下命令,验证安装结果。

    import numpy as np
    import luojianet_ms.context as context
    import luojianet_ms.nn as nn
    from luojianet_ms import Tensor
    from luojianet_ms.ops import operations as P
    
    context.set_context(mode=context.GRAPH_MODE, device_target="CPU")
    
    class Mul(nn.Module):
        def __init__(self):
            super(Mul, self).__init__()
            self.mul = P.Mul()
    
        def forward(self, x, y):
            return self.mul(x, y)
    
    x = Tensor(np.array([1.0, 2.0, 3.0]).astype(np.float32))
    y = Tensor(np.array([4.0, 5.0, 6.0]).astype(np.float32))
    
    mul = Mul()
    print(mul(x, y))
    [ 4. 10. 18.]
    
  3. LuoJiaNET安装手册中,相应的源码见tutorial

版本说明

版本说明请参阅RELEASE

许可证

Apache License 2.0

luojianet's People

Contributors

adhuan avatar luojiateam avatar mizhangwhuer avatar zhangzhan-whu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

luojianet's Issues

Mnist dataset test error

Compling with cuda 11.1 and cudnn 8 on unbuntu 18.04, run

example/quick_start.py

got error of

RuntimeError: Unexpected error. Invalid mnist file, the image number of /home/work/luojia/luojianet/datasets/MNIST_Data/train/train-images-idx3-ubyte should be 2051, but got 1010792557
Line of code : 119
File         : /home/work/luojia/luojianet/luojianet_ms/ccsrc/minddata/dataset/engine/datasetops/source/mnist_op.cc

It looks like dataset loading problem.

[BUG]使用model_zoo里训练resnet时,报错RuntimeError: Unexpected error. Failed to open file: ./WHU-RS19/Parking/parking_28.jpg

============== Starting Training ==============
[WARNING] ME(19208:139772197278208,MainProcess):2022-10-27-18:32:37.851.63 [luojianet_ms/train/model.py:573] The CPU cannot support dataset sink mode currently.So the training process will be performed with dataset not sink.
[WARNING] DEBUG(19208,7f1f402b1200,python):2022-10-27-18:32:37.123.286 [luojianet_ms/ccsrc/debug/debugger/debugger.cc:96] Debugger] Not enabling debugger. Debugger does not support CPU.
[ERROR] MD(19208,7f1e8fbba700,python):2022-10-27-18:32:37.156.100 [luojianet_ms/ccsrc/minddata/dataset/util/task_manager.cc:219] InterruptMaster] Task is terminated with err msg(more detail in info level log):Unexpected error. Failed to open file: ./WHU-RS19/Parking/parking_28.jpg
Line of code : 277
File : /home/vhr/tll/download/luojianet/luojianet_ms/ccsrc/minddata/dataset/core/tensor.cc

[WARNING] ME(19709:140164741772224,MainProcess):2022-10-27-18:32:38.345.906 [luojianet_ms/common/_decorator.py:34] 'DropoutDoMask' is deprecated from version 1.5 and will be removed in a future version, use 'ops.Dropout' instead.
The new bprop mindir files has been generated in the path "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/ops/_grad/../bprop_mindir", copy the *.mindir to your luojianet_ms path or PYTHONPATH if necessary.
epoch: 1 step: 1, loss is 2.9637277126312256
Traceback (most recent call last):
File "/home/y00572966/luojianet_test/luojianet/model_zoo/rs_scene_classification/ResNet/train.py", line 77, in
model.train(config.epoch_size, dataset, callbacks=[time_cb,ckpoint_cb,LossMonitor()])
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/train/model.py", line 808, in train
self._train(epoch,
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/train/model.py", line 112, in wrapper
func(self, *args, **kwargs)
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/train/model.py", line 575, in _train
self._train_process(epoch, train_dataset, list_callback, cb_params)
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/train/model.py", line 693, in _train_process
for next_element in dataset_helper:
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/train/dataset_helper.py", line 523, in next
data = self.iter.next()
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/dataset/engine/iterators.py", line 144, in next
data = self._get_next()
File "/home/riemann/anaconda3/envs/luojianet/lib/python3.9/site-packages/luojianet_ms/dataset/engine/iterators.py", line 222, in _get_next
return [self._transform_tensor(t) for t in self._iterator.GetNextAsList()]
RuntimeError: Unexpected error. Failed to open file: ./WHU-RS19/Parking/parking_28.jpg
Line of code : 277
File : /home/vhr/tll/download/luojianet/luojianet_ms/ccsrc/minddata/dataset/core/tensor.cc

luojianet针对大幅面遥感影像解译的模块,具体怎么使用的

luojianet团队您好:
有幸接触到luojianet这样一个遥感专用机器学习框架,我对其中的大幅面遥感影像解译比较感兴趣。查阅了相关资料,还是不太清楚在使用时,具体应该怎么做。具体问题是:
在模型训练的时候,是将整幅影像扔进去训练吗,不需要裁切成小图了是吗?
扔进去之后,相较于常规的深度学习框架中的模型训练,会有哪些不一样的处理呢?
希望您能帮我解答疑惑

编译GDAL,执行中出现recipe for target 'autotest/cpp/libvsipreload.so' failed问题,如何解决

系统环境介绍:
ubuntu18.04, cuda11.1, cudnn8.8.0,python3.7.6, gcc7.5.0
按luojianet官网安装流程进行编译安装。
在安装过程中,进行到此流程。
编译GDAL, 执行bash build_gdal.sh出现如下错误:
[ 93%] Built target gnmanalyse
[ 93%] Building CXX object apps/CMakeFiles/gnmmanage.dir/gnmmanage.cpp.o
[ 93%] Linking CXX executable gnmmanage
[ 93%] Built target gnmmanage
[ 93%] Building CXX object apps/CMakeFiles/test_ogrsf.dir/test_ogrsf.cpp.o
[ 93%] Linking CXX executable test_ogrsf
[ 93%] Built target test_ogrsf
[ 93%] Building CXX object autotest/cpp/CMakeFiles/vsipreload.dir///port/vsipreload.cpp.o
[ 93%] Linking CXX shared module libvsipreload.so
CMakeFiles/vsipreload.dir///port/vsipreload.cpp.o: In function myinit()': vsipreload.cpp:(.text+0x92): undefined reference to dlsym'
vsipreload.cpp:(.text+0xac): undefined reference to dlsym' vsipreload.cpp:(.text+0xc6): undefined reference to dlsym'
vsipreload.cpp:(.text+0xe0): undefined reference to dlsym' vsipreload.cpp:(.text+0xfa): undefined reference to dlsym'
CMakeFiles/vsipreload.dir///port/vsipreload.cpp.o:vsipreload.cpp:(.text+0x114): more undefined references to `dlsym' follow
collect2: error: ld returned 1 exit status
autotest/cpp/CMakeFiles/vsipreload.dir/build.make:97: recipe for target 'autotest/cpp/libvsipreload.so' failed
make[2]: *** [autotest/cpp/libvsipreload.so] Error 1
CMakeFiles/Makefile2:9406: recipe for target 'autotest/cpp/CMakeFiles/vsipreload.dir/all' failed
make[1]: *** [autotest/cpp/CMakeFiles/vsipreload.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2

不知道什么原因造成的,希望luojianet团队人员帮忙解答一下,谢谢!!!

源码编译遇到的错误

按照官方教程一步步来的,1.0.6和master版本都试过了,都是同样的错误,到下面这一步报错,貌似跟openmpi有关

......
......
checking for netinet/tcp.h... yes
checking for struct sockaddr_in... yes
checking if --with-cuda is set... not set (--with-cuda=)
./configure: line 13028: syntax error near unexpected token )' ./configure: line 13028: )'
CMake Error at cmake/utils.cmake:179 (message):
error! when ./configure;CXXFLAGS=-D_FORTIFY_SOURCE=2
-O2;--prefix=/home/rs/Codes/luojianet/build/luojianet_ms/.mslib/ompi_4.0.3_653b828d1fcd585d80aa35947b9dfeae
in /home/rs/Codes/luojianet/build/luojianet_ms/_deps/ompi-src
Call Stack (most recent call first):
cmake/utils.cmake:393 (__exec_cmd)
cmake/external_libs/ompi.cmake:10 (luojianet_ms_add_pkg)
cmake/mind_expression.cmake:43 (include)
CMakeLists.txt:56 (include)

-- Configuring incomplete, errors occurred!
See also "/home/rs/Codes/luojianet/build/luojianet_ms/CMakeFiles/CMakeOutput.log".
See also "/home/rs/Codes/luojianet/build/luojianet_ms/CMakeFiles/CMakeError.log".

[BUG]model zoo中deeplabv3训练中出现RuntimeError: luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:199 ImportBpropFromMindIR] The bprop mindir files are not up to da te.

deeplabv3模型训练时出现以下错误:
[WARNING] ME(2199:140703017723712,MainProcess):2022-10-28-16:29:18.609.789 [luojianet_ms/common/_decorator.py:38] 'TensorAdd' is d eprecated from version 1.1 and will be removed in a future version, use 'Add' instead.
Total Epoch:200 Training num:10000 Validation num:1000
INFO:log:Total Epoch:200 Training num:10000 Validation num:1000
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.388 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:172] C heckBpropHash] The bprop mindir files are not up to date. Please run the /usr/local/python3.7.5/lib/python3.7/site-packages/luojia net_ms/ops/_grad/../bprop_mindir/generate_mindir.py to generate new mindir files.
bprop_fg hash: 3d4ca3af3054d32fe54a557e457674558c4179705eccb4c3dae775993ba1a76a
bprop hash list:

[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.420 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 5dbd9e9d72c7b3227bfe7cc41cc4311526259c3297cb24ab2d4d7aa122c901e6
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.430 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] d15b0d6de0f996dcacce657aaec10a3a526e8567314332c8628d9cbc7bc26a03
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.438 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] c252daaf204cd7d19bad3054c4734cd874f00b9a0ad6d8b3b01ffe56bf1f0b2f
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.448 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 15cf11818e324bb31ab387f78edcce13902e54eb561b8406f4aac743e626cc2a
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.455 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] da9b8c2c77b9335db3c701e0a27fc1b3514c7abf3614b2b477cf94d2420d770e
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.461 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] dcbcbe9c73bbeb292f97cefef13d1e66d4df58dc4d0f4d61b3b3c0c48e61f014
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.466 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 300e3a12184504bf922c740d3c92af822c2517aef91dd32c009b58266169f93d
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.473 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 8a2f1d24f72f3e2821cdd873682fda4b8322905ab83ebe12885e2c9aec2957d0
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.478 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 9c4e884cbc35fd59fd028a19bc7f9c9510d0b05d90ad3de2939e2498a66a6ade
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.484 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] fd36050ed51ca754e75ba8ab0e59cf7aefc9aaf4425e607aa1d6115a69d64933
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.488 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 55e08f68aa4e972c4ebb7fc191119b682d6347830d383020968203382a8b59cc
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.493 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] ec3beb004dc8c56197a40e8b91c092539a103b8e7e7be62edaf0e83671e1966e
[ERROR] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.499 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:177] C heckBpropHash] 2930c5739d67414f42123aec886e845ed9e11e82b7a792124f3a04f57e17b5c4
[CRITICAL] OPTIMIZER(2199,7ff7f963e740,python3):2022-10-28-16:29:25.897.507 [luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:199 ] ImportBpropFromMindIR] The bprop mindir files are not up to date.
[WARNING] VM(2199,7ff768fb9700,python3):2022-10-28-16:29:25.897.759 [luojianet_ms/ccsrc/runtime/pynative/op_task.h:106] Run] Op bu ild failed, no need to launch.
Traceback (most recent call last):
File "train.py", line 67, in
train_net()
File "train.py", line 63, in train_net
train(param=param, model=model, train_dataset=train_dataset, valid_dataset=val_dataset)
File "/code/utils/deeplearning_dp.py", line 121, in train
train_net_step(data, label)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 613, in call
raise err
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 610, in call
output = self._run_construct(cast_inputs, kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 430, in _run_construct
output = self.forward(*cast_inputs, **kwargs)
File "/code/utils/deeplearning_dp.py", line 50, in forward
loss = self.network(data, label)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 613, in call
raise err
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 610, in call
output = self._run_construct(cast_inputs, kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 430, in _run_construct
output = self.forward(*cast_inputs, **kwargs)
File "/code/utils/deeplearning_dp.py", line 25, in forward
out = self.backbone(data)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 613, in call
raise err
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 610, in call
output = self._run_construct(cast_inputs, kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 430, in _run_construct
output = self.forward(*cast_inputs, **kwargs)
File "/code/nets/init.py", line 17, in forward
x =self.model(x)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 613, in call
raise err
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 610, in call
output = self._run_construct(cast_inputs, kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 430, in _run_construct
output = self.forward(*cast_inputs, **kwargs)
File "/code/nets/deeplabv3.py", line 204, in forward
out = self.resnet(x)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 613, in call
raise err
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 610, in call
output = self._run_construct(cast_inputs, kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 430, in _run_construct
output = self.forward(*cast_inputs, **kwargs)
File "/code/nets/deeplabv3.py", line 64, in forward
out = self.relu(out)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 613, in call
raise err
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 610, in call
output = self._run_construct(cast_inputs, kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/cell.py", line 430, in _run_construct
output = self.forward(*cast_inputs, **kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/nn/layer/activation.py", line 299, in forward
return self.relu(x)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/ops/primitive.py", line 295, in call
return _run_op(self, self.name, args)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/common/api.py", line 91, in wrapper
results = fn(*arg, **kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/luojianet_ms/ops/primitive.py", line 755, in _run_op
output = real_run_op(obj, op_name, args)
RuntimeError: luojianet_ms/ccsrc/frontend/optimizer/ad/kprim.cc:199 ImportBpropFromMindIR] The bprop mindir files are not up to date.
以上出现的错误,是什么原因造成的呢,希望给予帮助,非常感谢!!!

Expected Behavior

Current Behavior

Context

Steps to Reproduce

Your Environment

NPU(Ascend)服务器编译LuoJiaNET V1.0.5失败

LuoJiaTeam您好

我在华为Ascend服务器源码安装LuoJiaNET V1.0.5失败,希望LuoJiaTeam能协助分析安装失败的原因。
安装步骤:参考GPU源码编译指导。先安装环境依赖,接着安装gdal(bash build_gdal_npu.sh),最后安装LuoJiaNET(bash build_npu.sh -e ascend -j 4)。

安装LuoJiaNET过程遇到如下问题。

问题1:采用NPU训练脚本,但编译过程会寻找CUDA
CMake Error at /usr/local/share/cmake-3.21/Modules/FindCUDA.cmake:859 (message): Specify CUDA_TOOLKIT_ROOT_DIR
规避方法: cmake/options.cmake中的option(ENABLE_GPU "Enable gpu" ON)修改为option(ENABLE_GPU "Enable gpu" OFF)

问题2:找不到头文件。经查,未在对应目录下找到指定头文件
/root/ai_framework/luojianet/graphengine/ge/ge_runtime/model_runner.cc:18:10: fatal error: framework/ge_runtime/model_runner.h: No such file or directory #include "framework/ge_runtime/model_runner.h
/root/ai_framework/luojianet/graphengine/metadef/inc/common/opskernel/ops_kernel_builder.h:25:10: fatal error: proto/task.pb.h: No such file or directory #include "proto/task.pb.h

问题3:使用的变量未定义
/root/ai_framework/luojianet/graphengine/ge/common/dump/dump_properties.cc:146:7: error: ‘mmIsDir’ was not declared in this scope
规避方法:添加 #include "mmap/mmpa_api.h"头文件。

感谢答复。

关于大幅面影像处理的问题

在官方教程中http://58.48.42.237/luojiaNet/tutorial/experts有提到大幅面处理的相关方法。
请问大幅面影像四叉树处理和大幅面影像算子分解是什么关系呢?
在代码中我只看到了QuadSearch_GID的整体流程,没有看到大幅面影像算子分解的具体代码。所以大幅面影像算子分解是四叉树处理中的一个部分吗?还是说暂时还没有大幅面影像算子分解的具体代码(教程中只有代码片段,不能直接运行)

[BUG]数据集迭代器的内存泄露问题

数据集的迭代器不能在For循环中创建,否则会带来不正常的内存回收问题,导致主存占用持续增加。
问题发现于LuojiaNet modelzoo中rs_semantic_segmentation/DeepLabv3_ISPRS_Vaihingen。代码中没有使用model.train()封装方式,而是自己编写epoch循环,使用TrainOneStepCell方式来训练网络,这时就需要手动创建数据集的迭代器来读取数据。
但如果在Epoch的For循环中每次创建迭代器,如下所示,则会有内存泄漏问题,主存会随着Epoch的增加而增长。

for epoch in range(epochs):
    for batch_samples in train_loader.create_dict_iterator():
        pass

暂时是通过在For循环外定义迭代器解决此问题,如下所示。

train_iterator = train_loader.create_dict_iterator()
for epoch in range(epochs):
    for batch_samples in train_iterator :
        pass

luojianet并行训练效率提升问题

luojianet团队你们好!
在学习luojianet过程中,我们对分布式并行训练比较感兴趣。我们尝试了在单机多GPU环境下进行遥感影像语义分割的分布式并行训练,使用的模型是UNET语义分割模型。并且进行了一组对比实验,即不使用并行模式,直接在单GPU环境下进行训练。结果发现这两种训练方式耗时相差无几,不知道是什么原因,是因为多卡之间通信也有耗时吗?

还有,luojianet提供的这种并行训练它支持多机多GPU模式吗?

此外,当前Pytorch框架下也配备有分布式并行训练库,不知道在效率,精度方面,luojianet并行与Pytorch并行相比有什么区别。

LuojiaNET如何处理大画幅遥感图片

LuojiaNET团队您好

我从龚健雅院士的《龚健雅院士:遥感大数据与智能解译》报告初次了解到LuojiaNET框架。LuojiaNET针对大画幅遥感图像整体载入进行训练和推理做了框架层面优化。具介绍包括全局和局部影像关联、算子并行两种策略。现在我对模型如何进行大画幅训练和推理依然有不少问题,希望LuojiaNET团队可以给与解答。

  • 大画幅遥感图像训练中利用到框架提供的大画幅加载功能,可否提供使用样例作为参考。
  • 全局和局部影像关联策略是否只在模型加载大画幅图像时生效,不参与隐藏层大画幅处理。
  • 算子并行策略是否是针对特征自动进行分割,类似数据并行。只是一般数据并行为分割Batch,而这里是对特征进行分割。
  • 全局和局部影响管理、算子并行策略分别在源代码哪些目录实现。

感激回答。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.