Comments (11)
In order to dive deep into the root cause, I recommend to narrow down the case with following suggestions.
- does it crash on first iteration?
- could you try with CPU sampling?
- try with small fanout, single layer.
Thank you for your patient response. I have now resolved the issue, and the code for link prediction and node classification on heterogeneous graphs is running correctly. The previous bug might have been due to inconsistent devices.
from dgl.
this is the code about generating Dataset:
base_dir = os.path.join(now_dir,'HGBl_base_dir')
construct the Ondiskdataset from existed dglgraph
graph_file_path = '/data/zzh/TEST_DIR/HGBl_dir/HGBl-amazon_DGLGraph.bin'
HGBl_Graph = dgl.load_graphs(filename=graph_file_path)[0][0]
feature = HGBl_Graph.ndata['h']
product_feat_np = feature.numpy()
product_feat_file = os.path.join(base_dir,'product_feat_file.npy')
np.save(file=product_feat_file,arr=product_feat_np)
src,dst = HGBl_Graph.edges(etype=('product','product-product-0','product') )
src = src.numpy()
dst = dst.numpy()
P0P_npy = np.stack((src, dst))
P0P_npy_file = os.path.join(base_dir,'P0P.npy')
np.save(file=P0P_npy_file,arr=P0P_npy)
src,dst = HGBl_Graph.edges(etype=('product','product-product-1','product') )
src = src.numpy()
dst = dst.numpy()
P1P_npy = np.stack((src, dst))
P1P_npy_file = os.path.join(base_dir,'P1P.npy')
np.save(file=P1P_npy_file,arr=P1P_npy)
#The edge information numpy files in train_set, val_set, and test_set have been stored locally, and each set includes the source and target node IDs of two types of edges, P-0-P and P-1-P
Train set
train_set_POP_path = "/data/zzh/TEST_DIR/HGBl_base_dir/train_set_P0P.npy"
train_set_P1P_path = "/data/zzh/TEST_DIR/HGBl_base_dir/train_set_P1P.npy"
val set
val_set_POP_path = "/data/zzh/TEST_DIR/HGBl_base_dir/val_set_P0P.npy"
val_set_P1P_path = "/data/zzh/TEST_DIR/HGBl_base_dir/val_set_P1P.npy"
test set
test_set_POP_path = "/data/zzh/TEST_DIR/HGBl_base_dir/test_set_P0P.npy"
test_set_P1P_path = "/data/zzh/TEST_DIR/HGBl_base_dir/test_set_P1P.npy"
yaml_content = f"""
dataset_name: HGBl_amazon_GB
graph:
nodes:
- type: product
num: 10099
edges:
- type: "product:product-product-0:product"
format: numpy
path: {os.path.basename(P0P_npy_file)}
- type: "product:product-product-1:product"
format: numpy
path: {os.path.basename(P1P_npy_file)}
feature_data:
- domain: node
type: product
name: feat
format: numpy
in_memory: false
path: {os.path.basename(product_feat_file)}
tasks:
- name: link_prediction
num_classes: 100
train_set:
- type: "product:product-product-0:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(train_set_POP_path)}
- type: "product:product-product-1:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(train_set_P1P_path)}
validation_set:
- type: "product:product-product-0:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(val_set_POP_path)}
- type: "product:product-product-1:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(val_set_P1P_path)}
test_set:
- type: "product:product-product-0:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(test_set_POP_path)}
- type: "product:product-product-1:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(test_set_P1P_path)}
"""
metadata_path = os.path.join(base_dir, "metadata.yaml")
with open(metadata_path, "w") as f:
f.write(yaml_content)
dataset = gb.OnDiskDataset(base_dir).load()
graph = dataset.graph.to(device)
feature = dataset.feature.to(device)
tasks = dataset.tasks
link_pred_task = tasks[0]
datapipe = gb.ItemSampler(link_pred_task.train_set, batch_size=16, shuffle=True)
datapipe = datapipe.copy_to(device)
datapipe = datapipe.sample_uniform_negative(graph, 1)
datapipe = datapipe.sample_neighbor(graph, [-1, -1,-1])
datapipe = datapipe.fetch_feature(
feature,
node_feature_keys={"product": ["feat"]}
)
dataloader = gb.DataLoader(datapipe,num_workers=0)
from dgl.
Hello @Ying-1106, it'd be helpful if you can provide the error message. And you can try print(train_set)
to examine the training set and check if data is correct.
from dgl.
And please share which DGL version you're using.
from dgl.
And please share which DGL version you're using.
My DGL version is 2.2.1 + cu118
from dgl.
Hello @Ying-1106, it'd be helpful if you can provide the error message. And you can try
print(train_set)
to examine the training set and check if data is correct.
when i print train_set
print(link_pred_task.train_set)
ItemSetDict( itemsets={'product:product-product-0:product': ItemSet( items=(tensor([[ 552, 7161], [8166, 9154], [2310, 2945], ..., [1367, 4038], [ 728, 7947], [5994, 5039]], dtype=torch.int32),), names=('seeds',), ), 'product:product-product-1:product': ItemSet( items=(tensor([[ 454, 8906], [7462, 9232], [8126, 359], ..., [4892, 731], [6761, 3064], [8407, 9684]], dtype=torch.int32),), names=('seeds',), )}, names=('seeds',), )
the error
Whenever I step through this line 【for step, data in enumerate(dataloader):】, the code terminates abruptly, and the terminal outputs either "free(): invalid size," "munmap_chunk(): invalid pointer," or "double free or corruption (out)." Any of these three errors might be output.
from dgl.
Hello @Ying-1106, it'd be helpful if you can provide the error message. And you can try
print(train_set)
to examine the training set and check if data is correct.
it's the error message:
RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
This exception is thrown by iter of Bufferer(datapipe=FeatureFetcher)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 125, in iter
yield self._apply_fn(data)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 90, in _apply_fn
return self.fn(data)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/minibatch_transformer.py", line 38, in _transformer
minibatch = self.transformer(minibatch)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/subgraph_sampler.py", line 65, in _preprocess
) = SubgraphSampler._seeds_preprocess(minibatch)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/subgraph_sampler.py", line 166, in _seeds_preprocess
unique_seeds, compacted = unique_and_compact(nodes)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/internal/sample_utils.py", line 56, in unique_and_compact
unique[ntype], compacted[ntype] = unique_and_compact_per_type(
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/internal/sample_utils.py", line 47, in unique_and_compact_per_type
unique, compacted, _ = torch.ops.graphbolt.unique_and_compact(
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/ops.py", line 854, in call
return self._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
This exception is thrown by iter of MiniBatchTransformer(datapipe=UniformNegativeSampler, transformer=_preprocess)
During handling of the above exception, another exception occurred:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 39, in repr
csc_indptr_str = str(self.csc_indptr)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
This exception is thrown by iter of CompactPerLayer(datapipe=SamplePerLayer, deduplicate=True)
During handling of the above exception, another exception occurred:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 39, in repr
csc_indptr_str = str(self.csc_indptr)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
This exception is thrown by iter of CompactPerLayer(datapipe=SamplePerLayer, deduplicate=True)
During handling of the above exception, another exception occurred:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/dataloader.py", line 68, in iter
yield from self.dataloader
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
data = next(self.dataset_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 152, in next
return self._get_next()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 140, in _get_next
result = next(self.iterator)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 224, in wrap_next
result = next_func(*args, **kwargs)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py", line 383, in next
return next(self._datapipe_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 39, in repr
csc_indptr_str = str(self.csc_indptr)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
This exception is thrown by iter of CompactPerLayer(datapipe=SamplePerLayer, deduplicate=True)
During handling of the above exception, another exception occurred:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 385, in
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 385, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/torch_based_feature_store.py", line 225, in repr
str(self._tensor), " " * len(" feature=")
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/torch_based_feature_store.py", line 432, in repr
features_str = textwrap.indent(str(self._features), " ").strip()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/base.py", line 306, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/base.py", line 325, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/base.py", line 280, in iter
yield from self.datapipe
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py", line 383, in next
return next(self._datapipe_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 224, in wrap_next
result = next_func(*args, **kwargs)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 140, in _get_next
result = next(self.iterator)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 152, in next
return self._get_next()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
data = next(self.dataset_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/data/zzh/TEST_DIR/GraphBolt_异质图(链接预测有BUG).py", line 696, in get_HGBl_amazon_GB
for step, data in enumerate(dataloader):
File "/data/zzh/TEST_DIR/GraphBolt_异质图(链接预测有BUG).py", line 750, in
get_HGBl_amazon_GB()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
This exception is thrown by iter of Bufferer(datapipe=FeatureFetcher)
from dgl.
how do you generate the train_set
? are the Node IDs in each seed
is edge type wised?
from dgl.
how do you generate the
train_set
? are the Node IDs in eachseed
is edge type wised?
I generate train_set with 2 numpy files. One is edge type P0P, another is edge type P1P as below:
tasks:
- name: link_prediction
num_classes: 2
train_set:
- type: "product:product-product-0:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(train_set_POP_path)}
- type: "product:product-product-1:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(train_set_P1P_path)}
the numpy array , this is the numpy array in train_set:
train_set_POP = np.load(train_set_POP_path)
train_set_P1P = np.load(train_set_P1P_path)
prin(train_set_P0P):
train_set_POP
array([[ 552, 7161],
[8166, 9154],
[2310, 2945],
...,
[1367, 4038],
[ 728, 7947],
[5994, 5039]])
print(train_set_P1P):
train_set_P1P
array([[ 454, 8906],
[7462, 9232],
[8126, 359],
...,
[4892, 731],
[6761, 3064],
[8407, 9684]])
from dgl.
In order to dive deep into the root cause, I recommend to narrow down the case with following suggestions.
- does it crash on first iteration?
- could you try with CPU sampling?
- try with small fanout, single layer.
from dgl.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
from dgl.
Related Issues (20)
- [GraphBolt] ogbn-arxiv accuracy values are lower than expected. HOT 10
- Code running for distributed graph training HOT 2
- [liburing] redefinition of 'struct in6_pktinfo' when building on ubi7 with gcc/g++ 9.5.0 HOT 2
- [GraphBolt] Fix the performance issue of `graphbolt::parallel_for`. HOT 1
- [GraphBolt] Edge feature fetching does not work.
- [GraphBolt] Store `ORIGINAL_EDGE_ID` in `FeatureStore`. HOT 5
- DGL installation does not install pyyaml and pydantic as dependancies HOT 1
- [GraphBolt] Enable `CPUCachedFeature` for r-gcn mag240M example.
- [CI] Update CI compiler versions
- how to use gpu to generate graph ? (to generate edges and nodes ) HOT 2
- ImportError: Cannot load Graphbolt C++ library HOT 7
- Restrict `torch` versions in `setup.py` HOT 1
- [GraphBolt] WARNING: An experimental feature for CUDA allocations is turned on HOT 12
- sample_neighbor not behaving as expected on onDiskDataset HOT 11
- HeteroGraphConv's forward need to modify HOT 3
- HeteroGraphConv documentation sample code crash HOT 1
- DGL DataLoader does not maintain example order with shuffle=False when using multiple workers. HOT 1
- [DataLoader] User-defined Dataloader Problem HOT 1
- SDDMM operator fails in distributed environment
- [GraphBolt] Hetero example broken HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dgl.