Giter Site home page Giter Site logo

Comments (11)

Ying-1106 avatar Ying-1106 commented on August 15, 2024 1

In order to dive deep into the root cause, I recommend to narrow down the case with following suggestions.

  1. does it crash on first iteration?
  2. could you try with CPU sampling?
  3. try with small fanout, single layer.

Thank you for your patient response. I have now resolved the issue, and the code for link prediction and node classification on heterogeneous graphs is running correctly. The previous bug might have been due to inconsistent devices.

from dgl.

Ying-1106 avatar Ying-1106 commented on August 15, 2024

this is the code about generating Dataset:

base_dir = os.path.join(now_dir,'HGBl_base_dir')

construct the Ondiskdataset from existed dglgraph

graph_file_path = '/data/zzh/TEST_DIR/HGBl_dir/HGBl-amazon_DGLGraph.bin'
HGBl_Graph = dgl.load_graphs(filename=graph_file_path)[0][0]

feature = HGBl_Graph.ndata['h']
product_feat_np = feature.numpy()
product_feat_file = os.path.join(base_dir,'product_feat_file.npy')
np.save(file=product_feat_file,arr=product_feat_np)

src,dst = HGBl_Graph.edges(etype=('product','product-product-0','product') )
src = src.numpy()
dst = dst.numpy()
P0P_npy = np.stack((src, dst))
P0P_npy_file = os.path.join(base_dir,'P0P.npy')
np.save(file=P0P_npy_file,arr=P0P_npy)

src,dst = HGBl_Graph.edges(etype=('product','product-product-1','product') )
src = src.numpy()
dst = dst.numpy()
P1P_npy = np.stack((src, dst))
P1P_npy_file = os.path.join(base_dir,'P1P.npy')
np.save(file=P1P_npy_file,arr=P1P_npy)

#The edge information numpy files in train_set, val_set, and test_set have been stored locally, and each set includes the source and target node IDs of two types of edges, P-0-P and P-1-P

Train set

train_set_POP_path = "/data/zzh/TEST_DIR/HGBl_base_dir/train_set_P0P.npy"
train_set_P1P_path = "/data/zzh/TEST_DIR/HGBl_base_dir/train_set_P1P.npy"

val set

val_set_POP_path = "/data/zzh/TEST_DIR/HGBl_base_dir/val_set_P0P.npy"
val_set_P1P_path = "/data/zzh/TEST_DIR/HGBl_base_dir/val_set_P1P.npy"

test set

test_set_POP_path = "/data/zzh/TEST_DIR/HGBl_base_dir/test_set_P0P.npy"
test_set_P1P_path = "/data/zzh/TEST_DIR/HGBl_base_dir/test_set_P1P.npy"

yaml_content = f"""
dataset_name: HGBl_amazon_GB
graph:
nodes:
- type: product
num: 10099

    edges:
      - type: "product:product-product-0:product"
        format: numpy
        path: {os.path.basename(P0P_npy_file)}

      - type: "product:product-product-1:product"
        format: numpy
        path: {os.path.basename(P1P_npy_file)}
     
  feature_data:

    - domain: node
      type: product
      name: feat
      format: numpy
      in_memory: false
      path: {os.path.basename(product_feat_file)}

  tasks:
    - name: link_prediction
      num_classes: 100
      train_set:
        - type: "product:product-product-0:product"
          data:
            - name: seeds
              format: numpy
              path: {os.path.basename(train_set_POP_path)}

        - type: "product:product-product-1:product"
          data:
            - name: seeds
              format: numpy
              path: {os.path.basename(train_set_P1P_path)}
      
      validation_set:
        - type: "product:product-product-0:product"
          data:
            - name: seeds
              format: numpy
              path: {os.path.basename(val_set_POP_path)}

        - type: "product:product-product-1:product"
          data:
            - name: seeds
              format: numpy
              path: {os.path.basename(val_set_P1P_path)}

      test_set:
        - type: "product:product-product-0:product"
          data:
            - name: seeds
              format: numpy
              path: {os.path.basename(test_set_POP_path)}

        - type: "product:product-product-1:product"
          data:
            - name: seeds
              format: numpy
              path: {os.path.basename(test_set_P1P_path)}

"""

metadata_path = os.path.join(base_dir, "metadata.yaml")
with open(metadata_path, "w") as f:
f.write(yaml_content)

dataset = gb.OnDiskDataset(base_dir).load()
graph = dataset.graph.to(device)
feature = dataset.feature.to(device)
tasks = dataset.tasks
link_pred_task = tasks[0]

datapipe = gb.ItemSampler(link_pred_task.train_set, batch_size=16, shuffle=True)
datapipe = datapipe.copy_to(device)
datapipe = datapipe.sample_uniform_negative(graph, 1)
datapipe = datapipe.sample_neighbor(graph, [-1, -1,-1])
datapipe = datapipe.fetch_feature(
feature,
node_feature_keys={"product": ["feat"]}
)

dataloader = gb.DataLoader(datapipe,num_workers=0)

from dgl.

Skeleton003 avatar Skeleton003 commented on August 15, 2024

Hello @Ying-1106, it'd be helpful if you can provide the error message. And you can try print(train_set) to examine the training set and check if data is correct.

from dgl.

Rhett-Ying avatar Rhett-Ying commented on August 15, 2024

And please share which DGL version you're using.

from dgl.

Ying-1106 avatar Ying-1106 commented on August 15, 2024

And please share which DGL version you're using.

My DGL version is 2.2.1 + cu118

from dgl.

Ying-1106 avatar Ying-1106 commented on August 15, 2024

Hello @Ying-1106, it'd be helpful if you can provide the error message. And you can try print(train_set) to examine the training set and check if data is correct.

when i print train_set

print(link_pred_task.train_set)
ItemSetDict( itemsets={'product:product-product-0:product': ItemSet( items=(tensor([[ 552, 7161], [8166, 9154], [2310, 2945], ..., [1367, 4038], [ 728, 7947], [5994, 5039]], dtype=torch.int32),), names=('seeds',), ), 'product:product-product-1:product': ItemSet( items=(tensor([[ 454, 8906], [7462, 9232], [8126, 359], ..., [4892, 731], [6761, 3064], [8407, 9684]], dtype=torch.int32),), names=('seeds',), )}, names=('seeds',), )

the error

Whenever I step through this line 【for step, data in enumerate(dataloader):】, the code terminates abruptly, and the terminal outputs either "free(): invalid size," "munmap_chunk(): invalid pointer," or "double free or corruption (out)." Any of these three errors might be output.

from dgl.

Ying-1106 avatar Ying-1106 commented on August 15, 2024

Hello @Ying-1106, it'd be helpful if you can provide the error message. And you can try print(train_set) to examine the training set and check if data is correct.

it's the error message:

RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This exception is thrown by iter of Bufferer(datapipe=FeatureFetcher)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 125, in iter
yield self._apply_fn(data)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 90, in _apply_fn
return self.fn(data)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/minibatch_transformer.py", line 38, in _transformer
minibatch = self.transformer(minibatch)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/subgraph_sampler.py", line 65, in _preprocess
) = SubgraphSampler._seeds_preprocess(minibatch)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/subgraph_sampler.py", line 166, in _seeds_preprocess
unique_seeds, compacted = unique_and_compact(nodes)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/internal/sample_utils.py", line 56, in unique_and_compact
unique[ntype], compacted[ntype] = unique_and_compact_per_type(
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/internal/sample_utils.py", line 47, in unique_and_compact_per_type
unique, compacted, _ = torch.ops.graphbolt.unique_and_compact(
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/ops.py", line 854, in call
return self
._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This exception is thrown by iter of MiniBatchTransformer(datapipe=UniformNegativeSampler, transformer=_preprocess)

During handling of the above exception, another exception occurred:

File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 39, in repr
csc_indptr_str = str(self.csc_indptr)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This exception is thrown by iter of CompactPerLayer(datapipe=SamplePerLayer, deduplicate=True)

During handling of the above exception, another exception occurred:

File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 39, in repr
csc_indptr_str = str(self.csc_indptr)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This exception is thrown by iter of CompactPerLayer(datapipe=SamplePerLayer, deduplicate=True)

During handling of the above exception, another exception occurred:

File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/dataloader.py", line 68, in iter
yield from self.dataloader
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
data = next(self.dataset_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 152, in next
return self._get_next()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 140, in _get_next
result = next(self.iterator)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 224, in wrap_next
result = next_func(*args, **kwargs)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py", line 383, in next
return next(self._datapipe_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 39, in repr
csc_indptr_str = str(self.csc_indptr)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This exception is thrown by iter of CompactPerLayer(datapipe=SamplePerLayer, deduplicate=True)

During handling of the above exception, another exception occurred:

File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 375, in get_summarized_data
return torch.cat(
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 385, in
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 385, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 617, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor_str.py", line 697, in _str
return _str_intern(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/_tensor.py", line 464, in repr
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/torch_based_feature_store.py", line 225, in repr
str(self._tensor), " " * len(" feature=")
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/impl/torch_based_feature_store.py", line 432, in repr
features_str = textwrap.indent(str(self._features), " ").strip()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 27, in _simplify_obj_name
return repr(obj)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 43, in _generate_input_args_string
result.append((name, _simplify_obj_name(value)))
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 203, in wrap_generator
full_msg = f"{msg} {datapipe.class.name}({_generate_input_args_string(datapipe)})"
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/base.py", line 306, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/base.py", line 325, in iter
for data in self.datapipe:
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/dgl/graphbolt/base.py", line 280, in iter
yield from self.datapipe
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 181, in wrap_generator
response = gen.send(None)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py", line 383, in next
return next(self._datapipe_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 224, in wrap_next
result = next_func(*args, **kwargs)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 140, in _get_next
result = next(self.iterator)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 152, in next
return self._get_next()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
data = next(self.dataset_iter)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/data/zzh/TEST_DIR/GraphBolt_异质图(链接预测有BUG).py", line 696, in get_HGBl_amazon_GB
for step, data in enumerate(dataloader):
File "/data/zzh/TEST_DIR/GraphBolt_异质图(链接预测有BUG).py", line 750, in
get_HGBl_amazon_GB()
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/data/zzh/anaconda3/envs/YING/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This exception is thrown by iter of Bufferer(datapipe=FeatureFetcher)

from dgl.

Rhett-Ying avatar Rhett-Ying commented on August 15, 2024

how do you generate the train_set? are the Node IDs in each seed is edge type wised?

from dgl.

Ying-1106 avatar Ying-1106 commented on August 15, 2024

how do you generate the train_set? are the Node IDs in each seed is edge type wised?

I generate train_set with 2 numpy files. One is edge type P0P, another is edge type P1P as below:

tasks:
- name: link_prediction
num_classes: 2
train_set:
- type: "product:product-product-0:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(train_set_POP_path)}
- type: "product:product-product-1:product"
data:
- name: seeds
format: numpy
path: {os.path.basename(train_set_P1P_path)}

the numpy array , this is the numpy array in train_set:

train_set_POP = np.load(train_set_POP_path)
train_set_P1P = np.load(train_set_P1P_path)
prin(train_set_P0P):
train_set_POP
array([[ 552, 7161],
[8166, 9154],
[2310, 2945],
...,
[1367, 4038],
[ 728, 7947],
[5994, 5039]])

print(train_set_P1P):
train_set_P1P
array([[ 454, 8906],
[7462, 9232],
[8126, 359],
...,
[4892, 731],
[6761, 3064],
[8407, 9684]])

from dgl.

Rhett-Ying avatar Rhett-Ying commented on August 15, 2024

In order to dive deep into the root cause, I recommend to narrow down the case with following suggestions.

  1. does it crash on first iteration?
  2. could you try with CPU sampling?
  3. try with small fanout, single layer.

from dgl.

github-actions avatar github-actions commented on August 15, 2024

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

from dgl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.