qihoo360 / dgl-operator Goto Github PK
View Code? Open in Web Editor NEWThe DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes
License: Apache License 2.0
The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes
License: Apache License 2.0
deploy examples/v1alpha1/GraphSAGE_dist.yaml:
Phase 3/5: dispatch partitions
----------
Traceback (most recent call last):
File "tools/dispatch.py", line 102, in
main()
File "tools/dispatch.py", line 44, in main
with open(args.ip_config) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/etc/dgl/hostfile'
----------
Phase 3/5 error raised
021-08-14T09:45:48.716Z INFO controllers.DGLJob Finished reconciling job {"dgljob": "dgl-operator/dgl-graphsage", "dgl-operator/dgl-graphsage": "80.81µs"}
2021-08-14T09:45:48.722Z ERROR controllers.DGLJob unable to fetch DGLJob {"dgljob": "dgl-operator/dgl-graphsage", "error": "DGLJob.qihoo.net \"dgl-graphsage\" not found"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
github.com/Qihoo360/dgl-operator/controllers.(*DGLJobReconciler).Reconcile
/workspace/controllers/dgljob_controller.go:115
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:297
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:252
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:215
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99
Phase 3/5 error raised
Phase 1/5 error raised
Phase 2/5 error raised
I change partitionMode
to ParMETIS
,but it seems to have no effect.
Phase 2/5 error raised
Some time another error may occur.
Launch arguments: Namespace(cmd_type='copy_batch_container', container='watcher-loop-partitioner', ip_config='/etc/dgl/leadfile', num_parts=None, num_samplers=0, num_server_threads=1, num_servers=None, num_trainers=None, part_config=None, source_file_paths='/dgl_workspace/dataset', target_dir='/dgl_workspace', worker_chief_index=0, workspace='/dgl_workspace'), []
30050 dgl-graphsage-launcher
['30050', 'dgl-graphsage-launcher']
Traceback (most recent call last):
File "tools/launch.py", line 280, in
main()
File "tools/launch.py", line 252, in main
run_cp_container(args)
File "tools/launch.py", line 100, in run_cp_container
for pod_info in get_ip_host_pairs(args.ip_config):
File "tools/launch.py", line 64, in get_ip_host_pairs
raise RuntimeError("Format error of ip_config.")
RuntimeError: Format error of ip_config.
/etc/dgl/leadfile may loss ip.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.