Comments (3)
I don't believe this has anything to do with resource requirements. This behavior was there before. If the global scheduler has no local schedulers connected, it can't assign the task to anybody. One possibility is to set its state back to WAITING, or we are going to have to start queueing tasks in the global scheduler and part with the current bufferless design...
from ray.
Good point. This was already happening whenever the global scheduler received a task before the global scheduler had registered any local schedulers. Now it is failing whenever the global scheduler receives a task before it has registered any local schedulers or before it has received any heartbeats.
from ray.
This was fixed by #306.
from ray.
Related Issues (20)
- No dashboard for commit cf53b351471716e7bfa71d36368ebea9b0e219c5 (Ray 0.9.0.dev0)[<Ray component: Core|RLlib|etc...>]
- How can I hyperparameter tune over multiple GPUs? HOT 3
- [CI] `linux://doc:datasets_train` is failing/flaky on master. HOT 1
- Release test serve_autoscaling_multi_deployment.aws failed HOT 12
- [Serve] Gracefully shut down unhealthy replicas HOT 3
- [CI] Core dashboard tests flaky with "Failed to fetch" HOT 2
- [Build] Unable to build Ray on MacOS 14.1.2 (Sonoma) HOT 3
- [Dashboard] Should specify the time range in job detail page for load the cluster status and scale metrics HOT 5
- "RaySystemError: System error: Unknown error" HOT 2
- [Core] ray start cli exit code 1 not 0 HOT 2
- [Train] LightGBM stuck with more workers
- [CI] `osx://python/ray/tests:test_threaded_actor` is failing/flaky on master. HOT 1
- [CI] `linux://rllib:learning_tests_cartpole_crashing_and_stalling_appo` is failing/flaky on master. HOT 1
- [serve] Invalid Deployment state combination
- [Data] Bad assert on streaming executor tests
- [Doc] Minor fixes caught by pre-commit
- [Train] Custom checkpoint directory name for Trainer like Tuner does HOT 5
- [Train] Able to load all results using ExperimentAnalysis for Trainer's checkpoints instead of getting latest one only
- [Runtime] Release test ml_user_tune_rllib_connect_test.aws failed HOT 1
- [Runtime] Release test many_nodes_actor_test_on_v2.aws failed HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.