Giter Site home page Giter Site logo

kubernetes-retired / poseidon Goto Github PK

View Code? Open in Web Editor NEW
410.0 30.0 84.0 33.34 MB

[EOL] A Firmament-based Kubernetes scheduler

Home Page: http://www.firmament.io

License: Apache License 2.0

Shell 30.32% Go 57.95% Makefile 6.19% Python 1.57% Dockerfile 1.36% Starlark 2.60%
k8s-sig-scheduling

poseidon's Introduction

Build Status

Introduction

The Poseidon/Firmament scheduler incubation project is to bring integration of Firmament Scheduler OSDI paper in Kubernetes. At a very high level, Poseidon/Firmament scheduler augments the current Kubernetes scheduling capabilities by incorporating a new novel flow network graph based scheduling capabilities alongside the default Kubernetes Scheduler. Firmament models workloads on a cluster as flow networks and runs min-cost flow optimizations over these networks to make scheduling decisions.

Due to the inherent rescheduling capabilities, the new scheduler enables a globally optimal scheduling for a given policy that keeps on refining the dynamic placements of the workload.

As we all know that as part of the Kubernetes multiple schedulers support, each new pod is typically scheduled by the default scheduler. But Kubernetes can also be instructed to use another scheduler by specifying the name of another custom scheduler at the time of pod deployment (in our case, by specifying the 'schedulerName' as Poseidon in the pod template). In this case, the default scheduler will ignore that Pod and instead allow Poseidon scheduler to schedule the Pod on a relevant node.

Key Advantages

  • Flow graph scheduling provides the following

    • Support for high-volume workloads placement.
    • Complex rule constraints.
    • Globally optimal scheduling for a given policy.
    • Extremely high scalability.

    NOTE: Additionally, it is also very important to highlight that Firmament scales much better than default scheduler as the number of nodes increase in a cluster.

Current Project Stage

Alpha Release

Design

Poseidon/Firmament Integration architecture

For more details about the design of this project see the design document.

Installation

In-cluster installation of Poseidon, please start here.

Development

For developers please refer here

Release Process

To view details related to coordinated release process between Firmament & Poseidon repos, refer here.

Latest Benchmarking Results

Please refer to link for detail throughput performance comparison test results between Poseidon/Firmament scheduler and Kubernetes default scheduler.

Roadmap

  • Release 0.9 onwards:
    • Provide High Availability/Failover for in-memory Firmament/Poseidon processes.
    • Scheduling support for “Dynamic Persistence Volume Provisioning”.
    • Optimizations for reducing the no. of arcs by limiting the number of eligible nodes in a cluster.
    • CPU/Mem combination optimizations.
    • Transitioning to Metrics server API – Our current work for upstreaming new Heapster sink is not a possibility as Heapster is getting deprecated.
    • Continuous running scheduling loop versus scheduling intervals mechanism.
    • Priority Pre-emption support.
    • Priority based scheduling.
  • Release 0.8 – Target Date 15th February, 2019:
    • Pod Affinity/Anti-Affinity optimization in 'Firmament' code.
  • Release 0.7 – Target Date 19th November, 2018:
    • Support for Max. Pods per Node.
    • Co-Existence with Default Scheduler.
    • Node Prefer/Avoid pods priority function.
  • Release 0.6 – Target Date 12th November, 2018:
    • Gang Scheduling.
  • Release 0.5 – Released on 25th October 2018:
    • Support for Ephemeral Storage, in addition to CPU/Memory.
    • Implementation for Success/Failure of scheduling events.
    • Scheduling support for “Pre-bound Persistence Volume Provisioning”.
  • Release 0.4 – Released on 18th August, 2018:
    • Taints & Tolerations.
    • Support for Pod anti-affinity symmetry.
    • Throughput Performance Optimizations.
  • Release 0.3 – Released on 21st June, 2018:
    • Pod level Affinity and Anti-Affinity implementation using multi-round scheduling based affinity and anti-affinity.
  • Release 0.2 – Released on 27th May, 2018:
    • Node level Affinity and Anti-Affinity implementation.
  • Release 0.1 – Released on 3rd May, 2018:
    • Baseline Poseidon/Firmament Scheduling capabilities using new multi-dimensional CPU/Memory cost model is part of this release. Currently, this does not include node and pod level affinity/anti-affinity capabilities. As shown below, we are building all this out as part of the upcoming releases.
    • Entire test.infra BOT automation jobs are in place as part of this release.

poseidon's People

Contributors

agilebot1 avatar ant-caichu avatar asifdxtreme avatar deepak-vij avatar eissana avatar hanxiaoshuai avatar icgog avatar islinwb avatar jiaxuanzhou avatar k8s-ci-robot avatar karunchennuri avatar kevin-wangzefeng avatar ms705 avatar nikita15p avatar shivramsrivastava avatar spiffxp avatar stewart-yu avatar timothysc avatar wgliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

poseidon's Issues

dev list about e2e

There are one works about e2e need to do:

  1. As PR #97 mentioned, we should generte $HOME/.kube/config content rather copy it.
    ....

Local Cluster E2E test error

When I use test/e2e-poseidon-local.sh to run local cluster E2E test, it returns a a panic. I want to know what should be defined in kubeconfig file /root/.kube/config?

May 16 20:29:12.000: INFO: Location of the kubeconfig file /root/.kube/config
=== RUN   TestPoseidon
Running Suite: Poseidon Suite
=============================
Random Seed: 1526473751
Will run 7 of 7 specs

Panic [0.001 seconds]
[BeforeSuite] BeforeSuite
/home/gopath/src/github.com/kubernetes-sigs/poseidon/test/e2e/framework/framework.go:87

  Test Panicked
  invalid configuration: no configuration has been provided
  /home/go/src/runtime/panic.go:502

  Full Stack Trace
        /home/go/src/runtime/panic.go:502 +0x229
  github.com/kubernetes-sigs/poseidon/test/e2e/framework.(*Framework).BeforeEach(0xc4202b84b0)
        /home/gopath/src/github.com/kubernetes-sigs/poseidon/test/e2e/framework/framework.go:100 +0x509
  github.com/kubernetes-sigs/poseidon/test/e2e/framework.(*Framework).BeforeEach-fm()
        /home/gopath/src/github.com/kubernetes-sigs/poseidon/test/e2e/framework/framework.go:87 +0x2a
  github.com/kubernetes-sigs/poseidon/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc4200935c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/gopath/src/github.com/kubernetes-sigs/poseidon/test/e2e/poseidon_suite_test.go:28 +0x64
  testing.tRunner(0xc4204d40f0, 0x1103040)
        /home/go/src/testing/testing.go:777 +0xd0
  created by testing.(*T).Run
        /home/go/src/testing/testing.go:824 +0x2e0

k8s_api_client.cc: Exception while waiting for pod list: not an object

Hello, I pulled your camsas/poseidon:dev docker image and wanted to have a try, I used the follow command:

$ docker run camsas/poseidon:dev /usr/bin/poseidon \
    --logtostderr \
    --k8s_apiserver_host=192.168.3.89 \
    --k8s_apiserver_port=8080 \
    --cs2_binary=/usr/bin/cs2.exe \
    --max_tasks_per_pu=50

But then an exception occurred, here were the logs:

I0521 05:59:45.096031     1 scheduler_bridge.cc:44] Firmament scheduler instantiated: 0x1b56650
I0521 05:59:45.096866     1 k8s_api_client.cc:63] Starting K8sApiClient for API server at http://192.168.3.89:8080/
I0521 05:59:45.187544     1 scheduler_bridge.cc:86] Adding new node's resource with RID f4673222-7577-ed4f-a0ec-d5a5c1da1c5d
E0521 05:59:45.262907     1 k8s_api_client.cc:313] Exception while waiting for pod list: not an object
...

And here are my pods in my k8s cluster:

$ kubectl get pod 
NAME                      READY     STATUS    RESTARTS   AGE
frontend-0d71m            1/1       Running   0          12d
frontend-75xjr            1/1       Running   0          12d
frontend-b54c8            1/1       Running   0          12d
redis-master-3wtcq        1/1       Running   0          12d
redis-slave-3fctv         1/1       Running   0          12d
redis-slave-gm9kk         1/1       Running   0          12d

And I looked into the poseidon's source code, it seemed that the APIClient couldn't get the cluster's pods:
Poll pods
And, my OS is CentOS7.2, not the suggested OS Ubuntu14.04... Does this matter?
So what should I do to slove this problem?
Thanks~

poseidon should cache the node of the cluster

1, cache the node obj to generate the resource view of the nodes in the cluster
2, optional but useful : capability to provide api to get the whole resource usage of the cluster

Create the official release for poseidon

With the rapid development of the project, it is necessary for us to periodically public the poseidon release. So I propose:

  • release cycle: monthly?

  • release content:

    • zip and tar packages of source code.
    • a docker image link, e.g. gcr.io/google_containers/poseidon-{ARCH}:{Version}. Supported ARCHs are amd64, arm, arm64, ppc64le, s390x and relase version is in the format of x.y.z.
    • a release tar package which consists of poseidon deployment manifests for kubernetes and a saved docker image file.
    • a brief description of notable changes.

@ms705 @ICGog @deepak-vij @shashidharatd @shivramsrivastava @dhilipkumars

Firmament scheduler throws error when run with 1.7 kubernetes

I am facing a strange issue when using firmament container.

Observations:
Kubernetes version : 1.7

Firmament created using:

docker run  --net=host  camsas/firmament:dev /firmament/build/src/firmament_scheduler --flagfile=/firmament/default.conf 

Poseidon able to connect to firmament and kubernetes.

Poseidon created using:

./poseidon --logtostderr --kubeConfig=<kubeconfig.cfg> --firmamentAddress=localhost:9090.

In standard out, I am able to see that both nodes and pods are being successfully watched and task is being submitted to firmament scheduler:

I0903 22:37:33.695642   20401 nodewatcher.go:132] enqueueNodeAdition: Added node sample-node-2gjww

I0903 22:37:33.700437   20401 podwatcher.go:176] enqueuePodAddition: Added pod {podname4 test}
I0903 22:37:33.701334   20401 podwatcher.go:176] enqueuePodAddition: Added pod {podname5 test}

In firmament stdout, I see this error:

I0904 02:37:42.510321    13 utils.cc:341] External execution of command: build/third_party/flowlessly/src/flowlessly-build/flow_scheduler --graph_has_node_types=true --algorithm=successive_shortest_path --print_assignments=true --daemon=false caling_factor= 
W0904 02:37:42.510821    15 solver_dispatcher.cc:143] STDERR from solver: E0904 02:37:42.510514    14 utils.cc:381] execvp failed for task command 'build/third_party/flowlessly/src/flowlessly-build/flow_scheduler --graph_has_node_types=true --algorithm=successive_shortest_path --print_assignments=true --daemon=false caling_factor= ': No such file or directory [2]

Result:
Pods are not being scheduled at all.

Just wanted to know, if I missed anything

Road map for E2E test in Poseidon

I have created a Google doc here, for discussing on the road map for the Poseidon E2E.
Please comment on the Google doc or in this issue directly.
If you feel it needs more addition please add those sections in the Google doc.
This is the list is not exhaustive but is a list of things which i feel we need to add as a part of E2E tests.
We can refine and come up with final task list.

@deepak-vij @shashidharatd @m1093782566

Support for Gang scheduling

Enable support for Gang scheduling within Firmament Scheduler. Some jobs cannot make progress unless all their tasks are running (for example, a synchronized iterative graph computation), while others can begin processing even as tasks are scheduled incrementally (e.g. a MapReduce job).

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Poseidon pod stuck in "Init:0/1"

I followed the guide in https://github.com/kubernetes-sigs/poseidon/blob/master/docs/install/README.md and tried it in two VMs. The poseidon pod stuck in "Init:0/1" in one of the VMs.

docker logs <init-firmamentservice_contianer_ID>:

waiting for firmamentservice
nslookup: can't resolve 'firmament-service.kube-system'
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

docker exec -it <init-firmamentservice_contianer_ID> sh

[ root@poseidon-6d45696849-gxtpg:/ ]$ nslookup firmament-service.kube-system
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'firmament-service.kube-system'

[ root@poseidon-6d45696849-gxtpg:/ ]$ cat /etc/resolv.conf
nameserver 10.0.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local huawei.com
options ndots:5

[ root@poseidon-6d45696849-gxtpg:/ ]$ nslookup firmament-service.kube-system.svc.cluster.local
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      firmament-service.kube-system.svc.cluster.local
Address 1: 10.0.0.160 firmament-service.kube-system.svc.cluster.local

After changing firmament-service.kube-system to firmament-service.kube-system.svc.cluster.local in https://raw.githubusercontent.com/kubernetes-sigs/poseidon/master/deploy/poseidon-deployment.yaml, it works.

initContainers:
      - name: init-firmamentservice
        image: radial/busyboxplus:curl
        command: ['sh', '-c', 'until nslookup firmament-service.kube-system; do echo waiting for firmamentservice; sleep 1; done;']

Earlier I suspect it's caused by the huawei.com in /etc/resolv.conf which is from the node's /etc/resolv.conf. But I comment search huawei.com in /etc/resolv.conf of the node but still failed to nslookup firmament-service.kube-system.

The heapster-poseidon image is only available for x86 arch

I am trying to run poseidon on PowerPC machines. However, I didn't find the source code of the image shivramsrivastava/heapster-poseidon, so that I cannot create an image for PowerPC arch.

In this case, could you please share the modified heapster source (heapster-poseidon), or generate the image for ppc64le arch?

What is the difference between the shivramsrivastava/heapster-poseidon and a regular heapster image?

containerized poseidon: memory corruption

@ms705 Hi there!
Thank you so much for your amazing work (again). I've tried to bring up poseidon in docker, yet it seems to suffer some memory issue after it started to run for 2~3 min.
Please kindly take a look at the log if possible. Thanks so much.

root@ubuntu:~# docker logs 0f0e6ab6837e
I1205 02:11:00.477520     1 k8s_api_client.cc:47] Starting K8sApiClient for API server at http://10.10.103.67:8080/
I1205 02:11:00.481966     1 scheduler_integration.cc:134] Firmament scheduler instantiated: <FlowScheduler for coordinator >
I1205 02:11:00.526311     1 scheduler_integration.cc:145] Adding new node's resource with RID cf742c5b-2ef6-4b18-a93d-3785f5444d22
I1205 02:11:00.547468     1 scheduler_integration.cc:171] New unscheduled pod: busybox
I1205 02:11:00.548526     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:11:00.662428     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:11:00.663835     1 scheduler_integration.cc:191] Delta: task_id: 6569851536870938357
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:11:00.672199    40 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":409,"details":{"kind":"pods/binding","name":"busybox"},"kind":"Status","message":"Operation cannot be fulfilled on pods/binding \"busybox\": pod busybox is already assigned to node \"10.10.103.69\"","metadata":{},"reason":"Conflict","status":"Failure"}
I1205 02:11:00.673027     1 k8s_api_client.cc:201] Bound busybox to 10.10.103.67
I1205 02:11:10.695752     1 scheduler_integration.cc:171] New unscheduled pod: k8s-master-10.10.103.67
I1205 02:11:10.696748     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:11:10.802476     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:11:10.803570     1 scheduler_integration.cc:191] Delta: task_id: 8144958691656263322
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:11:10.812801    46 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":404,"details":{"kind":"pods","name":"k8s-master-10.10.103.67"},"kind":"Status","message":"pods \"k8s-master-10.10.103.67\" not found","metadata":{},"reason":"NotFound","status":"Failure"}
I1205 02:11:10.814126     1 k8s_api_client.cc:201] Bound k8s-master-10.10.103.67 to 10.10.103.67
I1205 02:11:20.839632     1 scheduler_integration.cc:171] New unscheduled pod: k8s-proxy-v1-7kvf0
I1205 02:11:20.840548     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:11:20.952191     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:11:20.952741     1 scheduler_integration.cc:191] Delta: task_id: 7995058044728560469
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:11:20.964922    31 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":404,"details":{"kind":"pods","name":"k8s-proxy-v1-7kvf0"},"kind":"Status","message":"pods \"k8s-proxy-v1-7kvf0\" not found","metadata":{},"reason":"NotFound","status":"Failure"}
I1205 02:11:20.965868     1 k8s_api_client.cc:201] Bound k8s-proxy-v1-7kvf0 to 10.10.103.67
I1205 02:11:30.994323     1 scheduler_integration.cc:171] New unscheduled pod: k8s-proxy-v1-eahvt
I1205 02:11:30.995163     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:11:31.108824     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:11:31.109328     1 scheduler_integration.cc:191] Delta: task_id: 13663010032745704128
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:11:31.118139    34 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":404,"details":{"kind":"pods","name":"k8s-proxy-v1-eahvt"},"kind":"Status","message":"pods \"k8s-proxy-v1-eahvt\" not found","metadata":{},"reason":"NotFound","status":"Failure"}
I1205 02:11:31.118768     1 k8s_api_client.cc:201] Bound k8s-proxy-v1-eahvt to 10.10.103.67
I1205 02:11:41.136282     1 scheduler_integration.cc:171] New unscheduled pod: k8s-proxy-v1-qce6o
I1205 02:11:41.136919     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:11:41.192404     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:11:41.193066     1 scheduler_integration.cc:191] Delta: task_id: 18392520999753559797
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:11:41.199827    34 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":404,"details":{"kind":"pods","name":"k8s-proxy-v1-qce6o"},"kind":"Status","message":"pods \"k8s-proxy-v1-qce6o\" not found","metadata":{},"reason":"NotFound","status":"Failure"}
I1205 02:11:41.199950     1 k8s_api_client.cc:201] Bound k8s-proxy-v1-qce6o to 10.10.103.67
I1205 02:11:51.220187     1 scheduler_integration.cc:171] New unscheduled pod: kube-addon-manager-10.10.103.67
I1205 02:11:51.220731     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:11:51.331187     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:11:51.331946     1 scheduler_integration.cc:191] Delta: task_id: 7867583654750718473
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:11:51.339476    41 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":404,"details":{"kind":"pods","name":"kube-addon-manager-10.10.103.67"},"kind":"Status","message":"pods \"kube-addon-manager-10.10.103.67\" not found","metadata":{},"reason":"NotFound","status":"Failure"}
I1205 02:11:51.339663     1 k8s_api_client.cc:201] Bound kube-addon-manager-10.10.103.67 to 10.10.103.67
I1205 02:12:01.368042     1 scheduler_integration.cc:171] New unscheduled pod: kube-dns-v20-o4p9l
I1205 02:12:01.369421     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
I1205 02:12:01.494426     1 scheduler_integration.cc:189] Got 1 scheduling deltas
I1205 02:12:01.496582     1 scheduler_integration.cc:191] Delta: task_id: 829239843783619468
resource_id: "cf742c5b-2ef6-4b18-a93d-3785f5444d22"
type: PLACE
I1205 02:12:01.509184    45 k8s_api_client.cc:72] Parsing binding response: {"apiVersion":"v1","code":404,"details":{"kind":"pods","name":"kube-dns-v20-o4p9l"},"kind":"Status","message":"pods \"kube-dns-v20-o4p9l\" not found","metadata":{},"reason":"NotFound","status":"Failure"}
I1205 02:12:01.509935     1 k8s_api_client.cc:201] Bound kube-dns-v20-o4p9l to 10.10.103.67
I1205 02:12:11.530864     1 scheduler_integration.cc:171] New unscheduled pod: kubernetes-dashboard-v1.4.0-cblag
I1205 02:12:11.531860     1 utils.cc:324] External execution of command: /usr/bin/cs2.exe 
*** Error in `/usr/bin/poseidon': malloc(): memory corruption: 0x00007f5a578bd010 ***

Add support for max pod number hard requirement

User story:

If node A can only run 10 Pods and already has 10 Pods running on it. Then, any Pod can't be scheduled to node A until the number of Pods running on node A decreases to less than 10.

NOTE: can we resolve it via CPU most model?

State where we are and revise roadmap to help the project better moving forward

As discussed yesterday, to help the project better moving forward, we shall:

1. State where we are (with details)

  • Add description of the project development stage -- I'd say alpha or pre-alpha?
    Since we not yet pass all scheduling conformance tests.

  • Figure out how far away behind the default scheduler at functionality wise.
    The best way to do is run scheduling e2e test from the core repo (with arg --ginkgo.focus="sig-scheduling"),
    and see how many tests fail.

  • Add CI status indicator
    Just a basic thing every project shall do. To better track project healthy.

2. Revise roadmap and better backlog grooming

  • It's almost mid May now, while some of the items supposed to be done from initial roadmap still not yet implemented.
  • There's no milestone set, nor any priority labels set on issues, we shall add them.

3. Do other things... that help improve contributor experience

Comments are always welcome.

[k8s Incubation] Requesting project transfer

Hi @ms705 and @ICGog,

As discussed and agreed over the email, could you please initiate the transfer to this new github organization and to Tim please? This will be first step of incubating Poseidon into kubernetes eco-system.

Approvers (People with Commit access): @ms705 @ICGog @shivramsrivastava
Revieweres (People who can lgtm PRs) : @dhilipkumars [two more to be added post incubation]

cc: @timothysc @shivramsrivastava @deepak-vij

Hi @timothysc,

Shouldnt all of us be added into the kubernetes-sigs org for us to effectively manage the project? Could we do anything to make that happen like send an email to some one with list of members to be added.

Also do you think it would make sense to create two github groups in kubernetes-sigs organization such that

  1. poseidon-approvers
  2. poseidon-reviewers

Regards,
Dhilip

PS: We could keep this issue for discussing other logistic related problems to bring this task to a closer. WDYT?

Unfriendly import packages of github.com/kubernetes-sigs/poseidon/pkg

Just like:
https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md

Developers need to create new k8s.io directory under $GOPATH, but usually developers will directly fork poseidon to their $GOPATH/src/github.com/username/poseidon. If there is no k8s.io directory, go build will fail. If the developer modifies the code under github.com/kubernetes-sigs/poseidon/pkg, the changed code is not imported.

I suggest import kubernetes-sigs/poseidon/pkg/k8sclient directly or just using like staging of kubernetes.

Cannot build poseidon

I have tried to build poseidon by several different ways and no one has worked for me.

First, I tried the "go build ." and a got the following error:
can't load package: package k8s.io/poseidon: no Go files in /home/go/src/k8s.io/poseidon
I tired to google that, but could't fix it, do you know what is it missing?

Second, i just tried a simple "make", then, it does not find the kubernetes packages
can't load package: package k8s.io/kubernetes/hack/cmd/teststale: cannot find package "k8s.io/kubernetes/hack/cmd/teststale" in any of:
/home/go-src/src/k8s.io/kubernetes/hack/cmd/teststale (from $GOROOT)
/home/go/src/k8s.io/poseidon/_output/local/go/src/k8s.io/kubernetes/hack/cmd/teststale (from $GOPATH)
I found out that the /home/go/src/k8s.io/poseidon/_output/local/go/src/k8s.io/kubernetes/ points to /home/go/src/k8s.io/poseidon/ not to the real kubernetes folder in the GOPATH

Third, using the bazel : make bazel-build
then I get ERROR: infinite symlink expansion detected

Fourth, I have manually create the symbolic link with the kubernetes folder in the GOPATH into the /home/go/src/k8s.io/poseidon/_output/local/go/src/k8s.io/
It fix the infinite symlink, but fails with:
ERROR: error loading package '_output/local/go/src/k8s.io/kubernetes/pkg/generated/openapi': Extension file not found. Unable to load package for '//pkg/generated/openapi:def.bzl': BUILD file not found on package path
ERROR: error loading package '_output/local/go/src/k8s.io/kubernetes/pkg/generated/openapi': Extension file not found. Unable to load package for '//pkg/generated/openapi:def.bzl': BUILD file not found on package path
INFO: Elapsed time: 1.077s
FAILED: Build did NOT complete successfully (418 packages loaded)
currently loading: _output/local/go/src/k8s.io/kubernetes/pkg/printers/intern
alversion ... (7 packages)

Please, could you help to build Poseidon?

[Suggestion] add comment to more functions and critical code blocks

Recently I'm gonna do some secondary development on poseidon project, I find that few comments are in the project, which makes it harder to understand some complex functions or code blocks or causes some misunderstand. Good comments make reading source code more efficient, which could attract more community contributions. I'm appreciated if adding more and complete comments could be put on agenda.

Unexpected dense task placement for k8s cluster

Hi, I was running a containerized poseidon with camsas/poseidon:dev on the master node of a k8s cluster. Yet since it was properly brought up (or at least that's what I thought), it cannot help scheduling every pod onto the master node (10.10.103.67 in this case, as you could see in the log) while the other two worker nodes are completely lack of workload. The three machines involved are homogeneous.

Note that the first busybox is scheduled by kube-scheduler.

I was wondering if I should bring up another two poseidon instances to make things right ( 'cause seems that we should bring up a Firmament coordinator on every worker node of the cluster) or, it's just some kind of coincidence and everything is taken care of?

cc @ms705 Appreciate your help in advance.

root@ubuntu:~# kubectl get nodes
NAME           STATUS    AGE
10.10.103.67   Ready     4d
10.10.103.68   Ready     4d
10.10.103.69   Ready     4d
root@ubuntu:~# kubectl get pods -o wide
NAME                            READY     STATUS    RESTARTS   AGE       IP           NODE
busybox                         1/1       Running   96         4d        10.1.82.2    10.10.103.69
busybox-trident-basic           1/1       Running   25         1d        10.1.34.5    10.10.103.67
busybox-trident-basic-1         1/1       Running   0          38m       10.1.34.6    10.10.103.67
frontend-3223876880-1rl4q       1/1       Running   0          9m        10.1.34.7    10.10.103.67
frontend-3223876880-fof79       1/1       Running   0          9m        10.1.34.8    10.10.103.67
frontend-3223876880-rg3mi       1/1       Running   0          9m        10.1.34.9    10.10.103.67
redis-master-3804387969-ot2z9   1/1       Running   0          9m        10.1.34.10   10.10.103.67
redis-slave-368277221-nwkye     1/1       Running   0          9m        10.1.34.11   10.10.103.67
redis-slave-368277221-pgh1p     1/1       Running   0          9m        10.1.34.12   10.10.103.67

e2e failed in local-up-cluster

e2e failed in local-up-cluster. When I local-up-cluster, and use test/e2e-poseidon-local.sh to test E2E.

NAMESPACE       NAME                                   READY     STATUS    RESTARTS   AGE
kube-system     heapster-85994cc757-z7bcs              1/1       Running   0          24m
kube-system     kube-dns-659bc9899c-st6qz              3/3       Running   0          26m
poseidon-test   firmament-scheduler-7788b7d89b-g5hvx   1/1       Running   0          4m
poseidon-test   poseidon-5c4db977b7-555xh              1/1       Running   0          4m
poseidon-test   test-nginx-pod-2596996162              0/1       Pending   0          4m
Add Pod using Poseidon scheduler
 /var/paas/kaiyuan/src/github.com/kubernetes-sigs/poseidon/test/e2e/poseidon_integration.go:60
   using firmament for configuring pod
   /var/paas/kaiyuan/src/github.com/kubernetes-sigs/poseidon/test/e2e/poseidon_integration.go:62
     should succeed deploying pod using firmament scheduler [It]
     /var/paas/kaiyuan/src/github.com/kubernetes-sigs/poseidon/test/e2e/poseidon_integration.go:65

     Expected
         <string>: Pending
     to equal
         <string>: Running

There are some error in kube-controller-manager.log

I0517 14:13:33.385386   97129 endpoints_controller.go:375] Error syncing endpoints for service "poseidon-test/firmament-service", retrying. Error: Endpoints "firmament-service" is invalid: subsets[0].notReadyAddresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)
I0517 14:13:33.391838   97129 endpoints_controller.go:375] Error syncing endpoints for service "poseidon-test/firmament-service", retrying. Error: Endpoints "firmament-service" is invalid: subsets[0].notReadyAddresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)
I0517 14:13:33.403792   97129 endpoints_controller.go:375] Error syncing endpoints for service "poseidon-test/firmament-service", retrying. Error: Endpoints "firmament-service" is invalid: subsets[0].notReadyAddresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)
I0517 14:13:33.426011   97129 endpoints_controller.go:375] Error syncing endpoints for service "poseidon-test/firmament-service", retrying. Error: Endpoints "firmament-service" is invalid: subsets[0].notReadyAddresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)
I0517 14:13:33.467916   97129 endpoints_controller.go:375] Error syncing endpoints for service "poseidon-test/firmament-service", retrying. Error: Endpoints "firmament-service" is invalid: subsets[0].notReadyAddresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)
I0517 14:13:33.552074   97129 endpoints_controller.go:375] Error syncing endpoints for service "poseidon-test/firmament-service", retrying. Error: Endpoints "firmament-service" is invalid: subsets[0].notReadyAddresses[0].ip: Invalid value: "127.0.0.1": may not be in the loopback range (127.0.0.0/8)

Question about flag content in Docker run command

Hello @ICGog @ms705 ,
I wanna integrate Firmament with Kubernetes running locally, however, I got confused when I tried to run this command:
$ docker run camsas/poseidon:dev poseidon
--logtostderr
--kubeConfig=<path_kubeconfig_file>
--firmamentAddress=:
--statsServerAddress=:
--kubeVersion=<Major.Minor>

I don't know how to fill in path_kubeconfig_file and I don't know what the statsServerAddress means.
I will be grateful if somebody can help me!

when a e2e case finished, the pod was not deleted in time

kubectl get po --all-namespaces
NAMESPACE       NAME                                            READY     STATUS    RESTARTS   AGE
kube-system     kube-dns-659bc9899c-q2vdh                       3/3       Running   0          3h
poseidon-test   firmament-scheduler-7788b7d89b-9cwnx            1/1       Running   0          6m
poseidon-test   poseidon-86796db96-chslx                        1/1       Running   0          6m
poseidon-test   restricted-pod                                  0/1       Pending   0          28s
poseidon-test   test-nginx-deploy-4039455774-7bbd79f9c6-p2tx2   1/1       Running   0          6m
poseidon-test   test-nginx-deploy-4039455774-7bbd79f9c6-svn8x   1/1       Running   0          6m
poseidon-test   test-nginx-job-1879968118-dv7j6                 1/1       Running   0          5m
poseidon-test   test-nginx-job-1879968118-srtpq                 1/1       Running   0          5m
poseidon-test   test-nginx-rs-2854263694-czx47                  1/1       Running   0          6m
poseidon-test   test-nginx-rs-2854263694-ppfnw                  1/1       Running   0          6m
poseidon-test   test-nginx-rs-2854263694-v9rw5                  1/1       Running   0          6m
I0524 17:24:23.474194  107687 deployment_controller.go:573] Deployment poseidon-test/test-nginx-deploy-4039455774 has been deleted
I0524 17:24:23.479394  107687 deployment_controller.go:573] Deployment poseidon-test/test-nginx-deploy-4039455774 has been deleted
I0524 17:24:23.496520  107687 replica_set.go:477] Too few replicas for ReplicaSet poseidon-test/test-nginx-rs-2854263694, need 3, creating 3

Should add design document for node affinity

As we almost finished the review work on node selector and node affinity design for firmament/poseidon scheduler, should we add the markdown-type document to this project? e.g. create a docs/design folder and drop the document in.

BTW, contributors should send design document PR before starting feature developing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.