unionai-oss / deploy-flyte Goto Github PK
View Code? Open in Web Editor NEWA set of IaC artifacts to automatically configure the infrastructure resources needed by a Flyte deployment
License: Apache License 2.0
A set of IaC artifacts to automatically configure the infrastructure resources needed by a Flyte deployment
License: Apache License 2.0
The base assumption for this Issue is that a reference implementation should implement the Least Privilege approach as a way to showcase a more secure deployment OOB and also inform users who want/need to relax security controls about the minimum set of permissions required, in this case, for Flyte.
The current GCP implementation is more permissive than necessary, specifically:
flyte-worker
and flyte-binary
the admin role.legacyBucketReader
role.Previous versions of the documentation and recent experiments of Flyte users indicate that it's possible to use a set of more granular permissions for Flyte services.
The working combination that implements the least privilege approach should be used.
Observations from revisiting the scripts for AWS:
eks.tf
module has more content than it requiresHey guys,
After fixing a few barriers I've managed to get the cluster up and running with the flyte-binary
in running and healthy mode..
But I cannot get the ingress address:
My helm config:
configuration:
inlineSecretRef: flyte-binary-inline-config-secret
database:
username: flyteadmin
host: 'example'
dbname: flyteadmin
storage:
metadataContainer: flyte-staging-data
userDataContainer: flyte-staging-data
provider: s3
providerConfig:
s3:
region: 'eu-central-1'
authType: 'iam'
logging:
level: 5
plugins:
cloudwatch:
enabled: true
templateUri: |-
https://console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/eks/opta-development/cluster;stream=var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}-{{ .containerId }}.log
auth:
enabled: true
oidc:
baseUrl: https://accounts.google.com
clientId: itsmysecret
clientSecret: mysecretasswell
internal:
clientSecret: example
clientSecretHash: has example
authorizedUris:
- https://flyte.dev.example
inline:
cluster_resources:
customData:
- production:
- defaultIamRole:
value: arn:aws:iam::example:role/flyte-staging-flyte-worker
- staging:
- defaultIamRole:
value: arn:aws:iam::example:role/flyte-staging-flyte-worker
- development:
- defaultIamRole:
value: arn:aws:iam::example:role/flyte-staging-flyte-worker
flyteadmin:
roleNameKey: 'iam.amazonaws.com/role'
plugins:
k8s:
inject-finalizer: true
default-env-vars:
- AWS_METADATA_SERVICE_TIMEOUT: 5
- AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20
storage:
cache:
max_size_mbs: 10
target_gc_percent: 100
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- K8S-ARRAY
default-for-task-types:
- container: container
- container_array: K8S-ARRAY
task_resources:
defaults:
cpu: 1
memory: 1Gi
storage: 100Mi
clusterResourceTemplates:
inline:
001_namespace.yaml: |
apiVersion: v1
kind: Namespace
metadata:
name: '{{ namespace }}'
002_serviceaccount.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: '{{ namespace }}'
annotations:
eks.amazonaws.com/role-arn: '{{ defaultIamRole }}'
ingress:
create: true
commonAnnotations:
kubernetes.io/ingress.class: nginx
alb.ingress.kubernetes.io/certificate-arn: 'arn:aws:acm:eu-central-1:example...'
httpAnnotations:
nginx.ingress.kubernetes.io/app-root: /console
grpcAnnotations:
nginx.ingress.kubernetes.io/backend-protocol: GRPC
host: flyte.dev.example
rbac:
extraRules:
- apiGroups:
- ''
resources:
- pods
- services
- configmaps
verbs:
- '*'
- apiGroups:
- ''
resources:
- serviceaccounts
verbs:
- create
- get
- list
- patch
- update
- apiGroups:
- rbac.authorization.k8s.io
resources:
- rolebindings
- roles
verbs:
- create
- get
- list
- patch
- update
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::example
Hey again :)
The logs are not getting into the CloudWatch for some reason.
Logs section in the helm file
logging:
level: 4 (tried also 1)
plugins:
cloudwatch:
enabled: true
templateUri: |-
https://console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/eks/flyte-staging/cluster;stream=var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}-{{ .containerId }}.log
CloudWatch can't find the log stream:
Looking at the other log streams it seems that most of the pods get log streams but none of the workflow/tasks get them.
Any idea what is not configured right? (tried adding CloudWatch full access policy to the role as well)
We used the terraform for GCP to create a Flyte cluster. We found that the node pool on the GKE cluster cannot be scaled down because "pod is blocking scale down because it’s not backed by a controller."
For more details:
GCP:
Pod is blocking scale down of underutilized node because it doesn’t have a controller, such as a deployment or replicaset. Refer to logs for more details.
Recommended actions
Set annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" for Pod or define a controller, such as a deployment or replicaset, for the Pod
Log:
noScaleDown: {
......
reason: {
messageId: "no.scale.down.node.pod.kube.system.unmovable"}
}
As a user, I'd like to see a demonstration of a working auth setup. Probably, using Keycloak as an IdP (due to its open nature and the availability of a Terraform provider).
It includes adding a programmatic way to label the node pool with the specific GPU type only if the user indicate the requirement to use GPUs (bool)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.