aws / aws-app-mesh-roadmap Goto Github PK
View Code? Open in Web Editor NEWAWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication
License: Apache License 2.0
AWS App Mesh is a service mesh that you can use with your microservices to manage service to service communication
License: Apache License 2.0
As discussed in #74, the current way to model an external service that a service within the mesh can route to is by modeling the external service as a VirtualNode. For example, if you had two services named Service-A and Service-B, and Service-B was an external service (e.g. gitlab) hosted at the DNS name gitlab.my-intranet.com
. If you wanted the VirtualNode representing Service-A to be able to egress traffic to Service-B, you would model your mesh configuration as:
Service-A:
{
"meshName": "foo",
"virtualNodeName": "service-a",
"spec": {
"listeners": [
{
"portMapping": {
"port": 8080,
"protocol": "http"
}
}
],
"serviceDiscovery": {
"dns": {
"serviceName": "service-a.foo-mesh.local"
}
},
"backends": [
"gitlab.my-intranet.com"
]
}
}
Service-B:
{
"meshName": "foo",
"virtualNodeName": "service-b",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "tcp"
},
"healthCheck": {
"protocol": "tcp",
"healthyThreshold": 2,
"unhealthyThreshold": 2,
"timeoutMillis": 2000,
"intervalMillis": 5000
}
}
],
"serviceDiscovery": {
"dns": {
"serviceName": "gitlab.my-intranet.com"
}
}
}
}
The VirtualNode model contains many specifications which would not normally apply to an external service not within the control of the mesh (such as backends), while others still do (such as health checks).
This issue is to track the investigation of a general simplification of modeling external entities within the mesh.
In ECS awsvpc networking mode doesn't meet everyone's needs. Enabling other network modes would be great.
Summary
When deleting and re-creating a VirtualNode with the same name (e.g. my-virtual-node under a Mesh named my-mesh), an Envoy identified as that VirtualNode name (e.g. mesh/my-mesh/virtualNode/my-virtual-node) which connects shortly after re-creating the VirtualNode may receive the previous VirtualNode's configuration. This may commonly occur if the Envoy is connected in less than 10 seconds from the time the VirtualNode was re-created.
This issue is closely related to #49.
We are working on a solution for this bug that will occasionally check for this state and request that the Envoy reconnect to receive updated configuration.
Steps to reproduce
Expected behavior: The Envoy receives the configuration for the new VirtualNode
Actual behavior: The Envoy receives the configuration for the previous VirtualNode by the same name
Work-arounds
APPMESH_VIRTUAL_NODE_NAME
to the VirtualNode's UID as returned by the CreateVirtualNode API, instead of the ARN or truncated resource name.We are developing an Envoy tracing driver for AWS X-Ray (https://github.com/awslabs/aws-app-mesh-examples/issues/5). This tracing driver will be upstream to Envoy.
Primary mechanism will be via AWS Cloud Map. Work with Hashicorp to build two-way sync between Consul and Cloud Map
.
You should be able to use the offical release of Envoy.
Expand to all AWS regions.
Current region list
As part of #16 , we want customers to be able to
The integration will happen primarily with a controller running in the customer's cluster on the master instances, managed by EKS. The controller will watch the Kubernetes API of the customer's cluster and react to certain objects being created or modified. It will create the necessary components in AppMesh and CloudMap.
Initial support will be for a single AppMesh mesh and a single CloudMap namespace per cluster (though many clusters can share a mesh/namespace). Customers can provide an existing mesh/namespace as well.
An additional component that will be used on the customer's worker nodes is the App Mesh CNI(https://github.com/awslabs/aws-app-mesh-examples/issues/15). Its responsibility is to enter the network namespace of a new pod and set up iptables rules to route incoming and outgoing traffic through envoy. This takes the place of an init container, and is preferred to avoid having to run privileged containers altogether.
Optionally, a mutating admission webhook could be employed to inject envoy as a sidecar container into pods that are launched in the cluster.
Hello, I was wondering what would be the correct way to access the host on which a service is running. It would be cumbersome to model each host as a TCP virtual node and virtual service and add them all as backends to each service to account for dynamic task placement.
My initial thought is to add them to the APPMESH_EGRESS_IGNORED_IP since that's how the metadata services are reached, but I wanted to know if there was a better approach.
Thanks!
A Retry Policy in App Mesh enables clients to protect themselves from intermittent network failures, or intermittent server-side failures. A Retry Policy is an immutable entity in App Mesh that allows users to specify the conditions under which a retry is attempted, including HTTP status codes that will trigger a retry. A Retry Policy also has parameters specifying how many times to retry, and the timeout to use per retry.
Once a Retry Policy is created, it can be attached to one or more Virtual Nodes as part of the backends. Each backend in a Virtual Node can have its own retry policy.
Summary
When deleting a VirtualNode in a Mesh, the resulting Envoy configuration for that VirtualNode will remain available to an Envoy which identifies itself as that VirtualNode name (e.g. mesh/my-mesh/virtualNode/my-virtual-node
). Envoys which are connected to the Envoy Management Service endpoint identified as that VirtualNode will remain connected and may receive improper configuration.
Note: Other Envoys identified as separate VirtualNodes, who may have previously relied on the deleted VirtualNode as part of a backend definition, will be updated with the correct configuration.
The period that this configuration is available after deleting a VirtualNode is approximately 7 days. We are working to reduce this time.
Steps to reproduce
Scenario 1: A connected Envoy remains connected after deletion of the VirtualNode
Expected behavior: The Envoy no longer receives configuration updates and is disconnected from the Envoy Management Service endpoint.
Actual behavior: The Envoy remains connected and may receive improper configuration.
Scenario 2: An Envoy connects after deletion of the VirtualNode_
Expected behavior: The Envoy is not allowed to connect to the Envoy Management Service, and receives an appropriate error code (e.g. NOT_FOUND
)
Actual behavior: The Envoy remains connected and may receive improper configuration.
Work-around
Make sure your Envoys are disconnected, and the associated ECS tasks, EKS pods, or applications running on EC2 are not serving traffic, then delete the VirtualNode.
Details: AWS Cloud Map to act as cross-service service registry for service endpoints and metadata. ECS already integrates with Cloud Map and we plan to build EKS connector to Cloud Map.
I'm not sure if there is here the right place to open this issue, but App mesh website is send to invalid URL when click on " Get started with AWS App Mesh". It's sending to :
https://us-west-2.awsc-integ.aws.amazon.com/appmesh/get-started?region=us-west-2
Add App Mesh to CloudFormation so that customers can easily automate App Mesh setup.
Describe the bug
See envoyproxy/envoy#5174 for the detailed description.
Platform
ALL
Expected behavior
Should not return 503 on route change.
Additional context
envoyproxy/envoy#5174
What happened?
Updates to routes have no effect on running envoys
What you expected to happen?
Tasks must be restarted for route changes to take effect.
How to reproduce it (as minimally and precisely as possible)?
Not easy to reproduce as it happens sporadically. Seems to be an issue with XDS protocol
Observed that it does not, though you may need to try to change the routes a few times and leave A running for an extended period of time (>30mins).
Issue gets resolved if I restart A as it gets new configuration.
Customers should be able to build their own filters into Envoy and we should allow config of those filters.
Tell us about your request
This feature request is for implementing traffic mirroring (also referred to as shadowing). Traffic mirroring allows one service to send the same traffic to more than one upstream service while still only using a single upstream service for the authoritative response. Other services which are receiving mirrored traffic can be tested for bugs and performance regressions prior to serving real traffic and becoming the authoritative upstream.
Which integration(s) is this request for?
All
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When working with microservices, developers and infrastructure engineers often need to test their new versions against real traffic before shifting all live traffic over to the new version. This increases confidence in code changes and allows teams to find bugs during periods of change.
Are you currently working around this issue?
App Mesh does not currently supporting traffic mirroring, so teams may work around the issue by replaying old traffic patterns from previous logs collected.
Additional context
Envoy Proxy supports traffic mirroring on routes.
Implement tagging of App Mesh resources so that our customers can have a consistent management and authorization experience.
The Envoy Management Service (EMS) is App Mesh's managed Aggregated Discovery Service (ADS) that customer Envoys connect to for dynamic configuration.
App Mesh will allow customers to enable DogStatsD metrics to a local sidecar agent or remote agent. The agent must be capable of ingesting DogStatsD metrics; examples include the CloudWatch agent and the DataDog agent. The metrics will be tagged/dimensioned to allow for aggregating metrics across multiple endpoints as desired.
Will it be possible to deploy to App Mesh a custom built Envoy binary that is compatible with App Mesh of course e.g. with SigV4 etc? Custom filters for Envoy require a custom Envoy.
Describe the bug
Resources vended via the Envoy Management Service contain unique version numbers that may change when new configuration is generated. This may cause TCP connections proxied by Envoy to fail, or HTTP connections to be prematurely drained due to resource replacement. It also causes some of Envoy's generated metrics to contain the version numbers, which makes it difficult to track a given statistic through Envoy configuration changes.
Platform
All
To Reproduce
Steps to reproduce the TCP connection failure behavior:
Steps to reproduce the metrics behavior:
Expected behavior
Additional Context
Here are some examples of statistics generated by Envoy which have unique version numbers in them:
$ curl http://colorgateway.appmesh-example.local:9901/stats
...
cluster.cds|egress|AppMeshExample|colorgateway-vn|colorteller-black-vn|http|9080|22459664.external.upstream_rq_completed: 1
...
http.ingress.AppMeshExample.colorgateway-vn.rds.rds|ingress|AppMeshExample|colorgateway-vn|http|9080|31467114.config_reload: 1
It would be great not to have to setup an IGW, NATGW or HTTP proxy for communication between Envoy sidecars and the App Mesh xDS API (e.g. appmesh-envoy-management.us-east-1.amazonaws.com
, see here).
Summary
When updating the value of the new mesh egress filter, any Envoy which is currently connected to the App Mesh ADS endpoint will not immediately receive the updated setting. The Envoy will receive the updated configuration after a maximum period of 30 minutes, or after the Envoy disconnects and reconnects to the ADS endpoint.
Steps to reproduce
Expected behavior: The Envoy should receive the updated setting within a matter of seconds.
Actual behavior: The Envoy does not receive the updated setting until it disconnects and reconnects to the App Mesh ADS endpoint (which happens automatically every 30 minutes).
Workaround: You can force the Envoy to update its configuration by restarting it (via the ECS task, EKS pod, or similar). Note that this will need to handled carefully for Envoys serving production traffic (i.e. issue a rolling restart).
A fix has been proposed and will be rolled out over the coming days to address this issue.
Send our customers API call data to CloudTrail so customers can reliably audit their account activity.
Per discussions in #49 and #71, the usage of ServiceNames in the mesh are not abundantly clear in the current APIs and documentation.
The current use of ServiceNames as described by @ivitjuk:
This task is for tracking clarification work against the usage of ServiceNames in the APIs and documentation.
App Mesh will enable authorization at the resource level, including resource prefixes. This will allow customers to create IAM policies and roles for specific resources or groups of resources in App Mesh. These roles can be assumed by multiple accounts, in order to enable multiple accounts to operate in the same mesh, with well-defined resource-level authorizations for each roles.
Describe the bug
Currently aws-appmesh-proxy-route-manager assumes that it is setting up iptables rules for a service and requires the environment variable APPMESH_APP_PORTS to be set. However, there is a use-case for using envoy as a side-car to an application that is client only and has no open ingress ports. In this case, the user would have to specify a fake port in order to launch the pod.
Platform
EKS, ECS
To Reproduce
Steps to reproduce the behavior:
Expected behavior
APPMESH_APP_PORTS should be able to be unset, perhaps with a warning message printed.
Enable a virtual node definition to include additional attributes beyond the service name when configuring the service registration details. This will enable routing to different ECS/Fargate task sets or k8s deployments under the same service name.
Create a CNI plugin that can be used to route network traffic instead of using containerized proxymanager script that required extra privlages.
App Mesh will allow customers to enable X-Ray tracing on a per-mesh basis. Once enabled, customers can view X-Ray segment data and configure sampling through the AWS X-Ray Console, API, or CLI. If X-Ray Tracing is enabled, App Mesh will emit X-Ray tracing segments to the X-Ray agent, running either as a sidecar in the task/pod or running elsewhere in the customer's account.
Allow customers to enable HTTP/TCP access logging for a Virtual Node. Access logs will be written to a deterministic location in the Envoy container. This location can be shared with a log ingestion sidecar such as fluentd, or (for ECS and EKS) shared with the host and ingested by an agent running on the host (e.g. CloudWatch agent).
I have seen #76 and related issues discussing egress configuration but it's still unclear to me how to properly setup egress to something completely outside of my cluster on the internet. My scenario is running a sidecar container next to my application container and the Envoy/App Mesh container that collects metrics and pushes them to an external service. Obviously this doesn't work by default because there's no backend defined to let that traffic leave the cluster. Is this a scenario that is covered in docs or an issue that I missed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.