turnerlabs / udeploy Goto Github PK
View Code? Open in Web Editor NEWA simple way to deploy versioned AWS resources.
A simple way to deploy versioned AWS resources.
During a deployment via the portal, container definition fields like StopTimeout
are being wiped out. We use this field to ensure sufficient time for a graceful shutdown of our containers in the event AWS needs to terminate or replace a Fargate container.
Manually revise task definition in AWS Console to re-insert StopTimeout
value.
Upgrade to latest (v1.37.0
) AWS Go SDK version. StopTimeout
(and other current) container fields are serialized and included in ContainerDefinition
when calling DescribeTaskDefinition
, which ensures the portal copies and preserves the fields.
N/A
When attempting a deployment for a lambda function an error is thrown.
ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: arn...
This appears to be due to lambda function state changes that were recently introduced. https://aws.amazon.com/de/blogs/compute/coming-soon-expansion-of-aws-lambda-states-to-all-functions/
When a commit has line returns/new lines, the line returns aren't visible in the HTML.
When trying to integrate an application in udeploy with GitHub to display commit history, a GitHub Personal Access Token (PAT) is requited. However, when you enter the PAT on the app configuration portal screen and try to save, you receive an error stating KMS access is denied, and the save is aborted. If you clear the Personal Access Token field, you can then save successfully.
Give the udeploy role access to encrypt using the KMS key.
https://github.com/turnerlabs/udeploy/blob/master/infrastructure/modules/portal/kms.tf
list of saml users for policies
configUserIds = flatten([
data.aws_caller_identity.current.account_id,
"${aws_iam_role.app_role.unique_id}:*", <<<<-----------
formatlist(
"%s:%s",
data.aws_iam_role.saml_role_config.unique_id,
var.saml_users,
)
])
Attempting to terraform the prod instance of the portal results in the following error:
Error: Reference to undeclared module
on infrastructure/portals/prod/main.tf line 61, in output "s3_change_queue":
61: value = module.dev.s3_change_queue
module.dev.s3_change_queue
should be module.env.s3_change_queue
When performing a deployment through the portal’s deploy modal, some ECR images are missing from the drop down list, but they do exist in ECR and are tagged properly. This is difficult to reproduce and is not always consistent with which images are missing. The SDK ECR service call ListImages
only returns the first page of results when called once. The number of results per page is determined dynamically by AWS due to internal algorithms.
Deleting all the old ECR images leaving only the few most recent to allow all the images to be returned in a single call.
To show all images from ECR, paging needs to be implemented.
empty
When a CI/CD process deploys a new version of a scheduled task or lambda function to AWS, the portal does not show the new version on the card. The new version should automatically appear on the apps card in the portal after a few seconds. It appears that the AWS events are being consumed by the portal correctly even in multiple account situations, but the uniquely assigned app ids for the events are incorrect and not allowing the portal to find a match in the app cached causing the events to be ignored.
Clicking the manual app refresh button on the right side of the environment group.
By correctly generating or using the app ids the portal can find and update the correct app/env in the cache.
empty
Hey, guys. I work on the ODT/Compass team with Bart Alcorn/Cea Mosley. Would it be possible to sort the versions in the deployment modal by the task definition revision number instead of the version #/maybe have this as a configuration option? The go code is returning a map instead of an array, so it'd be a bit more involved than just sorting an array differently somewhere.
I don't mind doing the changes and submitting a PR, but it's non-trivial to get an instance running to experiment with. I may try again tonight/tomorrow, but I have an @turner address and am on slack if any of you have a few minutes to give me a hand setting it up locally.
When a CloudWatch alarm in a linked AWS account changes state between OK
and ALARM
and sends the event to the portal, the lambda function app/env card does not reflect the changes
empty
GitHub announced a new token format: https://github.blog/2021-04-05-behind-githubs-new-authentication-token-formats.
After updating the GH personal token to the new format, this error will show up:
{
"time": "2021-10-19T20:43:01.712703657Z",
"level": "-",
"prefix": "echo",
"file": "recover.go",
"line": "73",
"message": "[PANIC RECOVER] runtime error: index out of range [0] with length 0 goroutine 3701753 [running]:\ngithub.com/labstack/echo/v4/middleware.RecoverWithConfig.func1.1.1(0x13594e8, 0x1000, 0x0, 0x1717ec0, 0xc003ae84d0)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/middleware/recover.go:71 +0xee\npanic(0x128aea0, 0xc001818c40)\n\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\ngithub.com/turnerlabs/udeploy/component/commit.BuildRelease(0xc0044c9990, 0xa, 0xc0044a4860, 0x16, 0xc0048b6af0, 0x9, 0x0, 0x0, 0x132be93, 0x16, ...)\n\t/deploy/component/commit/build.go:67 +0xcb6\ngithub.com/turnerlabs/udeploy/handler.GetInstanceCommits(0x1717ec0, 0xc003ae84d0, 0xc00320623d, 0x1e)\n\t/deploy/handler/commit.go:37 +0x34c\ngithub.com/turnerlabs/udeploy/component/cache.EnsureCache.func1(0x1717ec0, 0xc003ae84d0, 0x3, 0x12aec40)\n\t/deploy/component/cache/ensure.go:24 +0x123\ngithub.com/turnerlabs/udeploy/component/request.RouteContext.func1.2(0x170b500, 0xc002175c40, 0x1173ce0, 0x1fb33b0)\n\t/deploy/component/request/context.go:51 +0x98\ngo.mongodb.org/mongo-driver/mongo.WithSession(0x16fa400, 0xc003a62030, 0x1707980, 0xc002f538c0, 0xc0020bf928, 0x0, 0x0)\n\t/go/pkg/mod/go.mongodb.org/[email protected]/mongo/client.go:498 +0xed\ngithub.com/turnerlabs/udeploy/component/request.RouteContext.func1(0x1717ec0, 0xc003ae84d0, 0x0, 0x0)\n\t/deploy/component/request/context.go:48 +0x279\ngithub.com/turnerlabs/udeploy/component/auth.UnAuthError.func1(0x1717ec0, 0xc003ae84d0, 0xc003c4d110, 0x1341358)\n\t/deploy/component/auth/authentication.go:64 +0xdc\ngithub.com/labstack/echo/v4.(*Echo).Add.func1(0x1717ec0, 0xc003ae84d0, 0x28, 0x12ae8e0)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/echo.go:490 +0x8a\ngithub.com/turnerlabs/udeploy/component/session.NewWithConfig.func1.1(0x1717ec0, 0xc003ae84d0, 0x40d800, 0xc003ce7140)\n\t/deploy/component/session/echo.go:70 +0x19f\ngithub.com/labstack/echo/v4/middleware.RecoverWithConfig.func1.1(0x1717ec0, 0xc003ae84d0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/middleware/recover.go:78 +0x10e\ngithub.com/turnerlabs/udeploy/component/logger.LogErrors.func1(0x1717ec0, 0xc003ae84d0, 0x3, 0xc003206234)\n\t/deploy/component/logger/error.go:14 +0x43\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP(0xc000104000, 0x16f15c0, 0xc002fe60e0, 0xc0044a6100)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/echo.go:593 +0x222\nnet/http.serverHandler.ServeHTTP(0xc0002c0700, 0x16f15c0, 0xc002fe60e0, 0xc0044a6100)\n\t/usr/local/go/src/net/http/server.go:2831 +0xa4\nnet/http.(*conn).serve(0xc0038f40a0, 0x16fa340, 0xc003ce7000)\n\t/usr/local/go/src/net/http/server.go:1919 +0x875\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:2957 +0x384\n\ngoroutine 1 [IO wait]:\ninternal/poll.runtime_pollWait(0x7fd8ec5854e8, 0x72, 0x0)\n\t/usr/local/go/src/runtime/netpoll.go:184 +0x55\ninternal/poll.(*pollDesc).wait(0xc000424518, 0x72, 0x0, 0x0, 0x131e21a)\n\t/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45\ninternal/poll.(*pollDesc).waitRead(...)\n\t/usr/local/go/src/internal/poll/fd_poll_runtime.go:92\ninternal/poll.(*FD).Accept(0xc000424500, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)\n\t/usr/local/go/src/internal/poll/fd_unix.go:384 +0x1f8\nnet.(*netFD).accept(0xc000424500, 0x203000, 0x203000, 0x203000)\n\t/usr/local/go/src/net/fd_unix.go:238 +0x42\nnet.(*TCPListener).accept(0xc0004cc0e0, 0xc000305870, 0xaf262dc9, 0x7989f1d013ab5d13)\n\t/usr/local/go/src/net/tcpsock_posix.go:139 +0x32\nnet.(*TCPListener).AcceptTCP(0xc0004cc0e0, 0xc0002c07a0, 0xc00083e4b0, 0x44d5b8)\n\t/usr/local/go/src/net/tcpsock.go:248 +0x47\ngithub.com/labstack/echo/v4.tcpKeepAliveListener.Accept(0xc0004cc0e0, 0xc0003058d0, 0x4821e6, 0x616f2dd4, 0x438bee)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/echo.go:767 +0x2f\nnet/http.(*Server).Serve(0xc0002c0700, 0x16f1140, 0xc0000b8880, 0x0, 0x0)\n\t/usr/local/go/src/net/http/server.go:2925 +0x280\ngithub.com/labstack/echo/v4.(*Echo).StartServer(0xc000104000, 0xc0002c0700, 0x131c1c9, 0x2)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/echo.go:663 +0x385\ngithub.com/labstack/echo/v4.(*Echo).Start(...)\n\t/go/pkg/mod/github.com/labstack/echo/[email protected]/echo.go:604\nmain.startRouter(0xc000510de0)\n\t/deploy/routes.go:140 +0x1828\n"
}
When a Fargate service task fail, the portal calls AWS services to evaluate the failing tasks to determine the health of the service. If the service keeps failing, the portal will evaluate all the failing tasks. When the number of failing tasks starts to get high, the portal would crash and restart over and over again.
The portal is not properly retrieving failing task details in the recursive function. It was calling AWS with an empty NextToken
causing an infinite loop which eventually maxed out memory causing the portal to fail.
none
Pass the correct NextToken
when retrieving ECS tasks.
empty
Kenneth and I talked about this earlier on the phone. When making lambda deployments, the old version of the lambda is still kept around. If you're using S3 for lambda storage, this isn't really necessary since your old versions are available there for re-deployment. It can also eventually exhaust the account level code storage limit, which is 75GB by default. It appears that versions of the AWS lambda consume storage that counts towards the account's code storage limit, regardless of whether you're using S3 for storage or uploading a zip directly to the lambda. If that turns out to not be the case, this issue is moot.
We talked about adding a configuration option that would tell udeploy to remove older versions when making a deployment. When talking about it with my team, Ben mentioned that it would also be nice to have a configuration option for the number of versions to keep, 2 or 3 or however many you want to have quickly available in the console.
I want to note too that the lambda API prevents deletions of versions that have an alias pointing to them, so it's safe to attempt to remove all older aliases. IE, if you have an older lambda version with an alias you've created to keep around for testing, the deletion attempt will fail. Whether udeploy silently ignores this error or surfaces in the UI to the user as a warning is up for discussion.
Thoughts?
When a CloudWatch alarm in a linked AWS account changes state between OK
and ALARM
and sends the event to the portal, the lambda function app/env card does not reflect the changes by turning the card red or back to blue.
The CloudWatch alarms in linked accounts are not being sent to the portal's SNS topic to be consumed by the portal and the app ids generated by the portal for AWS events do not match the ids in the portals app cache causing the events to be ignored.
none
Send CloudWatch alarms from linked accounts to the portals SNS topic and correctly generate the app ids from the events to match the apps in the cache.
empty
When an ECS task stops for a service monitored by the portal, it may be due to failing to start which means the ExecutionStoppedAt
field will not be populated. The portal checks for this field when determining how many task errors to consider when determining the task state of the entire service. This is done to avoid the service staying in an error state for long periods of time even after all tasks start running successfully again.
The following error was being mishandled. The portal always assumed there would be a timestamp populated for ExecutionStoppedAt
which is not the case in the error below.
StoppedReason: Timeout waiting for network interface provisioning to complete.
StoppedCode: TaskFailedToStart
Stop the running udeploy service/task in ECS. Wait 45 minutes to allow all task errors to clear. Restart the service/task in ECS.
To correct this, the portal should look at the task CreatedAt
field when the the StoppedCode
is TaskFailedToStart. This cannot be done for running task failure errors since the CreatedAt
field could be days ago; so, the running task failure errors should continue to look at the ExecutionStoppedAt
field.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1913a7b]
goroutine 33 [running]:
github.com/turnerlabs/udeploy/component/integration/aws/service.getServiceError(0xc00069adc0, 0x13, 0x14, 0x1176592e000, 0xc000488360, 0xc000014300, 0x13)
/Users/USER/repo/udeploy/component/integration/aws/service/populate.go:217 +0x23b
github.com/turnerlabs/udeploy/component/integration/aws/service.checkError(0xc000396c60, 0xc00069adc0, 0x13, 0x14, 0x1176592e000, 0x13, 0x14)
/Users/USER/repo/udeploy/component/integration/aws/service/populate.go:164 +0x62
github.com/turnerlabs/udeploy/component/integration/aws/service.populateInst(0x0, 0x0, 0x4, 0xc00061b840, 0x16, 0xc0008184b8, 0x6, 0x0, 0x0, 0x0, ...)
/Users/USER/repo/udeploy/component/integration/aws/service/populate.go:89 +0x4ac
github.com/turnerlabs/udeploy/component/integration/aws/service.Populate(0xc0005f8b10, 0x0, 0xb5b2e6d5a1be945e, 0x282569e5)
/Users/USER/repo/udeploy/component/integration/aws/service/populate.go:47 +0x4c9
github.com/turnerlabs/udeploy/component/supplement.Instances(0x1f23c20, 0xc0004be000, 0xc0004e9f18, 0x7, 0xc0005f8b10, 0x0, 0x1, 0x0, 0x0)
/Users/USER/repo/udeploy/component/supplement/instances.go:30 +0x1f7
github.com/turnerlabs/udeploy/component/cache.EnsureApp(0x1f23c20, 0xc0004be000, 0xc0003d9690, 0xd, 0x0, 0x0)
/Users/USER/repo/udeploy/component/cache/ensure.go:42 +0x15e
github.com/turnerlabs/udeploy/component/cache.Ensure(0x1f23c20, 0xc0004be000, 0x1b1fa01, 0xc0004be000)
/Users/USER/repo/udeploy/component/cache/ensure.go:76 +0x1ba
main.main.func2.1(0x1f23c20, 0xc0004be000, 0x1b1fa80, 0x2667d90)
/Users/USER/repo/udeploy/main.go:52 +0x35
go.mongodb.org/mongo-driver/mongo.WithSession(0x1f14e80, 0xc000124000, 0x1f20360, 0xc00000e440, 0x1ca2c08, 0x0, 0x0)
/Users/USER/go/pkg/mod/go.mongodb.org/[email protected]/mongo/client.go:498 +0xed
main.main.func2(0x1f14e80, 0xc000124000, 0x1f20360, 0xc00000e440, 0xc0002ac130)
/Users/USER/repo/udeploy/main.go:51 +0x59
created by main.main
/Users/USER/repo/udeploy/main.go:50 +0x5ec
exit status 2
When terraforming up new infrastructure, the terraform apply will fail, due to the lambda using node 8.10.
Error: Error creating Lambda function: InvalidParameterValueException: The runtime parameter of nodejs8.10 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs12.x) while creating or updating functions.
{
Message_: "The runtime parameter of nodejs8.10 is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs12.x) while creating or updating functions.",
Type: "User"
}
When an ECS service is scaling down, AWS events a task stop code that was introduced a few months back for shutting down tasks called ServiceSchedulerInitiated
.The event does not always have the field ExecutionStoppedAt
populated when using this stop code. When the portal tries to process this ECS event status as an error it crashes the portal when ExecutionStoppedAt
is missing.
N/A
The portal should not try to process the event with the stop code of ServiceSchedulerInitiated
as an error. This stop code should be considered a normal shutdown for the task and be ignored by the portal.
| 2021-02-10T12:29:28.024-05:00 | panic: runtime error: invalid memory address or nil pointer dereference |
| 2021-02-10T12:29:28.024-05:00 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0xd55689]|
| 2021-02-10T12:29:28.024-05:00 | goroutine 45 [running]:|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/integration/aws/service.getServiceError(0xc0004ba5a0, 0x14, 0x14, 0x1176592e000, 0xc000440f80, 0xc000730d80, 0x14)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/integration/aws/service/populate.go:223 +0xd9|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/integration/aws/service.checkError(0xc000269e40, 0xc0004ba5a0, 0x14, 0x14, 0x1176592e000, 0x14, 0x14)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/integration/aws/service/populate.go:166 +0x62|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/integration/aws/service.populateInst(0xc000774690, 0x2e, 0x4, 0xc00033b840, 0x3d, 0xc000487198, 0x6, 0x0, 0x0, 0x0, ...)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/integration/aws/service/populate.go:91 +0x4ac|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/integration/aws/service.Populate(0xc0004af4a8, 0xc00077b440, 0xc00034cf98, 0xc0004af7c8)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/integration/aws/service/populate.go:49 +0x4c9|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/supplement.Instances(0x1393200, |0xc000471000, 0xc000486b48, 0x7, 0xc0004af4a8, 0xb4cf1d00, 0xc0003fef50, 0x4c, 0xc000ab6c60)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/supplement/instances.go:30 +0x248|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/sync.handleChange(0x1393200, |0xc000471000, 0xc0003fe780, 0x4c, 0xc0000cd200, 0x832)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/sync/aws.go:95 +0x3c9|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/integration/aws/sqs.MonitorChanges(0x1393200, 0xc000471000, 0x1113760, 0x20, 0x10a3c20)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/integration/aws/sqs/events.go:68 +0x573|
| 2021-02-10T12:29:28.024-05:00 | github.com/turnerlabs/udeploy/component/sync.AWSWatchEvents(...)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/component/sync/aws.go:60|
| 2021-02-10T12:29:28.024-05:00 | main.monitorChanges.func3.1(0x1393200, 0xc000471000, 0xf812c0, 0x1ae64b0)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/monitor.go:42 +0x45|
| 2021-02-10T12:29:28.024-05:00 | go.mongodb.org/mongo-driver/mongo.WithSession(0x1383900, 0xc000092000, 0x138f700, 0xc000470ba0, 0x1113f00, 0xc000488160, 0x4)|
| 2021-02-10T12:29:28.024-05:00 | /go/pkg/mod/go.mongodb.org/[email protected]/mongo/client.go:498 +0xed|
| 2021-02-10T12:29:28.024-05:00 | main.monitorChanges.func3(0x1383900, 0xc000092000, 0x138f700, 0xc000470ba0)|
| 2021-02-10T12:29:28.024-05:00 | /deploy/monitor.go:41 +0x59|
| 2021-02-10T12:29:28.025-05:00 | created by main.monitorChanges|
| 2021-02-10T12:29:28.025-05:00 | /deploy/monitor.go:40 +0xe1 |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.