Comments (10)
AWS Config does not have a complete accounting of resources in AWS. IAM access keys is a good example. You can get IAM users/roles/groups out of config, but you cannot query access key IDs.
You could use it to get some subset of the data, but it wouldn't be complete.
from cloudquery.
That assumes you have a cloudtrail log. And it might be challenging to follow the stream. If the long running process dies, would it know where to pick back up, or would it repull everything, and then restart following the stream?
I like the idea of parallelizing as much as possible, and having robust retry/backoff logic for provider API calls. I made a separate issue for that: #59
from cloudquery.
So actually now CloudQuery concurrently pulls data from the same region. It's should be pretty easy to add the same logic for accounts. One issue I think we might hit is the rate limits if we will have too many concurrent API calls.
Few thoughts: Maybe we can add a variable that specify number of concurrent requests? Other option is to have one long running job that fetches all the data and then subscribe to cloudtrail logs to pull only resources that were changed? What do you think?
from cloudquery.
Rate limit could be a quick fix at first, as even with concurrency AWS API has rate limit by IP, not only by access key/role.
from cloudquery.
Yeah, I guess a robust retry/backoff should be part of the solution here. Also, There is AWS V2 which should be faster in general and I think we need to migrate to it. Another option is to try and pull data from AWS Config in bulk (Never tried it, so just an idea).
from cloudquery.
Hey @yevgenypats , what do you mean by 'AWS Config in bulk' ?
from cloudquery.
@Rackme I didn't do enough research yet but an idea I had in back of my mind is to try and use https://docs.aws.amazon.com/config/latest/APIReference/API_SelectResourceConfig.html or https://docs.aws.amazon.com/config/latest/APIReference/API_BatchGetResourceConfig.html api calls to somehow get the data not via the standard APIs and this should help with the throttling issue. Not sure it's possible though and this API might not have all the data we want. Are you familiar with AWS Config? maybe you can help me shed some light on this one?
from cloudquery.
@yevgenypats I've never used AWS config to pull a bunch of data, only for a few checks sorry ...
As you said some of already covered services by cloudquery (directconnect, emr, organizations) seems to miss in their schema :
https://github.com/awslabs/aws-config-resource-schema/tree/master/config/properties/resource-types
If there is a maximum response size, the API Select documentation
is a little disturbing about the possibility to easily handle pagination, don't you think ?
LIMIT
Valid Range: Minimum value of 0. Maximum value of 100.
I've tried with select-resource-config
, only 25 resources are returned per page.
from cloudquery.
I think AWS tends to rate limit at the account level. So if you run each account concurrently, they shouldn't step on each others toes unless AWS also implements a global rate limit based on IP or something like that
from cloudquery.
Solved with https://github.com/cloudquery/cq-provider-aws/releases/tag/v0.2.5
from cloudquery.
Related Issues (20)
- feat: Add project and group settings to gitlab source plugin
- feat: Allow `aws_organizations_accounts` to be used as a cache to retrieve account IDs to sync from
- bug: Failed to save state: write failed: table cq_state_bq not found HOT 2
- feat: Support impersonate service account in the BigQuery destination plugin HOT 1
- feat (resources): add Oracle Kubernetes Clusters (OKE) table and child tables
- bug: Docker plugins from CloudQuery registry don't work with team API keys
- fix: Tests not Reporting Failures Properly
- bug: Inconsistant Tag Structure in AWS Plugin HOT 1
- feat: S3 Path Based on Data
- bug: For GCS plugin tests, the bucket name needs to be set in file HOT 1
- feat: Add CloudFront Key Value Stores to AWS source plugin
- feat: Add Snyk Targets HOT 1
- bug: Snyk Rate Limiting
- bug: S3 ContentType
- Visibility AWS ENIs usage per aws service
- bug: Running Sequential Syncs with MySQL results in error
- bug: Build: `CGO_ENABLED` doesn't seem to have an effect in the release process HOT 1
- feat: Document a "rich" docker image is needed to run the SQLite, DuckDB and Snowflake plugins
- Feedback for Official Stripe Plugin. HOT 12
- feat: Support more Reserved Instances tables HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloudquery.