Comments (7)
Great post! Took a lot of advice and ideas from this post while researching which tool to pick for the team I'm in. After all we wen't with Google's Data Catalog b/c we are heavy users of GCP. Do you have experience with that? I'd be curious to hear your opinion. I'm not 100% happy b/c there is so much development to add (tools for uploading templates and tagging resources, automated metric stats calculators, etc.). But maybe these obstacles are there for most platforms.
from eugeneyan-comments.
Check spothero's link, please. That is same as saxobank's.
from eugeneyan-comments.
I'm afraid I haven't looked at any proprietary data discovery tools (including Google Data Catalog). Have not been able to find much reviews of it too, though this may be helpful.
from eugeneyan-comments.
Thanks for raising this JeongHoon! Fixed.
from eugeneyan-comments.
I feel that you should include CKAN in the comparison list. While admittedly an older solution, it is viable as a data catalog for flat files. It doesnt include live connections to data sources, that is its limiting constaint. It is relatively mature as a product compared to these developing products.
from eugeneyan-comments.
Great Post! lots of useful insights.
I have a question about feature reuse. Considering a case that 10 users using the feature A, feature A's owner is jack. Can Jack modify and optimize the feature? does Jack need to guarantee the quality of feature, both the performance and the accuracy? In this circumstance, feature reuse increase the feature owner's burden. So how does this kind collaboration works?
from eugeneyan-comments.
@FreddieSun This is an interesting question and I don't have it all worked out yet.
If Jack is simply creating features as exhaust of his own machine learning pipeline, he can make the features available without guarantees of performance (i.e., Caveat Emptor). Thus, Jack can publish features he's using for his own use case and make it available to others without taking on the ops burden of updating/maintaining it for other use cases.
Alternatively, Jack can go the extra mile and maintain multiple versions of the feature in the short term. Thus, if he's updating a feature from v1 -> v2, he might provide v1 and v2 simultaneously for a period (e.g., a month) before deprecating v1. Consumers of Jack's features can be identified via looking at query logs before sending them a notification. Nonetheless, this is more burdensome and IMHO, Jack is in no way obliged to do this.
On the other hand, if Jack part of a team that provides features for internal users and their downstream use cases, he'll probably have to adhere to some contract with downstream users, such as ensuring the quality of embeddings, accuracy of imputed data, etc.
from eugeneyan-comments.
Related Issues (20)
- https://eugeneyan.com/writing/onboarding/ HOT 4
- https://eugeneyan.com/writing/design-patterns/ HOT 4
- https://eugeneyan.com/writing/flying-dagger/ HOT 1
- https://eugeneyan.com/writing/15-5/ HOT 8
- https://eugeneyan.com/writing/uncommon-python/ HOT 9
- https://eugeneyan.com/writing/simplicity/ HOT 1
- https://eugeneyan.com/writing/data-science-and-agile-frameworks-for-effectiveness/ HOT 2
- https://eugeneyan.com/writing/recsys2022/ HOT 1
- https://eugeneyan.com/writing/autoencoders-vs-diffusers/ HOT 1
- https://eugeneyan.com/writing/mechanisms-for-projects/ HOT 1
- https://eugeneyan.com/writing/what-i-did-not-learn-about-writing-in-school/ HOT 3
- https://eugeneyan.com/writing/bandits/ HOT 2
- https://eugeneyan.com/writing/labeling-guidelines/ HOT 1
- https://eugeneyan.com/writing/llm-experiments/ HOT 5
- https://eugeneyan.com/writing/open-llms/ HOT 1
- https://eugeneyan.com/writing/obsidian-copilot/ HOT 2
- https://eugeneyan.com/writing/more-patterns/ HOT 1
- https://eugeneyan.com/writing/llm-patterns/ HOT 17
- https://eugeneyan.com/writing/attention/ HOT 2
- https://eugeneyan.com/writing/finetuning/ HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from eugeneyan-comments.