Comments (2)
How do we feel about the following API?
from etl_manager.meta import get_existing_database_from_glue_catalogue
# Note I am not going to attempt to read current tables from Glue and create table objects
db = get_existing_database_from_glue_catalogue('my_database')
t = TableMeta(name="table1", location="somewhere")
t.add_column(name= "employee_id2", type= "character", description= "a new description")
db.add_table(t)
# Will not replace existing tables unless overwrite is set to true
db.append_tables_to_glue_database(overwrite=False)
from etl_manager.
Yeah fine by me. On top of that this should be used to fix #117. I'd imagine that you could have something like:
def create_glue_database(self, delete_if_exists=False):
"""
Creates a database in Glue based on the database object calling the method function.
By default, will error out if database exists - unless delete_if_exists is set to True (default is False).
"""
if delete_if_exists:
self.delete_glue_database()
db = get_existing_database_from_glue_catalogue(self.name)
if db:
existing_tables = db._tables
else:
db = {"DatabaseInput": {"Description": self.description, "Name": self.name}}
_glue_client.create_database(**db)
existing_tables = []
for tab in [t for t in self._tables if t not in existing_tables]:
glue_table_def = tab.glue_table_definition(self.s3_database_path)
_glue_client.create_table(DatabaseName=self.name, TableInput=glue_table_def)
There are some issues with the above (indenting probably for one). But we would probably want to parameterise the function to only update new tables, set a list of tables to update or do all of them. Anyway thought I'd add this as it will define what is returned from get_existing_database_from_glue_catalogue
from etl_manager.
Related Issues (20)
- Needs to raise error notice on failure
- metadata_base_path typo
- json schema for meta data table not explicit enough
- Partitions as last columns is not persistent
- Those spaces after the semi-colons are embarrassing
- update readme to document option for including headers
- Split out metadata schema and description from etl_manager to metadata_schema repo HOT 3
- string provided must be lowercase HOT 2
- Build problems HOT 1
- Add support for jars
- Multiple libraries in github_urls.txt can cause name collisions HOT 1
- Should add moto to tests
- Allow user to add a table to existing database HOT 1
- Validation is incorrectly failing STRUCT and ARRAY columns HOT 1
- Add ability to auto-generate a TableMeta object from parquet metadata
- _validate_string in utils.py : error message is vague
- etl manager should support / default to hive json serde HOT 1
- ETL manager should print job ID.
- Incorrect setter HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etl_manager.