Comments (3)
I tried the following simple script, but I get an error. 😢
For now, I don't know what implementation I should do. It is necessary to investigate.
It would be useful to be able to write DataFrame values directly to Athena with the to_sql method.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from urllib.parse import quote_plus
import pandas as pd
from sqlalchemy import create_engine
df = pd.DataFrame({'a': [1, 2, 3, 4, 5]})
conn_str = 'awsathena+rest://:@athena.{region_name}.amazonaws.com:443/' \
'{schema_name}?s3_staging_dir={s3_staging_dir}'
engine = create_engine(conn_str.format(
region_name='us-west-2',
schema_name='default',
s3_staging_dir=quote_plus('s3://BUCKET/path/to/')),
echo=True)
df.to_sql('test', engine, if_exists='append')
2019-11-24 23:00:21,324 INFO sqlalchemy.engine.base.Engine
SELECT
table_schema,
table_name,
column_name,
data_type,
is_nullable,
column_default,
ordinal_position,
comment
FROM information_schema.columns
WHERE table_schema = 'default'
AND table_name = 'test'
2019-11-24 23:00:21,324 INFO sqlalchemy.engine.base.Engine {}
2019-11-24 23:00:30,992 INFO sqlalchemy.engine.base.Engine
CREATE TABLE "test" (
"index" BIGINT,
"a" BIGINT
)
2019-11-24 23:00:30,992 INFO sqlalchemy.engine.base.Engine {}
Failed to execute query.
Traceback (most recent call last):
File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 202, in _execute
**request).get('QueryExecutionId', None)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 58, in retry_api_call
return retry(func, *args, **kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 358, in call
do = self.iter(retry_state=retry_state)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 319, in iter
return fut.result()
File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 428, in result
return self.__get_result()
File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 361, in call
result = fn(*args, **kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'
Traceback (most recent call last):
File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 202, in _execute
**request).get('QueryExecutionId', None)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 58, in retry_api_call
return retry(func, *args, **kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 358, in call
do = self.iter(retry_state=retry_state)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 319, in iter
return fut.result()
File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 428, in result
return self.__get_result()
File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 361, in call
result = fn(*args, **kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
cursor, statement, parameters, context
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\default.py", line 550, in do_execute
cursor.execute(statement, parameters)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 29, in _wrapper
return wrapped(*args, **kwargs)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\cursor.py", line 50, in execute
cache_size=cache_size)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 205, in _execute
raise_from(DatabaseError(*e.args), e)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\future\utils\__init__.py", line 400, in raise_from
exec(execstr, myglobals, mylocals)
File "<string>", line 1, in <module>
pyathena.error.DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/laughingman7743/github/PyAthena/to_sql_test.py", line 17, in <module>
df.to_sql('test', engine, if_exists='append')
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\core\generic.py", line 2531, in to_sql
dtype=dtype, method=method)
2019-11-24 23:00:31,363 INFO sqlalchemy.engine.base.Engine ROLLBACK
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 460, in to_sql
chunksize=chunksize, dtype=dtype, method=method)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 1173, in to_sql
table.create()
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 585, in create
self._execute_create()
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 569, in _execute_create
self.table.create()
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\schema.py", line 860, in create
bind._run_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 2033, in _run_visitor
conn._run_visitor(visitorcallable, element, **kwargs)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1607, in _run_visitor
visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\visitors.py", line 131, in traverse_single
return meth(obj, **kw)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\ddl.py", line 826, in visit_table
include_foreign_key_constraints,
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 988, in execute
return meth(self, multiparams, params)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\ddl.py", line 72, in _execute_on_connection
return connection._execute_ddl(self, multiparams, params)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1050, in _execute_ddl
compiled,
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
e, statement, parameters, cursor, context
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1466, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\util\compat.py", line 383, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\util\compat.py", line 128, in reraise
raise value.with_traceback(tb)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
cursor, statement, parameters, context
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\default.py", line 550, in do_execute
cursor.execute(statement, parameters)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 29, in _wrapper
return wrapped(*args, **kwargs)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\cursor.py", line 50, in execute
cache_size=cache_size)
File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 205, in _execute
raise_from(DatabaseError(*e.args), e)
File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\future\utils\__init__.py", line 400, in raise_from
exec(execstr, myglobals, mylocals)
File "<string>", line 1, in <module>
sqlalchemy.exc.DatabaseError: (pyathena.error.DatabaseError) An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'
[SQL:
CREATE TABLE "test" (
"index" BIGINT,
"a" BIGINT
)
]
(Background on this error at: http://sqlalche.me/e/4xp6)
Process finished with exit code 1
from pyathena.
hi @laughingman7743,
This is a fantastic package and it inspired me to create an R alternative RAthena. Anyway I believe there could be a simple solution for uploading pandas dataframes from python to Athena.
In the RAthena
package the method I use is to upload the dataframe straight to S3 using boto3 and then register the S3 file as an external table using an athena ddl. For the s3 location I allow the user to set it within the function or the default s3 location is used that is set the in connect function.
Here is the source script: https://github.com/DyfanJones/RAthena/blob/master/R/table.R .
If you are unfamiliar with R I am happy to test out some python sample code to see if it will work for pyathena.
from pyathena.
@DyfanJones Thanks! I will check it.
from pyathena.
Related Issues (20)
- Add custom filesystem object to arrow engine HOT 2
- Compatibility issue with SQLAlchemy<1.4 HOT 2
- `UUID` in a query gets garbled HOT 3
- Add support for Spark calculations HOT 8
- Add Endpoint_URL param to SQLAlchemy HOT 2
- SQLAlchemy dialect uses deprecated dbapi() method HOT 1
- Create documents in Sphinx and publish them on GitHub Pages HOT 1
- Breaking change in the release between 3.0.10 and 3.1.0 HOT 6
- Okta authentication support HOT 1
- Integer variant types incorrectly rendered in DDL HOT 5
- Support for Iceberg FOR SYSTEM_VERSION AS OF HOT 7
- AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached HOT 2
- FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas.
- Latest PyAthena no longer compatible with SQLAlchemy 1.4
- [Warning] SADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi(). HOT 1
- pyathena.error.DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:3242: mismatched input 'OFFSET'. Expecting: <EOF> HOT 2
- If a value for a partition key is None, to_sql doesn't warn you and no data is written
- GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException HOT 1
- URLLIB3 / Requests HOT 4
- Formatter does not escape single quotes when operation is MERGE or UPDATE HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyathena.