Giter Site home page Giter Site logo

pandas.to_sql about pyathena HOT 3 CLOSED

kismsu avatar kismsu commented on May 10, 2024
pandas.to_sql

from pyathena.

Comments (3)

laughingman7743 avatar laughingman7743 commented on May 10, 2024

I tried the following simple script, but I get an error. 😢
For now, I don't know what implementation I should do. It is necessary to investigate.
It would be useful to be able to write DataFrame values directly to Athena with the to_sql method.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from urllib.parse import quote_plus

import pandas as pd
from sqlalchemy import create_engine

df = pd.DataFrame({'a': [1, 2, 3, 4, 5]})
conn_str = 'awsathena+rest://:@athena.{region_name}.amazonaws.com:443/' \
           '{schema_name}?s3_staging_dir={s3_staging_dir}'
engine = create_engine(conn_str.format(
    region_name='us-west-2',
    schema_name='default',
    s3_staging_dir=quote_plus('s3://BUCKET/path/to/')),
    echo=True)

df.to_sql('test', engine, if_exists='append')
2019-11-24 23:00:21,324 INFO sqlalchemy.engine.base.Engine 
                SELECT
                  table_schema,
                  table_name,
                  column_name,
                  data_type,
                  is_nullable,
                  column_default,
                  ordinal_position,
                  comment
                FROM information_schema.columns
                WHERE table_schema = 'default'
                AND table_name = 'test'
                
2019-11-24 23:00:21,324 INFO sqlalchemy.engine.base.Engine {}
2019-11-24 23:00:30,992 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE "test" (
	"index" BIGINT, 
	"a" BIGINT
)


2019-11-24 23:00:30,992 INFO sqlalchemy.engine.base.Engine {}
Failed to execute query.
Traceback (most recent call last):
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 202, in _execute
    **request).get('QueryExecutionId', None)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 58, in retry_api_call
    return retry(func, *args, **kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 319, in iter
    return fut.result()
  File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 428, in result
    return self.__get_result()
  File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'
Traceback (most recent call last):
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 202, in _execute
    **request).get('QueryExecutionId', None)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 58, in retry_api_call
    return retry(func, *args, **kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 319, in iter
    return fut.result()
  File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 428, in result
    return self.__get_result()
  File "c:\users\laughingman7743\appdata\local\programs\python\python37\Lib\concurrent\futures\_base.py", line 384, in __get_result
    raise self._exception
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\tenacity\__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\botocore\client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\default.py", line 550, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 29, in _wrapper
    return wrapped(*args, **kwargs)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\cursor.py", line 50, in execute
    cache_size=cache_size)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 205, in _execute
    raise_from(DatabaseError(*e.args), e)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\future\utils\__init__.py", line 400, in raise_from
    exec(execstr, myglobals, mylocals)
  File "<string>", line 1, in <module>
pyathena.error.DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:/Users/laughingman7743/github/PyAthena/to_sql_test.py", line 17, in <module>
    df.to_sql('test', engine, if_exists='append')
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\core\generic.py", line 2531, in to_sql
    dtype=dtype, method=method)
2019-11-24 23:00:31,363 INFO sqlalchemy.engine.base.Engine ROLLBACK
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 460, in to_sql
    chunksize=chunksize, dtype=dtype, method=method)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 1173, in to_sql
    table.create()
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 585, in create
    self._execute_create()
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\pandas\io\sql.py", line 569, in _execute_create
    self.table.create()
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\schema.py", line 860, in create
    bind._run_visitor(ddl.SchemaGenerator, self, checkfirst=checkfirst)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 2033, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1607, in _run_visitor
    visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\visitors.py", line 131, in traverse_single
    return meth(obj, **kw)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\ddl.py", line 826, in visit_table
    include_foreign_key_constraints,
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\sql\ddl.py", line 72, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1050, in _execute_ddl
    compiled,
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
    e, statement, parameters, cursor, context
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1466, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\util\compat.py", line 383, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\util\compat.py", line 128, in reraise
    raise value.with_traceback(tb)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\sqlalchemy\engine\default.py", line 550, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\util.py", line 29, in _wrapper
    return wrapped(*args, **kwargs)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\cursor.py", line 50, in execute
    cache_size=cache_size)
  File "C:\Users\laughingman7743\github\PyAthena\pyathena\common.py", line 205, in _execute
    raise_from(DatabaseError(*e.args), e)
  File "C:\Users\laughingman7743\.virtualenvs\PyAthena-L16IeiG_\lib\site-packages\future\utils\__init__.py", line 400, in raise_from
    exec(execstr, myglobals, mylocals)
  File "<string>", line 1, in <module>
sqlalchemy.exc.DatabaseError: (pyathena.error.DatabaseError) An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:14: no viable alternative at input 'CREATE TABLE "test"'
[SQL: 
CREATE TABLE "test" (
	"index" BIGINT, 
	"a" BIGINT
)

]
(Background on this error at: http://sqlalche.me/e/4xp6)

Process finished with exit code 1

from pyathena.

DyfanJones avatar DyfanJones commented on May 10, 2024

hi @laughingman7743,

This is a fantastic package and it inspired me to create an R alternative RAthena. Anyway I believe there could be a simple solution for uploading pandas dataframes from python to Athena.

In the RAthena package the method I use is to upload the dataframe straight to S3 using boto3 and then register the S3 file as an external table using an athena ddl. For the s3 location I allow the user to set it within the function or the default s3 location is used that is set the in connect function.

Here is the source script: https://github.com/DyfanJones/RAthena/blob/master/R/table.R .

If you are unfamiliar with R I am happy to test out some python sample code to see if it will work for pyathena.

from pyathena.

laughingman7743 avatar laughingman7743 commented on May 10, 2024

@DyfanJones Thanks! I will check it.

from pyathena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.