Giter Site home page Giter Site logo

d6tstack's Introduction

Databolt File Ingest

Quickly ingest raw files. Works for XLS, CSV, TXT which can be exported to CSV, Parquet, SQL and Pandas. d6tstack solves many performance and schema problems typically encountered when ingesting raw files.

Features include

  • Fast pd.to_sql() for postgres and mysql
  • Quickly check columns for consistency across files
  • Fix added/missing columns
  • Fix renamed columns
  • Check Excel tabs for consistency across files
  • Excel to CSV converter (incl multi-sheet support)
  • Out of core functionality to process large files
  • Export to CSV, parquet, SQL, pandas dataframe

Installation

Latest published version pip install d6tstack. Additional requirements:

  • d6tstack[psql]: for pandas to postgres
  • d6tstack[mysql]: for pandas to mysql
  • d6tstack[xls]: for excel support
  • d6tstack[parquet]: for ingest csv to parquet

Latest dev version from github pip install git+https://github.com/d6t/d6tstack.git

Sample Use


import d6tstack

# fast CSV to SQL import - see SQL examples notebook
d6tstack.utils.pd_to_psql(df, 'postgresql+psycopg2://usr:pwd@localhost/db', 'tablename')
d6tstack.utils.pd_to_mysql(df, 'mysql+mysqlconnector://usr:pwd@localhost/db', 'tablename')
d6tstack.utils.pd_to_mssql(df, 'mssql+pymssql://usr:pwd@localhost/db', 'tablename') # experimental

# ingest mutiple CSVs which may have data schema changes - see CSV examples notebook

import glob
>>> c = d6tstack.combine_csv.CombinerCSV(glob.glob('data/*.csv'))

# show columns of each file
>>> c.columns()

# quick check if all files have consistent columns
>>> c.is_all_equal()
False

# show which files have missing columns
>>> c.is_column_present()
   filename  cost  date profit profit2 sales
0  feb.csv  True  True   True   False  True
2  mar.csv  True  True   True    True  True

>>> c.combine_preview() # keep all columns
   filename  cost        date profit profit2 sales
0   jan.csv  -80  2011-01-01     20     NaN   100
0   mar.csv  -100  2011-03-01    200     400   300

>>> d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), columns_select_common=True).combine_preview() # keep common columns
   filename  cost        date profit sales
0   jan.csv  -80  2011-01-01     20   100
0   mar.csv  -100  2011-03-01    200   300

>>> d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), columns_rename={'sales':'revenue'}).combine_preview()
   filename  cost        date profit profit2 revenue
0   jan.csv  -80  2011-01-01     20     NaN   100
0   mar.csv  -100  2011-03-01    200     400   300

# to come: check if columns match database
>>> c.is_columns_match_db('postgresql+psycopg2://usr:pwd@localhost/db', 'tablename')

# create csv with first nrows_preview rows of each file
>>> c.to_csv_head()

# export to csv, parquet, sql. Out of core with optimized fast imports for postgres and mysql
>>> c.to_pandas()
>>> c.to_csv_align(output_dir='process/')
>>> c.to_parquet_align(output_dir='process/')
>>> c.to_sql_combine('postgresql+psycopg2://usr:pwd@localhost/db', 'tablename')
>>> c.to_psql_combine('postgresql+psycopg2://usr:pwd@localhost/db', 'tablename') # fast, using COPY FROM
>>> c.to_mysql_combine('mysql+mysqlconnector://usr:pwd@localhost/db', 'tablename') # fast, using LOAD DATA LOCAL INFILE

# read Excel files - see Excel examples notebook for more details
import d6tstack.convert_xls

d6tstack.convert_xls.read_excel_advanced('test.xls',
    sheet_name='Sheet1', header_xls_range="B2:E2")

d6tstack.convert_xls.XLStoCSVMultiSheet('test.xls').convert_all(header_xls_range="B2:E2")

d6tstack.convert_xls.XLStoCSVMultiFile(glob.glob('*.xls'), 
    cfg_xls_sheets_sel_mode='name_global',cfg_xls_sheets_sel='Sheet1')
    .convert_all(header_xls_range="B2:E2")

Documentation

Faster Data Engineering

Check out other d6t libraries to solve common data engineering problems, including

  • data worfklows,build highly effective data science workflows
  • fuzzy joins, quickly join data
  • data pipes, quickly share and distribute data

https://github.com/d6t/d6t-python

And we encourage you to join the Databolt blog to get updates and tips+tricks http://blog.databolt.tech

Collecting Errors Messages and Usage statistics

We have put a lot of effort into making this library useful to you. To help us make this library even better, it collects ANONYMOUS error messages and usage statistics. See d6tcollect for details including how to disable collection. Collection is asynchronous and doesn't impact your code in any way.

It may not catch all errors so if you run into any problems or have any questions, please raise an issue on github.

d6tstack's People

Contributors

brunompacheco avatar d6tdev avatar leonardaukea avatar royopa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

d6tstack's Issues

pd_to_psql does not work if your data has comma in it

The library works well with data columns that doesn't have commas but if you have a comma that's like a name e.g. "Smith, John", and then you try doing pd_to_psql, you will get the error below.

Any thoughts?


DataError Traceback (most recent call last)
in
----> 8 d6tstack.utils.pd_to_psql(df, fast_engine, 'xxx', 'xxx', if_exists='append')
9
10

/usr/local/lib/python3.7/site-packages/d6tstack/utils.py in pd_to_psql(df, uri, tablename, schema_name, if_exists)
122 df.to_csv(fbuf, index=False, header=False)
123 fbuf.seek(0)
--> 124 cursor.copy_from(fbuf, tablename, sep=',', null='')
125 sql_cnxn.commit()
126 cursor.close()

DataError: extra data after last expected column
CONTEXT: COPY xxxx, line 1:

Writing File in Chunk

Hi,
I am wondering that it is possible to write the file in chunk size because when I am trying to handle a large size file, my server is getting out of memory. Though your library is working outstanding for the small size of file. I am using the following method to write file in the database.

d6tstack.utils.pd_to_mysql(meta_dataframe, 'mysql+mysqlconnector://abc:[email protected]/abcdb', meta_filename)

How can I handle large size of files? Any suggestions would be great.

UnicodeDecodeError

Allow to set the encoding when writing the dataframe to csv, before loading infile. It gets an error when the csv contains unicode characters.

Cannot import csv to mssql

Hi, I'm trying to import a csv to a mssql server and I get the following error:

creating mysql.csv ok
loading mysql.csv ok

---------------------------------------------------------------------------
MSSQLDatabaseException                    Traceback (most recent call last)
src/pymssql.pyx in pymssql.Cursor.execute()

src/_mssql.pyx in _mssql.MSSQLConnection.execute_query()

src/_mssql.pyx in _mssql.MSSQLConnection.execute_query()

src/_mssql.pyx in _mssql.MSSQLConnection.format_and_run_query()

src/_mssql.pyx in _mssql.check_cancel_and_raise()

src/_mssql.pyx in _mssql.maybe_raise_MSSQLDatabaseException()

MSSQLDatabaseException: (4860, b'Cannot bulk load. The file "mysql.csv" does not exist.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')

During handling of the above exception, another exception occurred:

OperationalError                          Traceback (most recent call last)
~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1245                     self.dialect.do_execute(
-> 1246                         cursor, statement, parameters, context
   1247                     )

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
    580     def do_execute(self, cursor, statement, parameters, context=None):
--> 581         cursor.execute(statement, parameters)
    582 

src/pymssql.pyx in pymssql.Cursor.execute()

OperationalError: (4860, b'Cannot bulk load. The file "mysql.csv" does not exist.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')

The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
<ipython-input-11-267ba832f344> in <module>
      5 #uri_mssql = f'mysql+mysqlconnector://{user}:{passW}@{host}:1433/{db}'
      6 
----> 7 d6tstack.utils.pd_to_mssql(df, uri_mssql, 'prueba', if_exists='replace') # experimental

~/codigos/thomson/myenv/lib/python3.6/site-packages/d6tcollect/__init__.py in wrapper(*args, **kwargs)
     78             payload['exceptionMsg'] = str(e)
     79             _submit(payload)
---> 80             raise e
     81 
     82     return wrapper

~/codigos/thomson/myenv/lib/python3.6/site-packages/d6tcollect/__init__.py in wrapper(*args, **kwargs)
     72         _submit(payload)
     73         try:
---> 74             return func(*args, **kwargs)
     75         except Exception as e:
     76             payload['event'] = 'exception'

~/codigos/thomson/myenv/lib/python3.6/site-packages/d6tstack/utils.py in pd_to_mssql(df, uri, table_name, schema_name, if_exists, tmpfile)
    188         table_name = '{}.{}'.format(schema_name,table_name)
    189     sql_load = "BULK INSERT {} FROM '{}';".format(table_name, tmpfile)
--> 190     sql_engine.execute(sql_load)
    191 
    192     os.remove(tmpfile)

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in execute(self, statement, *multiparams, **params)
   2180 
   2181         connection = self._contextual_connect(close_with_result=True)
-> 2182         return connection.execute(statement, *multiparams, **params)
   2183 
   2184     def scalar(self, statement, *multiparams, **params):

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params)
    974         """
    975         if isinstance(object_, util.string_types[0]):
--> 976             return self._execute_text(object_, multiparams, params)
    977         try:
    978             meth = object_._execute_on_connection

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_text(self, statement, multiparams, params)
   1147             parameters,
   1148             statement,
-> 1149             parameters,
   1150         )
   1151         if self._has_events or self.engine._has_events:

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1248         except BaseException as e:
   1249             self._handle_dbapi_exception(
-> 1250                 e, statement, parameters, cursor, context
   1251             )
   1252 

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1474                 util.raise_from_cause(newraise, exc_info)
   1475             elif should_wrap:
-> 1476                 util.raise_from_cause(sqlalchemy_exception, exc_info)
   1477             else:
   1478                 util.reraise(*exc_info)

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/util/compat.py in raise_from_cause(exception, exc_info)
    396     exc_type, exc_value, exc_tb = exc_info
    397     cause = exc_value if exc_value is not exception else None
--> 398     reraise(type(exception), exception, tb=exc_tb, cause=cause)
    399 
    400 

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
    150             value.__cause__ = cause
    151         if value.__traceback__ is not tb:
--> 152             raise value.with_traceback(tb)
    153         raise value
    154 

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1244                 if not evt_handled:
   1245                     self.dialect.do_execute(
-> 1246                         cursor, statement, parameters, context
   1247                     )
   1248         except BaseException as e:

~/codigos/thomson/myenv/lib/python3.6/site-packages/sqlalchemy/engine/default.py in do_execute(self, cursor, statement, parameters, context)
    579 
    580     def do_execute(self, cursor, statement, parameters, context=None):
--> 581         cursor.execute(statement, parameters)
    582 
    583     def do_execute_no_params(self, cursor, statement, context=None):

src/pymssql.pyx in pymssql.Cursor.execute()

OperationalError: (pymssql.OperationalError) (4860, b'Cannot bulk load. The file "mysql.csv" does not exist.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n')
[SQL: BULK INSERT prueba FROM 'mysql.csv';]
(Background on this error at: http://sqlalche.me/e/e3q8)


pd_to_mssql doesn't use schema arguement

example code

`uri_mssql = f'mssql+pymssql://{u}:{p}@$server/$db'

d6tstack.utils.pd_to_mssql(df = df, uri = uri_mssql, table_name = 'mytable', schema_name = 'myschema', if_exists='replace')`

The error reads that I dont have access to the schema 'dbo' when making the connection, which I assume is the default schema that the connection trys to make when schema_name is set to None.

pd_to_mssql error: Cannot bulk load because the file "mysql.csv" could not be opened

The function pd_to_mssql uses BULK INSERT which requires the target file to exist on the same instance as the SQL Server. If you're running Python code on a remote machine, like your own computer, then pd_to_mssql will return the error, "Cannot bulk load because the file "mysql.csv" could not be opened". I'm hopeful you can find a workaround for this but I imagine you'll have to change your approach.

_mysql_connector.MySQLInterfaceError: LOAD DATA LOCAL INFILE file request rejected due to restrictions on access

I've been using d6stack for months and have never had any problems, but recently this error message appeared below. How can I do to be able to solve this, without having to change the variables or permissions of my database?

I think there's something connected to mysqlconnector. Is there another way to use it other than for this driver? For example: pymysql

[2022-01-07 15:40:35,524] {bash_operator.py:126} INFO - [2022-01-07 15:40:31] Insert - SaldosB2C IS
[2022-01-07 15:40:35,524] {bash_operator.py:126} INFO - creating tb_saldos_b2c_iss.csv ok
[2022-01-07 15:40:35,525] {bash_operator.py:126} INFO - loading tb_saldos_b2c_iss.csv ok
[2022-01-07 15:40:35,535] {bash_operator.py:126} INFO - Traceback (most recent call last):
[2022-01-07 15:40:35,536] {bash_operator.py:126} INFO - File "/usr/local/airflow/.local/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 519, in cmd_query
[2022-01-07 15:40:35,537] {bash_operator.py:126} INFO - query_attrs=self._query_attrs)
[2022-01-07 15:40:35,537] {bash_operator.py:126} INFO - _mysql_connector.MySQLInterfaceError: LOAD DATA LOCAL INFILE file request rejected due to restrictions on access.

d6tstack.utils seems to leak database connections

I have a 200million row dataframe that I've sliced in-memory into chunks of 200,000 to insert gradually.

After running d6tstack.utils.pd_to_psql for 30 minutes or so (which is the halfway mark), I will get the following error:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "...", port ... failed: FATAL: sorry, too many clients already

I added the following lines to d6tstack.utils.py and the error stopped reproducing:

sql_cnxn.close()
sql_engine.dispose()

Issue with Schema Name

In the file:
/usr/local/lib/python3.7/site-packages/d6tstack/utils.py

def pd_to_psql(df, uri, tablename, schema_name=None, if_exists='fail'):

This references schema_name, not schema so it throws an error when needing to use a schema. I fixed it via:

sql_engine = sqlalchemy.create_engine(uri, connect_args={'options': '-csearch_path={}'.format(schema_name)})

Please make the change in your code. Thanks.

Regards,
Cody Kirrabilli

Error with pd_to_mssql

Hello everybody,

great package, but I get a strange error message when I try the mentioned above function with a sqlalchemy engine connected to ODBC.

See code below:
engine = create_engine(constring2)
engine.url
mssql+pymssql://XXXXXX:***@azurexxxx

import d6tstack.utils as du
du.pd_to_mssql(I_df, engine, "t_abda_pacapo01_stage", if_exists="replace")

This results in the error message below:
argument of type 'Engine' is not iterable

Any ideas how I can solve this?

thanks and br

christoph

Issue on importing Dataframe into SQL Server

Hello everyone,

I am having problems with the function d6tstack.utils.pd_to_mssql() when I want to import the Dataframe into Microsoft SQL Server database. The name of SQL Table is fxxProduktKatalogDE. I found that the name in function should be written lowercase and so I did. But I get the error as the Table does not exists. The table is there in database. I also tried to write in the table name the name exactly as in database but the same error happens.

Here the Code and the error

image
image

Can you help me to solve this issue ?

Thank you in advance!

Encoding error reading csv files to combine

I have csv files and I'm trying to fix different columns between them.

The function d6tstack.combine_csv.CombinerCSV works fine in other cases but now I have problems with the encoding.

I need to pass the parameter encoding="latin-1" to Pandas read_csv but it doesn't work.
In the documentation I found that is possible to pass read_csv_params={"encoding" : "latin-1"} but it doesn't work, it doesn't apply this encoding.

I tried reading the same file directly with Pandas and whit the parameter encoding works fine.

Are there another posivility to resolve it?
Thanks in advance.

postgres and Primary Key

I have searched but no luck. I maybe missing the right terms. I have a pre-existing table with a primary key.

class District(Base):
    __tablename__ = 'districts'

    id = Column(Integer, primary_key=True, autoincrement=True)

    district_county = Column(String(30))
    district_type = Column(String(100))
    district_name = Column(String(100))

    file_date = Column(Date)

I process a bunch of csv files with pandas and then pass it to a function

    def _write_df(self, df, engine, table_name) -> None:
        # df.to_sql(table_name, engine, if_exists='append', index=False, chunksize=100000)
        # add null id column
        # df.insert(0,'id','')
        d6tstack.utils.pd_to_psql(df, str(engine.url), table_name, if_exists='append', sep='\t')
    return None

But I get this error,

    File "/usr/local/lib/python3.8/site-packages/d6tstack/utils.py", line 107, in pd_to_psql
    cursor.copy_from(fbuf, table_name, sep=sep, null='')
    psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "BAKER"
    CONTEXT:  COPY districts, line 1, column id: "BAKER"

I tried adding an 'id' column with df.insert(0,'id','') but then I get this error.

psycopg2.errors.NotNullViolation: null value in column "id" of relation "districts" violates not-null constraint

What am I doing wrong? In using pandas.to_sql, I don't need an 'id' column and I use index=false in the call.

ImportError: cannot import name 'coordinate_from_string' from 'openpyxl.utils'

Hi ,
I'm using :
Python : 3.7
openpyxl : 2.6.2 (Last Stable)
d6tstack : 0.1.8

Traceback (most recent call last):
  File "/home/kedf/PycharmProjects/XLStoSQL/xlsxtosql.py", line 1, in <module>
    import d6tstack.convert_xls
  File "/usr/local/lib/python3.7/site-packages/d6tstack/convert_xls.py", line 12, in <module>
    from openpyxl.utils import coordinate_from_string
ImportError: cannot import name 'coordinate_from_string' from 'openpyxl.utils' (/usr/local/lib/python3.7/site-packages/openpyxl/utils/__init__.py)

In the docs of openpyxl 2.6.2 , function seems to be named -> openpyxl.utils.cell.coordinate_from_string
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.utils.cell.html#openpyxl.utils.cell.coordinate_from_string

So maybe changing line 12 of d6tstack/convert_xls.py"
from openpyxl.utils import coordinate_from_string
to
from openpyxl.utils.cell import coordinate_from_string
will fix it ?

Thanks

Issue with database name with "\"

use a database name with \ it

#server name is someserver123\data
#database name = Comissions
#no user name and password because Trusted_Connection = yes

import pandas as pd
import d6tstack.utils

myfile = "myfile.csv" #absolute file
csv_file = pd.read_csv(my_file, iterator = True, chunksize= 20000, low_memory=False, dtype=str)

for i, chunk in enumerate(csv_file):
    d6tstack.utils.pd_to_mssql(chunk , 'mssql+pymssql://someserver123\data/Comissions', 'dbo.mydb', if_exists="append")

will yield error

UserWarning: .pd_to_mssql() is experimental, if any problems please raise an issue on https://github.com/d6t/d6tstack/issues or make a pull request
warnings.warn('.pd_to_mssql() is experimental, if any problems please raise an issue on https://github.com/d6t/d6tstack/issues or make a pull request')
Traceback (most recent call last):
File "src\pymssql.pyx", line 636, in pymssql.connect
File "src_mssql.pyx", line 1957, in _mssql.connect
File "src_mssql.pyx", line 677, in _mssql.MSSQLConnection.init
_mssql.MSSQLDriverException: Connection to the database failed for an unknown reason.

this error is familiar with pyodbc, to fix this on pyodbc i use urllib and add a param. I dont know how to pull this off with d6stack

Error while inserting a list/array object existing in a Dataframe Column to a Postgresql table column

Proving an example to replicate this issue:
Consider the following Pandas Dataframe called my_data:
| id | coordinates |
| 01| [50.1, 68.2] |
|02| [52.2, 67.9] |

and consider the following postgresql table definition

CREATE TABLE coordinates (
id int2,
coordinates int8range
);

Now, even if you cast this dataframe column to the type object
my_data['coordinates'] = my_data['coordinates'].astype(object)

the following insert operation throws an error:
d6tstack.utils.pd_to_psql(my_table, db, table, schema if_exists='replace')

The Error log is:
BadCopyFileFormat: extra data after last expected column

The same insert operation works with dataframe.to_sql functionality provided by psycopg2

Adding to a table with primary key defined with SERIAL

When I tried inserting I got an error like so:

Traceback (most recent call last):
  File "results_sql_fasttext.py", line 284, in <module>
    d6tstack.utils.pd_to_psql(aux_results, URI, 'aux_results', if_exists='append')
  File "/usr/local/lib/python3.6/dist-packages/d6tcollect/__init__.py", line 58, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/d6tstack/utils.py", line 107, in pd_to_psql
    cursor.copy_from(fbuf, table_name, sep=sep, null='')
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type integer: "ANSVAR"
CONTEXT:  COPY aux_results, line 1, column aux_id: "ANSVAR"

aux_id is the id created by the database - nothing should be inserted into it since it is auto-generated. "ANSVAR" is a string that should go into the fourth column, not the first. So it seems there are multiple issues at play.

pd_readsql_table_from_sqlengine

Hi,

there is a spelling error.

File "C:\Python37\lib\site-packages\d6tstack\utils.py", line 65, in pd_readsql_table_from_sqlengine
return pd_readsql_query_from_sqlengine(uri, "SELECT * FROM {};".fromat(table_name), schema_name=schema_name, connect_args=connect_args)
AttributeError: 'str' object has no attribute 'fromat'

error while dumping NA values

Hi,
First of all, thank you very much for such a wonderful library.
I am facing while dumping data to MySQL database.
The NA and blank are getting either 'Null' or 'blank' randomly in the database.
For eg. If some value in CSV is 'null' or 'NA' then after dumping this kind of CSV into database these values become 'NA' or blank.

Can you please address this problem.

installation on anaconda

I tried to install in a package but I didn't succeed and I can't use it.
Would it be possible to create the repo for installation on anaconda?

Automatically appending \n in last column of df

Hi,
Whenever I am dumping data using d6tstack it is appending a \n (escape character at the end of each row of the last column). Any suggestions that how can I tackle such situations.
Any response would be highly appreciated.
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.