robin900 / gspread-dataframe Goto Github PK

View Code? Open in Web Editor NEW

233.0 233.0 22.0 169 KB

Read/write Google spreadsheets using pandas DataFrames

License: MIT License

Python 100.00%

google-sheets google-spreadsheet gspread pandas pandas-dataframe pandas-dataframes python

gspread-dataframe's People

Contributors

Stargazers

Watchers

gspread-dataframe's Issues

Support drop_empty_rows and drop_empty_columns, default True (was: get dataframe does not work with non-programatically created sheets)

if get dataframe does not provide an option to remove empty rows and columns(which needless to say are added by default when you create a spreadsheet in google drive) at the end, it is easier to just do

rows = worksheet.get_all_values()
df = pd.DataFrame(rows[1:], columns=rows[0])

I kinda want to try and see what this library is about but its such a bad taste in the mouth that the first thing you'd want to do when working on a real spreadsheet you have lying around is to skip what it provides.

return API response from set_with_dataframe

Hello,
I'm using this in combination with gspread, and I'm looking for advice as to how to capture the API response from set_with_dataframe. I've posted this as a gspread-tagged question on SO, but it may be more appropriate here.
thanks

get_as_dataframe() is reading the whole sheet data

Most of the cases , Google spreadsheet has 1000 rows in one tab as default. I tried to read all the tabs inside the spreadsheet and it got gspread API Error saying ""Quota exceeded for quota metric 'Read requests' and limit 'Read requests per minute per user' of service 'sheets.googleapis.com' .... ".

When I tried to print the shape of dataframes for each tab , it's showing (999, <col_num>). So, it looks like get_as_dataframe() is taking the whole sheet rows including the blank lines. I tried to use options - skip_blank_lines from this documentation that mentioned in here. But it didn't work.

My temporary solution is setting skipfooter to 900 because luckily all of my tabs had maximum 100 rows. But I felt like there should be more better solution in case we can't guess max rows in each tab.

Bug: Minimum number of cols not respected

The problem is in line 74: It should say and instead of or, so that if cols is not None, we check whether it is less than current_cols.

nrows option not working in get_as_dataframe

Hi!
I want to get as dataframe 39 rows of a spreadsheet using nrows, but it is not working and it is importing more than 1,000 rows. I tried different sintaxes, but none of them are working.
Would you be able to help me?
Thanks!

Generate Data Frames from Spreadsheets

aux = ws_contact.get_all_values()
df_contact=get_as_dataframe(ws_contact,nrows=39)

Cannot use gspread's add_row() method before using set_with_dataframe()

As a developer, I want to be able to add enough rows to the google spreadsheet with gspread's add_row() method before using the function set_with_dataframe() with large datasets.

This might be due to here:
In the case that I don't want to resize,
https://github.com/robin900/gspread-dataframe/blob/master/gspread_dataframe.py#L282

which calls this function:

gspread-dataframe/gspread_dataframe.py

Line 94 in ee2aa14

def _resize_to_minimum(worksheet, rows=None, cols=None):

Can't manually escape cell values with '

I am exporting short git sha1 references, such as "1234", "abcd", "fe3a" etc, and whenever a sha1 reference happens to be something like "7e30" or "9e50", the cells are filled with the corresponding number in scientific notation instead of the strings they are.

This stems from value_input_option='USER_ENTERED' on

gspread-dataframe/gspread_dataframe.py

Line 254 in 9ea7aec

    
           resp = worksheet.update_cells(cells_to_update, value_input_option='USER_ENTERED')

Allowing one to override this option could be a workaround, or, one could expect to be able to prepend the relevant cell values with ' before export (just like one would as a user entering such values in the spreadsheet).

However, due to

gspread-dataframe/gspread_dataframe.py

Lines 58 to 59 in 9ea7aec

    
           if value.startswith("'") or ((not allow_formulas) and value.startswith('=')): 
        
               value = "'%s" % value

any cell values that starts with ' gets another ' prepended, resulting in the literal value of '7e30 being saved in the cell rather than 7e30.

Changing line 58 to if (not allow_formulas) and value.startswith('='): resolves this issue, however one needs to remember to prepend ' to relevant cell values before every export.

Write header rows for DataFrames with MultiIndex columns so that Pandas reads them properly

The above image shows a worksheet with a single header row. If you get_as_dataframe(header=0, index_col=0), the header cell "Acronym" becomes the name for the index, while the other header cell values become the column labels.

If we instead get_as_dataframe(header=0, index_col=[0,1]), the resulting index is a MultiIndex with levels named "Acronym" and "Z", and columns is a simple Index with the other labels in the header row.

When there are multiple header rows, however:

The above image shows a worksheet with two header rows, and we get_as_dataframe(header=[0,1], index_col=0). We expect the columns to be a multi-index and we expect that the first column will become the index and have the name "Acronym". But instead, "Acronym" is interpreted as the name of the second level of the column multi-index, and the index itself has no name. The Pandas expectation, when there is more than one header row, is that header values on the index column(s) represent names for the column multi-index; if the index itself is to have a name, an additional header row beneath the column header rows is expected to contain the index's name(s), as in the following worksheet:

In all cases where an index is to be included when writing to a worksheet, if the columns are a MultiIndex, the level names, if they exist, need to be written in the first column of the worksheet (i.e. the first column of the included index:

In the above example, the two levels of the columns MultiIndex are named "Acronym" and "Type", and the index itself has level names "Ipso" and "Facto".

Also note: If columns is a MultiIndex, and include_index=False for set_with_dataframe(), there will be no space in the header row to write the columns' level names, and the expected behavior for set_as_dataframe() is not to write the level names anywhere in the worksheet.

So, our package needs to follow Pandas's expectations in our set_with_dataframe() behavior:

If no index is to be included in the worksheet (include_index is false), no additional header row will ever be needed for the index name(s). In addition, if the columns object has a name or names, there is no place in the worksheet to write those names if no index is included, and those column names will be omitted from the worksheet.
If columns is not a MultiIndex, and an index is to be included, AND the columns object has no "name" assigned, a single header row can hold both the index's names (if they exist) and the columns' labels; Pandas will interpret these correctly. If the index has no names, the relevant header cells can be empty.
If columns is not a MultiIndex, an index is to be included in the worksheet, but the columns object has a "name" assigned, the columns object name must still be omitted, and NOT appear in the first column of the worksheet's header row. Pandas seems to have no way to read the column object's name from a column indicated for the index, so there's no point in writing the column object's name anywhere.
If columns is a MultiIndex, and an index with names is to be included in the worksheet, the index names must appear in a header row beneath the header rows with the column labels/names -- even if the columns object has no names assigned.
If columns is a MultiIndex, and an index without a name or names is to be included in the worksheet, no additional header row for the index is needed.
If columns is a MultiIndex, and no index is to be included in the worksheet, no additional header row is needed.

include_index	index has name(s)	columns is MultiIndex	columns has name(s)	behavior
False	n/a	n/a	n/a	No additional header row. Columns' name(s) never written to header even if they exist.
True	False	False	False	No additional header row.
True	False	False	True	No additional header row. Columns' name is NOT written in first column of header row, because pandas has no way to read it back properly.
True	False	True	False	No additional header row.
True	False	True	True	No additional header row. Columns' names are written in first column of header rows.
True	True	False	False	No additional header row. Index name(s) will appear in header row atop index's columns.
True	True	False	True	No additional header row. Nothing will appear atop index column(s) in header row. Because of pandas limitations, there is no way to write columns object's name in the header row.
True	True	True	False	Additional header row required for index name(s). Column object has no names.
True	True	True	True	Additional header row required for index name(s). Column object's name appears in prior header rows in first column.

Data gets interpreted even when using dtype=str

I'm using gspread_dataframe to import data from Google Sheets. I'd like to force all imported data to strings independently of the cell type used in the worksheet.

The documentation mentions I can use all options supported by the Pandas text parsing readers. In theory dtype=str or dtype=object should force all values to be preserved without interpreting them. Somehow this is not true, not sure if it's a bug or I'm doing something wrong.

In the scenario below the imported dataframe has decimals dropped due to the fact that all Amounts are in number format in the worksheet. If I change the worksheet type to 'string' the desired outcome is correct, but I'm trying to avoid tweaking the file before importing the data.

import gspread
import gspread_dataframe as gsframe

gsframe.get_as_dataframe(
    worksheet=sheet, 
    header=0, 
    dtype=str, 
    usecols=cols, 
    skiprows=row_offset,
    skip_blank_lines=True).dropna(axis = 0, how = 'all').fillna('')

   Worksheet	      Imported Dataframe	
string	numbers		string	string
				
Name	Amount		Name	Amount
A	-25.00		A	   -25
B	-63.00		B	   -63
C	 20.00		C	    20
D	-10.00		D	   -10
				
  		                     ▲ dropped decimals

Expected outcome

   Worksheet	      Imported Dataframe	
string	numbers		string	string
				
Name	Amount		Name	Amount
A	-25.00		A	-25.00
B	-63.00		B	-63.00
C	 20.00		C	 20.00
D	-10.00		D	-10.00

Deprecation warning due to invalid escape sequences in Python 3.8

Deprecation warnings are raised due to invalid escape sequences in Python 3.8 . Below is a log of the warnings raised during compiling all the python files. Using raw strings or escaping them will fix this issue.

find . -iname '*.py'  | xargs -P 4 -I{} python -Wall -m py_compile {}

./gspread_dataframe.py:121: DeprecationWarning: invalid escape sequence \*
  """

Advice to update just one column of a DataFrame in a worksheet (was: ValueError when trying to set a single column)

I relatively frequently will be trying to use get_as_dataframe to pull in values from a gsheet, do some operation to them ad then only update a single column in the original gsheet with set_with_dataframe. However, I always get a value error when I do so, as in the below stacktrace:

Traceback (most recent call last):
  File "<string>", line 19, in __PYTHON_EL_eval
  File "~/Documents/Phon_Utils/scratch.py", line 16, in <module>
  File "~/Documents/Phon_Utils/venv/lib/python3.11/site-packages/gspread_dataframe.py", line 262, in set_with_dataframe
    y, x = dataframe.shape
    ^^^^
ValueError: not enough values to unpack (expected 2, got 1)

Would it be possible to update the function so that it can also accept dataframes that are only a single column wide as input?

Support gspread 3.0.0 and higher

gspread just released 3.0.0, which has a very different internal structure and API, and most thrillingly -- uses the v4 Sheets API, which offers cell-styling powers previously prohibitied to gspread users.

My general plan is to make a breaking change to gspread-dataframe to support only gspread 3.0.0 and later, and release it as a major version increase. Bugfix releases to 2.1, if ever necessary, i'll make from a 2.1 maintenance branch.

ModuleNotFoundError: No module named 'gspread.models'

I am getting this error, what should I do ? I am using google colab

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-61-a16ddb767c75> in <module>()
      1 [#Export](https://www.youtube.com/hashtag/export) to Google Sheets / Part 2 Export
----> 2 from gspread_dataframe import get_as_dataframe, set_with_dataframe
      3 
      4 sh = gc.create('filename')
      5 worksheet = gc.open('filename').sheet1

/usr/local/lib/python3.7/dist-packages/gspread_[dataframe.py]() in <module>()
     11 """
     12 from gspread.utils import fill_gaps
---> 13 from gspread.models import Cell
     14 import logging
     15 import re

Newest version gspread not compatible

Gspread 6.0.0 was released yesterday and has some breaking changes with gspread-dataframe it seems.
Code:

from gspread_dataframe import get_as_dataframe, set_with_dataframe
service_account = gspread.service_account(filename = "xxx.json")
spreadsheet = service_account.open("SPREADSHEET")
mapping_id = spreadsheet.worksheet("WORKSHEET")
df = get_as_dataframe(mapping_id)

AttributeError: 'Worksheet' object has no attribute 'spreadsheet'

I think it is due to a change in gspread, that there exist no .spreadsheet attribute anymore in the Worksheet object. For anyone having the same issue, setting gspread back to 5.12.4 works.

Have get_as_dataframe use pandas TextParser

The discussion in #1 resulted in a plan to have get_as_dataframe use the available TextParser in pandas.io.parsers, and thus use all the available parsing options offered by TextParser, instead of designing our own set of parsing options.

To switch to using TextParser:

Support evaluate_formulas properly.
Add unit tests (that depend on pandas) to test the various parsing options.
Cut a major release (2.0) with an adequate CHANGELOG.

Requests to sheets API cause 400 error is worksheet name is valid cell reference (was: Error using get_as_dataframe)

I am using get_as_dataframe to read a sheet. If the sheet name is "CRM26" the extraction fails with:
{'code': 400, 'message': 'Range (query!CRM26) exceeds grid limits. Max rows: 10002, max columns: 35', 'status': 'INVALID_ARGUMENT'}
If the sheet name is "CRM", "CR26M" or "CRM 26" the extraction is successful.

Dataframe columns with lists in them do not work

Got an error on what I think is a column with a list in it:

  File "get_paypal.py", line 789, in main
    donor, wkb='OTAT Donor Database', wks='DonorLevel')
  File "/Users/---/Desktop/repos/otat/config.py", line 246, in upload_to_google
    set_with_dataframe(worksheet, df, include_index=True)
  File "/usr/local/lib/python2.7/site-packages/gspread_dataframe.py", line 185, in set_with_dataframe
    _cellrepr(cell_value, allow_formulas))
  File "/usr/local/lib/python2.7/site-packages/gspread_dataframe.py", line 51, in _cellrepr
    if pd.isnull(value):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```

RFE: enable "prettyfing" on upload

It would be nice to be able to

set column types (e.g., % or date, as well and the precision)
set column width (as when I double click on column border, it becomes as wide/narrow as necessary/sufficient)

Bug: escaping when allow_formulas = False is not returning any value

When the allow_formulas = False, and the DataFrame cell contains value like "=A", the code at line 88 tries to escape the value (to prepend apostrophe (') ), but this value is not returned from the function anywhere. So the return value is None.

The problematic line:

gspread-dataframe/gspread_dataframe.py

Line 88 in d1ec66d

value = "'%s" % value

The line should read:
return "'%s" % value

set_with_dataframe: TypeError: Object of type 'int64' is not JSON serializable

Describe the bug

Submitted by @sam-s and copied here by me:

Uploading a DataFrame with int64 columns fails with a TypeError

Version and Environment
Version of package: 0.3.7
Python interpreter:

Python 3.9.1 (default, Dec 10 2020, 10:36:35) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin

and Python 3.6.9 on Linux 5.4.67
OS: MacOS, Linux

To Reproduce

import pandas as pd
df = pd.DataFrame({"a":[1,2,3]})
set_with_dataframe(...,df)

produces this error:

---> 59     set_with_dataframe(ws, df, include_index=True, resize=True)

~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread_dataframe.py in set_with_dataframe(worksheet, dataframe, row, col, include_index, include_column_header, resize, allow_formulas, string_escaping)
    324 
    325     resp = worksheet.update_cells(
--> 326         cells_to_update, value_input_option="USER_ENTERED"
    327     )
    328     logger.debug("Cell update response: %s", resp)

~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread/models.py in update_cells(self, cell_list, value_input_option)
    907             range_name,
    908             params={'valueInputOption': value_input_option},
--> 909             body={'values': values_rect},
    910         )
    911 

~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread/models.py in values_update(self, range, params, body)
    233         """
    234         url = SPREADSHEET_VALUES_URL % (self.id, quote(range))
--> 235         r = self.client.request('put', url, params=params, json=body)
    236         return r.json()
    237 

~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread/client.py in request(self, method, endpoint, params, data, json, files, headers)
     65             data=data,
     66             files=files,
---> 67             headers=headers,
     68         )
     69 

~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/sessions.py in put(self, url, data, **kwargs)
    600         """
    601 
--> 602         return self.request('PUT', url, data=data, **kwargs)
    603 
    604     def patch(self, url, data=None, **kwargs):

~/.virtualenvs/brazeds/lib/python3.6/site-packages/google/auth/transport/requests.py in request(self, method, url, data, headers, max_allowed_time, timeout, **kwargs)
    468                 headers=request_headers,
    469                 timeout=timeout,
--> 470                 **kwargs
    471             )
    472         remaining_time = guard.remaining_timeout

~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    526             hooks=hooks,
    527         )
--> 528         prep = self.prepare_request(req)
    529 
    530         proxies = proxies or {}

~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/sessions.py in prepare_request(self, request)
    464             auth=merge_setting(auth, self.auth),
    465             cookies=merged_cookies,
--> 466             hooks=merge_hooks(request.hooks, self.hooks),
    467         )
    468         return p

~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/models.py in prepare(self, method, url, headers, files, data, params, auth, cookies, hooks, json)
    317         self.prepare_headers(headers)
    318         self.prepare_cookies(cookies)
--> 319         self.prepare_body(data, files, json)
    320         self.prepare_auth(auth, url)
    321 

~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/models.py in prepare_body(self, data, files, json)
    467             # provides this natively, but Python 3 gives a Unicode string.
    468             content_type = 'application/json'
--> 469             body = complexjson.dumps(json)
    470             if not isinstance(body, bytes):
    471                 body = body.encode('utf-8')

/usr/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

/usr/lib/python3.6/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/usr/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/usr/lib/python3.6/json/encoder.py in default(self, o)
    178         """
    179         raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180                         o.__class__.__name__)
    181 
    182     def encode(self, o):

TypeError: Object of type 'int64' is not JSON serializable

Expected behavior
data is uploaded

Desktop (please complete the following information):

OS: MacOS
Browser chrome
Version 87
Pandas version: 1.1.5

Inferring dtypes in get_as_dataframe

This is an enhancement proposal.

For my use case, it could be nice if gspread-dataframe was able to try to infer column dtypes when fetching data from a sheet. While individual cells are converted through numericise, their column dtype remains object, and the returned dataframe fails equality checks with the original dataframe.

Motivating example

>>> df = pd.DataFrame({'a': [4,1,2,4],
...                    'b': list('abba')},
...                    index=pd.Index(list('ABCD'), name='our index'))
>>> df
           a  b
our index      
A          4  a
B          1  b
C          2  b
D          4  a
>>> df.dtypes
a     int64
b    object
dtype: object
>>> ws =  # Get a test worksheet here
>>> set_with_dataframe(ws, df, include_index=True, resize=True)
>>> r = get_as_dataframe(ws, index_column_number=1)
>>> r  # Looks as expected
           a  b
our index      
A          4  a
B          1  b
C          2  b
D          4  a
>>> r.dtypes  # All object dtype
a    object
b    object
dtype: object
>>> [type(v) for v in r['a']]  # correctly converted to int
[int, int, int, int]
>>> df.equals(r)  # The equality check fails
False
>>> df['a'].equals(r['a'])  # because of the dtype of column 'a'.
False
>>> df['a'] == r['a']  # The values *are* the same, though.
our index
A    True
B    True
C    True
D    True
Name: a, dtype: bool
>>> df['b'].equals(r['b'])  # str works as expected
True

Take it for a spin!

# Construct a dataframe where everything is either str or object
n = 10
df = pd.DataFrame({
    'datetime str': pd.date_range('2017-03-15', freq='D', periods=n
                                  ).astype(str),
    'timedelta str': pd.timedelta_range('00:03:00', periods=n, freq='10 s'
                                        ).to_native_types().astype(str),
    'int obj': pd.Series(range(n), dtype=object),
    'int str': [str(i) for i in range(n)],
    'float obj': pd.Series(map(float, range(n)), dtype=object),
    'float str': [str(float(i)) for i in range(n)],
})

print(df)
#   datetime str float obj float str int obj int str timedelta str
# 0   2017-03-15         0       0.0       0       0      00:03:00
# 1   2017-03-16         1       1.0       1       1      00:03:10
# 2   2017-03-17         2       2.0       2       2      00:03:20
# 3   2017-03-18         3       3.0       3       3      00:03:30
# 4   2017-03-19         4       4.0       4       4      00:03:40
# 5   2017-03-20         5       5.0       5       5      00:03:50
# 6   2017-03-21         6       6.0       6       6      00:04:00
# 7   2017-03-22         7       7.0       7       7      00:04:10
# 8   2017-03-23         8       8.0       8       8      00:04:20
# 9   2017-03-24         9       9.0       9       9      00:04:30

print(df.dtypes)
# datetime str     object
# float obj        object
# float str        object
# int obj          object
# int str          object
# timedelta str    object
# dtype: object


df = _assign_column_dtypes(df)

print(df)
#   datetime str  float obj  float str  int obj  int str  timedelta str
# 0   2017-03-15        0.0        0.0        0        0       00:03:00
# 1   2017-03-16        1.0        1.0        1        1       00:03:10
# 2   2017-03-17        2.0        2.0        2        2       00:03:20
# 3   2017-03-18        3.0        3.0        3        3       00:03:30
# 4   2017-03-19        4.0        4.0        4        4       00:03:40
# 5   2017-03-20        5.0        5.0        5        5       00:03:50
# 6   2017-03-21        6.0        6.0        6        6       00:04:00
# 7   2017-03-22        7.0        7.0        7        7       00:04:10
# 8   2017-03-23        8.0        8.0        8        8       00:04:20
# 9   2017-03-24        9.0        9.0        9        9       00:04:30

print(df.dtypes)
# datetime str      datetime64[ns]
# float obj                float64
# float str                float64
# int obj                    int64
# int str                    int64
# timedelta str    timedelta64[ns]
# dtype: object

Allow option in set_with_dataframe and get_as_dataframe to omit redundant higher-level labels in MultiIndex index columns

From an earlier issue #20 , @sam-s said:

In this case, I would like to have a separate column (with a header!) for each level.

I am torn between

foo one
foo two

and

foo one
    two

I think there should be an option to select one of them, the default being the 1st variant.
The same goes for column indexes.

...object has no attribute...

I might be in the wrong spot for this, but I keep getting the following error:

AttributeError: 'Client' object has no attribute 'get_cells_feed'

This happens whenever I try to write to my Google Spreadsheet using any DataFrame.

Sometimes it happens in the resize_to_minimum function, and other times it happens in the _get_all_values function, but I believe it errors as soon as it tries to read the client.worksheet.

Include formatting options in set_with_dataframe

This package is popular, but the gspread-formatting package is less so. I know from personal experience that writing tabular data to a gspread worksheet is pleasant only when proper formatting is applied to the worksheet -- mainly for bolding and shading header rows, and applying column alignment appropriate to the data type of the values for each column. Freezing the header rows is nice to have, especially when the data is more than 50 rows.

I wonder if this enhancement is accomplished best by just importing gspread_formatting as a new dependency, or whether to include bare-bones formatting functionality in this package without a new dependency on gspread_formatting.

Installation issue

I'm seeing the following error when trying to build documentation on ReadTheDocs for a package that depends on gspread-dataframe. I'm not sure what the issue is.

I can reproduce by running python setup.py install --force on this package: https://github.com/open-contracting/kingfisher-colab/blob/master/setup.py

Searching for gspread-dataframe
Reading https://pypi.python.org/simple/gspread-dataframe/
Downloading https://files.pythonhosted.org/packages/a5/60/274ba137f89b4cbbb3695a544a2ef48c64e7ab8072a0d3542f6cdb5d904d/gspread-dataframe-3.0.5.tar.gz#sha256=97170b36e2e92fa93102e6c73a19e2da1223eec6c11c08b48e3fdb56c3043790
Best match: gspread-dataframe 3.0.5
Processing gspread-dataframe-3.0.5.tar.gz
Writing /var/folders/cs/3pnj707s581bjnsfv0wndqkh0000gn/T/easy_install-03z8zowc/gspread-dataframe-3.0.5/setup.cfg
Running gspread-dataframe-3.0.5/setup.py -q bdist_egg --dist-dir /var/folders/cs/3pnj707s581bjnsfv0wndqkh0000gn/T/easy_install-03z8zowc/gspread-dataframe-3.0.5/egg-dist-tmp-8r41z5mh
error: [Errno 2] No such file or directory: '/var/folders/cs/3pnj707s581bjnsfv0wndqkh0000gn/T/easy_install-03z8zowc/gspread-dataframe-3.0.5/VERSION'

Max Cell Count

Hi!

I use this library very often and realized that the value of the variable WORKSHEET_MAX_CELL_COUNT is outdated. Google has incremented the cell count to 10 Million cells. I would like to ask if the max cell count limit can be incremented from 5,000,000 to 10,000,000.

Thanks!

gspread-dataframe doesn't handle '+' signs correctly

I have an issue, not sure if it is on the get_as_dataframe() or set_as_dataframe(), with a worksheet where one column will hold ('? | '+ | '-). I suspect it is in the get_as_dataframe().

The gspread-dataframe library has no issue with the question mark however with the plus (+) and minus (-) I have taken steps to ensure that these characters are written as strings, prefixing them with an apostrophe. When gspread-dataframe does a get_as_dataframe() or a set_as_dataframe() these string characters become non-string characters, instead becoming math operations that result in #ERROR in the worksheet cells.

My program first performs a get_as_dataframe() then appends to that dataframe and then does a set_as_dataframe(). I suspect the issue is actually in the get_as_dataframe() because the newly appended dataframe items are written out to the worksheet correctly.

hierarchical indexes are uploaded as a single column

Hierarchical indexes are uploaded as tuples.
E.g., DataFrame

first  second third
bar    one      1
       two      2
baz    one      3
       two      4

is uploaded as 2 column sheet

                third
("bar","one")    1
("bar","two")    2
("baz","one")    3
("baz","two")    4

instead of 3 columns

Sheets API erroneously rejects new rowCount if (new rowCount * old columnCount) > 5000000 (was: Bug in the number of cells of the dataframe

Hello

I am trying to add 2 dataframes to a Google sheet (in separate tabs) by using the set_with_dataframe function

Here are the shapes of those dataframes:
(41967, 14) (200000, 4)

This gives 1.387.000 cells in total.

However I get the following error:

Traceback (most recent call last): File "dynamo_to_google_sheet.py", line 144, in <module> set_with_dataframe(output_sheet, df) File "venv/lib/python3.8/site-packages/gspread_dataframe.py", line 247, in set_with_dataframe _resize_to_minimum(worksheet, y, x) File "venv/lib/python3.8/site-packages/gspread_dataframe.py", line 108, in _resize_to_minimum worksheet.resize(rows, cols) File "venv/lib/python3.8/site-packages/gspread/models.py", line 1272, in resize return self.spreadsheet.batch_update(body) File "venv/lib/python3.8/site-packages/gspread/models.py", line 145, in batch_update r = self.client.request( File "venv/lib/python3.8/site-packages/gspread/client.py", line 73, in request raise APIError(response) gspread.exceptions.APIError: {'code': 400, 'message': 'Invalid requests[0].updateSheetProperties: This action would increase the number of cells in the workbook above the limit of 5000000 cells.', 'status': 'INVALID_ARGUMENT'}

Since my dataframe sizes does not reach the quota I think the error message is misleading. Are there other limitations or Am I doing something wrong ?

Thanks for your help !

TRUE/FALSE values become 1.0/0.0 when using get_as_dataframe()

Hi there,

I'm trying to get the content of a GSheet as dataframe.

The GSheet contains something like:

v1	v2	v3
A	TRUE	FALSE
B	TRUE	FALSE
C	FALSE	TRUE
D	TRUE	TRUE

Then I get the worksheet loaded as in the tutorial:

gsheet = gc.open_by_url(url)

worksheet = gsheet.worksheet("Sheet1")

df = get_as_dataframe(worksheet, parse_dates=True, header=0)

But the output shows no boolean (that's what I'd expect) but floats of 0.0 or 1.0.

v1	v2	v3
A	1.0	0.0
B	1.0	0.0
C	0.0	1.0
D	1.0	1.0

Should I use pass some parameter to the function? Is there a solution to overcome this?

Consider offering a reader/writer object based on a given dataframe

A writer object would mainly serve to automate the arguments passed to get_as_dataframe but also a little bit for set_with_dataframe:

from gspread_dataframe import make_reader_writer
ws = some_worksheet
df = some_dataframe
df2 = some_dataframe_with_same_columns
reader_writer = make_reader_writer(example=df, resize=True, include_index=True)
reader_writer.write(ws, df2)
df3 = reader_writer.read(ws)
print(reader_writer.get_as_dataframe_call_repr(numpy_alias='np'))
>>> "get_as_dataframe(worksheet, header=0, indexcol=0, dtype={'col1': object, 'col2': np.int64, 'col3': np.datetime64})"

No module named 'gspread.ns'

Hi,

I'm geting this error message on :
from gspread_dataframe import get_as_dataframe, set_with_dataframe

I'm using the latest gspread release (3.0.0) which doesn't seem to contain the ns module

buggy behavior when dealing with decimals?

Hey, I am dealing with this behavior and I don't know if its a bug or I am doing something wrong:

import pandas as pd
import numpy as np
import gspread
import gspread_dataframe
from gspread_dataframe import set_with_dataframe
gc = gspread.service_account()
sh = gc.open('pandas_spreadsheet')
worksheet = sh.get_worksheet(0)

test_df = pd.DataFrame.from_records([{'a': i/500 + np.random.randn(), 'b': i * 2} for i in np.arange(0,1000,100)])
test_df
set_with_dataframe(worksheet, test_df)

Expected:

Obtained:

And if i go to the cell A1 and copy the cell value is -9622382525353560
I thought it was something with the formatting, but this number is not a float.

If I export the df using
set_with_dataframe(worksheet, test_df.round(3))
I obtain the following:

And this is a the same test in a new blank worksheet (to avoid any kind of previous formatting problems)

Please let me know if there is something that I can do to help you with a PR.

No module named 'gspread'

import gspread
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint

class GoogleSheet():

scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
#creds = ServiceAccountCredentials.from_json_keyfile_name("creds.json", scope)
creds = gspread.service_account()
#client = gspread.authorize(creds)

sh = creds.open("DI ITEM SHEET")  # Open the spreadhseet
wks = sh.worksheet("2022-APR") # Open the workhseet

data = wks.get_all_records()

this file runs successfully when run direct on the console

but give an error: No module named 'gspread' on server

Can't import, "No module named gspread.models"

Just updated the gspread library today (5.0.0) and a few scripts broke.

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-b68f22640753> in <module>
----> 1 import gspread_dataframe

~\anaconda3\lib\site-packages\gspread_dataframe.py in <module>
     11 """
     12 from gspread.utils import fill_gaps
---> 13 from gspread.models import Cell
     14 import pandas as pd
     15 from pandas.io.parsers import TextParser

ModuleNotFoundError: No module named 'gspread.models'

Not all columns are copied to the google sheet

I am trying to write my pandas dataframe to a google sheet using the gspread_data package. Adding include_index=True should add an extra column to the sheet with the index from the dataframe, but it does not add it for me. My code:

from gspread_dataframe import get_as_dataframe, set_with_dataframe
df=pd.read_csv(r'C:\Users\Laila\Desktop\rawData18Sheet.csv',thousands=',', 
index_col='Client',parse_dates=True,encoding = "ISO-8859-1",usecols=columns)

test_sheet = client.open("2018 Raw Data").worksheet("test_sheet")

set_with_dataframe(test_sheet, df,resize=True, 
include_index=True)

I did set the index when I read the dataframe from a csv, and pulling the index with df.index.values gives me the values I want in the index of the google sheet.

AttributeError: 'Spreadsheet' object has no attribute 'col_count'

import pandas as pd

import gspread
import gspread_dataframe
from gspread_dataframe import set_with_dataframe
gc = gspread.service_account()
sh = gc.open('pandas_spreadsheet')

test_dt = pd.DataFrame.from_records([{'a': i, 'b': i * 2} for i in range(100)])

set_with_dataframe(sh, test_df)

And I am obtaining the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-329ece69d8d4> in <module>
----> 1 set_with_dataframe(sh, test_df) #-> THIS EXPORTS YOUR DATAFRAME TO THE GOOGLE SHEET

~/HIPCAM/env_hipcam_gpu/lib/python3.7/site-packages/gspread_dataframe.py in set_with_dataframe(worksheet, dataframe, row, col, include_index, include_column_header, resize, allow_formulas, string_escaping)
    235         worksheet.resize(y, x)
    236     else:
--> 237         _resize_to_minimum(worksheet, y, x)
    238 
    239     updates = []

~/HIPCAM/env_hipcam_gpu/lib/python3.7/site-packages/gspread_dataframe.py in _resize_to_minimum(worksheet, rows, cols)
     98     # get the current size
     99     current_cols, current_rows = (
--> 100         worksheet.col_count,
    101         worksheet.row_count
    102         )

AttributeError: 'Spreadsheet' object has no attribute 'col_count'

I am using

gspread                   3.6.0                    pypi_0    pypi
gspread-dataframe         3.1.0                    pypi_0    pypi

edit: If I add values to the spread sheet by hand, I am able to get them using gspread.

True/False Columns

Hi,

I've got a worksheet with a True/False columns. I'd like to point out that when I try to 'get_as_dataframe' the True/False columns are sometimes parsed as 0's and 1's in the dataframe.

Thanks for your work.

Support for (value_render_option)

I am getting formula on my pd df instead of the values
ValueError: could not convert string to float: '=iferror(BB3/W3,"")'

Bug: Object of type int64 is not JSON serializable - introduced by Version 3.1.1

Hey @robin900 ,
we (@AndreCimander and me) might have found a bug which will lead to the following error.

The error is caused by this commits (00d2de4) changes in gspread_dataframe.py:53.

As the default pandas DataFrame will cast an integer to a numpy.int64, your changes will cause the JSON parsing to fail with the following set of information to reproduce the error.

Rolling back to version 3.1.0 fixes the issue, as numpy.int64 is properly casted to a str before insertion.

Code:

test_dataframe = pd.DataFrame({"column1": [6, 6, 7], "column2": [4, 4, 4]})
[...]
gspread_dataframe.set_with_dataframe(
    worksheet, dataframe, row=start_row, col=1, include_column_header=use_header
)

Error message: TypeError: Object of type int64 is not JSON serializable

Stack trace:

google_sheets_api/sheet.py:85: in append_pandas_dataframe_to_sheet
    gspread_dataframe.set_with_dataframe(
venv/lib/python3.8/site-packages/gspread_dataframe.py:299: in set_with_dataframe
    resp = worksheet.update_cells(cells_to_update, value_input_option='USER_ENTERED')
venv/lib/python3.8/site-packages/gspread/models.py:906: in update_cells
    data = self.spreadsheet.values_update(
venv/lib/python3.8/site-packages/gspread/models.py:235: in values_update
    r = self.client.request('put', url, params=params, json=body)
venv/lib/python3.8/site-packages/gspread/client.py:61: in request
    response = getattr(self.session, method)(
venv/lib/python3.8/site-packages/requests/sessions.py:590: in put
    return self.request('PUT', url, data=data, **kwargs)
venv/lib/python3.8/site-packages/google/auth/transport/requests.py:464: in request
    response = super(AuthorizedSession, self).request(
venv/lib/python3.8/site-packages/requests/sessions.py:516: in request
    prep = self.prepare_request(req)
venv/lib/python3.8/site-packages/requests/sessions.py:449: in prepare_request
    p.prepare(
venv/lib/python3.8/site-packages/requests/models.py:317: in prepare
    self.prepare_body(data, files, json)
venv/lib/python3.8/site-packages/requests/models.py:467: in prepare_body
    body = complexjson.dumps(json)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py:231: in dumps
    return _default_encoder.encode(obj)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py:199: in encode
    chunks = self.iterencode(o, _one_shot=True)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py:257: in iterencode
    return _iterencode(o, 0)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.encoder.JSONEncoder object at 0x7f9038667790>, o = 1

    def default(self, o):
        """Implement this method in a subclass such that it returns
        a serializable object for ``o``, or calls the base implementation
        (to raise a ``TypeError``).

        For example, to support arbitrary iterators, you could
        implement default like this::

            def default(self, o):
                try:
                    iterable = iter(o)
                except TypeError:
                    pass
                else:
                    return list(iterable)
                # Let the base class default method raise the TypeError
                return JSONEncoder.default(self, o)

        """
>       raise TypeError(f'Object of type {o.__class__.__name__} '
                        f'is not JSON serializable')
E       TypeError: Object of type int64 is not JSON serializable

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py:179: TypeError

If you need further information or assistance, please let me know.

Feature Request: Append feature to an existing sheet.

Hi!
I frequently use this library. Thanks to you and this library, I smoothly work well.

And I've had so many use-case that needs to append feature for a manually updated existing sheet. I implemented this feature from scratch using this original gspread feature. But I believe sorta "append" feature could help many engineer's works.

The way to implement

Create a feature searching last row(at the end of line), this could be low level API for a next step and others
Implement append feature using above 1 feature.

If you think this doesn't cause inconsistency against the existing code. I'd love to introduce it!

Consider allowing option to drop duplicates in set_with_dataframe

Consider this scenario where worksheet and dataframe have duplicated data:

	Worksheet				Dataframe 			
								
0	a0	b0	c0			b0	c0	d0
1	a1	b1	c1			b1	c1	d1

Would it be possible to add an option to eliminate the duplicated values (in this case b0, c0, b1, c1) when uploading the data with set_with_dataframe?

	if value.startswith("'") or ((not allow_formulas) and value.startswith('=')):
	value = "'%s" % value

robin900 / gspread-dataframe Goto Github PK

gspread-dataframe's People

Contributors

Stargazers

Watchers

Forkers

gspread-dataframe's Issues

Generate Data Frames from Spreadsheets

Motivating example

Suggested solution

Take it for a spin!

The way to implement

Recommend Projects

Recommend Topics

Recommend Org