robin900 / gspread-dataframe Goto Github PK
View Code? Open in Web Editor NEWRead/write Google spreadsheets using pandas DataFrames
License: MIT License
Read/write Google spreadsheets using pandas DataFrames
License: MIT License
if get dataframe does not provide an option to remove empty rows and columns(which needless to say are added by default when you create a spreadsheet in google drive) at the end, it is easier to just do
rows = worksheet.get_all_values()
df = pd.DataFrame(rows[1:], columns=rows[0])
I kinda want to try and see what this library is about but its such a bad taste in the mouth that the first thing you'd want to do when working on a real spreadsheet you have lying around is to skip what it provides.
Hello,
I'm using this in combination with gspread, and I'm looking for advice as to how to capture the API response from set_with_dataframe
. I've posted this as a gspread-tagged question on SO, but it may be more appropriate here.
thanks
Most of the cases , Google spreadsheet has 1000 rows in one tab as default. I tried to read all the tabs inside the spreadsheet and it got gspread API Error saying ""Quota exceeded for quota metric 'Read requests' and limit 'Read requests per minute per user' of service 'sheets.googleapis.com' .... ".
When I tried to print the shape of dataframes for each tab , it's showing (999, <col_num>)
. So, it looks like get_as_dataframe()
is taking the whole sheet rows including the blank lines. I tried to use options - skip_blank_lines
from this documentation that mentioned in here. But it didn't work.
My temporary solution is setting skipfooter
to 900
because luckily all of my tabs had maximum 100 rows. But I felt like there should be more better solution in case we can't guess max rows in each tab.
The problem is in line 74: It should say and
instead of or
, so that if cols
is not None, we check whether it is less than current_cols
.
Hi!
I want to get as dataframe 39 rows of a spreadsheet using nrows, but it is not working and it is importing more than 1,000 rows. I tried different sintaxes, but none of them are working.
Would you be able to help me?
Thanks!
aux = ws_contact.get_all_values()
df_contact=get_as_dataframe(ws_contact,nrows=39)
As a developer, I want to be able to add enough rows to the google spreadsheet with gspread
's add_row()
method before using the function set_with_dataframe()
with large datasets.
This might be due to here:
In the case that I don't want to resize,
https://github.com/robin900/gspread-dataframe/blob/master/gspread_dataframe.py#L282
which calls this function:
gspread-dataframe/gspread_dataframe.py
Line 94 in ee2aa14
I am exporting short git sha1 references, such as "1234", "abcd", "fe3a" etc, and whenever a sha1 reference happens to be something like "7e30" or "9e50", the cells are filled with the corresponding number in scientific notation instead of the strings they are.
This stems from value_input_option='USER_ENTERED'
on
gspread-dataframe/gspread_dataframe.py
Line 254 in 9ea7aec
Allowing one to override this option could be a workaround, or, one could expect to be able to prepend the relevant cell values with '
before export (just like one would as a user entering such values in the spreadsheet).
However, due to
gspread-dataframe/gspread_dataframe.py
Lines 58 to 59 in 9ea7aec
'
gets another '
prepended, resulting in the literal value of '7e30
being saved in the cell rather than 7e30
.
Changing line 58 to if (not allow_formulas) and value.startswith('='):
resolves this issue, however one needs to remember to prepend '
to relevant cell values before every export.
The above image shows a worksheet with a single header row. If you get_as_dataframe(header=0, index_col=0), the header cell "Acronym" becomes the name for the index, while the other header cell values become the column labels.
If we instead get_as_dataframe(header=0, index_col=[0,1]), the resulting index is a MultiIndex with levels named "Acronym" and "Z", and columns is a simple Index with the other labels in the header row.
When there are multiple header rows, however:
The above image shows a worksheet with two header rows, and we get_as_dataframe(header=[0,1], index_col=0). We expect the columns to be a multi-index and we expect that the first column will become the index and have the name "Acronym". But instead, "Acronym" is interpreted as the name of the second level of the column multi-index, and the index itself has no name. The Pandas expectation, when there is more than one header row, is that header values on the index column(s) represent names for the column multi-index; if the index itself is to have a name, an additional header row beneath the column header rows is expected to contain the index's name(s), as in the following worksheet:
In all cases where an index is to be included when writing to a worksheet, if the columns are a MultiIndex, the level names, if they exist, need to be written in the first column of the worksheet (i.e. the first column of the included index:
In the above example, the two levels of the columns MultiIndex are named "Acronym" and "Type", and the index itself has level names "Ipso" and "Facto".
Also note: If columns is a MultiIndex, and include_index=False for set_with_dataframe(), there will be no space in the header row to write the columns' level names, and the expected behavior for set_as_dataframe() is not to write the level names anywhere in the worksheet.
So, our package needs to follow Pandas's expectations in our set_with_dataframe() behavior:
include_index | index has name(s) | columns is MultiIndex | columns has name(s) | behavior |
---|---|---|---|---|
False | n/a | n/a | n/a | No additional header row. Columns' name(s) never written to header even if they exist. |
True | False | False | False | No additional header row. |
True | False | False | True | No additional header row. Columns' name is NOT written in first column of header row, because pandas has no way to read it back properly. |
True | False | True | False | No additional header row. |
True | False | True | True | No additional header row. Columns' names are written in first column of header rows. |
True | True | False | False | No additional header row. Index name(s) will appear in header row atop index's columns. |
True | True | False | True | No additional header row. Nothing will appear atop index column(s) in header row. Because of pandas limitations, there is no way to write columns object's name in the header row. |
True | True | True | False | Additional header row required for index name(s). Column object has no names. |
True | True | True | True | Additional header row required for index name(s). Column object's name appears in prior header rows in first column. |
I'm using gspread_dataframe
to import data from Google Sheets. I'd like to force all imported data to strings independently of the cell type used in the worksheet.
The documentation mentions I can use all options supported by the Pandas text parsing readers. In theory dtype=str
or dtype=object
should force all values to be preserved without interpreting them. Somehow this is not true, not sure if it's a bug or I'm doing something wrong.
In the scenario below the imported dataframe has decimals dropped due to the fact that all Amounts
are in number
format in the worksheet. If I change the worksheet type to 'string' the desired outcome is correct, but I'm trying to avoid tweaking the file before importing the data.
import gspread
import gspread_dataframe as gsframe
gsframe.get_as_dataframe(
worksheet=sheet,
header=0,
dtype=str,
usecols=cols,
skiprows=row_offset,
skip_blank_lines=True).dropna(axis = 0, how = 'all').fillna('')
Worksheet Imported Dataframe
string numbers string string
Name Amount Name Amount
A -25.00 A -25
B -63.00 B -63
C 20.00 C 20
D -10.00 D -10
▲ dropped decimals
Expected outcome
Worksheet Imported Dataframe
string numbers string string
Name Amount Name Amount
A -25.00 A -25.00
B -63.00 B -63.00
C 20.00 C 20.00
D -10.00 D -10.00
Deprecation warnings are raised due to invalid escape sequences in Python 3.8 . Below is a log of the warnings raised during compiling all the python files. Using raw strings or escaping them will fix this issue.
find . -iname '*.py' | xargs -P 4 -I{} python -Wall -m py_compile {}
./gspread_dataframe.py:121: DeprecationWarning: invalid escape sequence \*
"""
I relatively frequently will be trying to use get_as_dataframe to pull in values from a gsheet, do some operation to them ad then only update a single column in the original gsheet with set_with_dataframe. However, I always get a value error when I do so, as in the below stacktrace:
Traceback (most recent call last):
File "<string>", line 19, in __PYTHON_EL_eval
File "~/Documents/Phon_Utils/scratch.py", line 16, in <module>
File "~/Documents/Phon_Utils/venv/lib/python3.11/site-packages/gspread_dataframe.py", line 262, in set_with_dataframe
y, x = dataframe.shape
^^^^
ValueError: not enough values to unpack (expected 2, got 1)
Would it be possible to update the function so that it can also accept dataframes that are only a single column wide as input?
gspread just released 3.0.0, which has a very different internal structure and API, and most thrillingly -- uses the v4 Sheets API, which offers cell-styling powers previously prohibitied to gspread users.
My general plan is to make a breaking change to gspread-dataframe to support only gspread 3.0.0 and later, and release it as a major version increase. Bugfix releases to 2.1, if ever necessary, i'll make from a 2.1 maintenance branch.
I am getting this error, what should I do ? I am using google colab
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-61-a16ddb767c75> in <module>()
1 [#Export](https://www.youtube.com/hashtag/export) to Google Sheets / Part 2 Export
----> 2 from gspread_dataframe import get_as_dataframe, set_with_dataframe
3
4 sh = gc.create('filename')
5 worksheet = gc.open('filename').sheet1
/usr/local/lib/python3.7/dist-packages/gspread_[dataframe.py]() in <module>()
11 """
12 from gspread.utils import fill_gaps
---> 13 from gspread.models import Cell
14 import logging
15 import re
Gspread 6.0.0 was released yesterday and has some breaking changes with gspread-dataframe it seems.
Code:
from gspread_dataframe import get_as_dataframe, set_with_dataframe
service_account = gspread.service_account(filename = "xxx.json")
spreadsheet = service_account.open("SPREADSHEET")
mapping_id = spreadsheet.worksheet("WORKSHEET")
df = get_as_dataframe(mapping_id)
AttributeError: 'Worksheet' object has no attribute 'spreadsheet'
I think it is due to a change in gspread, that there exist no .spreadsheet attribute anymore in the Worksheet object. For anyone having the same issue, setting gspread back to 5.12.4 works.
The discussion in #1 resulted in a plan to have get_as_dataframe
use the available TextParser
in pandas.io.parsers
, and thus use all the available parsing options offered by TextParser
, instead of designing our own set of parsing options.
To switch to using TextParser
:
evaluate_formulas
properly.I am using get_as_dataframe to read a sheet. If the sheet name is "CRM26" the extraction fails with:
{'code': 400, 'message': 'Range (query!CRM26) exceeds grid limits. Max rows: 10002, max columns: 35', 'status': 'INVALID_ARGUMENT'}
If the sheet name is "CRM", "CR26M" or "CRM 26" the extraction is successful.
Got an error on what I think is a column with a list in it:
File "get_paypal.py", line 789, in main
donor, wkb='OTAT Donor Database', wks='DonorLevel')
File "/Users/---/Desktop/repos/otat/config.py", line 246, in upload_to_google
set_with_dataframe(worksheet, df, include_index=True)
File "/usr/local/lib/python2.7/site-packages/gspread_dataframe.py", line 185, in set_with_dataframe
_cellrepr(cell_value, allow_formulas))
File "/usr/local/lib/python2.7/site-packages/gspread_dataframe.py", line 51, in _cellrepr
if pd.isnull(value):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```
It would be nice to be able to
%
or date, as well and the precision)When the allow_formulas = False, and the DataFrame cell contains value like "=A", the code at line 88 tries to escape the value (to prepend apostrophe (') ), but this value is not returned from the function anywhere. So the return value is None.
The problematic line:
gspread-dataframe/gspread_dataframe.py
Line 88 in d1ec66d
The line should read:
return "'%s" % value
Describe the bug
Submitted by @sam-s and copied here by me:
Uploading a DataFrame
with int64
columns fails with a TypeError
Version and Environment
Version of package: 0.3.7
Python interpreter:
Python 3.9.1 (default, Dec 10 2020, 10:36:35)
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
and Python 3.6.9
on Linux 5.4.67
OS: MacOS, Linux
To Reproduce
import pandas as pd
df = pd.DataFrame({"a":[1,2,3]})
set_with_dataframe(...,df)
produces this error:
---> 59 set_with_dataframe(ws, df, include_index=True, resize=True)
~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread_dataframe.py in set_with_dataframe(worksheet, dataframe, row, col, include_index, include_column_header, resize, allow_formulas, string_escaping)
324
325 resp = worksheet.update_cells(
--> 326 cells_to_update, value_input_option="USER_ENTERED"
327 )
328 logger.debug("Cell update response: %s", resp)
~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread/models.py in update_cells(self, cell_list, value_input_option)
907 range_name,
908 params={'valueInputOption': value_input_option},
--> 909 body={'values': values_rect},
910 )
911
~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread/models.py in values_update(self, range, params, body)
233 """
234 url = SPREADSHEET_VALUES_URL % (self.id, quote(range))
--> 235 r = self.client.request('put', url, params=params, json=body)
236 return r.json()
237
~/.virtualenvs/brazeds/lib/python3.6/site-packages/gspread/client.py in request(self, method, endpoint, params, data, json, files, headers)
65 data=data,
66 files=files,
---> 67 headers=headers,
68 )
69
~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/sessions.py in put(self, url, data, **kwargs)
600 """
601
--> 602 return self.request('PUT', url, data=data, **kwargs)
603
604 def patch(self, url, data=None, **kwargs):
~/.virtualenvs/brazeds/lib/python3.6/site-packages/google/auth/transport/requests.py in request(self, method, url, data, headers, max_allowed_time, timeout, **kwargs)
468 headers=request_headers,
469 timeout=timeout,
--> 470 **kwargs
471 )
472 remaining_time = guard.remaining_timeout
~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
526 hooks=hooks,
527 )
--> 528 prep = self.prepare_request(req)
529
530 proxies = proxies or {}
~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/sessions.py in prepare_request(self, request)
464 auth=merge_setting(auth, self.auth),
465 cookies=merged_cookies,
--> 466 hooks=merge_hooks(request.hooks, self.hooks),
467 )
468 return p
~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/models.py in prepare(self, method, url, headers, files, data, params, auth, cookies, hooks, json)
317 self.prepare_headers(headers)
318 self.prepare_cookies(cookies)
--> 319 self.prepare_body(data, files, json)
320 self.prepare_auth(auth, url)
321
~/.virtualenvs/brazeds/lib/python3.6/site-packages/requests/models.py in prepare_body(self, data, files, json)
467 # provides this natively, but Python 3 gives a Unicode string.
468 content_type = 'application/json'
--> 469 body = complexjson.dumps(json)
470 if not isinstance(body, bytes):
471 body = body.encode('utf-8')
/usr/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
/usr/lib/python3.6/json/encoder.py in encode(self, o)
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
/usr/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
258
259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
/usr/lib/python3.6/json/encoder.py in default(self, o)
178 """
179 raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180 o.__class__.__name__)
181
182 def encode(self, o):
TypeError: Object of type 'int64' is not JSON serializable
Expected behavior
data is uploaded
Desktop (please complete the following information):
This is an enhancement proposal.
For my use case, it could be nice if gspread-dataframe was able to try to infer column dtypes when fetching data from a sheet. While individual cells are converted through numericise
, their column dtype remains object
, and the returned dataframe fails equality checks with the original dataframe.
>>> df = pd.DataFrame({'a': [4,1,2,4],
... 'b': list('abba')},
... index=pd.Index(list('ABCD'), name='our index'))
>>> df
a b
our index
A 4 a
B 1 b
C 2 b
D 4 a
>>> df.dtypes
a int64
b object
dtype: object
>>> ws = # Get a test worksheet here
>>> set_with_dataframe(ws, df, include_index=True, resize=True)
>>> r = get_as_dataframe(ws, index_column_number=1)
>>> r # Looks as expected
a b
our index
A 4 a
B 1 b
C 2 b
D 4 a
>>> r.dtypes # All object dtype
a object
b object
dtype: object
>>> [type(v) for v in r['a']] # correctly converted to int
[int, int, int, int]
>>> df.equals(r) # The equality check fails
False
>>> df['a'].equals(r['a']) # because of the dtype of column 'a'.
False
>>> df['a'] == r['a'] # The values *are* the same, though.
our index
A True
B True
C True
D True
Name: a, dtype: bool
>>> df['b'].equals(r['b']) # str works as expected
True
I am unsure what is the best way to deal with this, and whether it is a general enough use-case to warrant an addition to gspread-dataframe
. At any rate, the following code is my initial stab at how dtype inference could be implemented:
import pandas as pd
converters = (
pd.to_numeric,
pd.to_timedelta,
pd.to_datetime,
)
def _assign_column_dtypes(df):
for conv in converters:
for col in df:
if df[col].dtype != object:
continue
df[col] = conv(df[col], errors='ignore')
return df
It intentionally places timedelta before datetime, as '00:03:00' can be interpreted as either one by pandas. In my use-case, datetimes always include a date, so '00:03:00' would definitely be a timedelta.
# Construct a dataframe where everything is either str or object
n = 10
df = pd.DataFrame({
'datetime str': pd.date_range('2017-03-15', freq='D', periods=n
).astype(str),
'timedelta str': pd.timedelta_range('00:03:00', periods=n, freq='10 s'
).to_native_types().astype(str),
'int obj': pd.Series(range(n), dtype=object),
'int str': [str(i) for i in range(n)],
'float obj': pd.Series(map(float, range(n)), dtype=object),
'float str': [str(float(i)) for i in range(n)],
})
print(df)
# datetime str float obj float str int obj int str timedelta str
# 0 2017-03-15 0 0.0 0 0 00:03:00
# 1 2017-03-16 1 1.0 1 1 00:03:10
# 2 2017-03-17 2 2.0 2 2 00:03:20
# 3 2017-03-18 3 3.0 3 3 00:03:30
# 4 2017-03-19 4 4.0 4 4 00:03:40
# 5 2017-03-20 5 5.0 5 5 00:03:50
# 6 2017-03-21 6 6.0 6 6 00:04:00
# 7 2017-03-22 7 7.0 7 7 00:04:10
# 8 2017-03-23 8 8.0 8 8 00:04:20
# 9 2017-03-24 9 9.0 9 9 00:04:30
print(df.dtypes)
# datetime str object
# float obj object
# float str object
# int obj object
# int str object
# timedelta str object
# dtype: object
df = _assign_column_dtypes(df)
print(df)
# datetime str float obj float str int obj int str timedelta str
# 0 2017-03-15 0.0 0.0 0 0 00:03:00
# 1 2017-03-16 1.0 1.0 1 1 00:03:10
# 2 2017-03-17 2.0 2.0 2 2 00:03:20
# 3 2017-03-18 3.0 3.0 3 3 00:03:30
# 4 2017-03-19 4.0 4.0 4 4 00:03:40
# 5 2017-03-20 5.0 5.0 5 5 00:03:50
# 6 2017-03-21 6.0 6.0 6 6 00:04:00
# 7 2017-03-22 7.0 7.0 7 7 00:04:10
# 8 2017-03-23 8.0 8.0 8 8 00:04:20
# 9 2017-03-24 9.0 9.0 9 9 00:04:30
print(df.dtypes)
# datetime str datetime64[ns]
# float obj float64
# float str float64
# int obj int64
# int str int64
# timedelta str timedelta64[ns]
# dtype: object
From an earlier issue #20 , @sam-s said:
In this case, I would like to have a separate column (with a header!) for each level.
I am torn between
foo one
foo two
and
foo one
two
I think there should be an option to select one of them, the default being the 1st variant.
The same goes for column indexes.
I might be in the wrong spot for this, but I keep getting the following error:
AttributeError: 'Client' object has no attribute 'get_cells_feed'
This happens whenever I try to write to my Google Spreadsheet using any DataFrame.
Sometimes it happens in the resize_to_minimum function, and other times it happens in the _get_all_values function, but I believe it errors as soon as it tries to read the client.worksheet.
This package is popular, but the gspread-formatting package is less so. I know from personal experience that writing tabular data to a gspread worksheet is pleasant only when proper formatting is applied to the worksheet -- mainly for bolding and shading header rows, and applying column alignment appropriate to the data type of the values for each column. Freezing the header rows is nice to have, especially when the data is more than 50 rows.
I wonder if this enhancement is accomplished best by just importing gspread_formatting
as a new dependency, or whether to include bare-bones formatting functionality in this package without a new dependency on gspread_formatting.
I'm seeing the following error when trying to build documentation on ReadTheDocs for a package that depends on gspread-dataframe. I'm not sure what the issue is.
I can reproduce by running python setup.py install --force
on this package: https://github.com/open-contracting/kingfisher-colab/blob/master/setup.py
Searching for gspread-dataframe
Reading https://pypi.python.org/simple/gspread-dataframe/
Downloading https://files.pythonhosted.org/packages/a5/60/274ba137f89b4cbbb3695a544a2ef48c64e7ab8072a0d3542f6cdb5d904d/gspread-dataframe-3.0.5.tar.gz#sha256=97170b36e2e92fa93102e6c73a19e2da1223eec6c11c08b48e3fdb56c3043790
Best match: gspread-dataframe 3.0.5
Processing gspread-dataframe-3.0.5.tar.gz
Writing /var/folders/cs/3pnj707s581bjnsfv0wndqkh0000gn/T/easy_install-03z8zowc/gspread-dataframe-3.0.5/setup.cfg
Running gspread-dataframe-3.0.5/setup.py -q bdist_egg --dist-dir /var/folders/cs/3pnj707s581bjnsfv0wndqkh0000gn/T/easy_install-03z8zowc/gspread-dataframe-3.0.5/egg-dist-tmp-8r41z5mh
error: [Errno 2] No such file or directory: '/var/folders/cs/3pnj707s581bjnsfv0wndqkh0000gn/T/easy_install-03z8zowc/gspread-dataframe-3.0.5/VERSION'
Hi!
I use this library very often and realized that the value of the variable WORKSHEET_MAX_CELL_COUNT
is outdated. Google has incremented the cell count to 10 Million cells. I would like to ask if the max cell count limit can be incremented from 5,000,000 to 10,000,000.
Thanks!
I have an issue, not sure if it is on the get_as_dataframe() or set_as_dataframe(), with a worksheet where one column will hold ('? | '+ | '-). I suspect it is in the get_as_dataframe().
The gspread-dataframe library has no issue with the question mark however with the plus (+) and minus (-) I have taken steps to ensure that these characters are written as strings, prefixing them with an apostrophe. When gspread-dataframe does a get_as_dataframe() or a set_as_dataframe() these string characters become non-string characters, instead becoming math operations that result in #ERROR in the worksheet cells.
My program first performs a get_as_dataframe() then appends to that dataframe and then does a set_as_dataframe(). I suspect the issue is actually in the get_as_dataframe() because the newly appended dataframe items are written out to the worksheet correctly.
Hierarchical indexes are uploaded as tuples.
E.g., DataFrame
first second third
bar one 1
two 2
baz one 3
two 4
is uploaded as 2 column sheet
third
("bar","one") 1
("bar","two") 2
("baz","one") 3
("baz","two") 4
instead of 3 columns
Hello
I am trying to add 2 dataframes to a Google sheet (in separate tabs) by using the set_with_dataframe
function
Here are the shapes of those dataframes:
(41967, 14) (200000, 4)
This gives 1.387.000 cells in total.
However I get the following error:
Traceback (most recent call last): File "dynamo_to_google_sheet.py", line 144, in <module> set_with_dataframe(output_sheet, df) File "venv/lib/python3.8/site-packages/gspread_dataframe.py", line 247, in set_with_dataframe _resize_to_minimum(worksheet, y, x) File "venv/lib/python3.8/site-packages/gspread_dataframe.py", line 108, in _resize_to_minimum worksheet.resize(rows, cols) File "venv/lib/python3.8/site-packages/gspread/models.py", line 1272, in resize return self.spreadsheet.batch_update(body) File "venv/lib/python3.8/site-packages/gspread/models.py", line 145, in batch_update r = self.client.request( File "venv/lib/python3.8/site-packages/gspread/client.py", line 73, in request raise APIError(response) gspread.exceptions.APIError: {'code': 400, 'message': 'Invalid requests[0].updateSheetProperties: This action would increase the number of cells in the workbook above the limit of 5000000 cells.', 'status': 'INVALID_ARGUMENT'}
Since my dataframe sizes does not reach the quota I think the error message is misleading. Are there other limitations or Am I doing something wrong ?
Thanks for your help !
Hi there,
I'm trying to get the content of a GSheet as dataframe.
The GSheet contains something like:
v1 | v2 | v3 |
---|---|---|
A | TRUE | FALSE |
B | TRUE | FALSE |
C | FALSE | TRUE |
D | TRUE | TRUE |
Then I get the worksheet loaded as in the tutorial:
gsheet = gc.open_by_url(url)
worksheet = gsheet.worksheet("Sheet1")
df = get_as_dataframe(worksheet, parse_dates=True, header=0)
But the output shows no boolean (that's what I'd expect) but floats of 0.0 or 1.0.
v1 | v2 | v3 |
---|---|---|
A | 1.0 | 0.0 |
B | 1.0 | 0.0 |
C | 0.0 | 1.0 |
D | 1.0 | 1.0 |
Should I use pass some parameter to the function? Is there a solution to overcome this?
A writer object would mainly serve to automate the arguments passed to get_as_dataframe
but also a little bit for set_with_dataframe
:
from gspread_dataframe import make_reader_writer
ws = some_worksheet
df = some_dataframe
df2 = some_dataframe_with_same_columns
reader_writer = make_reader_writer(example=df, resize=True, include_index=True)
reader_writer.write(ws, df2)
df3 = reader_writer.read(ws)
print(reader_writer.get_as_dataframe_call_repr(numpy_alias='np'))
>>> "get_as_dataframe(worksheet, header=0, indexcol=0, dtype={'col1': object, 'col2': np.int64, 'col3': np.datetime64})"
Hi,
I'm geting this error message on :
from gspread_dataframe import get_as_dataframe, set_with_dataframe
I'm using the latest gspread release (3.0.0) which doesn't seem to contain the ns module
Hey, I am dealing with this behavior and I don't know if its a bug or I am doing something wrong:
import pandas as pd
import numpy as np
import gspread
import gspread_dataframe
from gspread_dataframe import set_with_dataframe
gc = gspread.service_account()
sh = gc.open('pandas_spreadsheet')
worksheet = sh.get_worksheet(0)
test_df = pd.DataFrame.from_records([{'a': i/500 + np.random.randn(), 'b': i * 2} for i in np.arange(0,1000,100)])
test_df
set_with_dataframe(worksheet, test_df)
And if i go to the cell A1 and copy the cell value is -9622382525353560
I thought it was something with the formatting, but this number is not a float.
If I export the df using
set_with_dataframe(worksheet, test_df.round(3))
I obtain the following:
And this is a the same test in a new blank worksheet (to avoid any kind of previous formatting problems)
Please let me know if there is something that I can do to help you with a PR.
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint
class GoogleSheet():
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
#creds = ServiceAccountCredentials.from_json_keyfile_name("creds.json", scope)
creds = gspread.service_account()
#client = gspread.authorize(creds)
sh = creds.open("DI ITEM SHEET") # Open the spreadhseet
wks = sh.worksheet("2022-APR") # Open the workhseet
data = wks.get_all_records()
this file runs successfully when run direct on the console
but give an error: No module named 'gspread' on server
Just updated the gspread library today (5.0.0) and a few scripts broke.
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-b68f22640753> in <module>
----> 1 import gspread_dataframe
~\anaconda3\lib\site-packages\gspread_dataframe.py in <module>
11 """
12 from gspread.utils import fill_gaps
---> 13 from gspread.models import Cell
14 import pandas as pd
15 from pandas.io.parsers import TextParser
ModuleNotFoundError: No module named 'gspread.models'
I am trying to write my pandas dataframe to a google sheet using the gspread_data package. Adding include_index=True should add an extra column to the sheet with the index from the dataframe, but it does not add it for me. My code:
from gspread_dataframe import get_as_dataframe, set_with_dataframe
df=pd.read_csv(r'C:\Users\Laila\Desktop\rawData18Sheet.csv',thousands=',',
index_col='Client',parse_dates=True,encoding = "ISO-8859-1",usecols=columns)
test_sheet = client.open("2018 Raw Data").worksheet("test_sheet")
set_with_dataframe(test_sheet, df,resize=True,
include_index=True)
I did set the index when I read the dataframe from a csv, and pulling the index with df.index.values gives me the values I want in the index of the google sheet.
import pandas as pd
import gspread
import gspread_dataframe
from gspread_dataframe import set_with_dataframe
gc = gspread.service_account()
sh = gc.open('pandas_spreadsheet')
test_dt = pd.DataFrame.from_records([{'a': i, 'b': i * 2} for i in range(100)])
set_with_dataframe(sh, test_df)
And I am obtaining the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-29-329ece69d8d4> in <module>
----> 1 set_with_dataframe(sh, test_df) #-> THIS EXPORTS YOUR DATAFRAME TO THE GOOGLE SHEET
~/HIPCAM/env_hipcam_gpu/lib/python3.7/site-packages/gspread_dataframe.py in set_with_dataframe(worksheet, dataframe, row, col, include_index, include_column_header, resize, allow_formulas, string_escaping)
235 worksheet.resize(y, x)
236 else:
--> 237 _resize_to_minimum(worksheet, y, x)
238
239 updates = []
~/HIPCAM/env_hipcam_gpu/lib/python3.7/site-packages/gspread_dataframe.py in _resize_to_minimum(worksheet, rows, cols)
98 # get the current size
99 current_cols, current_rows = (
--> 100 worksheet.col_count,
101 worksheet.row_count
102 )
AttributeError: 'Spreadsheet' object has no attribute 'col_count'
I am using
gspread 3.6.0 pypi_0 pypi
gspread-dataframe 3.1.0 pypi_0 pypi
edit: If I add values to the spread sheet by hand, I am able to get them using gspread
.
Hi,
I've got a worksheet with a True/False columns. I'd like to point out that when I try to 'get_as_dataframe' the True/False columns are sometimes parsed as 0's and 1's in the dataframe.
Thanks for your work.
I am getting formula on my pd df instead of the values
ValueError: could not convert string to float: '=iferror(BB3/W3,"")'
Hey @robin900 ,
we (@AndreCimander and me) might have found a bug which will lead to the following error.
The error is caused by this commits (00d2de4) changes in gspread_dataframe.py:53
.
As the default pandas
DataFrame
will cast an integer to a numpy.int64
, your changes will cause the JSON parsing to fail with the following set of information to reproduce the error.
Rolling back to version 3.1.0
fixes the issue, as numpy.int64
is properly casted to a str
before insertion.
Code:
test_dataframe = pd.DataFrame({"column1": [6, 6, 7], "column2": [4, 4, 4]})
[...]
gspread_dataframe.set_with_dataframe(
worksheet, dataframe, row=start_row, col=1, include_column_header=use_header
)
Error message: TypeError: Object of type int64 is not JSON serializable
Stack trace:
google_sheets_api/sheet.py:85: in append_pandas_dataframe_to_sheet
gspread_dataframe.set_with_dataframe(
venv/lib/python3.8/site-packages/gspread_dataframe.py:299: in set_with_dataframe
resp = worksheet.update_cells(cells_to_update, value_input_option='USER_ENTERED')
venv/lib/python3.8/site-packages/gspread/models.py:906: in update_cells
data = self.spreadsheet.values_update(
venv/lib/python3.8/site-packages/gspread/models.py:235: in values_update
r = self.client.request('put', url, params=params, json=body)
venv/lib/python3.8/site-packages/gspread/client.py:61: in request
response = getattr(self.session, method)(
venv/lib/python3.8/site-packages/requests/sessions.py:590: in put
return self.request('PUT', url, data=data, **kwargs)
venv/lib/python3.8/site-packages/google/auth/transport/requests.py:464: in request
response = super(AuthorizedSession, self).request(
venv/lib/python3.8/site-packages/requests/sessions.py:516: in request
prep = self.prepare_request(req)
venv/lib/python3.8/site-packages/requests/sessions.py:449: in prepare_request
p.prepare(
venv/lib/python3.8/site-packages/requests/models.py:317: in prepare
self.prepare_body(data, files, json)
venv/lib/python3.8/site-packages/requests/models.py:467: in prepare_body
body = complexjson.dumps(json)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py:231: in dumps
return _default_encoder.encode(obj)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py:199: in encode
chunks = self.iterencode(o, _one_shot=True)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py:257: in iterencode
return _iterencode(o, 0)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <json.encoder.JSONEncoder object at 0x7f9038667790>, o = 1
def default(self, o):
"""Implement this method in a subclass such that it returns
a serializable object for ``o``, or calls the base implementation
(to raise a ``TypeError``).
For example, to support arbitrary iterators, you could
implement default like this::
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return JSONEncoder.default(self, o)
"""
> raise TypeError(f'Object of type {o.__class__.__name__} '
f'is not JSON serializable')
E TypeError: Object of type int64 is not JSON serializable
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py:179: TypeError
If you need further information or assistance, please let me know.
Hi!
I frequently use this library. Thanks to you and this library, I smoothly work well.
And I've had so many use-case that needs to append feature for a manually updated existing sheet. I implemented this feature from scratch using this original gspread feature. But I believe sorta "append" feature could help many engineer's works.
1
feature.If you think this doesn't cause inconsistency against the existing code. I'd love to introduce it!
Consider this scenario where worksheet and dataframe have duplicated data:
Worksheet Dataframe
0 a0 b0 c0 b0 c0 d0
1 a1 b1 c1 b1 c1 d1
Would it be possible to add an option to eliminate the duplicated values (in this case b0, c0, b1, c1
) when uploading the data with set_with_dataframe
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.