Giter Site home page Giter Site logo

dexplo / dataframe_image Goto Github PK

View Code? Open in Web Editor NEW
279.0 279.0 41.0 12.4 MB

A python package for embedding pandas DataFrames as images into pdf and markdown documents

Home Page: https://dexplo.org/dataframe_image

License: MIT License

Python 3.13% HTML 0.49% CSS 0.03% Jupyter Notebook 96.35%

dataframe_image's Introduction

dexplo

Build Status PyPI - License

A data analysis library comparable to pandas

Installation

You must have cython installed. Run python setup.py build_ext --use-cython -i

Main Goals

  • A minimal set of features
  • Be as explicit as possible
  • There should be one-- and preferably only one --obvious way to do it.

Data Structures

  • Only DataFrames
  • No Series

Only Scalar Data Types

All data types allow nulls

  • bool - always 8 bits
  • int
  • float
  • str - stored as a categorical
  • datetime
  • timedelta

Column Labels

  • No hierarchical index
  • Column names must be strings
  • Column names must be unique

Row Labels

  • No row labels for now
  • Only a number display on the output

Subset Selection

  • Only one way to select data - [ ]
  • Subset selection will be explicit and necessitate both rows and columns
  • Rows will be selected only by integer location
  • Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
  • Slice notation is also OK

Development

  • Must use type hints
  • Must use 3.6+ - fstrings
  • numpy

Advantages over pandas

  • Easier to write idiomatically
  • String processing will be much faster
  • Nulls allowed in each data type
  • Nearly all operations will be faster

API

Attributes

  • size
  • shape
  • values
  • dtypes

Methods

Stats

  • abs
  • all
  • any
  • argmax
  • argmin
  • clip
  • corr
  • count
  • cov
  • cummax
  • cummin
  • cumprod
  • cumsum
  • describe
  • max
  • min
  • median
  • mean
  • mode
  • nlargest
  • nsmallest
  • prod
  • quantile
  • rank
  • round
  • std
  • streak
  • sum
  • var
  • unique
  • nunique
  • value_counts

Selection

  • drop
  • head
  • isin
  • rename
  • sample
  • select_dtypes
  • tail
  • where

Missing Data

  • isna
  • dropna
  • fillna
  • interpolate

Other

  • append
  • astype
  • factorize
  • groupby
  • iterrows
  • join
  • melt
  • pivot
  • replace
  • rolling
  • sort_values
  • to_csv

Other (after 0.1 release)

  • cut
  • plot
  • profile

Functions

  • read_csv
  • read_sql
  • concat

Group By - specifically with groupby method

  • agg
  • all
  • apply
  • any
  • corr
  • count
  • cov
  • cumcount
  • cummax
  • cummin
  • cumsum
  • cumprod
  • head
  • first
  • fillna
  • filter
  • last
  • max
  • median
  • min
  • ngroups
  • nunique
  • prod
  • quantile
  • rank
  • rolling
  • size
  • sum
  • tail
  • var

str - df.str.<method>

  • capitalize
  • cat
  • center
  • contains
  • count
  • endswith
  • find
  • findall
  • get
  • get_dummies
  • isalnum
  • isalpha
  • isdecimal
  • isdigit
  • islower
  • isnumeric
  • isspace
  • istitle
  • isupper
  • join
  • len
  • ljust
  • lower
  • lstrip
  • partition
  • repeat
  • replace
  • rfind
  • rjust
  • rpartition
  • rsplit
  • rstrip
  • slice
  • slice_replace
  • split
  • startswith
  • strip
  • swapcase
  • title
  • translate
  • upper
  • wrap
  • zfill

dt - df.dt.<method>

  • ceil
  • day
  • day_of_week
  • day_of_year
  • days_in_month
  • floor
  • freq
  • hour
  • is_leap_year
  • is_month_end
  • is_month_start
  • is_quarter_end
  • is_quarter_start
  • is_year_end
  • is_year_start
  • microsecond
  • millisecond
  • minute
  • month
  • nanosecond
  • quarter
  • round
  • second
  • strftime
  • to_pydatetime
  • to_pytime
  • tz
  • tz_convert
  • tz_localize
  • weekday_name
  • week_of_year
  • year

td - df.td.<method>

  • ceil
  • components
  • days
  • floor
  • freq
  • microseconds
  • milliseconds
  • nanoseconds
  • round
  • seconds
  • to_pytimedelta

dataframe_image's People

Contributors

geohyeon avatar hns258 avatar itamarshalev avatar mbustosorg avatar mddietz1 avatar moshemoshe137 avatar padgy avatar paleneutron avatar ramzeng avatar tdpetrou avatar theonlywayup avatar valerianrossigneux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataframe_image's Issues

nbconvert failed: module 'asyncio' has no attribute 'run'

when downloading the result from

File -> Download as -> PDF - Dataframe as image (from browser) .pdf

running smoothly in Python 3.8 environment but raising an error in a Python 3.6 environment

  nbconvert failed: module 'asyncio' has no attribute 'run'

Reason:
The asyncio.run() function was added in Python 3.7.

Nothing is exported if filename is pathlib.Path type

When a pathlib.Path object is passed to dfi.export, nothing is saved and no error is raised (although pathlib.Path should be supported)

To replicate:

  1. The code below will save nothing (no errors, just no output!)
from pathlib import Path

import numpy as np

import pandas as pd

import dataframe_image as dfi

filename = Path('test.png')
pd.DataFrame(np.random.rand(6,4)).dfi.export(filename)
  1. Adding a str() will make dataframe_image work as per documentation:
pd.DataFrame(np.random.rand(6,4)).dfi.export(str(filename))

Version information:

>>> sys.version
'3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]'
>>> dfi.__version__
'0.1.1'
>>> pd.__version__
'1.3.2'

Latex is not rendered in the exported image

Hi, thank you for developing this package, however I am having some problems. This is my configuration:

Windows 11
MiKTeX installed and updated (as suggested in nbconvert's documentation)
python 3.10.1
dataframe-image 0.1.2
jupyter-client 7.1.0
jupyter-contrib-core 0.3.3
jupyter-contrib-nbextensions 0.5.1
jupyter-core 4.9.1
jupyter-highlight-selected-word 0.2.0
jupyter-latex-envs 1.4.6
jupyter-nbextensions-configurator 0.4.1
jupyterlab-pygments 0.1.2
jupyterlab-widgets 1.1.0

Case 1

When I run this notebook on google chrome

1

the latex code in the exported image is not rendered

df_styled

Case 2

I tried also by running it in the cmd, i.e.

1

but the result is the same.

Case 3

Finally, I tried also File -> Download as -> DataFrame as Image (PDF or Markdown) -> Tool used for Conversion to PDF -> LaTeX, but also in this case the latex code is not rendered

1

Am I doing something wrong?

Export just a DataFrame - not just the whole notebook

Really useful package. It works well for me. One thing I would be interested in doing is exporting just the Dataframe - not the entire notebook. Is that a potential option or is that way outside the scope of this project?

Thanks again for the package.

Can't encode unicode characters

Example minimum working dataframe:

df = pd.DataFrame(["абв"])
dfi.export(df, smirnov_sent_df.png', max_rows = 1)


UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-79-48c9f88e0332> in <module>
----> 1 dfi.export(df, 'smirnov_sent_df.png', max_rows = 1)

~\Anaconda3\lib\site-packages\dataframe_image\_pandas_accessor.py in export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
     22 def export(obj, filename, fontsize=14, max_rows=None, max_cols=None, 
     23                table_conversion='chrome', chrome_path=None):
---> 24         return _export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
     25 
     26 

~\Anaconda3\lib\site-packages\dataframe_image\_pandas_accessor.py in _export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
     71         html = obj.to_html(max_rows=max_rows, max_cols=max_cols, notebook=True)
     72 
---> 73     img_str = converter(html)
     74 
     75     if isinstance(filename, str):

~\Anaconda3\lib\site-packages\dataframe_image\_screenshot.py in run(self, html)
    165     def run(self, html):
    166         self.html = self.css + html
--> 167         img = self.take_screenshot()
    168         img_str = self.finalize_image(img)
    169         return img_str

~\Anaconda3\lib\site-packages\dataframe_image\_screenshot.py in take_screenshot(self)
     93         temp_img = Path(temp_dir.name) / "temp.png"
     94         with open(temp_html, "w") as f:
---> 95             f.write(self.html)
     96 
     97         with open(temp_img, "wb") as f:

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 1296-1298: character maps to <undefined>

edit: If we could pass encoding, for instance utf-16, that would solve the issue

Is matplotlib not support dataframe styler?

I want to export pic and try use some dataframe style in my remote server (table_conversion='matplotlib'), but doesn't seem to work. The style works when tested locally with Chrome

Table is created with additional whitespace between rows.

Using dataframe_image.export creates an image with additional whitespace between rows compared to the image that is created when I use styledPandasDataframe.to_html() (in other words, the text font size is the same but the row heights are higher on the dataframe_image.export version.

Is there a way to restrict / change row height of a table that is created using dataframe_image.export?

Thank you very much!

highlight based on cell value

How to change the font color or the background color of a cell of a certain column if the value is >= of some number? Thank you.

Chrome executable error

I'm running jupyter lab on a linux machine and this is the error I get while trying to export a dataframe as an image:
OSError Traceback (most recent call last)
in
----> 1 user_df.head().dfi.export('df.png')

~/py36/lib/python3.6/site-packages/dataframe_image/_pandas_accessor.py in export(self, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
17 table_conversion='chrome', chrome_path=None):
18 return _export(self._df, filename, fontsize, max_rows, max_cols,
---> 19 table_conversion, chrome_path)
20
21

~/py36/lib/python3.6/site-packages/dataframe_image/_pandas_accessor.py in _export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
31 if table_conversion == 'chrome':
32 converter = Screenshot(max_rows=max_rows, max_cols=max_cols, chrome_path=chrome_path,
---> 33 fontsize=fontsize, encode_base64=False, limit_crop=False).run
34 else:
35 from ._matplotlib_table import TableMaker

~/py36/lib/python3.6/site-packages/dataframe_image/_screenshot.py in init(self, center_df, max_rows, max_cols, chrome_path, fontsize, encode_base64, limit_crop)
74 self.ss_width = 1400
75 self.ss_height = 900
---> 76 self.chrome_path = get_chrome_path(chrome_path)
77 self.css = self.get_css(fontsize)
78 self.encode_base64 = encode_base64

~/py36/lib/python3.6/site-packages/dataframe_image/_screenshot.py in get_chrome_path(chrome_path)
50 if chrome_path:
51 return chrome_path
---> 52 raise OSError("Chrome executable not able to be found on your machine")
53 elif system == "windows":
54 import winreg

OSError: Chrome executable not able to be found on your machine

Any ideas how to solve this?

FileNotFoundError: [Errno 2] No such file or directory:

Não consigo mais exportar o dataframe para uma imagem depois que compilei meu codigo em .exe com o pyinstaller
Apresenta o seguinte erro.

Exception in Tkinter callback
Traceback (most recent call last):
File "tkinter_init_.py", line 1892, in call
File "automacao_dash.py", line 352, in atualiza_image
File "automacao_dash.py", line 341, in selecao_dash_image
File "automacao_dash.py", line 363, in cria_image_png
File "dataframe_image_pandas_accessor.py", line 24, in export
File "dataframe_image_pandas_accessor.py", line 32, in _export
File "dataframe_image_screenshot.py", line 77, in init
File "dataframe_image_screenshot.py", line 84, in get_css
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\*****\AppData\Local\Temp\_MEI221163\dataframe_image\static\style.css'

Save table in Docker problem

Hey there!)
I got some problems using the package in docker. I got this traceback inside:

>>> df = pd.DataFrame({"1": [1, 2, 3], "2":[5, 1, 5]})
>>> dfi.export(df, "pic.png")
[0819/121227.333151:ERROR:zygote_host_impl_linux.cc(90)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/dataframe_image/_pandas_accessor.py", line 24, in export
    return _export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
  File "/usr/local/lib/python3.6/site-packages/dataframe_image/_pandas_accessor.py", line 73, in _export
    img_str = converter(html)
  File "/usr/local/lib/python3.6/site-packages/dataframe_image/_screenshot.py", line 167, in run
    img = self.take_screenshot()
  File "/usr/local/lib/python3.6/site-packages/dataframe_image/_screenshot.py", line 119, in take_screenshot
    img = mimage.imread(buffer)
  File "/usr/local/lib/python3.6/site-packages/matplotlib/image.py", line 1496, in imread
    with img_open(fname) as image:
  File "/usr/local/lib/python3.6/site-packages/PIL/ImageFile.py", line 121, in __init__
    self._open()
  File "/usr/local/lib/python3.6/site-packages/PIL/PngImagePlugin.py", line 676, in _open
    raise SyntaxError("not a PNG file")
SyntaxError: not a PNG file

Dockerfile with relevant to the issue part:

FROM python:3.6.14-buster

RUN apt-get update && \
    apt install chromium-driver libpng-dev tree -y

COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

I do use: dataframe-image==0.1.1; matplotlib==3.3.4
I have installed chromedriver ChromeDriver 90.0.4430.212

you can also see libpng-dev installation, it was one of my tries to fix the problem.. so I guess that image may not contains needed utils for the correct working of chromedriver

maybe this problem has already been solved or, can you list what dependencies are needed tho?
Thank you!)

Problems with Google Chrome 99.0.4844.82 on Ubuntu 20.04

Running dfi.export(df_styled, "output_file.png") on Ubuntu 20.04 with latest Google Chrome (version 99) creates the following warnings and errors.

[0324/143535.290253:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0324/143538.178682:ERROR:sandbox_linux.cc(377)] InitializeSandbox() called with multiple threads in process gpu-process.
[0324/143538.267157:INFO:headless_shell.cc(659)] Written to file /tmp/tmpb2ua5voz/temp.png.

The problem is related to:
_screenshot.py

97 with open(temp_img, "wb") as f:
98 args = [
99 "--enable-logging",
100 "--disable-gpu",
101 "--headless"
102 ]

Google Chrome version 99 seem to have a problem with "--disable-gpu" and "--headless" settings.
Running google-chrome --disable-gpu in Terminal causes:
[34249:34249:0324/144348.025377:ERROR:sandbox_linux.cc(377)] InitializeSandbox() called with multiple threads in process gpu-process.
and
google-chrome --headless
[0324/144437.931618:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.

When I disable these two options inside _screenshot.py the *.html file is displayed onscreen and a png file is not created, so I am not sure how to fix this problem.

Can someone help?

Change DPI of the saving image

Awesome package to save tables in Jupyter! Just would love to know whether it could be possible to support the change of DPI while saving the image, so like dpi=300 in the export function? Thank you!

_screenshot.py的95行,有一个中文的bug

当内容为中文时,在如题位置会报一个gbk的错误,需要手工改为with open(temp_html, "w",encoding='utf8') as f:才可以。
另外,由于chrome安装在win server 2019的时候,注册表读取位置是没有相应的键值的,所以也会报错,需要手动修改53行后面的内容才行。

Script being killed after "multiple threads in process gpu-process" message

Hi,

I have a script which creates a dataframe and generates a PNG file using dfi.export(). I ran it for a year on a raspberry pi and it never failed once. Now, I run it on a Ubuntu VPS and it works 90%-95% of the time. Sometimes the process just gets killed. It has the same amount of RAM as the raspberry I was using.

When running the code, I get following messages:

[0105/163538.099004:ERROR:sandbox_linux.cc(376)] InitializeSandbox() called with multiple threads in process gpu-process.
[0105/163538.352526:INFO:headless_shell.cc(653)] Written to file /tmp/tmp35i6n8rc/temp.png.

So I guess it has something to do with the memory usage of the GPU. But for that single small images, it seems strange that it takes up all the memory. How can we disable the use of the GPU?

Or are there any other solutions I can apply not to receive these messages?

Error in running exe file created from pyinstaller-NotImplementedError: Can't perform this operation for unregistered loader type

i am getting below error while running exe file created from pyinstaller

import dataframe_image as dfi
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in load_unlocked
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 493, in exec_module
exec(bytecode, module.dict)
File "dataframe_image_init
.py", line 2, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 493, in exec_module
exec(bytecode, module.dict)
File "dataframe_image_pandas_accessor.py", line 2, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in load_unlocked
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 493, in exec_module
exec(bytecode, module.dict)
File "pandas\io\formats\style.py", line 62, in
File "pandas\io\formats\style.py", line 141, in Styler
File "jinja2\environment.py", line 883, in get_template
File "jinja2\environment.py", line 857, in load_template
File "jinja2\loaders.py", line 115, in load
File "jinja2\loaders.py", line 248, in get_source
File "pkg_resources_init
.py", line 1389, in has_resource
File "pkg_resources_init
.py", line 1456, in _has
NotImplementedError: Can't perform this operation for unregistered loader type

Quality of Pandas DataFrame Image Export

I'm currently in the project stage of my Master's dissertation and I need a way to turn a Pandas DataFrame table into an image which is to be included into a Word Document.

I am using the API to export the data frame but the converted image resolution is quite low. Is there a way of changing this please? Apologies if it's a silly question or an easy one to answer, but I can't for the life of me figure it out

I've included a couple of images as examples.

Thanks in advance

XX_crime_top_crimes_display_table_London_Islington_
.
XX_earnings_ranking_display_table_London_Croydon_

Adjusting image resolution

When exporting the dataframe as image, there is an argument to reduce the font-size in the export function. To accomodate long texts such as e-mail ids in the dataframe, I had to tweak that value to my liking of 5 (where all the mail ids had fit perfectly). There was no option to increase the size of the image so reducing font size seems the only way around.
Although if I reduce the font size, the image is very low-scale and blurry, which it shouldn't be. Is there any way to increase the resolution of the image?

How to disable image cropping with table_conversion="chrome"

Hi, many thanks for this library. Came across an issue where setting the table conversion to chrome seems to come out with a third of what should be the full image. Switching the table conversion to matplotlib returns the right image but devoid of all styling. Is there a flag we might be able to use to stop the cropping?

Thanks

Save to remote file system

Hey there, quick question that I am trying to help someone with. They are looking to save their dif-generated image directly to an S3 bucket on AWS without saving the image locally. Do you know of any ways to specify this currently with the library?

Unable to save png on a headless server

Hello when attempting to save a dataframe as such:

import dataframe_image as dfi
from IPython.display import display, HTML
...

        percent_columns = ['Precision', 'Recall', 'F-1']
        print('INTENT ARGS EVAL:')
        output = pd.DataFrame(exp_intent_args['accuracy'] - prod_intent_args['accuracy'])
        output = output.style.applymap(lambda x: 'color : green' if x>=0 else 'color : red')
        display(output)
        output.export_png('intent_args_diff.png')

I get the following error only on my server:

Traceback (most recent call last):
  File "./utterance_understanding/eval/run_evaluation.py", line 279, in <module>
    main()
  File "./utterance_understanding/eval/run_evaluation.py", line 115, in main
    get_baseline_diff(args.baseline_path, os.path.join(evaluation_output_dir, 'debug'))
  File "./utterance_understanding/eval/run_evaluation.py", line 142, in get_baseline_diff
    get_exp_diff(exp_intent, exp_intent_args, prod_intent, prod_intent_args)
  File "./utterance_understanding/eval/run_evaluation.py", line 124, in get_exp_diff
    output.export_png('intent_args_diff.png')
  File "/miniconda/lib/python3.7/site-packages/dataframe_image/_pandas_accessor.py", line 24, in export
    return _export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
  File "/miniconda/lib/python3.7/site-packages/dataframe_image/_pandas_accessor.py", line 73, in _export
    img_str = converter(html)
  File "/miniconda/lib/python3.7/site-packages/dataframe_image/_screenshot.py", line 167, in run
    img = self.take_screenshot()
  File "/miniconda/lib/python3.7/site-packages/dataframe_image/_screenshot.py", line 119, in take_screenshot
    img = mimage.imread(buffer)
  File "/miniconda/lib/python3.7/site-packages/matplotlib/image.py", line 1490, in imread
    with img_open(fname) as image:
  File "/miniconda/lib/python3.7/site-packages/PIL/ImageFile.py", line 121, in __init__
    self._open()
  File "/miniconda/lib/python3.7/site-packages/PIL/PngImagePlugin.py", line 676, in _open
    raise SyntaxError("not a PNG file")
SyntaxError: not a PNG file```

Saving styled table as png files, outside of notebook

I am trying a simple export on a headless machine, at shell. The following gives me an error that it is trying to connect to my X11 server. Of course, the issue is that I am at a command prompt ssh'd in. Presumably it is trying to use chome. If that is the issue, is there a way to get it to use matplotlib like you can when you export entire notebooks?

(styled template derived from dataframe)
import dataframe_image as dfi
dfi.export(styled_table,'test_file.png')

FutureWarning: this method is deprecated in favour of `Styler.to_html()`

With recent changes to Pandas (I'm using version 1.4.0), Styler.render() is being depreciated in favour of Styler.to_html().

Inside _pandas_accessor.py, change:
68 if is_styler:
69 html = '<div>' + obj.render() + '</div>'

to
68 if is_styler:
69 html = '<div>' + obj.to_html() + '</div>'

That got rid of the future warning for me.

Dataframe as PNG object

Can we get the ability to return the styled Dataframe as a PNG object, without needing to save the image?

Even though dfi.export() is currently broken with respect to several situations (including on Google Colab), once I have the image object, I can use other means to save it.

This would address several issues, including #6 #7 #9 #13 #15

Superscript text not exported properly

I have a data-frame with some Superscript text crate with the code [cm\N{SUPERSCRIPT THREE}/min] .
grafik

However when i export this df as a image with the code below.

import dataframe_image as dfi
dfi.export(styled_df, "resultTable.png"))

It does not get exported properly (pic below). There a weird straight line like character instead of Superscript
grafik

Is there workaround for this?

Add parameter for nbconvert configuration options

With nbconvert, you have the option of specifying a large number of configuration options. We even set a single config option here with PDFExporter. Perhaps we can add a config parameter, have it accept a dictionary, and forward it to PDFExporter.

All of the preprocessors and exporters have config options, so I think it will be messy to try and parse all of the options. Maybe we just do PDFExporter options first?

I needed this personally to use a different latex template.

Can change font family when use matplotlib?

It will be wrong when i export png. It seems that matplotlib dont have 'Helvetica' font family. Can i change it in my code? Or can choice 'sans-serif'.
At the moment I change the ‘[_matplotlib_table.py]’ by my self.

Saving as SVG using matplotlib backend

Hi,
Love this library and it solves a problem I've had for a while, so thank you!

I'd like to be able to save to .svg with the matplotlib backend, due to resolution/filesize issues with .png. Matplotlib accepts a format keyword in savefig, i.e. self.fig.savefig(buffer, bbox_inches=bbox, format=self.format).

This requires inferring the format by taking the last three letters of filename in _pandas_accessor.py, i.e. converter = TableMaker(fontsize=fontsize, encode_base64=False, for_document=False, format=filename[-3:]).run. That seems errorprone but I just thought I'd mention the need and reasons for saving to svg :)

Thanks for your time!

Non Fatal chromdriver error logs are bloating app logs

I'm using this project as a library in my own work.

Seems like since this library uses chromedriver for rendering, the below logs are written upon every conversion. Any strategy to remove these or fix this?

I tried disabling all loggers but to no avail. These logs only occur when table_conversion="chrome" in dfi.export(...)

OS: MacOS BigSur
Run Time env: IntelliJ IDE
Python: 3.9.x

[0508/054235.070550:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/: Operation not permitted (1)
[0508/054235.073022:ERROR:file_io.cc(90)] ReadExactly: expected 8, observed 0
[0508/054235.074372:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/: Operation not permitted (1)
[0508/054236.207822:INFO:headless_shell.cc(616)] Written to file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/tmpf9m8tj4w/temp.png.
[0508/054236.609532:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/: Operation not permitted (1)
[0508/054236.610523:ERROR:file_io.cc(90)] ReadExactly: expected 8, observed 0
[0508/054236.611275:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/: Operation not permitted (1)
[0508/054238.018062:INFO:headless_shell.cc(616)] Written to file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/tmpdd2jt105/temp.png.
[0508/054238.785091:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/: Operation not permitted (1)
[0508/054238.786112:ERROR:file_io.cc(90)] ReadExactly: expected 8, observed 0
[0508/054238.786894:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/: Operation not permitted (1)
[0508/054241.031103:INFO:headless_shell.cc(616)] Written to file /var/folders/m9/lbcz29990ygdcdyrt8xjsdgm0000gn/T/tmp_ygkarof/temp.png.

when argument table_conversion = 'matplotlib' of export functions,the dataframe's set_caption will not work.

when I run below code on my macbook,everything is Ok;

    helper = WxMsgHelper()
    current_date = datetime.datetime.now().strftime('%Y-%m-%d')
    for user_name in user_tables_map:
        df1 = df[df.createby.str.contains(fr'\b{user_name}\b', regex=True, case=False)]
        if len(df1) > 10:
            df_styled = df1.style.background_gradient() 
            df_styled = df_styled.set_caption('<b>time:{}<br>'.format(current_date))
            dfi.export(df_styled,"{}.png".format(user_name),max_rows=-1)
            helper.send_wx_image('zds', "{}.png".format(user_name))
        break

when I run below code on my macbook, the caption of table lost.

    helper = WxMsgHelper()
    current_date = datetime.datetime.now().strftime('%Y-%m-%d')
    for user_name in user_tables_map:
        df1 = df[df.createby.str.contains(fr'\b{user_name}\b', regex=True, case=False)]
        if len(df1) > 10:
            df_styled = df1.style.background_gradient() 
            df_styled = df_styled.set_caption('<b>time:{}<br>'.format(current_date))
            dfi.export(df_styled,"{}.png".format(user_name),max_rows=-1,,table_conversion = 'matplotlib')
            helper.send_wx_image('zds', "{}.png".format(user_name))
        break

Does anyone know how to set_caption when I set table_conversion = 'matplotlib'.

NoSuchKernel Error

Trying to convert a notebook to pdf, .convert raises a NoSuchKernel exception.
I'm inside a conda environment, so I don't have any Python bound to python3.

Steps to reproduce:

import dataframe_image as dfi
dfi.convert('notebook.ipynb', to='pdf')

gives:

---------------------------------------------------------------------------
NoSuchKernel                              Traceback (most recent call last)
<ipython-input-27-1f6fca5272c8> in <module>
----> 1 dfi.convert('Tutorial-QuickStart.ipynb', to='pdf')

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/dataframe_image/_convert.py in convert(filename, to, max_rows, max_cols, ss_width, ss_height, resize, chrome_path, limit)
    193     c = Converter(Path(filename), to, max_rows, max_cols, ss_width, ss_height,
    194                  resize, chrome_path, limit)
--> 195     c.convert()

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/dataframe_image/_convert.py in convert(self)
    128
    129     def convert(self):
--> 130         self.execute_notebook()
    131         for kind in self.to:
    132             getattr(self, f"to_{kind}")()

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/dataframe_image/_convert.py in execute_notebook(self)
     91             timeout=600, kernel_name="python3", allow_errors=True, extra_arguments=extra_arguments
     92         )
---> 93         ep.preprocess(self.nb, resources)
     94
     95     def to_pdf(self):

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py in preprocess(self, nb, resources, km)
    401             resources = {}
    402
--> 403         with self.setup_preprocessor(nb, resources, km=km):
    404             self.log.info("Executing notebook with kernel: %s" % self.kernel_name)
    405             nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)

//anaconda/envs/jupyter_to_medium/lib/python3.7/contextlib.py in __enter__(self)
    110         del self.args, self.kwds, self.func
    111         try:
--> 112             return next(self.gen)
    113         except StopIteration:
    114             raise RuntimeError("generator didn't yield") from None

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py in setup_preprocessor(self, nb, resources, km, **kwargs)
    343         if km is None:
    344             kwargs["cwd"] = path
--> 345             self.km, self.kc = self.start_new_kernel(**kwargs)
    346             try:
    347                 # Yielding unbound args for more easier understanding and downstream consumption

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py in start_new_kernel(self, **kwargs)
    289         if km.ipykernel and self.ipython_hist_file:
    290             self.extra_arguments += ['--HistoryManager.hist_file={}'.format(self.ipython_hist_file)]
--> 291         km.start_kernel(extra_arguments=self.extra_arguments, **kwargs)
    292
    293         kc = km.client()

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/jupyter_client/manager.py in start_kernel(self, **kw)
    299              and launching the kernel (e.g. Popen kwargs).
    300         """
--> 301         kernel_cmd, kw = self.pre_start_kernel(**kw)
    302
    303         # launch the kernel subprocess

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/jupyter_client/manager.py in pre_start_kernel(self, **kw)
    252         # build the Popen cmd
    253         extra_arguments = kw.pop('extra_arguments', [])
--> 254         kernel_cmd = self.format_kernel_cmd(extra_arguments=extra_arguments)
    255         env = kw.pop('env', os.environ).copy()
    256         # Don't allow PYTHONEXECUTABLE to be passed to kernel process.

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/jupyter_client/manager.py in format_kernel_cmd(self, extra_arguments)
    176             cmd = self.kernel_cmd + extra_arguments
    177         else:
--> 178             cmd = self.kernel_spec.argv + extra_arguments
    179
    180         if cmd and cmd[0] in {'python',

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/jupyter_client/manager.py in kernel_spec(self)
     82     def kernel_spec(self):
     83         if self._kernel_spec is None and self.kernel_name != '':
---> 84             self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name)
     85         return self._kernel_spec
     86

//anaconda/envs/jupyter_to_medium/lib/python3.7/site-packages/jupyter_client/kernelspec.py in get_kernel_spec(self, kernel_name)
    233         resource_dir = self._find_spec_directory(kernel_name.lower())
    234         if resource_dir is None:
--> 235             raise NoSuchKernel(kernel_name)
    236
    237         return self._get_kernel_spec_by_name(kernel_name, resource_dir)

NoSuchKernel: No such kernel named python3

Round to two decimal places

How can I round to two decimal places all numbers in DF?
Now use this:
df_styled = df.style.background_gradient()
dfi.export(df_styled, "adv.png")
Return 6 decimal places

Thank you!

Table size smaller when exporting

When exported, the size of the table is always the same, regardless of the settings made for its size in the jupyter notebook (when viewing it in the Jupyter notebook it has the specified size, but when exporting it is minimal). Therefore, it is impossible to see the letters and what is written with many rows and few columns.

Hiding Input

Hi, love your package. Is there maybe a possibility to hide the input cells? For example, when I am using jupyter nbconvert I would use it with these flags: jupyter nbconvert notebook.ipynb --to pdf --no-input --no-prompt --output result.pdf. So are there flags available equivalent to these ones?

Strange markdown convertion behavior

Hi,

Thanks for your work, very useful to me.

When using dataframe_image to produce a markdown conversion of my notebooks, I notice that part of the markdown result is ... html (paragraphs, sections, ...) !

I looked at the source code but I'm not familiar with the usage of nbconvert as a library.

All I can say is that a MarkdownExporter is used but before that a MarkdownPreprocessor is called.
If I remove it, my problem is solved.

I guess that Preprocessing of MarkDown cells is needed when rendering to latex/pdf, but I think it is unecessary when rendering to markdown.

A simple fix in convert() from _convert.py function would be (line 327 in master):

if "md" not in self.to:
    MarkdownPreprocessor().preprocess(self.nb, self.resources)

What do you think about it ?

If you wish, I can do a PR. I have never done that before but it's an opportunity to learn :)

Change the DPI or quality

Hi,

Thank you very much for this amazing package to save tables in image. Would it be possible to support the change of DPI or improve the quality of the image, please ?

Thank you.

Florent

Works great on local machine, but doesn't work on Google Colab

Hi, thank you for library.
It works great on the local machine, but I moved my script to Google Colab.
Step 1: install all libraries

!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver

sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

!pip install dataframe_image
import dataframe_image as dfi

Step 2: export df using library

df.dfi.export('df.png')

And got error: OSError: read past end of file

Step 3: export df using library with specifying chrome webdriver path

df.dfi.export('df.png', chrome_path='/usr/lib/chromium-browser/chromedriver')

This process goes for ever. On local machine it takes a few second with same dataframe.

Please, ask any details if needed

I got the error 'Styler' object has no attribute 'dfi'

Hello, I am using dataframe_image to save styled dataframe as PNG. But I got the error 'Styler' object has no attribute 'dfi'. This error is embarrassing, because the description of this library states that styled dataframes can also be saved.

my code is

df = df.round(2)
df = df.style.applymap(highlight_geo)
df.dfi.export('figure/xxxx.png')

, and the error is

AttributeError: 'Styler' object has no attribute 'dfi'

Oh, dfi.export(df, 'figure/xxx.png') works well. But df.dfi.export('figure/xxxx.png') does not work.

Export styled dataframe to png not working with matplotlib

Hello,

I have run the following example code to export a dataframe to a png, using the "matplotlib" table converter.

import dataframe_image as dfi
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(6, 4))
df_styled = df.style.background_gradient()
dfi.export(df_styled, 'df_styled.png', table_conversion='matplotlib')

The image is exported without errors (only a warning about the helvetica font not being found). However, when opening the image afterwards, the style of the dataframe has disappeared and a simple dataframe without colors is displayed.
On the other hand, when using "chrome" as a table converter, the dataframe is correctly exported with its style.
Sadly, in the environement in which I am working, chrome is not available and I can't install it there, so this is not an option.

Is this the "expected" behavior when using matplotlib, or should it work like with chrome, i.e. export dataframe colors too?

Missing dependency for `table_conversion='matplotlib'`

Execution of e.g. dfi.export(dataframe, filename, table_conversion='matplotiib') currently fails with

    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

This can obviously be addressed by installing lxml.

However since BeautifulSoup note that requested parsers should be installed manually, and this parser is explicitly requested as a hard-coded argument here, should lxml not be included as a stated dependency of this package?

Table snapshots are shrinking throughout the document

Hi all,
I am trying to pdf a Jupyter Notebook using "Download as" - "DataFrame as Image" then using the Latex option and Chrome for Screenshots. My document has tables of different width. The large tables are shrunk correctly to fit onto the page. However, it seems that the shrinkage factor isn't reset. So some "normal sized" tables towards the end of the document are shrunk to a fraction of their size.

Example attached. I would expect the first and third table to look identical, but the third table is half the size. It seems the shrinkage factor applied to table 2 is never being reset. In my actual document, the impact is actually much more severe,

The issue seems to be Chrome specific and Matplotlib does not seem to have the same issue. The shrinkage issue is also showing when saving the notebook with embedded images "_dataframe_image", but picture files when downloading as a Markdown file look OK.

image

Run in a docker

Hi, when I trying to run the package inside a docker container I get:

>>> import dataframe_image as dfi
>>> dfi.export(df, 'dframe.png')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/dataframe_image/_pandas_accessor.py", line 24, in export
    return _export(obj, filename, fontsize, max_rows, max_cols, table_conversion, chrome_path)
  File "/usr/local/lib/python3.8/site-packages/dataframe_image/_pandas_accessor.py", line 32, in _export
    converter = Screenshot(max_rows=max_rows, max_cols=max_cols, chrome_path=chrome_path,
  File "/usr/local/lib/python3.8/site-packages/dataframe_image/_screenshot.py", line 76, in __init__
    self.chrome_path = get_chrome_path(chrome_path)
  File "/usr/local/lib/python3.8/site-packages/dataframe_image/_screenshot.py", line 52, in get_chrome_path
    raise OSError("Chrome executable not able to be found on your machine")
OSError: Chrome executable not able to be found on your machine

Tried to look it up but find only some comments about "pip install chromium-chromedriver", but it seems that it cant be found..

Any help will be welcomed !

Problems with export_png, export

import pandas as pd

import numpy as np

import dataframe_image as dfi

import os

cwd=os.getcwd()

df = pd.DataFrame(np.random.randn(6, 6), columns=list('ABCDEF'))

dfi.export(df, os.path.join(cwd, 'title.png'))

dfi.export_png(df, os.path.join(cwd, 'title.png'))
Problems
dfi.export_png(df,"mytable.png") did not work for me:
Object dfi.export_png not found.

dfi.export(df,"mytable.png") did the work, but additionally saved the jupyter notebook.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.