ricklupton / floweaver Goto Github PK
View Code? Open in Web Editor NEWView flow data as Sankey diagrams
Home Page: https://floweaver.readthedocs.io
License: MIT License
View flow data as Sankey diagrams
Home Page: https://floweaver.readthedocs.io
License: MIT License
Hi,
I'm having trouble with my flows when I want them to miss a stage in the diagram. In the image below, for example, I'd like Polyethylene to join directly to Thermoplastics, instead of grouping together with the Urea and other flows in the stage in between. I bundle each stage with each other stage so wondered if this might be the issue, but can't see a way to create the diagram without this. I've copied excerpts of my code below too.
nodes = {
'Source': ProcessGroup(['Oil', 'Natural Gas', 'Refinery Sourced', 'Secondary Reactants']),
'Primaries': ProcessGroup(['Ammonia', 'Methyl Alcohol', 'Ethylene', 'Propylene', 'C4 Stream', 'BTX aromatics']),
'First-Tier': ProcessGroup(['Urea', 'Ammonium Phosphate', 'Ammonium Sulphate', 'Nitric Acid', 'Polyethylene', 'Vinyl Chloride', 'Polypropylene', 'Styrene', 'Ethylene Glycol', 'Terephthalic Acid']),
'Second-Tier': ProcessGroup(['Ammonium Nitrate', 'Polyvinyl Chloride', 'Polystyrene', 'Polystyrene', 'Polyethylene Terephthalate']),
'End Products': ProcessGroup(['N-fertilisers', 'Thermoplastics', 'Thermosets, Fibre & Elastomers', 'Solvents, Additives & Explosives', 'Other', 'Secondary Products']),
}
ordering = [
['Source'],
['Primaries'],
['First-Tier'],
['Second-Tier'],
['End Products'],
]
bundles = [
Bundle('Source', 'Primaries'),
Bundle('Primaries', 'First-Tier'),
Bundle('First-Tier', 'Second-Tier'),
Bundle('Source', 'First-Tier'),
Bundle('Source', 'Second-Tier'),
Bundle('Source', 'End Products'),
Bundle('Primaries', 'End Products'),
Bundle('First-Tier', 'End Products'),
Bundle('Second-Tier', 'End Products'),
]
nodes['Source'].partition = sources_by_name
nodes['Primaries'].partition = primaries_by_name
nodes['First-Tier'].partition = firsttier_by_name
nodes['Second-Tier'].partition = secondtier_by_name
nodes['End Products'].partition = endproducts_by_name
Cheers
Should be compatible with:
It's an easy mistake to make, but just gives a generic KeyError.
Hi,
I decided to ask this question as an issue, since I feel like this is missing in your tutorial. After hours of playing around I haven't figured out how to generate a png / svg etc. I tried to use show_sankey
without using a ipython notebook, but the image strings are not generated. Is there a way to save the plot to a file (not only the json description with save_sankey_data
)? Alternatively do you have some code to generate json data that is compatible with your d3 plugin?
Thank you,
Roman
Unless I'm missing something, I can't find this explained anywhere. Does this have to be done through ipywidget, if so, explaining how to feed that into weave() would be nice!
I have tried to run the example you provide in the ipython notebook but I get a: AttributeError: 'DiGraph' object has no attribute 'get_node' error when I run the sankey = show_sankey(sdd, dataset, width=800, height=500)
command in your example ipython notebook. Is that a known issue / am I using the wrong version of python or something like that?
Thank you!
I am using python 3.5.4 with the following environment:
attrs (17.3.0)
bleach (2.1.1)
certifi (2016.2.28)
decorator (4.1.2)
entrypoints (0.2.3)
html5lib (1.0b10)
ipykernel (4.6.1)
ipysankeywidget (0.2.2)
ipython (6.2.1)
ipython-genutils (0.2.0)
ipywidgets (7.0.5)
jedi (0.11.0)
Jinja2 (2.10)
jsonschema (2.6.0)
jupyter-client (5.1.0)
jupyter-core (4.4.0)
MarkupSafe (1.0)
mistune (0.8.3)
nbconvert (5.3.1)
nbformat (4.4.0)
networkx (2.0)
notebook (5.2.2)
numpy (1.13.3)
palettable (3.1.0)
pandas (0.21.0)
pandocfilters (1.4.2)
parso (0.1.0)
pexpect (4.3.0)
pickleshare (0.7.4)
pip (9.0.1)
prompt-toolkit (1.0.15)
ptyprocess (0.5.2)
Pygments (2.2.0)
python-dateutil (2.6.1)
pytz (2017.3)
pyzmq (16.0.3)
sankeyview (1.1.7)
setuptools (36.4.0)
simplegeneric (0.8.1)
six (1.11.0)
terminado (0.8.1)
testpath (0.3.1)
tornado (4.5.2)
traitlets (4.3.2)
wcwidth (0.1.7)
webencodings (0.5.1)
wheel (0.29.0)
widgetsnbextension (3.0.8)
Draft JSON schema is here:
https://github.com/ricklupton/sankeydata
Please comment on attributes that are missing, or parts of the data model that don't make sense to you. The schema is incomplete -- contributions to fill out the definition are welcome.
See #49
Some people have had trouble because their ipysankeywidget was out of date. Check the pip version requirements are correct and explain in the docs.
I tried to import an xlsx. file, but that did not work. I used the following line:
dataset = Dataset.from_excel('glass.xlsx')
Please explain me what I did wrong. Thanks!
The package metadata is currently saying it is Python 2 compatible, but it isn't. This needs fixing in setup.py
:
https://github.com/ricklupton/floweaver/blob/master/setup.py#L42-L43
If anyone has experience of building conda packages, or wants to give it a try, it would be great is floWeaver and ipysankeywidget could be installed using conda. I think this just involves putting together a conda recipe.
Any questions let me know by replying below or asking in the chat!
I encountered "Could not instantiate widget error" when running python script on Azure Notebooks.
Environment :
code :
!jupyter nbextension enable --py --sys-prefix ipysankeywidget
Enabling notebook extension jupyter-sankey-widget/extension...
- Validating: OK
!jupyter nbextension enable --py --sys-prefix widgetsnbextension
Enabling notebook extension jupyter-js-widgets/extension...
- Validating: OK
import pandas as pd
flow_df = pd.DataFrame(
columns=['source', 'target', 'type', 'value'],
data=[{'source': 'farm1', 'target': 'Mary', 'type': 'apples', 'value': 5},
{'source': 'farm1', 'target': 'James', 'type': 'apples', 'value': 3},
{'source': 'farm2', 'target': 'Fred', 'type': 'apples', 'value': 10},
{'source': 'farm2', 'target': 'Fred', 'type': 'bananas', 'value': 10},
{'source': 'farm2', 'target': 'Susan', 'type': 'bananas', 'value': 5},
{'source': 'farm3', 'target': 'Susan', 'type': 'apples', 'value': 10},
{'source': 'farm4', 'target': 'Susan', 'type': 'bananas', 'value': 1},
{'source': 'farm5', 'target': 'Susan', 'type': 'bananas', 'value': 1},
{'source': 'farm6', 'target': 'Susan', 'type': 'bananas', 'value': 1}])
from floweaver import ProcessGroup, Bundle, SankeyDefinition, weave
size = {'width': 570, 'height': 300}
nodes = {
'farms': ProcessGroup(['farm1', 'farm2', 'farm3', 'farm4', 'farm5', 'farm6']),
'customers': ProcessGroup(['James', 'Mary', 'Fred', 'Susan']),
}
ordering = [
['farms'],
['customers'],
]
bundles = [
Bundle('farms', 'customers'),
]
sdd = SankeyDefinition(nodes=nodes, bundles=bundles, ordering=ordering)
weave_ = weave(sankey_definition=sdd, dataset=flow_df)
# When this code run, error dislayed on browser console.
weave_.to_widget(**size, debugging=True)
Dear current & former contributors to floWeaver and related projects,
@LeoPaoli @asoliverez @AstronautFireman @rodelius @konstantinstadler @charlieselway @coenraadwestbroek @space-curiosity @simon-ritchie @Nemecsek @jfouillou @krrome @linhuiw @dvdbng @pmackay @tarikaltuncu @verhulststefanie @mmeendez8 @sildar @uipo78 @snth @dylancsumner @ilaxes @ghost @chanansh @chananshgong @timsainb @hakanjonsson @dewald-galjaard @bollwyvl
Next week floWeaver is taking part in the Mozilla Global Sprint (10-11 May). If you are interested and have time, it would be great to see you there (you can take part online or at a physical local site, for as short or long a time as you wish).
For more information: https://www.ricklupton.name/2018/post/floweaver-mozsprint-2018/
Please excuse the abuse of Github and @mentions, and feel free to pass the invitation on to anyone you think would be interested :)
Rick
Hello,
I've been having trouble with handling flows which flow backwards relative to the overall direction of the Sankey. In this case (images below), the general direction is from left to right, but 'utilities' and 'refineries' exchange flows with each other, so there is inevitably a backwards flow which floweaver does not seem to handle well. Furthermore, floweaver seems to show the wrong direction for the backwards flow (for instance the output of the refinery that goes to the utility is shown as an input to the refinery). I've tried bundling these flows in different ways, but to no avail. Below is included both a screenshot where 'Utilities' and ' Refineries' are in the same horizontal band but a different vertical band, and the reverse. Here's a copy of the code I've used thus far, would you know by any chance how to handle this?
Thank you very much for your help!
nodes = {
'Sources': ProcessGroup(['Coal', 'Crude Oil' , 'Gas', 'Solid biomass & waste', 'Renewable','Minerals', 'Scrap', 'Other Materials']),
'Refineries': ProcessGroup(['Refineries']),
'Utilities': ProcessGroup(['Utilities']),
'Industry': ProcessGroup(['Iron and steel', 'Non-ferrous metals', 'Paper, pulp and print', 'Chemical and petrochemical']),
'End': ProcessGroup(['Construction','Equipment', 'Residential','Transport', 'Commercial and public services', 'Other', 'Losses']),
}
ordering = [['Sources'],['Refineries'],['Utilities'],['Industry'],['End']]
Sources_Partitioned = Partition.Simple('process',['Coal','Gas','Crude Oil','Renewable','Solid biomass & waste', 'Minerals', 'Scrap','Other Materials'])
Industry_Partitioned = Partition.Simple('process',['Chemical and petrochemical','Iron and steel', 'Non-ferrous metals', 'Paper, pulp and print' ])
End_Partitioned = Partition.Simple('process',['Construction', 'Residential','Equipment','Transport','Commercial and public services', 'Other', 'Losses'])
nodes['Sources'].partition = Sources_Partitioned
nodes['End'].partition = End_Partitioned
bundles = [
Bundle('Sources', 'Refineries'),
Bundle('Sources', 'Utilities'),
Bundle('Sources','Industry'),
Bundle('Sources','End'),
Bundle('Refineries', 'Utilities'),
Bundle('Refineries', 'Industry'),
Bundle('Refineries', 'End'),
Bundle('Utilities', 'Industry'),
Bundle('Utilities','Refineries'),
Bundle('Utilities','End'),
Bundle('Industry','End'),
]
Types_Partitioned = Partition.Simple('type', np.unique(flows['type']))
sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition = Types_Partitioned)
weave(sdd, flows, palette = 'Paired_12').to_widget(**size)
It would be nice to have a way of annotating individual flow lines, where one could annotate the value represented by a flow line for the sake of presentations.
For instance, annotating a value on each of the lines coming out of "bananas".
Could be very cluttered in the attached example, but simpler diagrams could benefit from the inclusion of the mouse-over values statically annotated on each flow.
When a Bundle has the same source and target, you need to specify which flows to include.
TODO: make a list here of what needs to be covered
Using floWeaver with larger datasets can slow down a bit. We haven't profiled this properly yet. I'll write more about how to do this, but if anyone would like to give this ago please ask below!
We need explicit examples of how to save Sankey diagrams as png/svg for further processing.
There is an example in the quickstart tutorial (cell 8) but it's not obvious.
@coenraadwestbroek does that help? use
weave(sdd, flows).to_widget().auto_save_png('filename.png')
or
weave(sdd, flows).to_widget().auto_save_svg('filename.svg')
The SVG you can open in Inkscape or another editor to tweak the text positions etc.
When working with datasets where each records represents a single "value" it seems quite unnatural to add df["value"] = 1
although it's something quite simple it could have been done by default when the dataset hasn't got the column value.
We want a gallery of real-world examples of uses of floWeaver! If you have used it, please tell us about it.
The best way of contributing your example is to add it directly following the instructions below. You can also leave a comment below to tell us about your example and we can add it for you.
Examples are in the docs/gallery
folder of the repository. To add a new example:
docs/gallery/your-example
(maybe copy an existing one)docs/gallery/your-example/index.rst
. You should include:
docs/gallery/index.rst
:
.. toctree::
bayesian-mfa/index.rst
[...]
your-example/index.rst # ADD THIS LINE FOR YOUR EXAMPLE!
After successful (pip) installation (v2.0.0a3) I tried importing the floweaver module but got the following error. Also tried an earlier version but same error.
Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 5 2017, 02:28:52)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import floweaver
Traceback (most recent call last):
File "", line 1, in
File "/Users/mr/anaconda2/lib/python2.7/site-packages/floweaver/init.py", line 13, in
from .color_scales import CategoricalScale, QuantitativeScale
File "/Users/mr/anaconda2/lib/python2.7/site-packages/floweaver/color_scales.py", line 59
raise ValueError('No qualitative palette called {}'.format(palette)) from None
^
SyntaxError: invalid syntax
Looks like this is a issue with networkx but I can't figure out how floweaver is using this package, or how I should go about running the correct versions. Floweaver was working for me a few days ago, so I'm not sure what changed.
import pandas as pd
import numpy as np
from floweaver import *
os.chdir("/Users/XYZ/Dropbox/")
df = pd.read_csv('REProcSectoral.csv')
flows = df[['source', 'target', 'value']].dropna(axis = 0)
partition_job = Partition.Simple('source', np.unique(flows['source']))
partition_activity = Partition.Simple('target', np.unique(flows['target']))
nodes = {
'source': ProcessGroup(['source'], partition_source),
'target': ProcessGroup(['target'], partition_target),
}
bundles = [
Bundle('source', 'target'),
]
ordering = [
['source'],
['target'],
]
# These are the same each time, so just write them here once
size_options = dict(width=500, height=400,
margins=dict(left=100, right=100))
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size_options)
AttributeError Traceback (most recent call last)
<ipython-input-28-4a3c134a6064> in <module>()
29
30 sdd = SankeyDefinition(nodes, bundles, ordering)
---> 31 weave(sdd, flows).to_widget(**size_options)
~/anaconda3/lib/python3.6/site-packages/floweaver/weave.py in weave(sankey_definition, dataset, measures, link_width, link_color, palette)
36 # consistency.
37 new_waypoints, new_bundles = elsewhere_bundles(sankey_definition)
---> 38 GV2 = augment(GV, new_waypoints, new_bundles)
39
40 # XXX messy
~/anaconda3/lib/python3.6/site-packages/floweaver/augment_view_graph.py in augment(G, new_waypoints, new_bundles)
73 G = G.copy()
74
---> 75 R = len(G.ordering.layers)
76 # XXX sorting makes order deterministic, which can affect final placement
77 # of waypoints
AttributeError: 'DiGraph' object has no attribute 'ordering'
I want to model the following situation:
In the real world example provided (fruits and farms) this was handle with the Compost part, but is it possible to limit the flow and end it in a certain waypoint instead of a target?
I believe this flexibility will be much useful since there are lots of situations where some of the input is lost or even with datasets with missing data (specially in the target label).
I didn't find a proper example of this behavior anywhere in the docs.
Moved the question to sankey diagram repo
The SankeyData's to_json
method will always return Json in a widget format. The 'format'
and 'metadata'
fields are missing, and the 'order'
field is included, even if there is no "format" argument passed into the method. The output also has single quotes instead of double quotes, False
instead of false
, and uses parenthesis for 'order'
tuples instead of brackets.
I want to use it outside jupyter, if possible: I would like to create offline the sankey and then import in my page only the final image (JPG/SVG).
Is this possible?
Thanks a lot for this great project. I've been using it to display customer flow dynamics and it works like a treat.
I was wondering if it is possible to display the value for each of the partitions in each node alongside the title? For example in your Basic diagram example I would like to see something like farms (46)
--> customers (46)
and the same for each section of the more complicated versions so that the value/weight of the connection is immediately clear. This would be particularly useful when exporting images.
I can calculate the value/weights and dynamically change the title but was wondering if there is an easier way to do this.
When using the auto_save_png
and auto_save_svg
the image is only saved when it is displayed in the notebook. If a ;
is added to that line to avoid showing duplicates (in case both svg and png are requested), the image with the ;
is not generated.
I believe it should be a default feature to first save and then show the image in the notebook instead of saving only when is displayed. This would also allow generating images without explicitly opening the notebook (through nbconvert for example).
See #49
The mybinder binding is great for experimenting. However, it would be quite useful to also be able to quickly run the quickstart locally. Could you provide the required csv file in the example folder?
Hi,
You can make the badge (e.g. zenodo, but they are also available for mybinder, pypi version) clickable and interactive. See https://github.com/konstantinstadler/pymrio_article, https://github.com/konstantinstadler/pymrio, https://github.com/pandas-dev/pandas
Hi MozSprint contributors!
@apw10 @dnuka @abmakko @QuLogic @neiljp
Thanks again for getting involved during the sprint and giving up your time to work on the project. If you have a few minutes more, it'd be really helpful to hear how you found that experience and understand what to do differently next time. If you don't mind giving some feedback, please fill out this quick form here:
https://goo.gl/forms/XqmQ9eNNtNTEkrD93
Thanks,
Rick
Here are some welcome (but easy!) contributions:
When working with pre-processed datasets it may be the case where there is no column name source
or target
or value
and even worse, they may be present with a completely different meaning!
I suggest to add optional parameters to set the name of the columns corresponding to this function, this way one shouldn't be "renaming" columns just to fit the library specs.
First off, thanks for the cool library. I am having some problems getting this to work on Ubuntu 16.04. When I execute the quickstart notebook (after following installation directions and installation of the requirements.txt libraries), the diagrams do not visually display, instead only some text is displayed (see screenshot below.)
I'm not sure what the problem is and have tried various virtualenv and conda setups. The current configuration is:
python --version
Python 3.6.2 :: Anaconda, Inc.
pip freeze
alabaster==0.7.11
attrs==18.1.0
Babel==2.6.0
backcall==0.1.0
bleach==2.1.3
certifi==2018.4.16
chardet==3.0.4
cycler==0.10.0
decorator==4.3.0
defusedxml==0.5.0
docutils==0.14
entrypoints==0.2.3
floweaver==2.0.0a3
html5lib==1.0.1
idna==2.7
imagesize==1.0.0
ipykernel==4.8.2
ipysankeywidget==0.2.4
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
jedi==0.12.1
Jinja2==2.10
jsonschema==2.6.0
jupyter-client==5.2.3
jupyter-core==4.4.0
kiwisolver==1.0.1
MarkupSafe==1.0
matplotlib==2.2.2
mistune==0.8.3
nbconvert==5.3.2.dev0
nbformat==4.4.0
nbsphinx==0.3.3
networkx==1.11
notebook==5.5.0
numpy==1.14.5
packaging==17.1
palettable==3.1.1
pandas==0.23.3
pandocfilters==1.4.2
parso==0.3.0
pexpect==4.6.0
pickleshare==0.7.4
prompt-toolkit==1.0.15
ptyprocess==0.6.0
Pygments==2.2.0
pyparsing==2.2.0
python-dateutil==2.7.3
pytz==2018.5
pyzmq==17.0.0
requests==2.19.1
Send2Trash==1.5.0
simplegeneric==0.8.1
six==1.11.0
snowballstemmer==1.2.1
Sphinx==1.7.5
sphinxcontrib-websupport==1.1.0
terminado==0.8.1
testpath==0.3.1
tornado==5.0.2
traitlets==4.3.2
urllib3==1.23
wcwidth==0.1.7
webencodings==0.5.1
widgetsnbextension==3.2.1
I am tried chrome (Version 65.0.3325.181 (Official Build) (64-bit)) and firefox (60.0.2 64-bit). I'm using Ubuntu 16.04.4 LTS.
Hi,
I think the notebooks provided in the example folder are out of data.
When I install floweaver as explained in the docs, I do not get a module 'floweaver.jupyter' as required in the example notebooks.
Hi, I can use weave().to_widget().auto_save_png() to save a chart when running from a jupyter notebook
However, I have an application that runs multiple reports from a CLI and I want to include this one. I tried the same .auto_save_png() command as in jupyter, and it doesn't save a png. It doesn't do anything, not even throw an error.
Is it because I should be using a different widget outside of jupyter?
I have a pandas dataframe, and I'm trying to get a sankey diagram out of it.
Here's the excerpt of code, copied mostly from the quickstart tutorial.
from ipysankeywidget import SankeyWidget
from floweaver import *
import pandas as pd
flowDf = lostWonDf.sort_values(['LeadSource','WonCount'], ascending=[True,False]).groupby(['LeadSource','WonCount','Type'], sort=False).agg({
'AmountGBP':'sum'
}).reset_index()
flowDf.rename(columns={'LeadSource':'source','AmountGBP':'value','Type':'type'}, inplace=True)
flowDf['target'] = flowDf.WonCount.apply(lambda x: 'Won' if x == 1 else 'Lost')
flowDf.drop('WonCount', axis=1, inplace=True)
SankeyWidget(links=flowDf.to_dict('records'))
I get this error when I run the cell.
TypeError Traceback (most recent call last)
in ()
13 #.unstack().reset_index().set_index('LeadSource')
14
---> 15 SankeyWidget(links=flowDf.to_dict('records'))
16
17 size = dict(width=570, height=300)TypeError: wrap() got an unexpected keyword argument 'links'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.