Giter Site home page Giter Site logo

data_hacks's People

Contributors

aripollak avatar elazarl avatar erikvanzijst avatar genevera avatar jehiah avatar lorrin avatar mateidavid avatar mjschultz avatar ojilles avatar phillipkent avatar poikilotherm avatar randyau avatar seanoc avatar tnxbutno avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data_hacks's Issues

should ceil the scale in bar_chart.py

Around line 54.
scale = int(float(max_value) / value_characters)

Should be
scale = int(math.ceil(float(max_value) / value_characters))

That way values like 1.4 become 2 instead of 1, which helps keep the tick marks from line wrapping.

Sort by values flag does not work in bar_chart.py

Need to modify bar_chart.py as follows:

if options.sort_values:
    data = [[value,key] for key,value in data.items()]
    data.sort(reverse=True)
else:
    # sort by keys
    data = [[key,value] for key,value in data.items()]
    data.sort()
    data = [[value, key] for key,value in data]
format = "%" + str(max_length) + "s [%6d] %s"
for value,key in data:
    print format % (key[:max_length], value, (value / scale) * "*")

histogram.py errors if all values are equal

histogram.py errors if all values are equal. e.g.

vagrant@localhost:~/ws$ echo -e '16\n16\n16\n' | ~/ws/bitly/data_hacks/histogram.py
Traceback (most recent call last):
  File "/home/vagrant/ws/bitly/data_hacks/histogram.py", line 300, in <module>
    options.agg_key_value), options)
  File "/home/vagrant/ws/bitly/data_hacks/histogram.py", line 150, in histogram
    raise ValueError('max must be > min. max:%s min:%s' % (max_v, min_v))
ValueError: max must be > min. max:16 min:16

New version of this package

Hello devs,

This package has been a part of my workflow for several years now, mainly since I spend most of my time on the command line. I see its not really maintained anymore. I would like to take responsibility for it if no one minds, mostly so I can get it working with python 3 (and have this version on PyPI) and add some features.

If this sounds okay, I propose one of three ways to transition this (in order of my preference):

  1. I fork the package and keep the same name. In this scenario, I'd like access to the data_hacks package on PyPI so I can upload the python 3 version and keep it up-to-date.
  2. I fork the package and use a new name (e.g. data_hacks_3 or data_cli). This would be more like making my own package with this package as a starting point. In this scenario, I would just need your permission, I'll handle the rest.
  3. I offer my support to the main fork of the package. I think this solution would cause the most overhead for you all, so that's why I've listed this solution last.

Let me know which sounds best for you. Thanks!
Ewen

Error when piping output into another program

On os x, I had a list in my clipboard, did:
pbpaste | bar_chart.py -v |head -n 30

The chart works fin when I do not pipe into head, but I wanted to only show the top 30 items. Piping into head does limit to 30 rows, as expected, but at the end I also see this error output printed:

close failed in file object destructor:
Error in sys.excepthook:

Original exception was:

Python 3 compatibility?

Hi folks. Planning Python 3 compatibility for this awesome tool? I'm getting SyntaxErrors, looks like from print statements missing parens.

<3

plotting script cannot handle missing values

if uniq -c produces an empty count such as

235054 
3629 0
136189 1
18418 10
 258 100
cat results/train_4.txt | bar_chart.py -a  --sort-keys
Traceback (most recent call last):
  File "/Users/aub3/portenv/bin/bar_chart.py", line 114, in <module>
    run(load_stream(sys.stdin), options)
  File "/Users/aub3/portenv/bin/bar_chart.py", line 52, in run
    data[kv[1]] += value
IndexError: list index out of range


histogram.py switch for logarthmic buckets

When I'm having many outliers, I often get histograms like:

$ time (./a.out 100000|histogram.py -b 10)
# NumSamples = 100000; Min = 237.00; Max = 37599.00
# Mean = 321.560610; Variance = 64719.622326; SD = 254.400516; Median 303.000000
# each ∎ represents a count of 1333
  237.0000 -  3973.2000 [ 99993]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 3973.2000 -  7709.4000 [     0]: 
 7709.4000 - 11445.6000 [     1]: 
11445.6000 - 15181.8000 [     0]: 
15181.8000 - 18918.0000 [     0]: 
18918.0000 - 22654.2000 [     0]: 
22654.2000 - 26390.4000 [     3]: 
26390.4000 - 30126.6000 [     1]: 
30126.6000 - 33862.8000 [     0]: 
33862.8000 - 37599.0000 [     2]: 

Not helpful. I see, I have outliers, but how is the distribution inside the first bucket? It is the most important one, and I want to understand what's there.

What I want is, logarithmic histogram, like dtrace shows. Double the distance at every buckets.

Can I send a PR?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.