danbernier / wordcram Goto Github PK

open-source word clouds for Processing

License: Apache License 2.0

Ruby 3.39% Processing 10.65% Java 43.31% HTML 42.65%

wordcram's Introduction

WordCram lets you generate word clouds in Processing. It does the heavy lifting -- text analysis, collision detection -- for you, so you can focus on making your word clouds as beautiful, as revealing, or as silly as you like.

Make a Word Cloud

import wordcram.*;

// Set up the Processing sketch
size(1000, 600);
colorMode(HSB);
background(230);

// Make a wordcram from a random wikipedia page.
new WordCram(this)
  .fromWebPage("https://en.wikipedia.org/wiki/Special:Random")
  .withColors(color(30), color(110),
              color(random(255), 240, 200))
  .sizedByWeight(5, 120)
  .withFont("Copse")
  .drawAll();

You can control where words appear, what angle they're at, their font, their color, and how they're sized.

Install

Installing WordCram is simple, like any standard Processing library.

How do I use this thing? Show me examples!

You can check out the tutorials and examples at http://wordcram.org. You can watch WordCram in action, on OpenProcessing: popular baby names, and the U.S. Constitution.

But the best way to see WordCram in action is to install it, and look at the examples under File > Examples > Contributed Libraries > WordCram.

Problems?

If you're running into problems, see the FAQ, or read the javadocs.

If a question has you stumped, and the FAQ is no help, send me a note. My email account is 'wordcram', and I use gmail.

Want a better WordCram?

WordCram is open-source under the Apache 2 license. That means you can help make it better! I try to keep the source clean so it's easy to find your way around. There's a laundry list of things to do, and it's easy to build WordCram from source.

wordcram's People

Contributors

Stargazers

Watchers

Forkers

wdmchaft jwiese saidimu suranadivya cezanne75 simpsus stefaniasini ff6347 markbil kurron fusiwei339 landongn michellecortese iluoyi firefaith garybentley vibou nibblonian prateek27 pierre-haessig mindnervestech qudiantech uikit0 alexgraffeocohen samuelwilliams gmit mikeaddison93 droidrockgirl lordumber rakeshnagar allendaicool ianchia ozayn raziyeh-fa skunkworks002 tma-comms yiqideren cwickniss luoyiqi miklobit monkstone marslabtron ashleedawg ferdnand nsakalle gianni996 ljlozano fdp09 lymanzhang ethergraf sidalsoft sgtcalamari michaelpaulukonis

wordcram's Issues

Make it easy to re-render a wordcram, without re-parsing all the text

I'm thinking something like this:

def draw_all
  reset_words
  while has_more
    draw_next
  end
end

def reset_words
  words.each(&:reset)
end

(...except in java, not ruby.)

We want a way to reset the progress when you're drawing incrementally (call reset_words when you're done), and a way to reset it if you just call draw again.

shape drawing problems

Comments are parsed with GitHub Flavored Markdown

Hi,
I've been trying variations of code and have not been able to get shape drawing to work. I am using the batman example. All files referenced have been checked and exist in the current folder. If I have the following code:

import wordcram.*;
import java.awt.Shape;

WordCram wc;
PImage cachedImage;

void setup() {
size(500, 500);

background(#110000);

PImage image = loadImage("batman.gif");
Shape imageShape = new ImageShaper().shape(image, #000000);
ShapeBasedPlacer placer = new ShapeBasedPlacer(imageShape);

//using fromTextFile, the text file has the text 'this is just' in it and nothing else right //now.

//I have used the text file to generate a wordcram.

//I have also tried using fromWebPage("http://en.wikipedia.org/wiki/Batman"). with the //same result
wc = new WordCram(this).
fromTextFile("processing.txt").
withPlacer(placer).
withNudger(placer).
sizedByWeight(12, 120).
withFont("Futura-CondensedExtraBold").
withColor(#DBC900);

while (wc.hasMore()) {
wc.drawNext();
}

cachedImage = get();

}

void draw() {
image(cachedImage, 0, 0);
}

it ends up making a big black window with nothing in it.

I do get the message that cue.language guesses my text is in English.

If I remove the withPlacer(placer).
withNudger(placer). then I get the text Just drawn in #DBC900 Futura-CondensedExtraBold as expected.

Limit minShapeSize to 1

The BBTreeBuilder can't handle shapes smaller than 1 pixel, because it creates bounding box trees that go down to 1 pixel across - so if the Shape is smaller than 1 pixel, it'll get a NullPointerException on the Shape bounds.

Just limit minShapeSize to 1.

Broken link to block on FAQ

The FAQ (https://github.com/danbernier/WordCram/wiki/FAQ) links to http://wordcram.org/2013/03/02/why-dont-all-my-words-show-up/ which 404s.

Combined Image and BBTree

Not sure if this is helpful to you, but I have code for combining the BBTree and ImageShaper in my fork. It's in C# though...

I'm not sure I like needing to pass the BBTree for the word into the Nudger/Placers, but this does allow nicer collisions (angled words fit into the shape better rather than using the rectangle bounding box for collision), and had a nice speed boost over the default .Net "shape" collision stuff. I don't know what the performance of the Java shapes are, so it may not be helpful there.

Also there are unit tests for the BBTree.Contains method which may be helpful.

Update example sketches that still use old stop-words methods

Possible to run in SWT instead of Swing?

Any thoughts on how you might set things up to work in SWT instead of Swing?

The current position of a Word is not visible to Filters and Placers/Nudgers

While a word is being placed, the EngineWord tracks the position and nudges the word until the position is accepted or rejected by the engine.

Placer/Nudger are part of this process. They were/are designed to not know anything of the words position.
With shapes coming into play, I would like to know where to word currently "is".

The Word.getTargetPlace method returns the place set by the placer, not the currentLocation (from engineword) influenced by the nudger. As every word is nudged by the engine once before being tested, Word.getTargetPlace never returns the place the word will be drawn at.

The Filter interface is "only" using the Word to filter. And because of the above, the Filter can only check if the Word was placed inside the mask/filter by the palcer. If the nudger has nudged the word out of the mask, the filter can't detect it.

My suggestion would be to either let Word.getTargetPlace return the currentLocation or have a Word.currentLocation method. I do not know the side effects of the former, but the latter complicates the user-visible interface further.

Timeline for support of server-based processing.js?

Hi Dan,

Thanks for this great project, had lots of fun so far. It's rather a question than an issue. Wondering by when processing.js will be supported within WordCram. Any details to the timeline/implementation as of yet?

Thanks for short response...

Doucor

Support for Processing 3.*; "ClassNotFoundException: processing.core.PGraphicsJava2D"

When running an ordinary sketch in 3.0b5, you quickly get this error: "ClassNotFoundException: processing.core.PGraphicsJava2D"

Hat-tip to DarthPeet for reporting it first: https://twitter.com/DarthPeet/status/636637811089764352

Issues with complex unicode characters with fonts

I tried to do some non-english words and succesfully loaded a font with the required unicode ligatures. I confirmed the font loading and it has the required ligatures. It loads correctly in the IDE and in other checks.

बुद्धि माया

It is missing the last ligatures, not sure how to debug this.

cue.lang erroneously chooses Arabic

I've noticed that cue.language will often get the language wrong, which isn't a problem as I use my own stop words, however if it decided that the text is Arabic, which it sometimes does then all the text is reversed.

ArrayIndexOutOfBoundsException when loading a non-ASCII text source

Sometimes, when loading a text source with non-ASCII characters, you'll get this exception:

Exception in thread "Animation Thread" java.lang.ArrayIndexOutOfBoundsException: 0
    at wordcram.WordSorterAndScaler.sortAndScale(WordSorterAndScaler.java:27)
    at wordcram.WordCram.getWordCramEngine(WordCram.java:727)
    at wordcram.WordCram.drawAll(WordCram.java:772)
    ...

The problem is that cue.language, for different reasons, guesses that all your words are stop-words. (Maybe your text source is encoded in Unicode, maybe all your words are single letters.) AFAICT, there's not much I can do to directly solve this problem, except improve the error messages. Which is what this ticket is for.

Processing-3.3 Compatability

@danbernier I don't know when it happened but processing.core.PGraphicsJava2D has been changed to processing.awt.PGraphicsJava2D so the current library doesn't work. However if you subsitute the latest processing core.jar into your build and change ocurrences of processing.core.PGraphicsJava2D to processing.awt.PGraphicsJava2D then it works OK (and build). For how much longer who can tell, as you seem to have noticed vanilla processing is moving away from awt.
PS I'm thinking of offering a WordCram gem for use with JRubyArt / propane, which I would anticipate building with polyglot maven (ruby).

When redrawing the sample firstNameusingWordPreset, word placing don't avoid collision

Using Processing 2 on mac os,

when clicking on the outputwindow, the wordcram is redrawn, however some names are placed on top of each other.

I assume this is related to the instanciation of WordCram:

wc = new WordCram(this);

I believe the "this" passed as argument keeps artifact values that disrupt the new wordcram.

Download links seem broken

Could someone pelase provide a well formed library folder for manual installation and use in the Processing IDE?

I have downloaded this repo but I lack the skills to produce a working installation on my sketchbook/libraries environment.

Support for processing 2.0

Hi Dan,

Had to do a very minor commit to support processing 2.0:

hellonico/wordcram@09c3d7c

If you can incorporate something similar to your code it'd be super awesome.

Cheers,

[Contrib] A Colorer taking the color from an image and the position of the word

The idea is to have two source images. One solid one for the ShapeBasedPlacer to place the words in, and another colored one for the color of the words, in that way, one can create shapes by color. Works best with crams with lots of evenly sized words:

It currently has two modes, one which takes only the renderedLocation of the image and a slower one that counts all the covered pixels in the image and takes the most-covered color.

https://github.com/simpsus/WordCram/blob/sketches/src/contrib/ImageBasedWordColorer.java

Exception when maxNumberOfWords > actual number of words

May want to add the following to WordCramEngine::wordsIntoEngineWords() ~ line 69

            if(words.length < maxNumberOfWords)
                maxNumberOfWords = words.length;

Solves the bug where you specify a larger maxNumberOfWords parameter than the number of words that are available.

(via http://twitter.com/#!/skhanyz/status/129761090199498752)

Processing3.0.2 compatibility

This happens as another libraries, but "super-easy" installation won't work well on Processing3.0.2. Not sure why but maybe something in directory name rules?

NullPointerException in WordCounter.isStopWord

I have a text file that produces this in the latest release as well as in trunk:

Exception in thread "Animation Thread" java.lang.NullPointerException
at wordcram.WordCounter.isStopWord(WordCounter.java:90)
at wordcram.WordCounter.shouldCountWord(WordCounter.java:76)
at wordcram.WordCounter.countWords(WordCounter.java:61)
at wordcram.WordCounter.count(WordCounter.java:54)
at wordcram.WordCram.getWordCramEngine(WordCram.java:726)
at wordcram.WordCram.drawAll(WordCram.java:773)
at twitterwords.setup(twitterwords.java:62)
at processing.core.PApplet.handleDraw(PApplet.java:1608)
at processing.core.PApplet.run(PApplet.java:1530)
at java.lang.Thread.run(Thread.java:680)

I was able to bypass this error with a check for null, although I don't know if this is your intended behavior.

private boolean isStopWord(String word) {
if (cueStopWords == null) return false;
return cueStopWords.isStopWord(word) ||
extraStopWords.contains(word.toLowerCase());
}

Redraw

Hi,

I am working on trying WordCram in a medical context and I need more control over Skipped Words. I know I can see the words via getSkippedWords(), but I'd like to ask WordCram to try again (e.g. a Redraw() feature that simply takes another crack at it.) I currently have things working with a "Generate" button and often the skipped words are able to be reintroduced after a few tries - just because words were skipped on a first try, it doesn't mean they get skipped on the 2nd.

This "regenerate" procedure is harder to do programatically; I can see if there are skipped words but I'm struggling with getting it to try again until there are no skipped words.

Alternatively, a feature that disallowed word skipping in its entirety would be great... Perhaps all words could be sized-down to accommodate?

This note assumes I'm not missing some other obvious solution to this problem.

Exception in Processing 2.0: NoClassDefFoundError: processing/core/PGraphics2D

hi….i installed processing 2.0b3. and using wordcram 0.5.1……but none of the word cram examples are running in processing.
getting the following exception

Exception in thread “Animation Thread” java.lang.NoClassDefFoundError: processing/core/PGraphics2D
at wordcram.WordCramEngine.(Unknown Source)
at wordcram.WordCram.getWordCramEngine(Unknown Source)
at wordcram.WordCram.drawAll(Unknown Source)
at helloworld.setup(helloworld.java:47)
at processing.core.PApplet.handleDraw(PApplet.java:2103)
at processing.core.PGraphicsJava2D.requestDraw(PGraphicsJava2D.java:190)
at processing.core.PApplet.run(PApplet.java:2006)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: processing.core.PGraphics2D
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
… 8 more

But, when i tried running the same word cram file on processing 1.5.1 it works.
Its not recognizing word cram.
Please help.

Idea: Alternative Placing of Words in a Shape

The current Implementation finds a place to place the word agnostic of any placed words, then checks if it collides with any placed word, if so, nudges the word agnostic of any word placed and checks again if it collides with any word placed.

If shapes are in place, one approach could be:
We start with the whole shape.
Search for a free place, agnostic of any words placed
place the word
remove the exact shape of the placed word from the shape
continue with the next word and the remaining shape.

Questions I have:
You BBTree implementation is blazing fast, I am sure. Does this approach have a chance to be faster for the given scenario?
Can we substract the exact shape of the glyphs from the area of the shape?
Would such an implementation have any chance of getting into wordcram-contrib?

Make the Graphics pluggable to enable direct SVG export

The apache Batik SVG lib (http://xmlgraphics.apache.org/batik/) contains an implementation of the Java Graphics Interface that draws directly to an SVG file.

If I could plug this Graphics into Wordcram, I could directly output to an SVG file and not to a PDF File. The PDF workflow is painful atm:

import pdf in inkscape (slow!!), ungroup the stuff (even slower!!), delete the background layers ==> Every word is the in one path obejct ==> many!

Move downloads off github

To S3? Google drive? Figure it out.

Info:
https://github.com/blog/1302-goodbye-uploads
http://aws.amazon.com/free/

WordCram Fails in "Android" mode

When running the Hello World in "java" mode, no problems at all, but the following exception is thrown for "android" mode, probably because java.awt.font.FontRenderContext isn't included in Android.

FATAL EXCEPTION: Animation Thread
java.lang.NoClassDefFoundError: java.awt.font.FontRenderContext
at wordcram.WordShaper.(WordShaper.java:29)
at wordcram.WordCram.getWordCramEngine(WordCram.java:740)
at wordcram.WordCram.drawAll(WordCram.java:781)
at processing.test.helloworld.helloworld.setup(helloworld.java:40)
at processing.core.PApplet.handleDraw(Unknown Source)
at processing.core.PGraphicsAndroid2D.requestDraw(Unknown Source)
at processing.core.PApplet.run(Unknown Source)
at java.lang.Thread.run(Thread.java:856)

Not knowing much about the structure of WordCram, how much of the code would be reusable for a standalone (non-processing-based) Android word cloud library?

[Contrib] A Standard Console Output Observer

https://github.com/simpsus/WordCram/blob/sketches/src/contrib/StandardConsoleObserver.java

It has a level which configures verbosity, it can give information about:

progress through skipped and drawn words
overall performance through overall timestamps
results through a summary

Default space between words to 1

Originally it was 0, but this makes for kind of crummy pictures.

1 is a nicer default.

Array of String not correctly joined in WordCram.fromTextString(...)

I embedded your code in a larger Java program and found this bug, I am quite certain it will behave the same when executed in Processing. Basically, all Strings are merged into one giant word, without spaces. I might be the case as well with the fromHTML function, did not test this.

Diff to solve this:

365c365
<         return fromText(new Text(PApplet.join(text, "")));

---
>         return fromText(new Text(PApplet.join(text, " ")));

The class processing.core.PGraphicsJava2D does not exist

I get this error:
ClassNotFoundException: processing.core.PGraphicsJava2D
And for no apparent reason. The class obviously exists, yet it says it does not. I installed the latest version, and ran the helloworld example.

It's too hard to make a WordCram from multiple sources

It should be as easy as this:

new WordCram(this)
  .fromWebPage("http://example.com")
  .fromTextFile("~/texts/book.txt")
  .drawAll();

All the text source methods should concatenate, not overwrite.

WordCram is not drawing all the words.

Hi Danbernier,

I am facing below issue with "WordCram 0.5.6" , "Processing 2.0.b7- windows 32".

Issue : Wordcram is missing out few words. If you observe below code, I am sending 15 words to WordCram, but it is drawing 12 to 13 words in all the runs.

Please help me with resolution.

package wordcram;

//import controlP5.Tooltip;

import java.awt.BorderLayout;
import java.awt.Color;

import java.awt.Container;
import java.awt.Dimension;
import java.awt.Graphics;
import java.awt.Graphics2D;
import java.awt.Point;
import java.awt.Rectangle;
import java.awt.RenderingHints;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.awt.event.MouseEvent;
import java.awt.event.MouseListener;

import java.awt.event.MouseMotionListener;

import javax.swing.JFrame;

import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.JScrollPane;
import javax.swing.SwingUtilities;

import javax.swing.UIManager;

import processing.core.PApplet;
import processing.core.PVector;

public class WordCramApplet extends PApplet {
WordCram words;

public WordCramApplet() {

}

public void setup() {
size(1000,1000);
words = new WordCram(this);

}

public void draw() {
background(200);
Word[] w = new Word[15];
w[0] = new Word("Hello",5);
w[1] = new Word("World",2);
w[2] = new Word("Timely",3);
w[3] = new Word("Incomplete",5);
w[4] = new Word("Good",8);
w[5] = new Word("Excellent",10);
w[6] = new Word("Bad review",2);
w[7] = new Word("Average",5);
w[8] = new Word("Very bad",1);
w[9] = new Word("Optimal",7);
w[10] = new Word("Nice work",9);
w[11] = new Word("Very low",3);
w[12] = new Word("Outstanding",11);
w[13] = new Word("Highly rated",9);
w[14] = new Word("Imperfect",4);

Color c = new Color(139,0,0);
System.out.println(" c.getBlue() : "+c.getBlue());
for ( int i=0; i < w.length; i++) {
if(w[i].weight>=5.0)
w[i].setColor(c.BLUE.getRGB());
if(w[i].weight<5.0)
w[i].setColor(c.RED.getRGB());
if(w[i].weight>10.0)
w[i].setColor(c.GREEN.getRGB());
}
this.words.fromWords(w);

this.words.drawAll();

PVector position = w[2].getRenderedPlace();

noLoop();
System.out.println(" this.height : "+this.height);
}
}

public static void main(String args[]) {
JFrame f1 = new JFrame();
f1.setBounds(100, 100, 1000, 1000);
WordCramApplet p1 = new WordCramApplet();
p1.init();
f1.add(p1);
f1.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
f1.setVisible(true);
p1.destroy();
}

}

Calling Word.setPlace, and using a ShapeBasedPlacer, throws a NullPointerException when nudging

ShapeBasedPlacer sets "height" and "width" properties on each Word it places, and uses those when it nudges the words.

Calling Word.setPlace means the Placer is never called, so it never sets the "width" and "height" properties - so when the Nudger reaches for them, they're null.

How to include Stop words?

I have read the java doc, it says "WordCram uses cue.language to remove common words from the text by default". My question is that is there a method to include these words easily?

Thank you very much.

Best,
Ning

[Contrib] Tango Colors

The tango color scheme
http://tango.freedesktop.org/static/cvs/tango-art-tools/palettes/Tango-Palette.png
is a nice standard set of colors.

An enum containing all the colors:

https://github.com/simpsus/WordCram/blob/sketches/src/contrib/TangoColorEnum.java

A Colorer Iterating through these words based on some strategies:

https://github.com/simpsus/WordCram/blob/sketches/src/contrib/TangoWordColorer.java

A Colorer that uses the temperature levels of the colors in relation to the word weight:

https://github.com/simpsus/WordCram/blob/sketches/src/contrib/TangoHeatMap.java

A Colorer that distributes shades of grey

With the option to give more "weight" to the heavier ones, that is more black when the weight is higher.

Maybe the current Colorers can do this and I am too silly to use them..

Can't get a Word's Shape, or bounds

This would've been useful for making a WordCram render to an arbitrary shape.

I can't see a reason to not have them on there.

(Plus, it lets us get away from the extra-long signature of WordPlacer, with word-width and word-height. But we'll see about that.)

svn repo -> git repo, in docs

In the javadoc overview (the "next steps" section)
In the README, which is rendered on http://danbernier.github.com/WordCram/

And make sure the javadocs are being updated & uploaded as part of the release.

ShapeBasedPlacer fromImage does only work with black

Both as a reminder and a cry for help this issue is.
I have no idea why this works only for black.
Sometimes it will create an outline of the shape, but nothing fits in there. Most of the time there will be nothing.

Cannot find a class or type named "Shape"

If I have the following
import wordcram.*;

void setup() {
size(700, 400);
background(255);
PImage image = loadImage("../twitter_tools/images/robot1_processed.png");
Shape imageShape = new ImageShaper().shape(image, #000000);
..
}
I get the error Cannot find a class or type named "Shape"

Using processing 2.0 - should I switch?

Thanks,
Bryan Rasmussen

Make the Resulting Shape/Area accessible/pluggable

If the shape of WordCram could be accessed and given upfront, we could do two (or more) consequtive runs on the same output. With the new Color-aware shape based placer we would be able to something like this:

http://browse.deviantart.com/art/Harley-Quinn-In-Text-209380962

Arabic text is not displayed properly.

Wordcram works well with many languages but when it comes to Arabic, it displays it from left to right whereas it should be from Right to Left. I hope someone looks into this issue.

Allow XPath/CSS selectors for HTML

Sometimes the whole HTML page is too much - especially since it can fool cue.language into thinking you're using a different language.

Wikipedia is a good example: the list of languages down the bar make cue.lang think your page is in Arabic.

And sometimes you don't want footers, ads, navigation, etc in your word cloud.

So let the pages be filtered by an XPath or CSS selector, to narrow the content. It can default to the whole document.

Incompatible with Processing 2.0.6, 2.0.7? Investigate

i have used the library with processing 1.5.1 and everything was ok
now i have tried to use it with processing 2.0 6 or 7 and it throws a java exception class not found error for the PGraphics class at the begining of draw() ()if wc.hasMore() in your examples…
thanks

[Contrib] An importer for csv files

https://github.com/simpsus/WordCram/blob/sketches/src/contrib/CSVImporter.java

can be given a csv file and two indices. The csv file will then be parsed and a Word[] returned. The csv parsing is handled by the lightweight opencsv
http://opencsv.sourceforge.net/
https://sourceforge.net/projects/opencsv/
for which there seems to be a fork at
https://code.google.com/p/opencsv/
which I did not use

getSkippedWords() returns null when there are no skipped words

It should return a new Word[0] instead.

Create Words from a set of ... "words"

Usecase: The cram shall be filled with the same few words, but in different sizes. The normal Wordcram methods (fromText-likes) will eliminate the duplicate words and the resulting cram will show each word only once.

So it would be nice to have a helper class which yu can pass a word and how often it shall appear with what weights and this class build the Word[] for you.

The applications go mostly towards ShapeBased crams where you want to fill a shape with a distinct set of words (the age or silhouette with the name of the person or such things).

Allow "by hand" enabling of right-to-left text rendering.

I've built a solution based on the rel-060 branch and its support for RTL. Our system has information about the text being processed, including if it needs RTL enabled. What I did was to add a simple method on the WordCram object that sets the renderOptions.rightToLeft field to true (I'd send you a pull-request but I'm struggling with GitHub to only pick up that one change). Just prior to calling draw we enable the RTL flag if needed. I'm thinking that others might be in a similar situation and could make use of this functionality. Please think about adding it to a future release.