beatgeek / tweetparser Goto Github PK
View Code? Open in Web Editor NEWThis project forked from neilkod/tweetparser
Parses raw twitter JSON from stdin using python. I'm only extracting a few fields for quick processing in PIG. Still a lot of work to do. Currently, it extracts id, timestamp, client program, author, and tweet text. I'll add more fields such as geo, if requested. The filenames for the output and bad tweets are currently hardcoded for my testing. I'll make this more dynamic shortly.
Home Page: http://www.neilkodner.com