cloudera / cdh-twitter-example Goto Github PK

Example application for analyzing Twitter data using CDH - Flume, Oozie, Hive

Java 100.00%

cdh-twitter-example's Introduction

Analyzing Twitter Data Using CDH

This repository contains an example application for analyzing Twitter data using a variety of CDH components, including Flume, Oozie, and Hive.

Getting Started

Install Cloudera Manager 4.8 and CDH4

Before you get started with the actual application, you'll first need CDH4 installed. Specifically, you'll need Hadoop, Flume, Oozie, and Hive. The easiest way to get the core components is to use Cloudera Manager to set up your initial environment. You can download Cloudera Manager from the Cloudera website, or install CDH manually.

If you go the Cloudera Manager route, you'll still need to install Flume manually.
Install MySQL

MySQL is the recommended database for the Oozie database and the Hive metastore. Click here for installation documentation.

Configuring Flume (Cloudera Manager path)

Build or Download the custom Flume Source

A pre-built version of the custom Flume Source is available here.

The flume-sources directory contains a Maven project with a custom Flume source designed to connect to the Twitter Streaming API and ingest tweets in a raw JSON format into HDFS.

To build the flume-sources JAR, from the root of the git repository:
```
$ cd flume-sources  
$ mvn package
$ cd ..  
```
This will generate a file called flume-sources-1.0-SNAPSHOT.jar in the target directory.
Add the JAR to the Flume classpath

Copy flume-sources-1.0-SNAPSHOT.jar to /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar and also to /var/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar, just to be sure (actually, refer to Plugin Directories in Cloudera manager->flume->configuration->Agent(Default)). If those places don't exist, sudo mkdir them.
Configure Flume agent in Cloudera Manager Web UI flume

Go to the Flume Service page (by selecting Flume service from the Services menu or from the All Services page).

Pull down the Configuration tab, and select View and Edit.

Select the Agent (Default) in the left hand column.

Set the Agent Name property to TwitterAgent whose configuration is defined in flume.conf.

Copy the contents of flume.conf file, in its entirety, into the Configuration File field. -- If you wish to edit the keywords and add Twitter API related data, now might be the right time to do it.

Click Save Changes button.

Setting up Hive

Build or Download the JSON SerDe

A pre-built version of the JSON SerDe is available here.

The hive-serdes directory contains a Maven project with a JSON SerDe which enables Hive to query raw JSON data.

To build the hive-serdes JAR, from the root of the git repository:
```
$ cd hive-serdes    
$ mvn package  
$ cd ..  
```
This will generate a file called hive-serdes-1.0-SNAPSHOT.jar in the target directory.

Create the Hive directory hierarchy

 $ sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse   
 $ sudo -u hdfs hadoop fs -chown -R hive:hive /user/hive  
 $ sudo -u hdfs hadoop fs -chmod 750 /user/hive  
 $ sudo -u hdfs hadoop fs -chmod 770 /user/hive/warehouse

You'll also want to add whatever user you plan on executing Hive scripts with to the hive Unix group:

$ sudo usermod -a -G hive <username>

Configure the Hive metastore

The Hive metastore should be configured to use MySQL. Follow these instructions to configure the metastore. Make sure to install the MySQL JDBC driver in /var/lib/hive/lib.

Create the tweets table

Run hive, and execute the following commands:

 ADD JAR <path-to-hive-serdes-jar>;
 
 CREATE EXTERNAL TABLE tweets (
   id BIGINT,
   created_at STRING,
   source STRING,
   favorited BOOLEAN,
   retweeted_status STRUCT<
     text:STRING,
     user:STRUCT<screen_name:STRING,name:STRING>,
     retweet_count:INT>,
   entities STRUCT<
     urls:ARRAY<STRUCT<expanded_url:STRING>>,
     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
     hashtags:ARRAY<STRUCT<text:STRING>>>,
   text STRING,
   user STRUCT<
     screen_name:STRING,
     name:STRING,
     friends_count:INT,
     followers_count:INT,
     statuses_count:INT,
     verified:BOOLEAN,
     utc_offset:INT,
     time_zone:STRING>,
   in_reply_to_screen_name STRING
 ) 
 PARTITIONED BY (datehour INT)
 ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
 LOCATION '/user/flume/tweets';

The table can be modified to include other columns from the Twitter data, but they must have the same name, and structure as the JSON fields referenced in the Twitter documentation.

Prepare the Oozie workflow

Configure Oozie to use MySQL

If using Cloudera Manager, Oozie can be reconfigured to use MySQL via the service configuration page on the Databases tab. Make sure to restart the Oozie service after reconfiguring. You will need to install the MySQL JDBC driver in /usr/lib/oozie/libext.

If Oozie was installed manually, Cloudera provides instructions for configuring Oozie to use MySQL.
Create a lib directory and copy any necessary external JARs into it

External JARs are provided to Oozie through a lib directory in the workflow directory. The workflow will need a copy of the MySQL JDBC driver and the hive-serdes JAR.
```
 $ mkdir oozie-workflows/lib
 $ cp hive-serdes/target/hive-serdes-1.0-SNAPSHOT.jar oozie-workflows/lib
 $ cp /var/lib/oozie/mysql-connector-java.jar oozie-workflows/lib
 
```

Copy hive-site.xml to the oozie-workflows directory

To execute the Hive action, Oozie needs a copy of hive-site.xml.

 $ sudo cp /etc/hive/conf/hive-site.xml oozie-workflows
 $ sudo chown <username>:<username> oozie-workflows/hive-site.xml

Copy the oozie-workflows directory to HDFS

$ hadoop fs -put oozie-workflows /user/<username>/oozie-workflows

Install the Oozie ShareLib in HDFS
```
 $ sudo -u hdfs hadoop fs -mkdir /user/oozie
 $ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie
 
```
In order to use the Hive action, the Oozie ShareLib must be installed. Installation instructions can be found here.

Starting the data pipeline

Start the Flume agent

Create the HDFS directory hierarchy for the Flume sink. Make sure that it will be accessible by the user running the Oozie workflow.
```
 $ hadoop fs -mkdir /user/flume/tweets
 $ hadoop fs -chown -R flume:flume /user/flume
 $ hadoop fs -chmod -R 770 /user/flume
 $ sudo /etc/init.d/flume-ng-agent start
 
```
If using Cloudera Manager, start Flume agent from Cloudera Manager Web UI.
Adjust the start time of the Oozie coordinator workflow in job.properties

You will need to modify the job.properties file, and change the jobStart, jobEnd, and initialDataset parameters. The start and end times are in UTC, because the version of Oozie packaged in CDH4 does not yet support custom timezones for workflows. The initial dataset should be set to something before the actual start time of your job in your local time zone. Additionally, the tzOffset parameter should be set to the difference between the server's timezone and UTC. By default, it is set to -8, which is correct for US Pacific Time.

Start the Oozie coordinator workflow

$ oozie job -oozie http://<oozie-host>:11000/oozie -config oozie-workflows/job.properties -run

cdh-twitter-example's People

Contributors

Stargazers

Watchers

Forkers

jihoon-kong agiledon rongchao getgitxxx sing0794 raoiitkgp nglx anitatailor duyvk nickboucart reverocean romainr viirya bitilandu venuktan joshlipps mailmahee nagukothapalli naveen007 xm-king megabites2013 hjyun gballari one7 leelakrishna marvinwang decli tedwon albertking sstratosphere shiumachi voltas rcgarcia74 diogenesjf knadigatla sunitakoppar carsonsong sstrato taiwanlennon sjulias chriscos uncleangel jameshtsun rustemt achun2080 changguanghua jackode chfakhar sethkontny abhimanyusheoran tk0485 ccstartfish101 foreverjay ruivaz dsdinter aburan28 prateek remindprakash dlaytonj2 kamrani abhilash-m ephremasfaw sthitaprajnas harsh86 kmizumar io10 seelingc rnalakurthi ryanwarm p4rth stijnbe drpymm rogerhadoop ottomata hu174 ydyashad dvasilen gchen gwenshap yunguangwang891017 reedbusinessmedia vamseeyarla karanadep martintriska henningde wangqiaowqo anb2 olexbelyaev jakarta1024 christianbeland prasadram thewisdomhacker josdeosan varuna-priya peterhk80 zengqiang2006 sandeepmukho edwardt 91pavan kristofvb

cdh-twitter-example's Issues

Twitter API mandatory SSL certificate requirements

Flume agent has stopped streaming the twitter data after Twitter API mandatory SSL certificate requirements. Update to the example is required.

Configuration issues with Oozie 3.1.3-cdh4.0.1 ?

This is a great tutorial - many thanks for posting it. I follow all of the set up instructions, but get hung up on running the Oozie workflow, with error: "Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/oozie-workflows/coord-app.xml] does not exist"

The file certainly does exist, and there doesn't seem to be an issue with permissions. I'm not sure if this error is suggesting it can't find coord-app.xml or if there is an issue with a setting in coord-app.xml. Could there be some issue with my default CDH4 setup?

tim@phocion:/user/tim$ oozie version
Oozie client build version: 3.1.3-cdh4.0.1

tim@phocion:/user/tim$ oozie job -oozie http://localhost:11000/oozie -config oozie-workflows/job.properties -run
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/oozie-workflows/coord-app.xml] does not exist

tim@phocion:/user/tim$ sudo -u oozie [ -f oozie-workflows/coord-app.xml ] && echo "FOUND" || "NOT FOUND"
FOUND

tim@phocion:/user/tim$ ls -l oozie-workflows/
total 24
-rwxr-xr-x 1 tim tim 938 Sep 24 21:29 add_partition.q
-rwxr-xr-x 1 tim tim 1356 Sep 26 11:09 coord-app.xml
-rwxr-xr-x 1 tim tim 1918 Sep 24 21:29 hive-action.xml
-rwxr-xr-x 1 tim tim 2200 Sep 24 21:29 hive-site.xml
-rwxr-xr-x 1 tim tim 1356 Sep 26 11:32 job.properties
drwxr-xr-x 2 tim tim 4096 Sep 24 21:29 lib

Use example with Hadoop 2.0.0-cdh4.2.1

We tried the example with the following software:

Hadoop 2.0.0-cdh4.2.1
Hive 0.10.0-cdh4.2.1
Flume 1.3.0-cdh4.2.1
Oozie 3.3.0-cdh4.2.1

In the description of the example the use of MySQL is stressed. Default Hadoop 2.0.0-cdh4.2.1 is installed with postgresql for Hive and Derby for Oozie, which works with no problem using this example.

We didn’t need to install Flume manually either. In the Cloudera Manager you can add Flume as a service. In the page of the service you can add the content of flume.conf in Configuration – Agent (Base). In the same page you can set the agent name to TwitterAgent. When you put the flume-sources-1.0-SNAPSHOT.jar in /usr/share/cmf/lib/plugins/ the jar will be added to FLUME_CLASSPATH in /var/run/cloudera-scm-agent/process/-flume-AGENT/flume-env.sh when the service is started.

However, one issue prevented us to use this service for the example. You have to add com.cloudera.flume.source.TwitterSource to flume.plugin.classes in flume-site.xml. Otherwise you get the error ClassNotFound. We haven’t found a way to do this via the Cloudera Manager. When starting the service a directory /var/run/cloudera-scm-agent/process/-flume-AGENT is created, which includes flume-site.xml. When you restart the service via Cloudera Manager a new directory is created with a different number for . But after changing flume-site.xml you could use this directory to start Flume via the command line.

Concerning the custom Flume Source, it’s probably best to build the source with the right value for hadoop.version (in our case 2.0.0-cdh4.2.1) and flume.version (1.3.0-cdh4.2.1) in pom.xml.

We had some trouble with the time zone. In our case the time zone in coord-app.xml in oozie-workflows had to be changed to "Europe/Amsterdam". In job.properties tzOffset had to be changed to 1, otherwise we got a mismatch between the directory mentioned in the parameter WFINPUT and the DATEHOUR-parameter in Action Configuration of Oozie (viewed via Oozie Web Console).

We didn’t need to install the Oozie ShareLib in HDFS.

We used Hue File Browser to create the necessary directories in HDFS.

It turned out that each time a hive session is started the ADD JAR ; has to be executed again.

Hive Table gives error while creation

Hi,
When I am trying to create the table - it gives me the following error. Can anyone please help in letting me know what can I do for this?

I have added Jar :
ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
ADD JAR /usr/local/Hive-JSON-Serde/json-serde/target/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: com.cloudera.hive.serde.JSONSerDe

Thanks.

variable [wfInput] cannot be resolved

Hello!

I have followed all the steps but when I run the following error occurs:
variable [wfInput] cannot be resolved

<coordinator-app name="add-partition-coord" frequency="${coord:hours(1)}" start="${jobStart}" end="${jobEnd}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1"> <datasets> <dataset name="tweets" frequency="${coord:hours(1)}" initial-instance="${initialDataset}" timezone="America/Los_Angeles"> <uri-template>hdfs://jupiter:8020/user/flume/tweets/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template> <done-flag></done-flag> </dataset> </datasets> <input-events> <data-in name="input" dataset="tweets"> <instance>${coord:current(coord:tzOffset() / 60)}</instance> </data-in> <data-in name="readyIndicator" dataset="tweets"> <instance>${coord:current(1 + (coord:tzOffset() / 60))}</instance> </data-in> </input-events> <action> <workflow> <app-path>${workflowRoot}/hive-action.xml</app-path> <configuration> <property> <name>wfInput</name> <value>${coord:dataIn('input')}</value> </property> <property> <name>dateHour</name> <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), tzOffset, 'HOUR'), 'yyyyMMddHH')}</value> </property> </configuration> </workflow> </action> </coordinator-app>

`nameNode=hdfs://jupiter:8020
jobTracker= jupiter:8021
workflowRoot=${nameNode}/user/${user.name}/oozie-workflows

jobStart=2016-11-15T12:30Z
jobEnd=2016-11-15T15:00Z

initialDataset=2016-11-15T11:00Z

tzOffset=+1

oozie.use.system.libpath=true
oozie.coord.application.path=${nameNode}/user/${user.name}/oozie-workflows/coord-app.xml
`

I live in Spain (UTC+1). Job starts correctly at the scheduled time.
Does anyone know why this error occurs ? Can anyone help me?
Thanks in advance.

Flume: Unable to start EventDrivenSourceRunner

Hi, tried this code and guidance for extracting data from Twitter. But when i start the agent i got the following error:

Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:139)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

What could be my failure. I am a absolute beginner with flume!

Thanks and best regards,
Martin

Null values in nested structures issue

When we have a nested structure that doesn't have a value in the JSON object, the hive deserialization silently fails for the whole structure.
I've created a fork of the project with a patch that I believe will demonstrate the issue and the suggested fix:

sjulias@33617a0

It would be great if you could review this change and hopefully apply it to the main project.

Thanks in advance,
Yulia

Authentication credentials is missing

I got these error when i tried to stream twitter using flume to hbase :

2015-06-30 17:01:46,352 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} } - Exception follows.
java.lang.IllegalStateException: Authentication credentials are missing. See http://twitter4j.org/configuration.html for the detail.
at twitter4j.TwitterBaseImpl.ensureAuthorizationEnabled(TwitterBaseImpl.java:200)
at twitter4j.TwitterStreamImpl.sample(TwitterStreamImpl.java:159)
at com.cloudera.flume.source.TwitterSource.start(TwitterSource.java:121)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

i wonder what is the error caused by

Oozie Job execution "Intrernal Server Error"

Hi ,

I have executed the same procedure as given by you only difference is I am not using Cloudera Manager . I am using CDH4.7 . when I execute as : oozie job -oozie http://localhost:11000/oozie -config job.properties -run. I am getting the following error as given below.

Error: HTTP error code: 500 : Internal Server Error.

Please help me in crossing this hurdle

getting error while connecting solr with SQL server 2005

Hi,
Pls give me the solution why i am getting this error when i am using Solr DIH to SQL server2005

WARN - 2015-01-20 18:56:28.534; org.apache.solr.handler.dataimport.SolrWriter; Error creating document : SolrInputDocument(fields: [id=[email protected], Resume_Size=397090, Document_Type=.pdf, Application_date=2014-09-03 15:12:19.0, Resume_content=[B@14ebc94, Phone_Day=435454354545, Years_of_Experience=2, Main_Skills=fdsf, Resume_Content_Type=text/html, Exported=false, City=grtgr, Timestamp=[B@3ec96e, Source=Makro-Care, Category=3, Fname=dsfsdf, Lname=dfdsf, statename=Chiba, Resume_File_Name=doc_en_FAQ.pdf, Current_Address=, version=1490823821167427584])
org.apache.solr.common.SolrException: ERROR: [doc=[email protected]] multiple values encountered for non multiValued copy field Phone_Day: 435454354545
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:140)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:238)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:926)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1080)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:692)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:265)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:511)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
INFO - 2015-01-20 18:56:28.534; org.apache.solr.handler.dataimport.DocBuilder; Time taken = 0:0:0.693

Apache Hive is not loading tweets from flume's directory

Hi!

I have been following the tutorial and I am stuck. Flume is gathering tweets correctly and saving them in /user/flume/tweets.

Then, I have execute:

ADD JAR ;

CREATE EXTERNAL TABLE tweets (.....

But when for example I execute SELECT COUNT(*) from tweets; , the table is empty, the result is 0. Do I have to execute some other command in order to load the tweets?

Thanks!

Issues with the json serde.

Hi,

I'm using the json serde in hive for parsing another set of json files I have from valentines day. I noticed that there is no option to ignore malformed json, and there seems to be some problems with deserializing all json.

This tweet is causing the error:

{"text":"@KimKardashian happy valentines day, hope it's a good one","retweet_count":0,"geo":{"type":"Point","coordinates":[38.7313358,-108.05278695]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":25365536,"source":"\u003Ca href="http://twitter.com/download/android" rel="nofollow"\u003ETwitter for Android\u003C/a\u003E","in_reply_to_user_id_str":"25365536","id_str":"169483808003989505","entities":{"user_mentions":[{"indices":[0,14],"screen_name":"KimKardashian","id_str":"25365536","name":"Kim Kardashian","id":25365536}],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/6a7e7dbf9d6c7ac4.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Delta, CO","bounding_box":{"type":"Polygon","coordinates":[[[-108.104644,38.71503],[-108.021863,38.71503],[-108.021863,38.769794],[-108.104644,38.769794]]]},"name":"Delta","id":"6a7e7dbf9d6c7ac4","country":"United States"},"in_reply_to_screen_name":"Ki{"text":"@bbrandivirgo too bad I dont have the number. Happy valentines day tho :)","retweet_count":0,"geo":{"type":"Point","coordinates":[33.77406404,-84.39270512]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"source":"\u003Ca href="http://mobile.twitter.com" rel="nofollow"\u003EMobile Web\u003C/a\u003E","in_reply_to_user_id_str":null,"id_str":"169497701241716736","entities":{"user_mentions":[],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/8173485c72e78ca5.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Atlanta, GA","bounding_box":{"type":"Polygon","coordinates":[[[-84.54674,33.647908],[-84.289389,33.647908],[-84.289389,33.887618],[-84.54674,33.887618]]]},"name":"Atlanta","id":"8173485c72e78ca5","country":"United States"},"in_reply_to_screen_name":null,"favorited":false,"truncated":false,"created_at":"Tue Feb 14 19:06:15 +0000 2012","contributors":null,"user":{"contributors_enabled":false,"profile_background_image_url":"http://a3.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","url":"http://facebook.com/cperk3","profile_link_color":"0084B4","followers_count":773,"profile_image_url":"http://a3.twimg.com/profile_images/1792490671/000011110000_normal.jpg","default_profile_image":false,"show_all_inline_media":true,"statuses_count":3271,"profile_background_color":"C0DEED","description":"Ga Tech Athlete-Student.. Black&Samoan...Follow me as I follow Jesus-","location":"Atlanta, GA","profile_background_tile":true,"favourites_count":1,"profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","time_zone":"Quito","profile_sidebar_fill_color":"DDEEF6","screen_name":"Cpeezy21","id_str":"312682111","lang":"en","geo_enabled":true,"profile_image_url_https":"https://si0.twimg.com/profile_images/1792490671/000011110000_normal.jpg","verified":false,"notifications":null,"profile_sidebar_border_color":"04080a","protected":false,"listed_count":5,"created_at":"Tue Jun 07 14:14:34 +0000 2011","name":"Charles Perkins III","is_translator":false,"follow_request_sent":null,"following":null,"profile_use_background_image":true,"friends_count":223,"id":312682111,"default_profile":false,"utc_offset":-18000,"profile_text_color":"333333"},"retweeted":false,"id":169497701241716736,"coordinates":{"type":"Point","coordinates":[-84.39270512,33.77406404]}}

I'm getting this error when processing some sample twitter data:

2012-09-26 15:15:39,059 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2012-09-26 15:15:39,215 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-09-26 15:15:39,372 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /mapred/local/taskTracker/distcache/-624804405132306423_-2027207125_45603557/hadoop1.domain.com/tmp/hive-root/hive_2012-09-26_15-15-33_715_8669028640552125101/-mr-10004/af319f96-99f0-4f06-8fba-3fbf5b880148 <- /mapred/local/taskTracker/root/jobcache/job_201209252321_0010/attempt_201209252321_0010_m_000000_0/work/HIVE_PLANaf319f96-99f0-4f06-8fba-3fbf5b880148
2012-09-26 15:15:39,380 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /mapred/local/taskTracker/root/jobcache/job_201209252321_0010/jars/job.jar <- /mapred/local/taskTracker/root/jobcache/job_201209252321_0010/attempt_201209252321_0010_m_000000_0/work/job.jar
2012-09-26 15:15:39,388 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /mapred/local/taskTracker/root/jobcache/job_201209252321_0010/jars/.job.jar.crc <- /mapred/local/taskTracker/root/jobcache/job_201209252321_0010/attempt_201209252321_0010_m_000000_0/work/.job.jar.crc
2012-09-26 15:15:39,451 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2012-09-26 15:15:39,452 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2012-09-26 15:15:39,767 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2012-09-26 15:15:39,773 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@484845aa
2012-09-26 15:15:40,065 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH
2012-09-26 15:15:40,222 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available
2012-09-26 15:15:40,222 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded
2012-09-26 15:15:40,232 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
2012-09-26 15:15:40,236 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2012-09-26 15:15:40,242 INFO ExecMapper: maximum memory = 119341056
2012-09-26 15:15:40,243 INFO ExecMapper: conf classpath = [file:/var/run/cloudera-scm-agent/process/93-mapreduce-TASKTRACKER/, file:/usr/java/jdk1.6.0_31/lib/tools.jar, file:/usr/lib/hadoop-0.20-mapreduce/, file:/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.0.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.5.4.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.5.4.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-api-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/core-3.1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.0.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jdiff-1.0.9.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/json-simple-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.16.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/oro-2.0.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.3.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-2.1/jsp-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-2.1/jsp-api-2.1.jar, file:/usr/share/cmf/lib/plugins/tt-instrumentation-4.0.4.jar, file:/usr/share/cmf/lib/plugins/event-publish-4.0.4-shaded.jar, file:/usr/lib/hadoop-hdfs/lib/avro-1.5.4.jar, file:/usr/lib/hadoop-hdfs/lib/paranamer-2.3.jar, file:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar, file:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar, file:/usr/lib/hadoop-hdfs/lib/slf4j-api-1.6.1.jar, file:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar, file:/usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar, file:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar, file:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar, file:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar, file:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.0.1.jar, file:/usr/lib/hadoop-hdfs/lib/log4j-1.2.15.jar, file:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.0.1-tests.jar, file:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar, file:/usr/lib/hadoop/lib/commons-codec-1.4.jar, file:/usr/lib/hadoop/lib/jets3t-0.6.1.jar, file:/usr/lib/hadoop/lib/json-simple-1.1.jar, file:/usr/lib/hadoop/lib/guava-11.0.2.jar, file:/usr/lib/hadoop/lib/avro-1.5.4.jar, file:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar, file:/usr/lib/hadoop/lib/commons-configuration-1.6.jar, file:/usr/lib/hadoop/lib/asm-3.2.jar, file:/usr/lib/hadoop/lib/paranamer-2.3.jar, file:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar, file:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar, file:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar, file:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar, file:/usr/lib/hadoop/lib/commons-cli-1.2.jar, file:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.1.jar, file:/usr/lib/hadoop/lib/commons-lang-2.5.jar, file:/usr/lib/hadoop/lib/kfs-0.3.jar, file:/usr/lib/hadoop/lib/hue-plugins-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar, file:/usr/lib/hadoop/lib/jettison-1.1.jar, file:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar, file:/usr/lib/hadoop/lib/jsch-0.1.42.jar, file:/usr/lib/hadoop/lib/stax-api-1.0.1.jar, file:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar, file:/usr/lib/hadoop/lib/jsr305-1.3.9.jar, file:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar, file:/usr/lib/hadoop/lib/jsp-api-2.1.jar, file:/usr/lib/hadoop/lib/oro-2.0.8.jar, file:/usr/lib/hadoop/lib/jersey-server-1.8.jar, file:/usr/lib/hadoop/lib/commons-digester-1.8.jar, file:/usr/lib/hadoop/lib/commons-math-2.1.jar, file:/usr/lib/hadoop/lib/jline-0.9.94.jar, file:/usr/lib/hadoop/lib/core-3.1.1.jar, file:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar, file:/usr/lib/hadoop/lib/commons-el-1.0.jar, file:/usr/lib/hadoop/lib/jersey-core-1.8.jar, file:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar, file:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar, file:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.1.jar, file:/usr/lib/zookeeper/zookeeper-3.4.3-cdh4.0.1.jar, file:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar, file:/usr/lib/hadoop/lib/commons-net-3.1.jar, file:/usr/lib/hadoop/lib/servlet-api-2.5.jar, file:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar, file:/usr/lib/hadoop/lib/commons-io-2.1.jar, file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar, file:/usr/lib/hadoop/lib/commons-logging-api-1.1.jar, file:/usr/lib/hadoop/lib/xmlenc-0.52.jar, file:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar, file:/usr/lib/hadoop/lib/activation-1.1.jar, file:/usr/lib/hadoop/lib/jersey-json-1.8.jar, file:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar, file:/usr/lib/hadoop/lib/log4j-1.2.15.jar, file:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1-tests.jar, file:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.0.1.jar, file:/mapred/local/taskTracker/root/jobcache/job_201209252321_0010/jars/classes, file:/mapred/local/taskTracker/root/jobcache/job_201209252321_0010/jars/job.jar, file:/mapred/local/taskTracker/root/distcache/4260026189093522549_-70309741_45603944/hadoop1.domain.com/user/root/.staging/job_201209252321_0010/libjars/hive-builtins-0.8.1-cdh4.0.1.jar, file:/mapred/local/taskTracker/root/distcache/-6339710882011042599_2132445101_45603979/hadoop1.domain.com/user/root/.staging/job_201209252321_0010/libjars/hive-serdes-1.0-SNAPSHOT.jar, file:/mapred/local/taskTracker/root/distcache/7269667103068590023_-978189584_45604014/hadoop1.domain.com/user/root/.staging/job_201209252321_0010/libjars/hive-contrib-0.8.1-cdh4.0.1.jar, file:/mapred/local/taskTracker/root/jobcache/job_201209252321_0010/attempt_201209252321_0010_m_000000_0/work/]
2012-09-26 15:15:40,243 INFO ExecMapper: thread classpath = [file:/var/run/cloudera-scm-agent/process/93-mapreduce-TASKTRACKER/, file:/usr/java/jdk1.6.0_31/lib/tools.jar, file:/usr/lib/hadoop-0.20-mapreduce/, file:/usr/lib/hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.0.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.5.4.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.5.4.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-api-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/core-3.1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.0.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jdiff-1.0.9.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/json-simple-1.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.16.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/oro-2.0.8.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.3.2.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-2.1/jsp-2.1.jar, file:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-2.1/jsp-api-2.1.jar, file:/usr/share/cmf/lib/plugins/tt-instrumentation-4.0.4.jar, file:/usr/share/cmf/lib/plugins/event-publish-4.0.4-shaded.jar, file:/usr/lib/hadoop-hdfs/lib/avro-1.5.4.jar, file:/usr/lib/hadoop-hdfs/lib/paranamer-2.3.jar, file:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar, file:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar, file:/usr/lib/hadoop-hdfs/lib/slf4j-api-1.6.1.jar, file:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar, file:/usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar, file:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar, file:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar, file:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar, file:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.0.1.jar, file:/usr/lib/hadoop-hdfs/lib/log4j-1.2.15.jar, file:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.0.1-tests.jar, file:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar, file:/usr/lib/hadoop/lib/commons-codec-1.4.jar, file:/usr/lib/hadoop/lib/jets3t-0.6.1.jar, file:/usr/lib/hadoop/lib/json-simple-1.1.jar, file:/usr/lib/hadoop/lib/guava-11.0.2.jar, file:/usr/lib/hadoop/lib/avro-1.5.4.jar, file:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar, file:/usr/lib/hadoop/lib/commons-configuration-1.6.jar, file:/usr/lib/hadoop/lib/asm-3.2.jar, file:/usr/lib/hadoop/lib/paranamer-2.3.jar, file:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar, file:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar, file:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar, file:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar, file:/usr/lib/hadoop/lib/commons-cli-1.2.jar, file:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.1.jar, file:/usr/lib/hadoop/lib/commons-lang-2.5.jar, file:/usr/lib/hadoop/lib/kfs-0.3.jar, file:/usr/lib/hadoop/lib/hue-plugins-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar, file:/usr/lib/hadoop/lib/jettison-1.1.jar, file:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar, file:/usr/lib/hadoop/lib/jsch-0.1.42.jar, file:/usr/lib/hadoop/lib/stax-api-1.0.1.jar, file:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar, file:/usr/lib/hadoop/lib/jsr305-1.3.9.jar, file:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar, file:/usr/lib/hadoop/lib/jsp-api-2.1.jar, file:/usr/lib/hadoop/lib/oro-2.0.8.jar, file:/usr/lib/hadoop/lib/jersey-server-1.8.jar, file:/usr/lib/hadoop/lib/commons-digester-1.8.jar, file:/usr/lib/hadoop/lib/commons-math-2.1.jar, file:/usr/lib/hadoop/lib/jline-0.9.94.jar, file:/usr/lib/hadoop/lib/core-3.1.1.jar, file:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar, file:/usr/lib/hadoop/lib/commons-el-1.0.jar, file:/usr/lib/hadoop/lib/jersey-core-1.8.jar, file:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar, file:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar, file:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.1.jar, file:/usr/lib/zookeeper/zookeeper-3.4.3-cdh4.0.1.jar, file:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar, file:/usr/lib/hadoop/lib/commons-net-3.1.jar, file:/usr/lib/hadoop/lib/servlet-api-2.5.jar, file:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar, file:/usr/lib/hadoop/lib/commons-io-2.1.jar, file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar, file:/usr/lib/hadoop/lib/commons-logging-api-1.1.jar, file:/usr/lib/hadoop/lib/xmlenc-0.52.jar, file:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar, file:/usr/lib/hadoop/lib/activation-1.1.jar, file:/usr/lib/hadoop/lib/jersey-json-1.8.jar, file:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar, file:/usr/lib/hadoop/lib/log4j-1.2.15.jar, file:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1-tests.jar, file:/usr/lib/hadoop/hadoop-annotations-2.0.0-cdh4.0.1.jar, file:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.0.1.jar, file:/mapred/local/taskTracker/root/jobcache/job_201209252321_0010/jars/classes, file:/mapred/local/taskTracker/root/jobcache/job_201209252321_0010/jars/job.jar, file:/mapred/local/taskTracker/root/distcache/4260026189093522549_-70309741_45603944/hadoop1.domain.com/user/root/.staging/job_201209252321_0010/libjars/hive-builtins-0.8.1-cdh4.0.1.jar, file:/mapred/local/taskTracker/root/distcache/-6339710882011042599_2132445101_45603979/hadoop1.domain.com/user/root/.staging/job_201209252321_0010/libjars/hive-serdes-1.0-SNAPSHOT.jar, file:/mapred/local/taskTracker/root/distcache/7269667103068590023_-978189584_45604014/hadoop1.domain.com/user/root/.staging/job_201209252321_0010/libjars/hive-contrib-0.8.1-cdh4.0.1.jar, file:/mapred/local/taskTracker/root/jobcache/job_201209252321_0010/attempt_201209252321_0010_m_000000_0/work/]
2012-09-26 15:15:40,253 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias tweets to work list for file hdfs://hadoop1.domain.com:8020/uploads
2012-09-26 15:15:40,256 INFO org.apache.hadoop.hive.ql.exec.MapOperator: dump TS structtext:string,user:struct<screen_name:string>
2012-09-26 15:15:40,256 INFO ExecMapper:
Id =3

Id =0

Id =1

Id =2
Id = 1 null<\Parent>
<\FS>
<\Children>
Id = 0 null<\Parent>
<\SEL>
<\Children>
Id = 3 null<\Parent>
<\TS>
<\Children>
<\MAP>
2012-09-26 15:15:40,257 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self 3 MAP
2012-09-26 15:15:40,257 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self 0 TS
2012-09-26 15:15:40,257 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Operator 0 TS initialized
2012-09-26 15:15:40,257 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children of 0 TS
2012-09-26 15:15:40,257 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 1 SEL
2012-09-26 15:15:40,257 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 1 SEL
2012-09-26 15:15:40,262 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT structtext:string,user:struct<screen_name:string>
2012-09-26 15:15:40,262 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 1 SEL initialized
2012-09-26 15:15:40,262 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 1 SEL
2012-09-26 15:15:40,262 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 2 FS
2012-09-26 15:15:40,262 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 2 FS
2012-09-26 15:15:40,293 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 2 FS initialized
2012-09-26 15:15:40,293 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 2 FS
2012-09-26 15:15:40,293 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 1 SEL
2012-09-26 15:15:40,293 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initialization Done 0 TS
2012-09-26 15:15:40,293 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 3 MAP
2012-09-26 15:15:40,298 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing path hdfs://hadoop1.domain.com:8020/uploads/twitter.txt
2012-09-26 15:15:40,298 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias tweets for file hdfs://hadoop1.domain.com:8020/uploads
2012-09-26 15:15:40,497 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarding 1 rows
2012-09-26 15:15:40,497 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
2012-09-26 15:15:40,497 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows
2012-09-26 15:15:40,497 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://hadoop1.domain.com:8020/tmp/hive-root/hive_2012-09-26_15-15-33_715_8669028640552125101/_tmp.-ext-10002/000000_0
2012-09-26 15:15:40,498 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://hadoop1.domain.com:8020/tmp/hive-root/hive_2012-09-26_15-15-33_715_8669028640552125101/_task_tmp.-ext-10002/_tmp.000000_0
2012-09-26 15:15:40,498 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://hadoop1.domain.com:8020/tmp/hive-root/hive_2012-09-26_15-15-33_715_8669028640552125101/_tmp.-ext-10002/000000_0
2012-09-26 15:15:40,560 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 24284800
2012-09-26 15:15:40,577 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarding 10 rows
2012-09-26 15:15:40,577 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 10 rows
2012-09-26 15:15:40,577 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 10 rows
2012-09-26 15:15:40,577 INFO ExecMapper: ExecMapper: processing 10 rows: used memory = 24860552
2012-09-26 15:15:40,705 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarding 100 rows
2012-09-26 15:15:40,705 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100 rows
2012-09-26 15:15:40,705 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 100 rows
2012-09-26 15:15:40,705 INFO ExecMapper: ExecMapper: processing 100 rows: used memory = 28885000
2012-09-26 15:15:41,499 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarding 1000 rows
2012-09-26 15:15:41,499 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1000 rows
2012-09-26 15:15:41,499 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1000 rows
2012-09-26 15:15:41,499 INFO ExecMapper: ExecMapper: processing 1000 rows: used memory = 7598072
2012-09-26 15:15:42,992 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"text":"@KimKardashian happy valentines day, hope it's a good one","retweet_count":0,"geo":{"type":"Point","coordinates":[38.7313358,-108.05278695]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":25365536,"source":"\u003Ca href="http://twitter.com/download/android" rel="nofollow"\u003ETwitter for Android\u003C/a\u003E","in_reply_to_user_id_str":"25365536","id_str":"169483808003989505","entities":{"user_mentions":[{"indices":[0,14],"screen_name":"KimKardashian","id_str":"25365536","name":"Kim Kardashian","id":25365536}],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/6a7e7dbf9d6c7ac4.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Delta, CO","bounding_box":{"type":"Polygon","coordinates":[[[-108.104644,38.71503],[-108.021863,38.71503],[-108.021863,38.769794],[-108.104644,38.769794]]]},"name":"Delta","id":"6a7e7dbf9d6c7ac4","country":"United States"},"in_reply_to_screen_name":"Ki{"text":"@bbrandivirgo too bad I dont have the number. Happy valentines day tho :)","retweet_count":0,"geo":{"type":"Point","coordinates":[33.77406404,-84.39270512]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"source":"\u003Ca href="http://mobile.twitter.com" rel="nofollow"\u003EMobile Web\u003C/a\u003E","in_reply_to_user_id_str":null,"id_str":"169497701241716736","entities":{"user_mentions":[],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/8173485c72e78ca5.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Atlanta, GA","bounding_box":{"type":"Polygon","coordinates":[[[-84.54674,33.647908],[-84.289389,33.647908],[-84.289389,33.887618],[-84.54674,33.887618]]]},"name":"Atlanta","id":"8173485c72e78ca5","country":"United States"},"in_reply_to_screen_name":null,"favorited":false,"truncated":false,"created_at":"Tue Feb 14 19:06:15 +0000 2012","contributors":null,"user":{"contributors_enabled":false,"profile_background_image_url":"http://a3.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","url":"http://facebook.com/cperk3","profile_link_color":"0084B4","followers_count":773,"profile_image_url":"http://a3.twimg.com/profile_images/1792490671/000011110000_normal.jpg","default_profile_image":false,"show_all_inline_media":true,"statuses_count":3271,"profile_background_color":"C0DEED","description":"Ga Tech Athlete-Student.. Black&Samoan...Follow me as I follow Jesus-","location":"Atlanta, GA","profile_background_tile":true,"favourites_count":1,"profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","time_zone":"Quito","profile_sidebar_fill_color":"DDEEF6","screen_name":"Cpeezy21","id_str":"312682111","lang":"en","geo_enabled":true,"profile_image_url_https":"https://si0.twimg.com/profile_images/1792490671/000011110000_normal.jpg","verified":false,"notifications":null,"profile_sidebar_border_color":"04080a","protected":false,"listed_count":5,"created_at":"Tue Jun 07 14:14:34 +0000 2011","name":"Charles Perkins III","is_translator":false,"follow_request_sent":null,"following":null,"profile_use_background_image":true,"friends_count":223,"id":312682111,"default_profile":false,"utc_offset":-18000,"profile_text_color":"333333"},"retweeted":false,"id":169497701241716736,"coordinates":{"type":"Point","coordinates":[-84.39270512,33.77406404]}}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('t' (code 116)): was expecting comma to separate OBJECT entries
at [Source: java.io.StringReader@366ef7ba; line: 1, column: 999]
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('t' (code 116)): was expecting comma to separate OBJECT entries
at [Source: java.io.StringReader@366ef7ba; line: 1, column: 999]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:285)
at org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeserializer.java:220)
at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeserializer.java:165)
at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeserializer.java:25)
at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2402)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1602)
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:126)
... 10 more

2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing...
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarded 4551 rows
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:1
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 4551 rows
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing...
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarded 4551 rows
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing...
2012-09-26 15:15:42,993 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 forwarded 0 rows
2012-09-26 15:15:43,066 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:4551
2012-09-26 15:15:43,066 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done
2012-09-26 15:15:43,066 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2012-09-26 15:15:43,066 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 Close done
2012-09-26 15:15:43,066 INFO ExecMapper: ExecMapper: processed 4551 rows: used memory = 17571376
2012-09-26 15:15:43,074 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-09-26 15:15:43,077 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"text":"@KimKardashian happy valentines day, hope it's a good one","retweet_count":0,"geo":{"type":"Point","coordinates":[38.7313358,-108.05278695]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":25365536,"source":"\u003Ca href="http://twitter.com/download/android" rel="nofollow"\u003ETwitter for Android\u003C/a\u003E","in_reply_to_user_id_str":"25365536","id_str":"169483808003989505","entities":{"user_mentions":[{"indices":[0,14],"screen_name":"KimKardashian","id_str":"25365536","name":"Kim Kardashian","id":25365536}],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/6a7e7dbf9d6c7ac4.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Delta, CO","bounding_box":{"type":"Polygon","coordinates":[[[-108.104644,38.71503],[-108.021863,38.71503],[-108.021863,38.769794],[-108.104644,38.769794]]]},"name":"Delta","id":"6a7e7dbf9d6c7ac4","country":"United States"},"in_reply_to_screen_name":"Ki{"text":"@bbrandivirgo too bad I dont have the number. Happy valentines day tho :)","retweet_count":0,"geo":{"type":"Point","coordinates":[33.77406404,-84.39270512]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"source":"\u003Ca href="http://mobile.twitter.com" rel="nofollow"\u003EMobile Web\u003C/a\u003E","in_reply_to_user_id_str":null,"id_str":"169497701241716736","entities":{"user_mentions":[],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/8173485c72e78ca5.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Atlanta, GA","bounding_box":{"type":"Polygon","coordinates":[[[-84.54674,33.647908],[-84.289389,33.647908],[-84.289389,33.887618],[-84.54674,33.887618]]]},"name":"Atlanta","id":"8173485c72e78ca5","country":"United States"},"in_reply_to_screen_name":null,"favorited":false,"truncated":false,"created_at":"Tue Feb 14 19:06:15 +0000 2012","contributors":null,"user":{"contributors_enabled":false,"profile_background_image_url":"http://a3.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","url":"http://facebook.com/cperk3","profile_link_color":"0084B4","followers_count":773,"profile_image_url":"http://a3.twimg.com/profile_images/1792490671/000011110000_normal.jpg","default_profile_image":false,"show_all_inline_media":true,"statuses_count":3271,"profile_background_color":"C0DEED","description":"Ga Tech Athlete-Student.. Black&Samoan...Follow me as I follow Jesus-","location":"Atlanta, GA","profile_background_tile":true,"favourites_count":1,"profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","time_zone":"Quito","profile_sidebar_fill_color":"DDEEF6","screen_name":"Cpeezy21","id_str":"312682111","lang":"en","geo_enabled":true,"profile_image_url_https":"https://si0.twimg.com/profile_images/1792490671/000011110000_normal.jpg","verified":false,"notifications":null,"profile_sidebar_border_color":"04080a","protected":false,"listed_count":5,"created_at":"Tue Jun 07 14:14:34 +0000 2011","name":"Charles Perkins III","is_translator":false,"follow_request_sent":null,"following":null,"profile_use_background_image":true,"friends_count":223,"id":312682111,"default_profile":false,"utc_offset":-18000,"profile_text_color":"333333"},"retweeted":false,"id":169497701241716736,"coordinates":{"type":"Point","coordinates":[-84.39270512,33.77406404]}}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"text":"@KimKardashian happy valentines day, hope it's a good one","retweet_count":0,"geo":{"type":"Point","coordinates":[38.7313358,-108.05278695]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":25365536,"source":"\u003Ca href="http://twitter.com/download/android" rel="nofollow"\u003ETwitter for Android\u003C/a\u003E","in_reply_to_user_id_str":"25365536","id_str":"169483808003989505","entities":{"user_mentions":[{"indices":[0,14],"screen_name":"KimKardashian","id_str":"25365536","name":"Kim Kardashian","id":25365536}],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/6a7e7dbf9d6c7ac4.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Delta, CO","bounding_box":{"type":"Polygon","coordinates":[[[-108.104644,38.71503],[-108.021863,38.71503],[-108.021863,38.769794],[-108.104644,38.769794]]]},"name":"Delta","id":"6a7e7dbf9d6c7ac4","country":"United States"},"in_reply_to_screen_name":"Ki{"text":"@bbrandivirgo too bad I dont have the number. Happy valentines day tho :)","retweet_count":0,"geo":{"type":"Point","coordinates":[33.77406404,-84.39270512]},"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"source":"\u003Ca href="http://mobile.twitter.com" rel="nofollow"\u003EMobile Web\u003C/a\u003E","in_reply_to_user_id_str":null,"id_str":"169497701241716736","entities":{"user_mentions":[],"urls":[],"hashtags":[]},"in_reply_to_status_id":null,"place":{"url":"http://api.twitter.com/1/geo/id/8173485c72e78ca5.json","place_type":"city","country_code":"US","attributes":{},"full_name":"Atlanta, GA","bounding_box":{"type":"Polygon","coordinates":[[[-84.54674,33.647908],[-84.289389,33.647908],[-84.289389,33.887618],[-84.54674,33.887618]]]},"name":"Atlanta","id":"8173485c72e78ca5","country":"United States"},"in_reply_to_screen_name":null,"favorited":false,"truncated":false,"created_at":"Tue Feb 14 19:06:15 +0000 2012","contributors":null,"user":{"contributors_enabled":false,"profile_background_image_url":"http://a3.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","url":"http://facebook.com/cperk3","profile_link_color":"0084B4","followers_count":773,"profile_image_url":"http://a3.twimg.com/profile_images/1792490671/000011110000_normal.jpg","default_profile_image":false,"show_all_inline_media":true,"statuses_count":3271,"profile_background_color":"C0DEED","description":"Ga Tech Athlete-Student.. Black&Samoan...Follow me as I follow Jesus-","location":"Atlanta, GA","profile_background_tile":true,"favourites_count":1,"profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/376284279/yyyyyyyyyyyyyyyyyyyy.jpg","time_zone":"Quito","profile_sidebar_fill_color":"DDEEF6","screen_name":"Cpeezy21","id_str":"312682111","lang":"en","geo_enabled":true,"profile_image_url_https":"https://si0.twimg.com/profile_images/1792490671/000011110000_normal.jpg","verified":false,"notifications":null,"profile_sidebar_border_color":"04080a","protected":false,"listed_count":5,"created_at":"Tue Jun 07 14:14:34 +0000 2011","name":"Charles Perkins III","is_translator":false,"follow_request_sent":null,"following":null,"profile_use_background_image":true,"friends_count":223,"id":312682111,"default_profile":false,"utc_offset":-18000,"profile_text_color":"333333"},"retweeted":false,"id":169497701241716736,"coordinates":{"type":"Point","coordinates":[-84.39270512,33.77406404]}}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
... 8 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('t' (code 116)): was expecting comma to separate OBJECT entries
at [Source: java.io.StringReader@366ef7ba; line: 1, column: 999]
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('t' (code 116)): was expecting comma to separate OBJECT entries
at [Source: java.io.StringReader@366ef7ba; line: 1, column: 999]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:285)
at org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeserializer.java:220)
at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeserializer.java:165)
at org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeserializer.java:25)
at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2402)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1602)
at com.cloudera.hive.serde.JSONSerDe.deserialize(JSONSerDe.java:126)
... 10 more
2012-09-26 15:15:43,081 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.