Giter Site home page Giter Site logo

trend-sentiment-miner's Introduction

To run the application, at current directory, execute the following command:
./setup.bash

train: trainmodel.py


input format:
	"id";"polarity";"tweet"
	e.g. sts_gold_tweet.csv
	polarity:
		0: negtive
		4: positive


output:
	rf_model.model

steps:

	1. replace all the spaces(\s) with " "						(line: 27)
	2. replace all the urls with " "						(line: 28)
	3. tokenize									(line: 30)
	4. filter stop words								(line: 32) 	(default stop words by spark, might need to change)
	5. convert to vector: Word2Vec							(line: 34)
	6. index label									(line: 36)
	7. split training test to tain and test						(line: 40)
	8. train 									(line: 48)
	9. test										(line: 50)
	10.save model 									(line: 63)


model: 

	random forest 									(line: 42)

	from pyspark.ml.classification import RandomForestClassifier, RandomForestClassificationModel

	RandomForestClassifier(self, featuresCol="features", labelCol="label”, …)

	in which:
		featuresCol: 					vectorized words 	(step: 5)
		labelCol: 					indexed polarity  	(step: 6)	(0.0: negtive, 1.0: positive)




classification: sentimentAnalysis.py

input: 	short.json

output: 	sentiments.json/part-r-00000xxxxxxx.json

steps:

	1. load model 									(line: 26)
	2. replace all the spaces(\s) with " "						(line: 32)
	3. replace all the urls with " "						(line: 34)
	4. tokenize									(line: 36)
	5. filter stop words								(line: 38) 	(default stop words by spark, might need to change)
	6. convert to vector: Word2Vec							(line: 40)
	7. test with model 								(line: 42)
	8. format output 								(line: 47~55)
	9. count predicate percentage 							(line: 49) 	(because 0: negtive, 1:positive, directly take the average)






Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.