Comments (5)
Perfect! Please use T 176 and P 2687340 for the 4tasks03.txt.
The 4task01.txt was the largest dataset that worked on the local mode with 16 GB RAM. (-T 176 -P 895780)
from dr1dl-pyspark.
What about r and e? I believe we generally set m to 100.
On Thu, Mar 10, 2016 at 4:27 PM milad181 [email protected] wrote:
Perfect! Please use T 176 and P 2687340 for the 4tasks03.txt.
The 4task01.txt http://bd.hafni.cs.uga.edu/test/4tasks01.txt was the
largest dataset that worked on the local mode with 16 GB RAM. (-T 176 -P
895780)—
Reply to this email directly or view it on GitHub
#67 (comment)
.iPhone'd
from dr1dl-pyspark.
We generally used r 0.07 -m 5 -e 0.01 to obtain results faster.
from dr1dl-pyspark.
@quinngroup/bigneuron
I seem to have a reliable BlueData image working. It's currently crunching the 4tasks03.txt
dataset; so far it's working. I also implemented a few optimizations--broadcasting the random seed at the start of each iteration, and representing v
with a SparseVector
object--to see how they work. They're not fully tested yet so the job may crash at some point.
In the meantime, feel free to use the image and stress test it against either the cluster I've spun up or your own custom cluster. Let me know if there are any problems.
from dr1dl-pyspark.
@magsol
Dear Dr.quinn, would you please set some credentials for me to work with your cluster ?
Thanks
from dr1dl-pyspark.
Related Issues (20)
- Consensus of row vs column -wise operations HOT 4
- Command-line flag to tell Spark whether data are row or column-oriented HOT 1
- Dimensional consistency HOT 1
- Refactor common functionality between Spark and Thunder-based apps
- Complexity analysis HOT 2
- Nodes/CPUs vs data size experiments
- Data size vs speed up experiments
- Make number of partitions a command-line parameter
- Explore DataFrames for possible serialization speed-ups HOT 2
- Custom Spark Partitioner
- Sparse vector representations HOT 1
- Heap and memory issues @GACRC HOT 4
- Broadcast random seeds, rather than random vectors
- Cache / persist S over each iteration
- Investigate SystemML
- Batch updates
- Swap/disk memory problems and runtime analysis HOT 1
- Distributed ALS implementations HOT 1
- Vector-Matrix: ReduceByKey error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dr1dl-pyspark.