Comments (2)
Take a look here: https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector
Example notesbooks:
https://docs.databricks.com/_static/notebooks/spark-tensorflow-connector.html
https://github.com/GoogleCloudPlatform/cloud-dataproc/tree/master/spark-tensorflow
from ecosystem.
Thanks @joyeshmishra!
@srinivasugaddam Here is a short example of how to use the connector with PySpark.
Run PySpark with the spark_connector in the jars argument as shown below:
$SPARK_HOME/bin/pyspark --jars target/spark-connector_2.11-1.8.0.jar
Here is the README example translated to Python.
from pyspark.sql.types import *
path = "test-output.tfrecord"
fields = [StructField("id", IntegerType()), StructField("IntegerCol", IntegerType()), StructField("LongCol", LongType()), StructField("FloatCol", FloatType()), StructField("DoubleCol", DoubleType()), StructField("VectorCol", ArrayType(DoubleType(), True)), StructField("StringCol", StringType())]
schema = StructType(fields)
test_rows = [[11, 1, 23, 10.0, 14.0, [1.0, 2.0], "r1"], [21, 2, 24, 12.0, 15.0, [2.0, 2.0], "r2"]]
rdd = spark.sparkContext.parallelize(test_rows)
df = spark.createDataFrame(rdd, schema)
df.write.format("tfrecords").option("recordType", "Example").save(path)
df = spark.read.format("tfrecords").option("recordType", "Example").load(path)
df.show()
from ecosystem.
Related Issues (20)
- About spark-tensorflow-distributor---I have the same problem as you. When I use a spark cluster with two or more nodes, the algorithm needs the same time as if I have only one node. I don't understand why. Please, someone answer this doubt
- Does this connector work with TF 2.x? HOT 4
- Error when deserializing tfrecord's in TF 2.x: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices
- Failed to build spark-tensorflow-connector_2.12 HOT 3
- Functionality to save the trained model HOT 4
- Spark TensorFlow Distributor: Spark custom resource scheduling - when and how?
- Get error when I run spark-tensorflow-connector example... HOT 1
- Cannot convert field to unsupported data type StructType([StructField("user_flush_num", ArrayType(IntegerType(), True)),StructField("field2", ArrayType(IntegerType(), True))]) HOT 2
- Task failed while writing rows HOT 2
- spark-tensorflow-distributor: RAM overflow when running ResNet152 HOT 1
- ByteType not supported, while IntegerType is too space-consuming
- Is it possible to use the spark tensorflow distributor with Parameter server strategy?
- How to make the “YOLOv4 object detection” or “YOLOv5 object detection” or “U-Net segmentation” under the Spark cluster?
- Build new Spark-Tensorflow-Connector for scala 2.12 HOT 5
- Failing to save the model pickle. and how to check for the predictions? HOT 1
- java.lang.NullPointerException when save as TFRecords
- How to avoid aggregate(shuffle) in processing the tfrecord file?
- mirrored_strategy_runner.py for multiple executors / workers
- Limitation on Spark "task_gpu_amount" cannot be less than 1
- why spark-tensorflow-connector can't support double and integer data type.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ecosystem.