Comments (8)
Looks like the array return type has a bug. I will fix it asap. I have a local fix working as follows:
var udf = Udf<string, string[]>((str) => new[] { str, str + str });
df.Select(Explode(udf(df["name"]))).Show();
The original table:
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
After exploding:
+--------------+
| col|
+--------------+
| Michael|
|MichaelMichael|
| Andy|
| AndyAndy|
| Justin|
| JustinJustin|
+--------------+
from spark.
few more related questions.
-
Can i have custom classes as return type
-
can return type be IEnumberable
from spark.
- Can i have custom classes as return type
No, at the moment. However, we'd like to understand the use case. Can you explain the scenario where you want this? (sample scenario with some snippets would be best)
- can return type be IEnumberable
From what I understand, you want to iterate over the result set? If so, have you considered using ToLocalIterator which returns an IEnumerable.
from spark.
- can return type be IEnumberable
From what I understand, you want to iterate over the result set? If so, have you considered using ToLocalIterator which returns an IEnumerable.
I think @guruvonline meant to have IEnumerable as a return type of UDF. Yes, this will be supported:
var udf = Udf<string, IEnumerable<string>>((str) => new[] { str, str + str });
from spark.
- Can i have custom classes as return type
No, at the moment. However, we'd like to understand the use case. Can you explain the scenario where you want this? (sample scenario with some snippets would be best)
I have added a new feature request with example scenario
from spark.
I also get this error, and the workaround don't work.
SparkSession spark = SparkSession
.Builder()
.AppName("RunExe")
.GetOrCreate();
spark.Udf().Register<string, string[]>("udf1", s=> new string[]{s, s+s});
spark.Udf().Register<string[], string>("udf2", g => g[0]);
DataFrame dt = xxxx;
dt.Select(CallUDF("udf1", dt.Col("value"))) //don't work
or dt.Select(Explode(CallUDF("udf1", dt.Col("value")))) //don't work
I always get the following error stack:
[JvmBridge] java.lang.IllegalArgumentException: Failed to convert the JSON string 'array<string>' to a data type.
at org.apache.spark.sql.types.DataType$$anonfun$nameToType$1.apply(DataType.scala:129)
at org.apache.spark.sql.types.DataType$$anonfun$nameToType$1.apply(DataType.scala:129)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.apache.spark.sql.types.DataType$.nameToType(DataType.scala:127)
at org.apache.spark.sql.types.DataType$.parseDataType(DataType.scala:144)
at org.apache.spark.sql.types.DataType$.fromJson(DataType.scala:113)
at org.apache.spark.sql.types.DataType.fromJson(DataType.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:162)
at org.apache.spark.api.dotnet.DotnetBackendHandler.handleBackendRequest(DotnetBackendHandler.scala:102
just look at the source code [DateType.scala], (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala) , i am totally new guy about Scala, and i can't understand why "array" go wrong switch-case.
def fromJson(json: String): DataType = parseDataType(parse(json)) // json = "array<string>"
private[sql] def parseDataType(json: JValue): DataType = json match {
// it fall into this case
case JString(name) =>
nameToType(name)
// supposed to be going here ?
case JSortedObject(
("containsNull", JBool(n)),
("elementType", t: JValue),
("type", JString("array"))) =>
ArrayType(parseDataType(t), n)
any workaround for me ? if possible, I can modify Microsoft.Spark locally to make it works.
from spark.
finally, i understand the Scala code, and i fix it now locally in Microsoft.Spark. will send PR later.
from spark.
@danny8002 there is already a PR for this: #114.
from spark.
Related Issues (20)
- Can't a .netcore program connect to a remote spark cluster? If so, what should I do? please help me
- Support for NotebookUtils
- [BUG]: Hive incompatibility when using microsoft-spark-3-1_2.12-2.1.1.jar HOT 1
- [FEATURE REQUEST]: Benchmark Spark.NET versus PySpark and SparkR
- [BUG]: HOT 1
- [FEATURE REQUEST]: .Net 6.0/7.0 Support HOT 24
- [FEATURE REQUEST]: Status of Project HOT 1
- [BUG]: When collected, long values are cast to int
- Question: How to use DataFrame API to achieve the function equivalent to map/reduce in spark.net
- support Apache Spark 3.4 HOT 4
- [BUG]: Failed to execute 'collectToPython' on 'org.apache.spark.sql.Dataset' with args=()
- [FEATURE REQUEST]: Spark version 3.1.3 is not supported by current dotnet on spark code. This is preventing Migration to HDI 5.0 which uses spark version 3.1.3 HOT 1
- Can we breathe life back into this project? HOT 23
- [BUG]: HOT 11
- I am facing the following issue: The system cannot find the path specified but my pyspark opens up. HOT 4
- [FEATURE REQUEST]: Replacement for BinaryFormatter HOT 1
- [FEATURE REQUEST]: Stop targeting .net standard (both 2.0 and 2.1)
- [FEATURE REQUEST]: .Net 8.0 support
- [FEATURE REQUEST]: Deprecate and/or evict Microsoft.Data.Analysis from the Microsoft.Spark assembly HOT 1
- [BUG]: [Spark.NET 3.5.1] Unable to get Charset 'cp65001' for property 'sun.stderr.encoding' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark.