Comments (6)
@Ragavenderan: Thanks for raising this! Can you give us a sample Scala method and how you intend to use it within Spark?
from spark.
Currently we have a jar file that contains the function definitions.
From pyspark - this is how we call the method -
from pyspark.sql import SparkSession, DataFrame
def load_curated_email(spark, category, startdate="", enddate="", location=""):
df = spark._jvm.com.microsoft.odinml.Extractor.loadCuratedEmail(spark._jsparkSession, category, startdate, enddate, location)
return DataFrame(df, spark)
from spark.
The proper way to do this to introduce new datasource such that you can do:
var df = spark.Read().Format("your_format_here").Option("startdate","something").Load();
Have you considered this option?
Otherwise, we don't have a plan to support calling Scala methods from .NET other than RegisterJava
for UDF here.
from spark.
Hi @imback82
I assume the UDF can only be used in spark sql, right?
The scenario Raga shared is to call a scala method to return a DataFrame as the python code shows, since we don't want to duplicate the logic in C#, is there any way to achieve this goal?
from spark.
@garyyang2002: Yes, that's correct.
I'm afraid what you want is not possible by any simple means and is a use case we cannot support immediately. While you can call into regular Java functions through SparkSQL, the use case described here is to call a wrapper function that then invokes spark.read.format().load() which will return a Dataframe. This is a bit unconventional and is not the recommended way – that brings me to the next question – can you share some details regarding the loadCuratedEmail()? How complex would it be to write this one function in .NET?
from spark.
The workaround is to use https://github.com/aelij/IgnoresAccessChecksToGenerator to access some internal classes (but this is not recommended since internal classes can break APIs).
from spark.
Related Issues (20)
- Can't a .netcore program connect to a remote spark cluster? If so, what should I do? please help me
- Support for NotebookUtils
- [BUG]: Hive incompatibility when using microsoft-spark-3-1_2.12-2.1.1.jar HOT 1
- [FEATURE REQUEST]: Benchmark Spark.NET versus PySpark and SparkR
- [BUG]: HOT 1
- [FEATURE REQUEST]: .Net 6.0/7.0 Support HOT 24
- [FEATURE REQUEST]: Status of Project HOT 1
- [BUG]: When collected, long values are cast to int
- Question: How to use DataFrame API to achieve the function equivalent to map/reduce in spark.net
- support Apache Spark 3.4 HOT 4
- [BUG]: Failed to execute 'collectToPython' on 'org.apache.spark.sql.Dataset' with args=()
- [FEATURE REQUEST]: Spark version 3.1.3 is not supported by current dotnet on spark code. This is preventing Migration to HDI 5.0 which uses spark version 3.1.3 HOT 1
- Can we breathe life back into this project? HOT 23
- [BUG]: HOT 11
- I am facing the following issue: The system cannot find the path specified but my pyspark opens up. HOT 4
- [FEATURE REQUEST]: Replacement for BinaryFormatter HOT 1
- [FEATURE REQUEST]: Stop targeting .net standard (both 2.0 and 2.1)
- [FEATURE REQUEST]: .Net 8.0 support
- [FEATURE REQUEST]: Deprecate and/or evict Microsoft.Data.Analysis from the Microsoft.Spark assembly HOT 1
- [BUG]: [Spark.NET 3.5.1] Unable to get Charset 'cp65001' for property 'sun.stderr.encoding' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark.