Comments (6)
Hmm, swappable jar?
I'm super open to add any neccessary compat layers; shoot a PR and we'll get it in if you have any nice ideas!
from frameless.
13.3 LTS backported the 3.5 isNativeType change as well so that's reflected in the title. (I was mistaken, 0.16 spark33 builds work fine)
fyi - I've created shim to handle the abstraction barring isNative the approach seems workable. I'll start on frameless shims in the next days. lots to do there.
from frameless.
fyi - With the 1st shim snapshot push, compiling against 3.5.oss works when running against 14.3.dbr, only StaticInvoke needed doing. (so the same frameless jar can be run against both 14.0/14.1 and 14.2/14.3 by swapping the shim to the right dbr version. or indeed users stay with the oss version as per a normal dependency)
The code for StaticInvoke handling and shims etc. is branch here and diff here
I'll target the major pain points impacted in each OSS major/minor release next (i.e. TypedEncoder, TypedExpressionEncoder and RecordEncoder) to have each internal api usage pulled out (e.g. [Un]WrapOption, Invoke, NewInstance, GetStructField, ifisnull, GetColumnByOrdinal, MapObjects and probably TypedExpressionEncoder itself). It's probably worth doing them in advance of any pull request.
What I'll attempt with this is to see how much of the encoding logic can be re-used from the current frameless codebase and targetted major versions on older dbrs (e.g. can we get a 3.5 oss frameless jar running on a 3.1.2 Databricks runtime)
If you'd like me to add FramelessInternals.objectTypeFor, ScalaReflection.dataTypeFor etc. as well I think that'd make sense but Reflection had been fairly stable code before they ripped it out :)
from frameless.
@pomadchin -
So at time of writing, building the current 0.16 based fork branch (rev 7944fe9 is pre-reformatting) against the 3.5 correct shim_runtime version and testing the encoding functionality (used by Quality tests built against 0.16 frameless with 3.1.3 oss base) with the shim_runtime for 9.1.dbr works despite the very different impl.
I'd not want to advertise that it's possible to jump versions so much (there are other issues like kmeans and join interface changes of course) but it proves the approach works at least and may ease 4.x support.
Pre-reformatting functional change diff is here. Key mima change is removal of frameless.MapGroups, it could of course be kept and just forwarding to a forward if needed.
from frameless.
per b880261, #803 and #804 are confirmed as working on all LTS versions of Databricks, Spark 4 and the latest 15.0 runtime - test combinations are documented here
from frameless.
A number of test issues appear when running on a cluster, these do not appear on a single node server (e.g. github runners, dev box or even Databricks Community Edition).
- all double generated values used in tests
- the OrderByTest "derives a CatalystOrdered for case classes when all fields are comparable"
doubles lose precision on serialisation, e.g.:
stddev_samp *** FAILED *** (19 seconds, 196 milliseconds)
GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
(AggregateFunctionsTests.scala:591)
Falsified after 5 successful property evaluations.
Location: (AggregateFunctionsTests.scala:591)
Occurred when passed generated values (
arg0 = List("X2(1,-2147483648)", "X2(1,654883454)", "X2(-1,-2147483648)", "X2(1,0)") // 4 shrinks
)
Label of failing property:
Expected Map(1 -> Some(1.4659365454162877E9), -1 -> None) but got Map(1 -> Some(1.4659365454162874E9), -1 -> None)
the very last digit didn't match, as such all double gens have to be serializable, the same occurs for BigDecimals on other tests (like AggregateFunctionsTest first/last) but this is likely due to lack of the package arbitraries being correct in the testless shade (they are correct when used via TestlessSingle in the ide).
for the order by:
import frameless.{X2, X3}
import spark.implicits._
val v = Vector(X3(-1,false,X2(586394193,6313416569807298536L)), X3(2147483647,false,X2(1,-1L)), X3(729528245,false,X2(1,-1L)))
v.toDS.orderBy("c").collect().toVector
the error that can occur is:
derives a CatalystOrdered for case classes when all fields are comparable *** FAILED *** (11 seconds, 784 milliseconds)
GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
(OrderByTests.scala:177)
Falsified after 5 successful property evaluations.
Location: (OrderByTests.scala:177)
Occurred when passed generated values (
arg0 = Vector(X3(-1,false,X2(586394193,6313416569807298536)), X3(2147483647,false,X2(1,-1)), X3(729528245,false,X2(1,-1))) // 2 shrinks
)
Label of failing property:
Expected Vector(X3(729528245,false,X2(1,-1)), X3(2147483647,false,X2(1,-1)), X3(-1,false,X2(586394193,6313416569807298536))) but got Vector(X3(2147483647,false,X2(1,-1)), X3(729528245,false,X2(1,-1)), X3(-1,false,X2(586394193,6313416569807298536)))
testless.org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException:
i.e. (1,-1) can be in any order and both are acceptable results. The test needs to be re-written to account for this to just compare c's.
from frameless.
Related Issues (20)
- CI release failure HOT 7
- How should parse and convert data from an external medium in a generic way? HOT 2
- Frameless 0.13 release HOT 2
- spark 3.4 support - replacing dataTypeFor logic HOT 8
- 3.4 AgnosticEncoder support - Spark Connect HOT 1
- [feature] DatasetT HOT 1
- AVG and KMeans tests fix HOT 1
- Add scalafmt HOT 1
- Add support for TypedDeltaTable
- use HOT 1
- Iterate over TypedColumns with evidence
- Spark 3.5 update HOT 10
- type inference for .opt no longer works without explicit type argument in Scala 2.13.x HOT 3
- Defective schema generation on array/seq column HOT 5
- scalafmt was not maintained for some of the code? HOT 2
- Add TypedEncoder for shapeless Record. HOT 3
- UDF fails when subexpression elimination is used in interpreted mode HOT 1
- Encoder derivation for collection incompatible implementations for interpreted serde e.g. Seq instead of Vector HOT 1
- Spark 4 snapshot test failures due to sql.ansi handling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frameless.