apache / incubator-fury Goto Github PK
View Code? Open in Web Editor NEWA blazingly fast multi-language serialization framework powered by JIT and zero-copy.
Home Page: https://fury.apache.org/
License: Apache License 2.0
A blazingly fast multi-language serialization framework powered by JIT and zero-copy.
Home Page: https://fury.apache.org/
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
Fury is in rapid development, and serialization is used commonly.
we need a way to remind the users which api are stable and which is expected to change.
Describe the solution you'd like
add java api annotation to mark api stability:
@Public
is stable@Internal
is subject to changeDescribe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Support extract captured variables in lambda, we can use this feature in codegen to extract dependent expressions when split big methods into small methods.
Describe the solution you'd like
When lambda is Serializable, we can use java.lang.invoke.SerializedLambda#getCapturedArg
to extract captured variables
Common IR: ValueExpression、ListExpression、Literal、Reference、Empty、Block、FieldValue、SetField、Cast、Invoke、StaticInvoke、NewInstance、NewArray、AssignArrayElem、If、IsNull、Not、Comparator、Arithmetic、Add、Subtract、ForEach、ZipForEach、ForLoop、ListFromIterable、Return
Is your feature request related to a problem? Please describe.
For class with nested generics such as:
class Foo {
List<Integer> intLists;
Map<String, List<Long>> map;
}
If we push Integer
type to ListSerializer
and String, List<Long>
to MapSerializer
, then ListSerializer
will know every element is an Integer
, there will be no need to query element serializer and write element type every time serializing those elements, thus much space/time efficient.
MapSerializer
can use same mechanism. Also when serializing List<Long>
value, MapSerializer
can push Long
to ListSerializer
, which make nested list serialization more efficient too.
Java generics is erasured at runtime,List
type won't have element type. We need a way to push and propagate those erasured generics along the serialization.
Describe the solution you'd like
TypeToken
to extract genericsGenerics
to record generics hierarchy and current genericsGenericType
to tracking children generics and binding serializer to reduce map loopup costAdditional context
#70
Is your feature request related to a problem? Please describe.
Serialization context will be used to add some context-related information, so that the serializers can set up relation between serializing different objects. The context will be reset after finished serializing/deserializing the object tree.
Additional context
#70
Is your feature request related to a problem? Please describe.
When serializing multiple objects of same type, classname will be written to buffer multiple times. There should be a way to write classname only once, and in later classname writing, an id should be written.
Such classname are enumable string, there should be an abstraction to write such string only once.
Additional context
#70
Is your feature request related to a problem? Please describe.
Implement java serialization framework for fury. JIT serialization are not contained in this issue.
Describe the solution you'd like
Serialization framework includes following classes:
Is your feature request related to a problem? Please describe.
Getting field value by reflection is slow, using unsafe sun.misc.Unsafe#getXXX(java.lang.Object, long)
is much faster
Is your feature request related to a problem? Please describe.
Add tool for traverse expression tree and update node by specified action at expression site
Additional context
#32
Is your feature request related to a problem? Please describe.
Serialziation will have many hash loopup:
We need a very fast map implementation to avoid map lookup become bottleneck
Describe the solution you'd like
Use linear probing and fib rehash
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Serialization will contain much memory read/write, a convinient and highly-efficient util is necessary:
Describe the solution you'd like
Using sun.misc.Unsafe
for efficient memory operations, combine off-heap/heap memory together to avoid viritual methods call cost.
If heap buffer is null, Unsafe
will locate to off-heap memory offset, otherwise locate to heap memory address.
Describe alternatives you've considered
Make memory buffer as an interface and off-heap/heap buffer as implementation is feasible, but will incur viritual methods call which is unaccepable for such perf-critical scene.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Due to strong encapsulation in JDK17 is enabled by default, we can't get String zero-copy constructor without some hacks, the deserialization of string in JDK17 will have an extra copy when creating String object.
Additional context
#90
Is your feature request related to a problem? Please describe.
Add java ci support
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Add type id consisitent between languages
Describe the solution you'd like
Based on arrow type id: arrow/type_fwd.h
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Implement bigint/decimal serialization for java
Is your feature request related to a problem? Please describe.
#97 implements enum serialization by writing enum ordinal, this is fast. But when enum constants are reordered, deserialization will get wrong value.
Describe the solution you'd like
Support serialization by enum string, but in a configurable way.
By default, serialization enum using ordinal. But can be configured to using enum string for serialization,
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Is your feature request related to a problem? Please describe.
For class forward/backward compatibility, fury needs to send class meta to peer everytime, which is time-consuming, and consume more bandwidth.
Describe the solution you'd like
If the serialization sender and receiver are serialized serially in a certain context (TCP connection), then some metadata (class name, field name, final field type information, etc.) can be shared between multiple requests in that context. These type information will be sent to the other end during the first serialization in that context. This way, the other end can rebuild the same deserializer based on the type information, so that it can still deserialize correctly when the fields on the serialization and deserialization sides are inconsistent. At the same time, unnecessary metadata serialization overhead can be reduced in subsequent serialization.
Additional context
#197
Is your feature request related to a problem? Please describe.
Add class registry for fast class information read/write resolving
Additional context
#70
Is your feature request related to a problem? Please describe.
JDK java.util.WeakHashMap
only support one key as weak key, but sometimes we may need a weak map with key is an array of multiple weak items. In such cases, creating a temporary weak key and putting it into WeakHashMap is not feasible, because the temporary key is not strongly-referenced. We need a new weak map which support multi-key weak key natively.
Is your feature request related to a problem? Please describe.
Serialization has frequent memory operations, efficient memory access is necessary for performance, JDK unsafe is an efficient util for this case
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Fuy will print some logs for diagnostics, although it's useful, but sometimes may be annoying. We should support disabling logging.
Describe the solution you'd like
When logging disabled, switch to org.slf4j.helpers.NOPLogger#NOP_LOGGER
Describe alternatives you've considered
configure log4j2.xml/log4j2.properties for io.fury
package
Additional context
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
String code = new Expression.If(
ExpressionUtils.eq(Expression.Literal.ofInt(1), new Expression.Reference("classId", PRIMITIVE_SHORT_TYPE, false)),
new Expression.Return(Expression.Literal.True),
new Expression.Return(Expression.Literal.False)).genCode(new CodegenContext()).code();
Screenshots
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
Describe the bug
Java license auto format has no blank line before package declation, which conflict with checkstyle plugin
To Reproduce
mvn -T10 clean license:format
mvn -T10 clean checkstyle:check
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
Setup basic java code structure:
Is your feature request related to a problem? Please describe.
String is very common in serialization, but due to its variable length and mutiple encoding, string serialization is pretty slow, sometimes is becomes the bottle of whole serialization. We need a way for fast string serialization.
The bottle mainly consists of:
char[]
/ byte[]
outside for serialization.char[]
/ byte[]
into ascii
/unicode16
/utf8
ascii
/unicode16
char[]
/byte[]
char[]
/byte[]
for immutability.Describe the solution you'd like
sun.misc.Unsafe
for extract inner char[]
/ byte[]
ascii
/unicode16
/utf8
to minimize encoding costjava.lang.invoke.MethodHandle
to avoid invoke package-level zero0-copy constructor with minimal costIs your feature request related to a problem? Please describe.
There should be a way to guide users to get involved
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
The java jit codegen needs to generate java code string, which need some string utils such as format/stripBlankLines/capitalize/uncapitalize/isBlank
.
Describe the solution you'd like
Copy capitalize/uncapitalize/isBlank
from common-lang, implement others.
Describe alternatives you've considered
Add common-lang
is OK, but will introduce an dependency which we try to avoid since serialziation is so commonly used.
Additional context
Is your feature request related to a problem? Please describe.
Fury Java JIT will genereate byte codes for generated serializer class, which will be loaded as an class in a new or existed classloader.
Class define and loading should ensure it won't create too much new classloaders, and new classes are eligiable to gc, and doesn't pollute exsting classloaders.
Additional context
Is your feature request related to a problem? Please describe.
Support zero-copy to avoid large buffer serialization cost
Describe the solution you'd like
Python pickle5 out-of-band serialization is zero-copied, fury can implement similar protocol, but in a cross-language way.
Is your feature request related to a problem? Please describe.
add java serializer interface, a new inter type support will only need to implement the serializer for that type.
Describe the solution you'd like
public abstract class Serializer<T> {
public void write(MemoryBuffer buffer, T value) {
throw new UnsupportedOperationException();
}
public T read(MemoryBuffer buffer) {
throw new UnsupportedOperationException();
}
public void crossLanguageWrite(MemoryBuffer buffer, T value) {
throw new UnsupportedOperationException();
}
public T crossLanguageRead(MemoryBuffer buffer) {
throw new UnsupportedOperationException();
}
}
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
#70
Is your feature request related to a problem? Please describe.
add support for java enum serialization:
Is your feature request related to a problem? Please describe.
Fury jit will generate java code based expression tree, we need a way to compile java code into bytecode.
Describe the solution you'd like
We can use janino compiler to compile java code into bytecode since it's faster than jdk compiler.
Describe alternatives you've considered
javax.tools.JavaCompiler
is also feasible, but too slow and generated classfile only.
Additional context
Janino compiler doesn't support generics, the generated code shouldn't contains generics.
Is your feature request related to a problem? Please describe.
Map with long type key using java.util.HashMap
will incur boxing cost, a new map implementation is needed.
Describe the solution you'd like
Implement a new map with long[]
key array and Object[]
value array. Using linear probing and Fibonacci hashing.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
#42
Is your feature request related to a problem? Please describe.
Java object graph may have shared or circular reference between each other.
Serialization should support tracking such reference to avoid writing duplciate data or recursion error.
At the same time, reference tracking will need a map to track ref, which is pretty slow, althogh we can use optimized map in io.fury.collection
. So there should be an option to disable ref tracking.
Describe the solution you'd like
ReferenceResolver is an abstract interface. MapReferenceResolver tracking reference by map, NoReferenceResolver just ignore reference.
Describe alternatives you've considered
Binding a reference resolver for every type, i.e. implement a hierarchical resolver may have better performance at some cases?
Additional context
#70
Is your feature request related to a problem? Please describe.
Java lacks of tuple support, which is common in other languages such as cpp/python/golang, and is useful as an common data structure for use by users and by fury itself.
Describe the solution you'd like
Add tuple2/tuple3 support for now, other tuple classes can be added later.
Describe alternatives you've considered
Additional context
Is your feature request related to a problem? Please describe.
Add StringBuilder/StringBuffer serializer
Describe the solution you'd like
Convert StringBuilder/StringBuffer to String, then serializing it using StringSerializer.
Describe alternatives you've considered
Convert to/from String may have some cost, bettern solution is tackle inner data structure of StringBuilder/StringBuffer directly.
But StringBuilder/StringBuffer serialization is not common, we can using the conversion first, then optimize later if truely needed.
Additional context
#89
Is your feature request related to a problem? Please describe.
When there is a buffer which can be zero-copy serialized, Buffer callback should be invoked to handle this buffer.
If buffer callback returns false, the given buffer is out-of-band, thus zero-copied.
Additional context
#85
Is your feature request related to a problem? Please describe.
Binary protocol bug is hard to debug, when there is a bug in implementation, crash will happen sometimes. A detailed debugging doc is necessary for trouble shotting.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Java is a strong-typed language, class fields have types and generics. By using those type info, serialization performance and size can be improved notably.
Type inferrence performance is critical, since first serialziation will infer object fields type info. If inferrence is slow, there may be burr when serving requests, which is unacceptable.
Describe the solution you'd like
TypeToken
@Ignore
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
#29
Is your feature request related to a problem? Please describe.
Java ArrayList
is slower:
We should implement a faster auto-growing object array.
Describe the solution you'd like
ObjectArray
which hold Object[]
array inernally.System.arraycopy
from an null elemente array for clearDescribe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
MurmurHash3 in guava is too slow for perf-critical scene, we need a new implementation
Describe the solution you'd like
Implementation in https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java is pretty fast for our use
Describe alternatives you've considered
Additional context
Is your feature request related to a problem? Please describe.
Readme has some syntax and not readable
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
JDK ArrayList<Integer>
has boxing overhead, which is unacceptable for perf critical serialization scene. An auto-growing IntArray is needed in such cases.
Describe the solution you'd like
Implement an auto-growing IntArray which hold a int[]
internally.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
Serialization will use reflections frequently in codegen or serialization, a reflection utils will be convinient for code reuse
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.