Giter Site home page Giter Site logo

flink-extended / clink Goto Github PK

View Code? Open in Web Editor NEW
27.0 8.0 12.0 3.26 MB

Clink is a library that provides APIs and infrastructure to facilitate the development of parallelizable feature engineering operators that can be used in both C++ and Java runtime.

C++ 52.99% Java 31.71% Starlark 12.64% MLIR 0.71% Shell 1.96%
machine-learning feature-engineering online-learning flink-streaming

clink's Introduction

Clink

Clink is a library that provides infrastructure to do the following:

  • Defines C++ functions that can be parallelized by TFRT thread pool.
  • Executes a graph (in the MLIR format) of these C++ functions in parallel.
  • Makes C++ functions executable as Java functions using JNA.

Furthermore, Clink provides an off-the-shelf library of reusable Feature Processing functions that can be executed as Java and C++ functions.

Clink is useful in the scenario where users want to do online feature processing with low latency (in sub-millisecond) in C++, apply the same logic to do offline feature processing in Java, and implement this logic only once (in C++).

Getting Started

Prerequisites

Clink uses TFRT as the underlying execution engine and therefore follows TFRT's Operation System and installation requirements.

Currently supported operating systems are as follows:

  • Ubuntu 16.04
  • CentOS 7.7.1908

Here are the prerequisites to build and install Clink:

  • Bazel 4.0.0
  • Clang 11.1.0
  • libstdc++8 or greater
  • openjdk-8

Clink provides dockerfiles and pre-built docker images that satisfy the installation requirements listed above. You can use one of the following commands to build the docker image, according to the operating system you expect to use.

$ docker build -t ubuntu:16.04_clink -f docker/Dockerfile_ubuntu_1604 .
$ docker build -t centos:centos7.7.1908_clink -f docker/Dockerfile_centos_77 .

Or you can use one of the following commands to pull the pre-built Docker image from Docker Hub.

$ docker pull docker.io/flinkextended/clink:ubuntu16.04
$ docker pull docker.io/flinkextended/clink:centos7.7.1908

If you plan to set up the Clink environment without the docker images provided above, please check the TFRT README for more detailed instructions to install, configure and verify Bazel, Clang, and libstdc++8.

Initializing Submodules before building Clink from Source

After setting up the environment according to the instructions above and pulling Clink repository, please use the following command to initialize submodules like TFRT before building any Clink target from source.

$ git submodule update --init --recursive

Executing Examples

Users can execute Clink C++ function example in parallel in C++ using one of the following commands.

$ bazel run //:executor -- `pwd`/mlir_test/executor/basic.mlir --work_queue_type=mstd --host_allocator_type=malloc

Developer Guidelines

Running All Tests

Developers can run the following command to build all targets and to run all tests.

$ bazel test $(bazel query //...) -c dbg

Code Formatting

Changes to Clink C++ code should conform to Google C++ Style Guide.

Clink uses ClangFormat to check C++ code, diffplug/spotless to check java code, and Buildifier to check bazel code.

Please run the following command to format codes before uploading PRs for review.

$ ./tools/format-code.sh

View & Edit Java Code with IDE

Clink provides maven configuration that allows users to view or edit java code with IDEs like IntelliJ IDEA. Before IDEs can correctly compile java project, users need to run the following commands after setting up Clink repo and build Clink.

$ bazel build //:clink_java_proto
$ cp bazel-bin/libclink_proto-speed.jar java-lib/lib/

Then users can open java-lib directory with their IDEs.

clink's People

Contributors

lindong28 avatar yunfengzhou-hub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clink's Issues

Add automatic check for memory leak in unit tests

Clink needs to check memory leak in its operators' unit tests as C++ does not provide GC.

One possible method is to reuse TFRT's infrastructure. For example LeakCheckAllocator would exit(1) if it finds memory chunks that are not freed. RCReference would also report error if it is freed when there is still reference to it. These infrastructures might be adopted by Clink as well.

Pre-build TFRT in docker images

Currently every PR tests spend ~30 minutes building TFRT before running any tests. We could significantly speedup PR tests and safe CPU resource by pre-building TFRT in docker images.

Avoid passing model data through memory

Current Clink operators, like OneHotEncoderModel, provides methods like setModelData that allows passing model data entirely through memory. This could cause resource problems when the scale of model data is large.

In order to avoid this problem, Clink needs to remove such methods and change all the methods' usages to alternatives. In this case Clink C++ operators can only load model data through filesystem.

Change ClangFormat style to Google

Clink has been using -style=llvm when using clang-format to check C++ code style. This style needs to be changed to google so as to better match the coding style of TFRT.

Add document for API usage and examples

Clink needs more documentation and examples about the usage of Clink APIs so as to improve Clink's readability and usability. It is equally important to setup conventions for the whole documentation system for Clink's infrastructure and operators.

Possible implementation details of this issue could be to

  • Add a docs/ folder containing markdown files of user guides
  • Add various Clink programs as examples
  • Add more detailed description directly to APIs, like Flink has done to org.apache.flink.table.api.Table.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.