Giter Site home page Giter Site logo

中文/英文文档 about seatunnel HOT 13 CLOSED

apache avatar apache commented on August 15, 2024
中文/英文文档

from seatunnel.

Comments (13)

garyelephant avatar garyelephant commented on August 15, 2024 3

A quick Example:

无需任何代码、编译、打包,比官方的Quick Example更简单

配置Waterdrop:

spark {
  # Waterdrop defined streaming batch duration in seconds
  spark.streaming.batchDuration = 5

  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.master = "local[2]"
  spark.app.name = "Waterdrop-1"
  spark.ui.port = 13000
}

input {
  socket {}
}
filter {
}

output {
  stdout {}
}

启动netcat server用于发送数据:

nc -l -p 9999

for windows:
nc64 -l -p 9999

启动Waterdrop 接收程序:
sbt "-Dconfig.path=C:\Users\Administrator\Desktop\softwares\waterdrop\config\ConfigExample.conf" "run-main org.interestinglab.waterdrop.WaterdropMain"

在nc端输入:

Hello World

Waterdrop日志打印出:

+-----------+
|raw_message|
+-----------+
|Hello World|
+-----------+

参考:

https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024 1

插件对应文档制作流程:
(1)假设你的插件叫Drop,是filter插件,请在 docs/zh-cn/configuration/filter-plugins下面创建Drop.docs
(2)根据docs语法规则书写插件文档
(3)执行 PluginDocCommand生成插件文档对应markdown文档
(4)在docs/zh-cn/configuration/_sidebar.md中新增对应链接。
(5)如果想在本地查看生成的文档是否正确,请先安装docsify,然后cd docs, ./start-doc.sh, 访问localhost:3000查看。
(6)git中提交所有变更,merge到master分支后,在线上可以看到文档。
插件对应文档存放位置:
docs/zh-cn/configuration/input-plugins
docs/zh-cn/configuration/filter-plugins
docs/zh-cn/configuration/output-plugins
对应markdown生成方法举例:
sbt "run-main org.interestinglab.waterdrop.docutils.PluginDocCommand /Users/yixia/IdeaProjects/waterdrop/docs/zh-cn/configuration/filter-plugins/Drop.docs true"

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

根据plugin javadoc生成markdown文档。

指定plugin doc规则, 获取javadoc, 解析javadoc, 生成markdown

https://tomassetti.me/extracting-javadoc-documentation-source-files-using-javaparser/

javaparser/javaparser#325

https://dzone.com/articles/extracting-javadoc-documentation-from-source-files

https://github.com/antlr/grammars-v4/tree/master/javadoc

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

文档内容:

插件开发指导

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

Document 增加内部原理的介绍

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024
  • 【文档】input spark原生支持:s3, Kinesis

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

核心数据结构:Event

功能:

特性:

相关概念:field, value,field references

特殊field: raw_message, "root"

实现:SparkSQL Row

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

表达清楚,除了文档中所列filter插件可以用,所有的Spark UDF也可以在SQL中作为filter使用,能做的事很多!

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

Waterdrop 与Spark, Logstash 等做对比

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

描述性能的章节,主要内容是:(1)spark的性能 (2)我们利用的spark的优化点 (3)Waterdrop的性能。

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

一个配置示例:fake -> split -> stdout, mysql

# fake -> split -> stdout, mysql

spark {
  # Waterdrop defined streaming batch duration in seconds
  spark.streaming.batchDuration = 5

  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  spark.master = "local[2]"
  spark.app.name = "Waterdrop-1"
  spark.ui.port = 13000

//  spark.executor.instances = 60
//  spark.executor.cores = 2
//  spark.executor.memory = "4g"
//  spark.streaming.blockInterval= "1000ms"
//  spark.streaming.kafka.maxRatePerPartition = 30000
//  spark.streaming.kafka.maxRetries = 2
//  spark.driver.extraJavaOptions = "-Dconfig.file=/data/slot6/waterdrop/application.conf"
}

input {
  fake {
    rate = 1
  }
}
filter {
  split {
    fields = ["name", "age"]
    delimiter = ","
//    target_field = "wrapped"
  }
}

output {
  stdout {}

  mysql {
    url = "jdbc:mysql://localhost:3306/data"
    user = "root"
    password = "123456"
    table = "sample_data_table"
  }
//  textfile {
//    save_mode = "ignore"
//    serializer = "orc"
//    path = "file:///Users/yixia/work/waterdrop-data3"
//  }
}

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

对于 socket的示例,需要waterdrop与官网的socket示例做鲜明的对比

from seatunnel.

garyelephant avatar garyelephant commented on August 15, 2024

grok插件测试地址:https://grokdebug.herokuapp.com/

from seatunnel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.