Giter Site home page Giter Site logo

webankfintech / exchangis Goto Github PK

View Code? Open in Web Editor NEW
419.0 29.0 199.0 42.18 MB

Exchangis is a lightweight,highly extensible data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources

Home Page: https://github.com/WeBankFinTech/Exchangis.git

License: Apache License 2.0

Java 78.26% Shell 0.58% Python 0.96% JavaScript 1.64% CSS 0.38% HTML 2.24% Dockerfile 0.01% Scala 3.57% Less 0.04% Vue 12.32%
exchangis datax transmission-engine sqoop wedatasphere dataspherestudio linkis etl flink

exchangis's People

Contributors

15100399015 avatar 393562632 avatar alexkun avatar alexzywu avatar binbincheng avatar bleachzk avatar boliza avatar chaogefeng avatar davidhua1996 avatar det101 avatar dlimeng avatar finaltarget avatar hch666888 avatar jefftlin avatar kiduncle avatar liuyou2 avatar luban08 avatar mingfengwang avatar nutsjian avatar one-beauty avatar peacewong avatar photon8231 avatar ryanqin01 avatar schumiyi avatar starchouzz avatar tomshy1 avatar wushengyeyouya avatar xj2jx avatar yuxin-no1 avatar zwx-master avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exchangis's Issues

Dev 1.0.0 project list

project list front-end development tasks.

  1. Search bar
    Support fuzzy search by project name. For related interfaces, please check the interface documentation.

  2. List of items
    Show all items in the form of cards.
    The main contents of the card are: project icon, project name, management button, description and label.
    The created project card defaults to the first card, which is displayed in paging according to the actual resolution of the browser.

  3. Project management function
    When the management button is clicked, it switches to an edit button and a delete button.

  4. Project creation and editing

When creating a new

Project name: required, only Chinese, English, underscore and numbers, special characters are not supported.
Business tag: Optional, only Chinese, English and numbers are allowed. For specific interaction, please refer to the new business tag interaction when DSS creates a project. Please note: If the project is created on the Exchangis side when the project is created on the DSS side, the project will be labeled as "DSS Project" by default.
Edit permission/view permission/execution permission: The user list is returned from the background, allowing multiple selections. For details, please refer to the permission setting interaction when creating a project in DSS.
Project description: no more than 300 words.

When editing

If it is a DSS project, it is not allowed to modify any attributes.
For non-DSS projects, the project name cannot be modified, and others can be modified.

  1. Deletion of the project
    When deleting a project, you need to request the background first. If the project has been published to the scheduling system and there are batch scheduling tasks, it is not allowed to be deleted; if it is a DSS project, it is not allowed to be deleted.
    Otherwise, a dialog box pops up whether to confirm the deletion of the item. After the user confirms, call the background interface to delete.

执行器运行 datax 获取 hive 元数据异常

日志

2020-05-26 12:03:18.235 [job-1476524239652786176] INFO  StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All task WaitWriterTime 0.000s |  All task WaitReaderTime 0.000s | Percentage 0.00%
2020-05-26 12:03:18.345 [job-1476524239652786176] ERROR Engine - 

该任务最可能的错误原因是:

com.alibaba.datax.common.exception.DataXException: Code:[Framework-02], Description:
[DataX引擎运行过程出错,具体原因请参看DataX运行结束时的错误诊断信息  .].  - 
java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Lorg/apache/hadoop/con
f/Configuration;Lorg/apache/hadoop/hive/metastore/HiveMetaHookLoader;Ljava/util/concurrent
/ConcurrentHashMap;Ljava/lang/String;Z)Lorg/apache/hadoop/hive/metastore/IMetaStoreClien;
	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:4299)
	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4367)
	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4347)
	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4603)
	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:291)
	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:274)
	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:435)
	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:375)
	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:355)
	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:331)
	at com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriterUtil.lambda$getHiveConnByUris$5(HdfsWriterUtil.java:935)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriterUtil.getHiveConnByUris(HdfsWriterUtil.java:934)
	at com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriterUtil.updateConfigByHiveMeta(HdfsWriterUtil.java:218)
	at com.alibaba.datax.plugin.writer.hdfswriter.HdfsWriter$Job.prepare(HdfsWriter.java:187)
	at com.alibaba.datax.core.job.JobContainer.lambda$prepareJobWriter$3(JobContainer.java:846)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at com.alibaba.datax.core.job.JobContainer.prepareJobWriter(JobContainer.java:841)
	at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:367)
	at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:132)
	at com.alibaba.datax.core.Engine.start(Engine.java:98)
	at com.alibaba.datax.core.Engine.entry(Engine.java:177)
	at com.alibaba.datax.core.Engine.main(Engine.java:242)

求助!
是缺少 hive-metastore 这个包吗?

[Feature] Dev 1.0.0 project management

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Problem Description
Project management background tasks

Description
1.project management
project addition, deletion, modification and investigation

Use case
No response

solutions
No response

Anything else
No response

  • Are you willing to submit a PR?

Yes I am willing to submit a PR!

Retry mechanism

By observing the code, it is found that there is no failure retry mechanism for the scheduling of dataX tasks at present.
The process I understand is:

  1. Create a task and bind the execution node.
    2.service runs the task, calling the run method based on the bound execution node, and making random requests to the bound executor by using feign's load balancing strategy.
  2. The executor pulls up the task by himself, manages the task he is executing, and reports the task execution status to the service through the heartbeat interface and the status reporting interface.

so
In one case, if the task fails, only the failure status will be reported. Will not fail to retry at the current node. Service will not retry on other execution nodes.
In the other case, if the current execution node goes down, the task status will not be updated (it is stuck in the last state) until the down node recovers.

Due to the lack of document description, the above is the result of my code combing. I don't know if the situation I described is true. I hope the owner can reply and confirm it at your convenience.
Thanks!

通过观察代码,发现目前关于dataX任务的调度,并没有失败重试机制。
我理解的流程是:
1.创建任务,绑定执行节点。
2.service运行任务,基于绑定的执行节点,调用run方法,利用feign的负载均衡策略,随机请求到绑定的executor。
3.由executor自行拉起task,并管理自身执行中的task,通过心跳接口和状态上报接口,向service上报任务执行状态。

所以
一种情况:如果任务失败了,只是会上报失败状态。不会在当前节点失败重试。也不会由service在其他的执行节点上进行重试。
另一种情况:如果当前执行节点宕机,那么任务状态将不会更新(表现为卡住在最后一个状态),直至宕机节点恢复。

由于文档描述较少,以上是我通过代码梳理的结果,不知道我描述的情况是否属实,方便时,希望owner能回复并给予确认。
感谢!

Exchangis configured mysql>hive,Is there support for writing data to hive?

image

  • 1: I have configured the mysql datasource and the hive datasource on exchageis;
    -2: When the job is executed, the data results are written to HDFS in the form of ".gz" compression

question:
Is there support for writing data to hive?

For example:$hive> LOAD DATA LOCAL INPATH '/home/hadoop/tmp/user/hive/warehouse/hivedata.db/pub_info/exchangis_hive_w__03458f02_e1a8_499f_b4c3_301ba473b1d3.gz' OVERWRITE INTO TABLE pub_info;


-1: mysql data source and hive data source are configured on exchageis.
-2: When the job is executed, the data results are written on hdfs and compressed in. gz mode.

Excuse me: Is it possible to directly or provide an option to write data into hive? ?-1: mysql data source and hive data source are configured on exchageis.
-2: When the job is executed, the data results are written on hdfs and compressed in. gz mode.

Excuse me: Is it possible to directly or provide an option to write data into hive? ?

  • 1: 在exchageis上配置了mysql数据源,和hive数据源
  • 2: 在执行作业的时候、数据结果写到了hdfs上、并且是以.gz压缩的方式

请问:是否可以直接或者提供选择将数据写到hive中呢??

When will Exchange IS be integrated into DSS?

Want to know about the community's plan to integrate Exchange IS into DSS?Want to know about the community's plan to integrate Exchange IS into DSS?

想了解社区将Exchangis集成到DSS的计划?

Sqoop performs tasks.

Execute the task with sqoop engine and report an error:
java.lang.IllegalArgumentException: Cannot find the SQOOP Tool option: ["{}"]
用sqoop引擎执行任务,报错:
java.lang.IllegalArgumentException: Cannot find the SQOOP Tool option: ["{}"]

Is dockerfile supported?

I mean, not all modules are in one dockerfile, but different modules are in different dockers.
我的意思不是所有的模块都放在一个dockerfile中,而是不同模块在不同的docker中

How to add a task execution node

If you want to add a task execution node, what service should you deploy?
如果要新增任务执行节点,要部署什么服务

Dev 1.0.0 Data source management

project list front-end development tasks.

  1. Data source list
    Display all data sources in batches and support paging function.
    The top provides a search function, which supports searching by data source name, data source type, and creator.
    If the logged-in user is the creator of the data source, and the logged-in user is not an administrator, the operation bar of the data source table provides: test connection, edit and delete functions.
    If the logged-in user is the creator of the data source or has editing rights, and the logged-in user is an administrator, the operation bar of the data source table provides: test connection, editing, and expiration functions.
    If the logged-in user is not the creator of the data source, the operation bar of the data source table provides: test connection function.

  2. Multiple versions of data sources
    The data source provides a multi-version function, allowing version rollback of the data source.
    The historical version is allowed to roll back to the latest version, that is, the historical version is copied to the latest version of the data source.
    The latest version is opened by default, and historical versions are also allowed to be opened in read-only mode.

  3. New data source
    The new data source is divided into three steps: select data source type -> supplement data source connection information and basic information -> supplement permission information.
    3.1 Select the data source type
    Request the background to obtain the classification of all data sources, as well as the data source name, icon and description under each category, and display all the data sources that can be added
    3.2 Supplementary data source connection information and basic information
    Depending on the data source, the connection information of the data source is different. For details, please refer to the Linkis data source architecture design document. The content of the form is dynamically loaded based on the back-end return.
    3.3 Supplementary authority information
    If the administrator wants to share the data source with a certain user, he only needs to enter this step; the non-administrator cannot enter this step because he does not have the ability to share the data source, and completes it directly in the second step.permission information is divided into data source read-write authorization and collaborative development authorization.

The new DATAX task cannot list the library names in the ORACLE data source

Schema/USER in the library cannot be listed, only the service name or instance name in the template creation stage can be listed, and it cannot be named by "library name" here. To be precise, it should be represented by user or Schema.
不能列出库中的Schema/USER,仅仅列出模板创建阶段的服务名或实例名,并且这里不能用“库名”来命名。准确来说应该使用用户或Schema来表示。

MacOS环境下无法启动executor模块,提示sigar问题。

您好:
当开发环境为MACOS时,executor模块启动时报如下异常:
Caused by: java.lang.UnsatisfiedLinkError: org.hyperic.sigar.Mem.gather(Lorg/hyperic/sigar/Sigar;)V

请问开发过程中,您是怎么解决这个问题的?期待您的回复,谢谢。

从mysql同步where条件如何取得当前时间

我想根据插入时间的前5分钟取数据如果弄呢比如
select * from user where insert_time >=${currtime} -5 and insert_time <=${currtime}
我看有个where条件但是不知道如何使用当前系统时间

Exchange is configured mysql->es task, then click execute, and the task has been queued.

image

  1. version:
    exchangis-0.5.0.RELEASE
  2. Phenomenon
    After configuring mysql -> es tasks, and clicking Execute, all tasks are in the queue state, but not in the execution state.
    Observe the logs of service, executor and gateway, and no obvious error log is found.

1.版本:
exchangis-0.5.0.RELEASE
2. 现象
配置好mysql -> es 任务,并点击执行后, 所有任务都处于排队状态,没有进入到执行状态。
观察service、 executor、gateway 日志,未发现明显错误日志

Gateway continuous printing:
gateway 持续打印:
image

Service continuous printing:
service 持续打印:
image

Executor continuous printing:
executor 持续打印:
image

> @jkl0898 Excuse me, how to bind the IP? I also encountered this problem.Excuse me, how to bind the IP? I also encountered this problem.

@jkl0898 请问如何绑定的IP?我也遇到了此问题

Hello. To check the IP configuration related to the network in your environment, you can check its source logic, the part about IP binding.
You can also debug errors by remote debugging.
你好。需要检查你的环境中网路相关的IP 配置, 你可查询其源码逻辑,关于IP绑定的部分。
也可以通过远程调试的方式调试错误。

Originally posted by @jkl0898 in #12 (comment)

Task execution failed, and the error log cannot be seen.

When data is exchanged from MYSQL to MYSQL, only No Logs are available can be seen in the logs of the execution list of task jobs.
从MYSQL交换数据到MYSQL,在任务作业的执行列表的日志里只看到No Logs are available

No available candidate servers

exchangis-service启动起来后就持续报错。
com.webank.wedatasphere.exchangis.route.exception.NoAvailableServerException: No available candidate servers at com.webank.wedatasphere.exchangis.route.MachineLoadRule.choose(MachineLoadRule.java:108) at com.webank.wedatasphere.exchangis.route.MachineLoadRule.choose(MachineLoadRule.java:65) at com.netflix.loadbalancer.BaseLoadBalancer.chooseServer(BaseLoadBalancer.java:736) at com.netflix.loadbalancer.ZoneAwareLoadBalancer.chooseServer(ZoneAwareLoadBalancer.java:113) at com.netflix.loadbalancer.LoadBalancerContext.getServerFromLoadBalancer(LoadBalancerContext.java:481) at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:184) at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:180) at rx.Observable.unsafeSubscribe(Observable.java:10327) at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:94) at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:42) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48) at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30) at rx.Observable.subscribe(Observable.java:10423) at rx.Observable.subscribe(Observable.java:10390)

Exchangis es version problem

The version of es is 6.7, and some methods of RestHighLevelClient before es 6.7 will not be compatible.

Es 6.3 Start RestHighLevelClient datax. Some methods are relatively complete. If you want to be compatible with the small version, you have to make a big adjustment.

If it is not version es 6.7, it will report an error if the compatibility limit is extremely large.

es 版本是6.7 ,es 6.7 以前RestHighLevelClient 的一些方法不会兼容。

es 6.3开始RestHighLevelClient datax中的一些方法相对齐全,如果要兼容小版本得大的调整。

es 不是6.7版本,兼容性限制特别大,就会报错

Caused by: ElasticsearchStatusException[method [HEAD], host [http://192.168.200.18:9212], URI [/linkis_db.node1?include_type_name=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 400 Bad Request]]; nested: ResponseException[method [HEAD], host [http://192.168.200.18:9212], URI [/linkis_db.node1?include_type_name=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 400 Bad Request]]; nested: ResponseException[method [HEAD], host [http://192.168.200.18:9212], URI [/linkis_db.node1?include_type_name=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 400 Bad Request]];
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2027)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1777)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1749)
	at org.elasticsearch.client.IndicesClient.exists(IndicesClient.java:1087)
	at com.webank.wedatasphere.exchangis.datax.plugin.writer.elasticsearchwriter.v6.ElasticRestClient.lambda$existIndices$2(ElasticRestClient.java:145)
	at com.webank.wedatasphere.exchangis.datax.plugin.writer.elasticsearchwriter.v6.ElasticRestClient.execute(ElasticRestClient.java:242)
	... 11 more
Caused by: org.elasticsearch.client.ResponseException: method [HEAD], host [http://192.168.200.18:9212], URI [/linkis_db.node1?include_type_name=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 400 Bad Request]
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764)
	... 15 more

datax依赖问题

请问下当前提交的代码中对datax的依赖是直接复制datax项目的源码至engine-datax模块,而不是直接通过pom依赖,是因为exchangis有对datax进行增强还是别的什么原因呢?为何不是fork出datax项目,然后再pom依赖呢?期待您的回复。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.