Comments (20)
In addition, there's nothing wrong with debugging with idea, but that's what happens on the server.
from seatunnel.
@ruanwenjun may have something to do with your issue #5903, which is whether checkpoint uses hdfs or cache. I saw that you submitted #6039 and noticed the cache problem, but checkpoint did not: 在这个类;org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsConfiguration
from seatunnel.
And this code:
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
from seatunnel.
s3n is because the S3Conf obtained from AggregatedCommit does not use this buildWithConfig method, but uses DEFAULT_SCHEMA. debug finds that it seems to be related to dag:
buildWithConfig method screenshot:
DAG initialization S3conf screenshot:
AggregatedCommit获取的hadoopconf和shema截图:
So far I have found two solutions:
1. Change DEFAULT_SCHEMA to s3a:
2. Set up the profile:
_ But I am not familiar with the code of dag and AggregatedCommint, I need to ask my teacher to help me look at the root cause! _
from seatunnel.
@EricJoy2048 I see that you have been working on the mutli table feature #6698 of S3file connector recently. Have you ever encountered that the schema in hadoop conf obtained by mutli table is the default s3n instead of the s3a specified in the configuration file
from seatunnel.
I'll look at that as soon as I can
from seatunnel.
- SeaTunnel Engine(Zeta) will use hdfs api to write checkpoint to FileSystem, The config is at
$SEATUNNEL_HOME/conf/seatunnel.yaml
, And the codes are inseatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common
package . I do not know the content about this config file in your environment, But I think we need disable the cache in checkpoint storage too.
from seatunnel.
- I can show you how the
FileSinkAggregatedCommitter
init.
from seatunnel.
from seatunnel.
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?
from seatunnel.
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?#00000;是可复制的?
Yes, I am now 100% repeat, I officially downloaded a seatunnel2.3.4 package and plug-in, deployed on the server and executed in local mode, there will be such a problem, but local idea debugging is not possible, it cannot appear
from seatunnel.
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?
Can you try use remote debug?
from seatunnel.
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?#00000;是可复制的?Can you try use remote debug?你能尝试使用远程调试吗?
Yes, I discovered through remote debugging that s3n was actually being passed
from seatunnel.
我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?#00000;是可复制的?Can you try use remote debug?你能尝试使用远程调试吗?
Yes, I discovered through remote debugging that s3n was actually being passed
I understand now. It's really a matter of serialization and deserialization of static variables
. I don't think we should define static variables
for classes that need to be serialized and deserialized. Can you put up a pr to fix this?
from seatunnel.
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?#00000;是可复制的?Can you try use remote debug?你能尝试使用远程调试吗?
I seem to have spotted the problem, noting that this code is actually fine, by executing the buildWithConfig method it assigns [SCHEMA] to "s3a", but the key is that when the whole [sink] is passed downstream to [multiTableSink], it is de-serialized, The deserialization process re-instantiates the static variables, and the member variables of the entire S3CONF class are [stastic] modified, including [SCHEMA], resulting in [mutiltable sink] getting the default value of [SCHEMA], which is "s3n".
from seatunnel.
我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?#00000;是可复制的?Can you try use remote debug?你能尝试使用远程调试吗?
Yes, I discovered through remote debugging that s3n was actually being passed 是的,我通过远程调试发现s3n实际上正在传递
I understand now. It's really a matter of serialization and deserialization of
static variables
. I don't think we should definestatic variables
for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是static variables
的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义static variables
。你能做个公关来解决这个问题吗?
Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict
from seatunnel.
我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached
Is this reproducible?#00000;是可复制的?Can you try use remote debug?你能尝试使用远程调试吗?
Yes, I discovered through remote debugging that s3n was actually being passed 是的,我通过远程调试发现s3n实际上正在传递
I understand now. It's really a matter of serialization and deserialization of
static variables
. I don't think we should definestatic variables
for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是static variables
的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义static variables
。你能做个公关来解决这个问题吗?Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict
Don't worry, I will resolve the conflict after your pr merge.
from seatunnel.
Related PR: #6698
from seatunnel.
You can add close #6678
to the description of pr in pr to ensure that the issue is automatically closed when pr merges
from seatunnel.
You can add
close #6678
to the description of pr in pr to ensure that the issue is automatically closed when pr merges
Ok
from seatunnel.
Related Issues (20)
- 增加atlas hook监听插件
- [Bug] [mysql to hive 2.3.5] Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(Lorg/apache/hadoop/hive/conf/HiveConf;)V HOT 6
- SeaTunnel community meeting Topic collect HOT 5
- [Bug] [serializer-protobuf] exception with compiling seatunnel HOT 4
- [Bug] [Module Name] Can not print Job Statistic Information when Using Spark Engine HOT 1
- [Feature][Connector-v2] The paimon sink is not supported the schema change evolution. HOT 1
- [Bug] when upgrade 2.3.5 version from 2.3.4, job restore failed HOT 3
- [Bug] SeaTunnel job executed failed HOT 2
- [Bug] [Jdbc] An error is reported in version 2.3.5 for the types supported in version 2.3.4( db2/opengauss )
- [Feature][Password Reset Seatunnel Web] Password reset for new users
- [Feature][User Id lockout Seatunnel Web] User lockout after multiple incorrect login attempts
- ErrorCode:[API-06], ErrorDescription:[Factory initialize failed] - Unable to create a sink for identifier 'StarRocks' HOT 1
- [Bug] [salserver to kafka] Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.NegativeArraySizeException
- [Bug] [WEB] Seatunnel WEB 界面Hive数据源已创建,但是在创键同步任务选项中无法选取Hive数据源 HOT 2
- Special character column in config template
- [Bug] [LocalFile] Bug LocalFile Model uses spark local mode to double the data
- [Feature][HDFS File Source and Sink] Whether to support file synchronization function similar to distcp HOT 3
- [Bug] [Connector-V2] [connector-kafka] [KafkaSourceConfig] The following content cannot be set in the config configuration file to consume different tables in Kafka: database.include or table.include.
- [Feature][Connector or transform Dynamic loading] Dynamic loading Newly developed Connector or transform
- [Bug] [CheckPoint] checkpoint ineffective HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seatunnel.