<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

And this code: <a target="_blank" rel="noopener noreferrer" href="https://private-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

SeaTunnel Engine(Zeta) will use hdfs api to write checkpoint to FileSystem, The

I can show you how the FileSinkAggregatedCom

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com

[Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down about seatunnel HOT 20 CLOSED

LeonYoah commented on June 5, 2024

[Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down

from seatunnel.

Comments (20)

LeonYoah commented on June 5, 2024

In addition, there's nothing wrong with debugging with idea, but that's what happens on the server.

from seatunnel.

LeonYoah commented on June 5, 2024

@ruanwenjun may have something to do with your issue #5903, which is whether checkpoint uses hdfs or cache. I saw that you submitted #6039 and noticed the cache problem, but checkpoint did not: 在这个类；org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsConfiguration

from seatunnel.

LeonYoah commented on June 5, 2024

And this code:

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached

from seatunnel.

LeonYoah commented on June 5, 2024

s3n is because the S3Conf obtained from AggregatedCommit does not use this buildWithConfig method, but uses DEFAULT_SCHEMA. debug finds that it seems to be related to dag:

buildWithConfig method screenshot：

DAG initialization S3conf screenshot：

AggregatedCommit获取的hadoopconf和shema截图：

So far I have found two solutions:

1. Change DEFAULT_SCHEMA to s3a：

2. Set up the profile：

_ But I am not familiar with the code of dag and AggregatedCommint, I need to ask my teacher to help me look at the root cause! _

from seatunnel.

LeonYoah commented on June 5, 2024

@EricJoy2048 I see that you have been working on the mutli table feature #6698 of S3file connector recently. Have you ever encountered that the schema in hadoop conf obtained by mutli table is the default s3n instead of the s3a specified in the configuration file

from seatunnel.

EricJoy2048 commented on June 5, 2024

I'll look at that as soon as I can

from seatunnel.

EricJoy2048 commented on June 5, 2024

SeaTunnel Engine(Zeta) will use hdfs api to write checkpoint to FileSystem, The config is at $SEATUNNEL_HOME/conf/seatunnel.yaml, And the codes are in seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common package . I do not know the content about this config file in your environment, But I think we need disable the cache in checkpoint storage too.

from seatunnel.

EricJoy2048 commented on June 5, 2024

I can show you how the FileSinkAggregatedCommitter init.

from seatunnel.

EricJoy2048 commented on June 5, 2024

from seatunnel.

EricJoy2048 commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?

from seatunnel.

LeonYoah commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的？

Yes, I am now 100% repeat, I officially downloaded a seatunnel2.3.4 package and plug-in, deployed on the server and executed in local mode, there will be such a problem, but local idea debugging is not possible, it cannot appear

from seatunnel.

EricJoy2048 commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?

Can you try use remote debug?

from seatunnel.

LeonYoah commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的？

Can you try use remote debug?你能尝试使用远程调试吗？

Yes, I discovered through remote debugging that s3n was actually being passed

from seatunnel.

EricJoy2048 commented on June 5, 2024

我注意到这块代码其实是没问题的，通过执行buildwith方法会赋值shema为s3a，但是关键在于整个sink往下游传递给mutiltable sink时，会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的？

Can you try use remote debug?你能尝试使用远程调试吗？

Yes, I discovered through remote debugging that s3n was actually being passed

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?

from seatunnel.

LeonYoah commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的？

Can you try use remote debug?你能尝试使用远程调试吗？

I seem to have spotted the problem, noting that this code is actually fine, by executing the buildWithConfig method it assigns [SCHEMA] to "s3a", but the key is that when the whole [sink] is passed downstream to [multiTableSink], it is de-serialized, The deserialization process re-instantiates the static variables, and the member variables of the entire S3CONF class are [stastic] modified, including [SCHEMA], resulting in [mutiltable sink] getting the default value of [SCHEMA], which is "s3n".

from seatunnel.

LeonYoah commented on June 5, 2024

我注意到这块代码其实是没问题的，通过执行buildwith方法会赋值shema为s3a，但是关键在于整个sink往下游传递给mutiltable sink时，会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的？

Can you try use remote debug?你能尝试使用远程调试吗？

Yes, I discovered through remote debugging that s3n was actually being passed 是的，我通过远程调试发现s3n实际上正在传递

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是 static variables 的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义 static variables 。你能做个公关来解决这个问题吗？

Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict

from seatunnel.

EricJoy2048 commented on June 5, 2024

我注意到这块代码其实是没问题的，通过执行buildwith方法会赋值shema为s3a，但是关键在于整个sink往下游传递给mutiltable sink时，会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的？

Can you try use remote debug?你能尝试使用远程调试吗？

Yes, I discovered through remote debugging that s3n was actually being passed 是的，我通过远程调试发现s3n实际上正在传递

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是 static variables 的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义 static variables 。你能做个公关来解决这个问题吗？

Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict

Don't worry, I will resolve the conflict after your pr merge.

from seatunnel.

EricJoy2048 commented on June 5, 2024

Related PR: #6698

from seatunnel.

EricJoy2048 commented on June 5, 2024

You can add close #6678 to the description of pr in pr to ensure that the issue is automatically closed when pr merges

from seatunnel.

LeonYoah commented on June 5, 2024

You can add close #6678 to the description of pr in pr to ensure that the issue is automatically closed when pr merges
Ok

from seatunnel.

[Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down about seatunnel HOT 20 CLOSED

Comments (20)

s3n is because the S3Conf obtained from AggregatedCommit does not use this buildWithConfig method, but uses DEFAULT_SCHEMA. debug finds that it seems to be related to dag:

buildWithConfig method screenshot：

DAG initialization S3conf screenshot：

AggregatedCommit获取的hadoopconf和shema截图：

So far I have found two solutions:

1. Change DEFAULT_SCHEMA to s3a：

2. Set up the profile：

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent