Giter Site home page Giter Site logo

[Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down about seatunnel HOT 20 CLOSED

LeonYoah avatar LeonYoah commented on June 5, 2024
[Bug] [S3File] [zeta-local] Error writing to S3File in version 2.3.4:: Java lang. An IllegalStateException: Connection pool shut down

from seatunnel.

Comments (20)

LeonYoah avatar LeonYoah commented on June 5, 2024

In addition, there's nothing wrong with debugging with idea, but that's what happens on the server.

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

@ruanwenjun may have something to do with your issue #5903, which is whether checkpoint uses hdfs or cache. I saw that you submitted #6039 and noticed the cache problem, but checkpoint did not: 在这个类;org.apache.seatunnel.engine.checkpoint.storage.hdfs.common.HdfsConfiguration
image

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

And this code:
f69f6109842ad735b032077b37310df
d9e9d4eb54ab346be53e7b1dadb6e56
Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

s3n is because the S3Conf obtained from AggregatedCommit does not use this buildWithConfig method, but uses DEFAULT_SCHEMA. debug finds that it seems to be related to dag:

buildWithConfig method screenshot:

image

DAG initialization S3conf screenshot:

image

image

AggregatedCommit获取的hadoopconf和shema截图:

image

So far I have found two solutions:

1. Change DEFAULT_SCHEMA to s3a:

image

2. Set up the profile:

image

_ But I am not familiar with the code of dag and AggregatedCommint, I need to ask my teacher to help me look at the root cause! _

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

@EricJoy2048 I see that you have been working on the mutli table feature #6698 of S3file connector recently. Have you ever encountered that the schema in hadoop conf obtained by mutli table is the default s3n instead of the s3a specified in the configuration file

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

I'll look at that as soon as I can

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024
  1. SeaTunnel Engine(Zeta) will use hdfs api to write checkpoint to FileSystem, The config is at $SEATUNNEL_HOME/conf/seatunnel.yaml, And the codes are in seatunnel-engine/seatunnel-engine-storage/checkpoint-storage-plugins/checkpoint-storage-hdfs/src/main/java/org/apache/seatunnel/engine/checkpoint/storage/hdfs/common package . I do not know the content about this config file in your environment, But I think we need disable the cache in checkpoint storage too.

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024
  1. I can show you how the FileSinkAggregatedCommitter init.
image image image

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024
image

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Yes, I am now 100% repeat, I officially downloaded a seatunnel2.3.4 package and plug-in, deployed on the server and executed in local mode, there will be such a problem, but local idea debugging is not possible, it cannot appear

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?

Can you try use remote debug?

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed
image

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed image

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

I seem to have spotted the problem, noting that this code is actually fine, by executing the buildWithConfig method it assigns [SCHEMA] to "s3a", but the key is that when the whole [sink] is passed downstream to [multiTableSink], it is de-serialized, The deserialization process re-instantiates the static variables, and the member variables of the entire S3CONF class are [stastic] modified, including [SCHEMA], resulting in [mutiltable sink] getting the default value of [SCHEMA], which is "s3n".

image

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed image是的,我通过远程调试发现s3n实际上正在传递 image

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是 static variables 的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义 static variables 。你能做个公关来解决这个问题吗?

Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

我注意到这块代码其实是没问题的,通过执行buildwith方法会赋值shema为s3a,但是关键在于整个sink往下游传递给mutiltable sink时,会执行反序列化操作

Sometimes hadoopConf's getSchema method returns s3n instead of s3a, resulting in the s3 connector actually being cached Is this reproducible?#00000;是可复制的?

Can you try use remote debug?你能尝试使用远程调试吗?

Yes, I discovered through remote debugging that s3n was actually being passed image是的,我通过远程调试发现s3n实际上正在传递 image

I understand now. It's really a matter of serialization and deserialization of static variables. I don't think we should define static variables for classes that need to be serialized and deserialized. Can you put up a pr to fix this?我现在明白了这实际上是 static variables 的序列化和重复化的问题。我不认为我们应该为需要序列化和非序列化的类定义 static variables 。你能做个公关来解决这个问题吗?

Ok, I am willing to submit the PR, I have modified the first version to try to remove the [static] changes, but I see that you submitted a change to the s3 connector, maybe my submission caused a conflict

Don't worry, I will resolve the conflict after your pr merge.

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

Related PR: #6698

from seatunnel.

EricJoy2048 avatar EricJoy2048 commented on June 5, 2024

You can add close #6678 to the description of pr in pr to ensure that the issue is automatically closed when pr merges

from seatunnel.

LeonYoah avatar LeonYoah commented on June 5, 2024

You can add close #6678 to the description of pr in pr to ensure that the issue is automatically closed when pr merges
Ok

from seatunnel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.