Giter Site home page Giter Site logo

Comments (11)

apnadkarni avatar apnadkarni commented on August 23, 2024

Sorry, Petro, nothing similar come to mind that I've encountered.

If I understand correctly, the sending end is not Tcl, only the client side right?

Can you post the readSocket proc? That may help me target a specific code path to examine for bugs.

Out of curiosity, if you are using event driven i/o, why is the socket kept in blocking mode?

Also, could you confirm you still see the issue with tcl 8.6.13? It fixed two bugs in channel buffering. I don't think it would impact your case (binary) but nevertheless would be be good to confirm you still see the issue in 8.6.13.

If the code is open source, you can just send me link to look at it.

/Ashok

from iocp.

Kazmirchuk avatar Kazmirchuk commented on August 23, 2024

Thanks a lot for the pointer to 8.6.13! Good to hear there have been some fixes in this area 👍 I saw the announcement on comp.lang.tcl, but didn't look much thru bugfixes. We're using 8.6.12 at the moment.

The sending side is some proprietary SW (not Tcl) running on Windows 10 x64. The LAN is 10GbE using Intel Adapters. All this, including the server running our Tcl application, is our partner's HW that I only accessed remotely. We've received a C#-based simulator of the sending side, and we tried running it in house with the Tcl application, but the bug doesn't occur in such setup.

I've tried increasing sorcvbuf & sosndbuf to 1MB - it didn't make any difference.

if you are using event driven i/o, why is the socket kept in blocking mode?

I thought it would be simpler to implement readSocket with blocking [read] and [chan copy]. Just read one packet at a time and rely on Tcl to invoke readSocket again (~4000 pkts/s). Basically smth like:

proc readSocket {sock} {
    set pcapHdr [read $sock 44]
    binary scan $pcapHdr @8i incl_len
    if {$incl_len < 24 || $incl_len > 262144} {
        error "Invalid incl_len"
    }
    set pktHeader [read $sock 20]  ;# CCSDS packet header: extract some values and insert into Postgres; done in a separate thread
    ...
    chan copy $sock $::binFile -size $pktPayloadLength ;# copy packet payload straight into a file
}

After a bit more debugging I noticed that when the bug occurs, the very first [read] 'skips' the first 256 bytes received from TCP, so I land in the middle of packet payload. Interesting, that it's always 256 bytes.

In hindsight I realize that letting Tcl event loop call readSocket 4000 times/s probably wasn't the best idea :-) maybe I should have another proc on top that would have [chan pending] and then call readSocket in a loop until all available packets are read (in practice they have the same size). Or do a non-blocking socket... [chan copy] would be inconvenient in this case, but I can replace it with plain [read].

Unfortunately, the HW is now being packaged for delivery to a customer, and they seem to accept it as-is for now, so I won't be able to experiment further until we start working on a 2nd delivery in a few months. I will let you know.

from iocp.

apnadkarni avatar apnadkarni commented on August 23, 2024

Before calling chan copy, are you turning off the fileevent handler (in the ... section of readSocket) ? I would be a little uncomfortable with using chan copy while a event handler was registered. It is possible there would be a race condition between the event handler firing and the chan copy internal loop. But I don't know for sure.

/Ashok

from iocp.

Kazmirchuk avatar Kazmirchuk commented on August 23, 2024

I don't. The [chan copy] docs mention turning off fileevent handlers only when doing a background copy. I believed that it wasn't necessary when doing a blocking copy, like here. Anyway, while troubleshooting this I did try replacing [chan copy] with plain [read]+[puts], and it made no difference - neither in performance, nor with the bug. BTW I guess, for packets of 5-10KB using [chan copy] probably means just showing off my Tcl skills rather than making a real difference. I guess, it becomes worthwhile starting with... 1MB?..

Also, do you know if [chan pending] is a cheap operation or it should be cached in a variable? E.g. if in a fileevent handler I want to read from a socket until there's 10KB left:

while { [chan pending input $chan] > 10000} {
    set data [read $chan $numBytes]
   # process $data
}

is it an OK implementation for a high-performance loop? or better call [chan pending] once before the loop?

E.g. recently I accidentally discovered that getting all socket options with [chan configure $chan] is in fact very expensive, and the other side disconnected me as a slow client :-)

On a more general note, do you know about a reasonably modern Tcl open-source project that is a good example of proper work combining TCP sockets and coroutines? with full error handling and testing? (in addition to your book ofc!) A couple of years ago, while working on my NATS client, I tried looking around and couldn't find much, so had to learn many things through trial and error. E.g. the fact that [socket -async] can throw an error when given an invalid host name, because DNS resolution is done synchronously anyway, was quite a surprise :D

from iocp.

Kazmirchuk avatar Kazmirchuk commented on August 23, 2024

oh and another completely unrelated question :-) do you plan to publish a 2nd edition of your Tcl book in the near future? I'm going to buy it, and thought, maybe it's worth waiting a bit for an update

from iocp.

apnadkarni avatar apnadkarni commented on August 23, 2024

I don't want to say never but no plans currently for a second edition. The thought of proofing again is too daunting :-)

Having said that, purchasing the PDF version (from gumroad) will also allow access to future editions.

from iocp.

apnadkarni avatar apnadkarni commented on August 23, 2024

Regarding your other questions, fconfigure on sockets can be expensive because of the reverse DNS lookup on the remote address.

Sorry I don't know the answer to your other questions regarding sockets (chan pending etc.). Probably best to measure and see.

/Ashok

from iocp.

apnadkarni avatar apnadkarni commented on August 23, 2024

Forgot to mention. I had a look through the code paths again on the receive but didn't see anything that would explain the sporadic data corruption. Does not mean it doesn't exist of course. I'll set up a long term sink test and see if I can spot it. I can't produce data at the rates you are seeing though.

/Ashok

from iocp.

Kazmirchuk avatar Kazmirchuk commented on August 23, 2024

Don't worry too much. We've suggested a workaround to our customer (reading the missed packets later from an archive), and they are fine with it. I couldn't reproduce the problem in-house, so it could be limited to the specific HW setup. Next time I get a change to look at it will be in a few months, so I suppose it doesn't make sense to keep the issue open. If I have any news, I'll comment here. Thank you for all the replies and for your immense contribution to the Tcl ecosystem!

from iocp.

Kazmirchuk avatar Kazmirchuk commented on August 23, 2024

After updating Tcl 8.6.10 -> 8.6.13 the loss of sync doesn't occur anymore, and we have a stable throughput of 650Mbit/s (as received from our instrument) with very little CPU and memory consumption.

I've been working with Tcl for almost 10y, and the discovery that the standard Tcl sockets on Windows are still based on the slow API from the late 90s, was very sudden. I've just tried googling "tcl socket performance on windows" and no results point to your package. That's why I think that it is highly important to integrate iocp_inet into core for Tcl 9.

from iocp.

apnadkarni avatar apnadkarni commented on August 23, 2024

It's interesting that the sync issues went away with 8.6.13. Thanks for letting me know.

Yes, I'm aware of the need for better networking performance for the Tcl core. But Tcl 9 is in such a state of flux, I'm reluctant to add one more risk factor right now and do not have the time either. But eventually once things settle down.

/Ashok

from iocp.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.