Giter Site home page Giter Site logo

[Re] Ten years challenge: Velho and Legrand (2009) - Accuracy Study and Improvement of Network Simulation in the SimGrid Framework about submissions HOT 33 CLOSED

rescience avatar rescience commented on June 9, 2024
[Re] Ten years challenge: Velho and Legrand (2009) - Accuracy Study and Improvement of Network Simulation in the SimGrid Framework

from submissions.

Comments (33)

benoit-girard avatar benoit-girard commented on June 9, 2024 4

Ok, then I have the pleasure to state that the paper has been accepted!

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024 3

And we have a reviewer: @rgrunbla

from submissions.

rgrunbla avatar rgrunbla commented on June 9, 2024 1

Hi,
Thank you for the paper.

I have a few questions / comments:

  • The creation of the /root/simutools09/01-onelink directory is missing from the instructions (it could probably be put inside the original Dockerfile) in section 3.2, page 13
  • Formatting: In section 3.3, page 14, the second shell commands, it would be better to adopt a single norm regarding the copy of files into a directory (specify the destination filename or not), and not mix them. I'd go with « specifying the filename » as it underlines it goes into a directory.
  • Running the commands from page 14, the sweep-parse takes around 13 minutes to give me the shell back on my machine, at which point it crashes:
sh: line 1: 410408 Aborted                 (core dumped) ./gtnets /root/simutools09/01-onelink/tmp/plateforme-1-1.xml /root/simutools09/01-onelink/tmp/deployment-1-1.xml --cfg=workstation_model:compound --cfg=cpu_model:Cas01 --cfg=network_model:GTNets 2>&1 >&/root/simutools09/01-onelink/tmp/tempotrace-1-1.log

I get 10581 lines of output (according to wc -l) before it crashes (and it's always the same number of lines). Could you confirm whether (or not) this is happening from your side ?

  • While the argument of the separation of concerns (about the input files, perl script, and the simulation code) is a good one (section 3.2), i would also appreciate an overlay "Dockerfile" containing these things to avoid typing the commands (only available in the Pdf, it seems to me, so hardly copy/pastable) needed to reproduce the results of this paper. It would allow me to be sure the aformentioned segfault is a problem from my side, or something else :)

  • How is the "raw.data" file, mentioned in the section 3.4, generated ? Is that supposed to be generated by sweep-parse.pl ?

Thanks,

Rémy

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024 1

@rgrunbla A gentle reminder: do you consider the answer or @alegrand satisfying? Do you have additional requirements on this submission?

from submissions.

alegrand avatar alegrand commented on June 9, 2024 1

I have finally taken the time to polish this article. I deeply apologize to the editors and the reviewer (who all worked in a timely manner) for their patience.
I have just uploaded an updated version here: https://github.com/alegrand/reproducibility-challenge
The main changes are related to recent shutdown of Inria gforge. I have now ensured everything was on a perennial archive (Software Heritage and Zenodo) and I have updated the recipes. We may now proceed to the publication.

from submissions.

rougier avatar rougier commented on June 9, 2024

Thanks for your submission. I'm afraid I've conflict of interest (https://rr-france.github.io/bookrr/) and I cannot edit it.

@labarba Coudl you edit this submission for the Ten Years Reproducibility Challenge (only 1 reviewer needed) ?

from submissions.

rougier avatar rougier commented on June 9, 2024

@labarba @eroesch Could you edit this submission for the Ten Years Reproducibility Challenge (only 1 reviewer needed)

from submissions.

labarba avatar labarba commented on June 9, 2024

I don't have field expertise in computer networks, and I'm also slammed with multiple service roles, so I cannot take one more task.

from submissions.

rougier avatar rougier commented on June 9, 2024

@labarba Ok thanks for the quick answer.
@benoit-girard Could you edit this submission ? I know it's not your domain but @alegrand may help in finding reviewers.

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

OK, let's try that. @alegrand, any reviewer suggestion?

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

Thanks a lot @rgrunbla !

from submissions.

alegrand avatar alegrand commented on June 9, 2024

Hi Remy,

sorry for the late reply. Huge thanks for being more responsive than me. I've been quite busy over the last days so I haven't been able to peacefully look at what may be wrong. I should finally be able to do this in the following days. Thanks for your patience.

Best,

Arnaud

from submissions.

rougier avatar rougier commented on June 9, 2024

@benoit-girard @alegrand Gentle reminder

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@alegrand

Have you been able to address the questions raised by @rgrunbla?
Let us know as soon as possible...

from submissions.

alegrand avatar alegrand commented on June 9, 2024

Hi @rgrunbla, @benoit-girard and @rougier . I really apologize for this unacceptable delay. :(

I have finally been able to find some time to look into this this afternoon (on a completely fresh system on an old laptop as my regular one has recently crashed, 3 weeks after the end of the warranty :( ).

Good catch! There is indeed a problem even though it does not fail in the same way as you.

In my case the script runs to completion (no crash), which is why I had not noticed this. But indeed, when I monitor stderr, there is one of the configuration for which I get the same message as you:

sh: line 1: 363014 Aborted                 (core dumped) ./gtnets /root/simutools09/01-onelink/tmp/plateforme-1-1.xml /root/simutools09/01-onelink/tmp/deployment-1-1.xml --cfg=workstation_model:compound --cfg=cpu_model:Cas01 --cfg=network_model:GTNets 2>&1 >&/root/simutools09/01-onelink/tmp/tempotrace-1-1.log

I have to say I do not understand why it would crash and stop on your machine and run to completion on mine since we're both in a Docker image!

Anyway, I looked into the log to determine, which configuration fails. There is only one, which corresponds to this in the log file:

========> Bandwidth (B) : 1.000000e+05 B/s (Bytes per second)
========> Latency   (L) : 0.50000 s (seconds)
========> Size      (S) : 17000 B (Bytes) 
========> Model     (M) : GTNets
[0.000000] [simix_kernel/INFO] setting 'workstation_model' to 'compound'
[0.000000] [xbt_cfg/INFO] type in variable = 2
[0.000000] [simix_kernel/INFO] setting 'cpu_model' to 'Cas01'
[0.000000] [xbt_cfg/INFO] type in variable = 2
[0.000000] [simix_kernel/INFO] setting 'network_model' to 'GTNets'
[0.000000] [xbt_cfg/INFO] type in variable = 2
<<<<<================================>>>>>
Dumping GTNETS topollogy information
== LINKID: 0
  [SRC] ID: 0, router?: 0, hosts[]: [ 0]
  [DST] ID: 1, router?: 0, hosts[]: [ 1]
>>>>>================================<<<<<
[0.000000] [simix_kernel/INFO] Oops ! Deadlock or code not perfectly clean.
[0.000000] [simix_kernel/INFO] 2 processes are still running, waiting for something.
[0.000000] [simix_kernel/INFO] Legend of the following listing: "<process> on <host>: <status>."
[0.000000] [simix_kernel/INFO] master on S1:  Blocked on condition 0xdead; Waiting for the following actions: 'sleep'(0xdead) 'Task_0'(0xdead) 'sleep'(0xdead).
[0.000000] [simix_kernel/INFO] slave on C1:  Blocked on condition 0xdead; Waiting for the following actions: 'sleep'(0xdead) 'Task_0'(0xdead) 'sleep'(0xdead).
[0.000000] [simix_kernel/INFO] Return a Warning.
** SimGrid: UNCAUGHT EXCEPTION received on (0): category: unknown_err; value: 0
** Cannot cancel GTNetS flow
** Thrown by () in this process
[0.000000] xbt/ex.c:113: [xbt_ex/CRITICAL] Cannot cancel GTNetS flow

**   In action_cancel() at /root/simgrid-3.3/src/surf/network_gtnets.c:349
**   In action_cancel() at /root/simgrid-3.3/src/surf/workstation.c:126
**   In SIMIX_action_cancel() at /root/simgrid-3.3/src/simix/smx_action.c:170
**   In MSG_process_kill() at /root/simgrid-3.3/src/msg/m_process.c:196
**   In MSG_clean() at /root/simgrid-3.3/src/msg/global.c:237
=========================><=========================

This is extremely weird since when looking in the log files of the original article, I should get:

trace-file-1-1.log-114497->==================================================<
trace-file-1-1.log:114498:========> Bandwidth (B) : 1.000000e+05 B/s (Bytes per second)
trace-file-1-1.log-114499-========> Latency   (L) : 0.50000 s (seconds)
trace-file-1-1.log-114500-========> Size      (S) : 17000 B (Bytes) 
trace-file-1-1.log-114501-========> Model     (M) : GTNets
trace-file-1-1.log-114502-[0.000000] [simix_kernel/INFO] setting 'workstation_model' to 'compound'
trace-file-1-1.log-114503-[0.000000] [xbt_cfg/INFO] type in variable = 2
trace-file-1-1.log-114504-[0.000000] [simix_kernel/INFO] setting 'cpu_model' to 'Cas01'
trace-file-1-1.log-114505-[0.000000] [xbt_cfg/INFO] type in variable = 2
trace-file-1-1.log-114506-[0.000000] [simix_kernel/INFO] setting 'network_model' to 'GTNets'
trace-file-1-1.log-114507-[0.000000] [xbt_cfg/INFO] type in variable = 2
trace-file-1-1.log-114508-[S1:master:(1) 6.068680] [msg_test/INFO] Send completed (to C1). Transfer time: 6.068680	 Agregate bandwidth: 2801.268151
trace-file-1-1.log-114509-[S1:master:(1) 6.068680] [msg_test/INFO] Completed peer: C1 time: 6.068680
trace-file-1-1.log-114510-[C1:slave:(2) 6.068680] [msg_test/INFO] ===> Estimated Bw of FLOW[1] : 2801.268151 ;  message from S1 to C1  with remaining : 0.000000
trace-file-1-1.log-114511-=========================><=========================

So out of the 7920 tested configurations, there is one that mysteriously fails (the simulation does not even start) and it does not appear to have any particular characteristic.

I'm going to investigate this during the week-end and I'll keep you posted.

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

Thanks for the update!

from submissions.

alegrand avatar alegrand commented on June 9, 2024

Hello,

This is really crazy. It seems to fail solely for this particular configuration (B=1E5, L=0.5, S=17000, M=GTNets) and I really cannot figure out why. It works like a charm for every value S in [16990,17010] but for 17000 (note that this parameter is the size of the message which is sent from a host to an other in the simulation so this is really weird) !!! I've activated SimGrid's debugging logs, and when you compare the execution for 17000 and 17001, if you ignore pointer address differences, the first difference is right before the deadlock message, because there is no next action end for GTNets whereas there should be one. So I've run gdb, and even attached to the forked child (because the GTNets simulation is forked to determine the next completion) and although a flow was created, it appears that there is no event to dequeue. I already took me quite some time and I do not know GTNets well enough (and I really do not want to debug it) to dig into what could be wrong.

The goal of this challenge was not to fix/improve the old code (especially as this one was a prototype whic has been deprecated when SimGrid moved to NS3 as GTNets was no longer developed). So I think I'll stop there the investigation and amend the article by explaining that I checked that all the (7920) "new" results match the (316800) "old" ones except for one that mysterious configuration where it stops.

While I'm there, I'll take @rgrunbla 's suggestion about the overlay into account and update the URLs with more stable ones (gforge is closing, docker recently announced that they would automatically remove images that had not been accessed in the last few months. :().

The other weird point is that the behavior of this old perl script seems different in my machine (simulation fails silently) and the one of @rgrunbla (stops). If we ever understand what was wrong, I'll also amend the article.

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@rgrunbla could you have a look at @alegrand 's answer and provide your reviewer's feedback?

from submissions.

rgrunbla avatar rgrunbla commented on June 9, 2024

Yeah, sorry about the big delay in this answer. I'm completely satisfied with @alegrand answer and explanations. No additional requirements, everything is ok from my side.

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@rgrunbla : do you have an ORCID? If yes, please let me know...

from submissions.

rgrunbla avatar rgrunbla commented on June 9, 2024

Yes I do ! https://orcid.org/0000-0002-9146-9888

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@rgrunbla Thanks!
@alegrand At this publicaiton step, please process my PR about metadata and modify your article.pdf accordingly.

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@alegrand a gentle reminder that I need your updated article.pdf to proceed with publication.

from submissions.

khinsen avatar khinsen commented on June 9, 2024

🔔 This is a wakeup call for @alegrand. All we need from you is a final PDF for publication! 🔔

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@alegrand please finalize the pdf update (alegrand/reproducibility-challenge#1)

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

@alegrand a gentle reminder that without the updated pdf your paper cannot be published.

from submissions.

rougier avatar rougier commented on June 9, 2024

@benoit-girard Maybe you can try email

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

I just tried that, we will see...

from submissions.

benoit-girard avatar benoit-girard commented on June 9, 2024

Some information has been lost since March: I was requiring the integration of my pull request (alegrand/reproducibility-challenge#1), that contains the necessary metadata concerning editor and reviewer identity, as well as dates of submission, acceptance and publication.
Can you do that (and remove the "under review label"), so that I can use that pdf for the end of the publication process?

from submissions.

alegrand avatar alegrand commented on June 9, 2024

@benoit-girard, I have finally merged and kind of fixed the latex (for some reason, after merging, lualatex did not want anymore to break my very long hrefs generated by the swhid). It is here is is imho OK for a final version: https://github.com/alegrand/reproducibility-challenge/blob/master/article.pdf

from submissions.

rougier avatar rougier commented on June 9, 2024

@benoit-girard @alegrand Is the article ready to be published then? I can help if necessary.

from submissions.

alegrand avatar alegrand commented on June 9, 2024

Well yes, I think so. As I said, the final version is ready but I'm not sure how to proceed with the publication.

from submissions.

rougier avatar rougier commented on June 9, 2024

@benoit-girard will proceed with the publication (or I can do it just tell me Benoît)

from submissions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.