Giter Site home page Giter Site logo

Comments (15)

veprbl avatar veprbl commented on August 15, 2024 1

Indeed, I see messages for both ROOTs using a relatively new testing dataset:

star-sw/tests/joblist.json

Lines 1100 to 1108 in 28b6c4f

{
"jobid": "123",
"descr": "Test for StSpinPool/StFcsPi0FinderForEcal (PR #412)",
"chopts": "pp2022,StiCA,BEmcChkStat,ftt,-picowrite,-hitfilt,-evout",
"inpfile": "st_fwd_23040001_raw_0000003.daq",
"inppath": "/star/rcf/test/daq/2022/040/23040001",
"outpath": "",
"nevents": "50"
}

ROOT5 (test 123)

https://github.com/star-bnl/star-sw/actions/runs/4119348287/jobs/7114791395#step:5:1767

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=6)<=0,  crID=64, skip it

ROOT6 (test 123)

https://github.com/star-bnl/star-sw/actions/runs/4119348287/jobs/7114796210#step:5:1997

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=6)<=0,  crID=64, skip it

from star-sw.

plexoos avatar plexoos commented on August 15, 2024

Thanks for the report @veprbl

I've compared the corresponding logs from a few test jobs and it appears to me that the warnings you report can be seen in both ROOT5 and ROOT6 jobs. So, it does not look like something specific to ROOT6. The tests jobs I looked at were: 90, 102, 119, and some earlier chains don't show them at all.

Actually, the logs look almost identical for ROOT5 and ROOT6 which is a very good news given the fact that there is lots of numerical information dumped in the files such as number of tracks, vertex positions, etc.

Hopefully, @kkauder, the owner of StEmc* packages, can shed some light on the nature of these warnings.

from star-sw.

kkauder avatar kkauder commented on August 15, 2024

Sorry :-/ I was assigned as owner by mistake. I was briefly software coordinator for the BEMC back in 2016 but never had much to do with this code, and certainly not the EEMC

from star-sw.

gvisser avatar gvisser commented on August 15, 2024

hi guys, This looks like a usual "crate header error" for ESMD. Data from "crates" showing this error may be bad in such runs. It sounds like such data is automatically skipped? That's probably a good thing. You could go back to STAR shift log for the runs in question and see if any issues were remarked on. Hope this comment helps some.

  • Gerard Visser (EEMC HW guy)

from star-sw.

klendathu2k avatar klendathu2k commented on August 15, 2024

This looks like a usual "crate header error" for ESMD. Data from "crates" showing this error may be bad in such runs. It sounds like such data is automatically skipped?

That is my recollection of what that warning implies. Looking at the code which generates it (StRoot/StEmcRawMaker/StEemcRaw.cxx), the condition is that the size of the header is reported as <=0. The data from that crate is skipped.

from star-sw.

fgeurts avatar fgeurts commented on August 15, 2024

Hi,

If this is expected behavior (re: @gvisser's comment) for bad/skipped data, can we then close this as a software issue? Please advise.

from star-sw.

gvisser avatar gvisser commented on August 15, 2024

Frank, I think probably yes this issue should be closed. I can only suggest someone checks that runs which are known to be good (e.g. at least, did not have suchonline crate header messages when taken) do not make these errors. I hope this whole business has been about certain particular runs. For sure, we have had at times a dead crate or other corruption, sometimes noted in shiftlog though probably sometimes not mentioned. For sure I expect such errors to show in some runs, offline.

from star-sw.

veprbl avatar veprbl commented on August 15, 2024

@gvisser Unfortunately your criteria is not met. The issue is seen in a random test I've picked (125):

from star-sw.

gvisser avatar gvisser commented on August 15, 2024

ok...
If you can help me to understand any way that I can be of assistance in this, I'll certainly try. I don't understand your first two links there, sorry. (Do I get to see the error messages about headers somehow through those links?) I agree from shiftlog that 23072022 looks ok, although sometimes there are problems not noted in the shiftlog of course. Perhaps we should try a new test run we could probably take tomorrow, if ESMD has been left on, I can ask Will about that. With a new run I can look at DAQ logfile in detail and see if any relevant messages there. What do you think, is it worth trying this?

from star-sw.

fgeurts avatar fgeurts commented on August 15, 2024

I am also confused about what the issue is.

If I understand Gerard correctly, this is what a warning message looks like when a crate has an issue. So, presuming this is the case, what is wrong with using this warning?

Or, is the issue that we know there should be no crate issues, and the software still gives us the message.

In the former case, I propose we close this issue unless there is an objection to using this mechanism to warn about a potential data-integrity/crate issue. In the latter case, it would be good to have EMC experts fix this software bug.

from star-sw.

veprbl avatar veprbl commented on August 15, 2024

Actually, looking for the second time. The test above (125) is actually fine, producing singular messages like:

StEmcRawMaker:INFO  - StEemcRaw::headersAreSick() --> Nfibers=54

The suspicious test is (123) produces additional messages:

StEmcRawMaker:INFO  - StEemcRaw::headersAreSick() --> Nfibers=54
StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=6)<=0,  crID=64, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=7)<=0,  crID=65, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=8)<=0,  crID=66, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=9)<=0,  crID=67, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=10)<=0,  crID=68, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=11)<=0,  crID=69, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=12)<=0,  crID=70, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=13)<=0,  crID=71, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=14)<=0,  crID=72, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=15)<=0,  crID=73, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=16)<=0,  crID=74, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=17)<=0,  crID=75, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=18)<=0,  crID=76, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=19)<=0,  crID=77, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=20)<=0,  crID=78, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=21)<=0,  crID=79, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=22)<=0,  crID=80, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=23)<=0,  crID=81, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=24)<=0,  crID=82, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=25)<=0,  crID=83, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=26)<=0,  crID=84, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=27)<=0,  crID=85, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=28)<=0,  crID=86, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=29)<=0,  crID=87, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=30)<=0,  crID=88, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=31)<=0,  crID=89, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=32)<=0,  crID=90, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=33)<=0,  crID=91, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=34)<=0,  crID=92, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=35)<=0,  crID=93, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=36)<=0,  crID=94, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=37)<=0,  crID=95, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=38)<=0,  crID=96, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=39)<=0,  crID=97, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=40)<=0,  crID=98, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=41)<=0,  crID=99, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=42)<=0,  crID=100, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=43)<=0,  crID=101, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=44)<=0,  crID=102, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=45)<=0,  crID=103, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=46)<=0,  crID=104, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=47)<=0,  crID=105, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=48)<=0,  crID=106, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=49)<=0,  crID=107, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=50)<=0,  crID=108, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=51)<=0,  crID=109, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=52)<=0,  crID=110, skip it

StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=53)<=0,  crID=111, skip it

So the correct links would be:

But, again, I don't see issue reported in run logs.

I've clicked the gear button and then "Download log archive" which pulls are logs as text files. Grepping through it:

grep 'StEemcRaw::headersAreSick() sizeHeader' *.txt | cut -d : -f 1 | sort -u

I get that quite a few tests show this issue:

100_test (102, root5, gcc485).txt
101_test (102, root6, gcc485).txt
102_test (103, root5, gcc485).txt
103_test (103, root6, gcc485).txt
104_test (104, root5, gcc485).txt
105_test (104, root6, gcc485).txt
106_test (105, root5, gcc485).txt
107_test (105, root6, gcc485).txt
108_test (106, root5, gcc485).txt
109_test (106, root6, gcc485).txt
110_test (107, root5, gcc485).txt
111_test (107, root6, gcc485).txt
112_test (108, root5, gcc485).txt
113_test (108, root6, gcc485).txt
114_test (109, root5, gcc485).txt
115_test (109, root6, gcc485).txt
116_test (110, root5, gcc485).txt
117_test (110, root6, gcc485).txt
118_test (111, root5, gcc485).txt
119_test (111, root6, gcc485).txt
120_test (112, root5, gcc485).txt
121_test (112, root6, gcc485).txt
122_test (113, root5, gcc485).txt
123_test (113, root6, gcc485).txt
124_test (114, root5, gcc485).txt
125_test (114, root6, gcc485).txt
126_test (115, root5, gcc485).txt
127_test (115, root6, gcc485).txt
128_test (116, root5, gcc485).txt
129_test (116, root6, gcc485).txt
130_test (117, root5, gcc485).txt
131_test (117, root6, gcc485).txt
132_test (118, root5, gcc485).txt
133_test (118, root6, gcc485).txt
134_test (119, root5, gcc485).txt
135_test (119, root6, gcc485).txt
136_test (120, root5, gcc485).txt
137_test (120, root6, gcc485).txt
138_test (121, root5, gcc485).txt
139_test (121, root6, gcc485).txt
142_test (123, root5, gcc485).txt
143_test (123, root6, gcc485).txt
40_test (37, root5, gcc485).txt
41_test (37, root6, gcc485).txt
92_test (90, root5, gcc485).txt
93_test (90, root6, gcc485).txt
94_test (91, root5, gcc485).txt
95_test (91, root6, gcc485).txt
96_test (92, root5, gcc485).txt
97_test (92, root6, gcc485).txt
98_test (101, root5, gcc485).txt
99_test (101, root6, gcc485).txt

Next, if we ask for context on header size for icr=0

grep -B4 'StEemcRaw::headersAreSick() sizeHeader(icr=0)' *.txt

That correlates perfectly with disabled ETOW, with outputs like:

99_test (101, root6, gcc485).txt-2024-03-20T14:02:19.1882279Z StEmcRawMaker:ERROR - ETOW Structure not found
99_test (101, root6, gcc485).txt-2024-03-20T14:02:19.1890687Z StEmcRawMaker:ERROR - ESMD Structure not found
99_test (101, root6, gcc485).txt-2024-03-20T14:02:19.1891721Z StEmcRawMaker:INFO  - StEemcRaw::copyRawData() 0 data bloks copied
99_test (101, root6, gcc485).txt-2024-03-20T14:02:19.1892426Z StEmcRawMaker:INFO  - StEemcRaw::headersAreSick() --> Nfibers=54
99_test (101, root6, gcc485).txt:2024-03-20T14:02:19.1893195Z StEmcRawMaker:WARN  - StEemcRaw::headersAreSick() sizeHeader(icr=0)<=0,  crID=1, skip it

Selecting lines that precede icr=6, but not additional sizeHeader errors:

grep -B4 'StEemcRaw::headersAreSick() sizeHeader(icr=6)' *.txt | grep -v 'sizeHeader(icr' | grep '.\{70\}'

correlates with missing ESMD

97_test (92, root6, gcc485).txt-2024-03-20T15:14:54.6056032Z StEmcRawMaker:ERROR - ESMD Structure not found
97_test (92, root6, gcc485).txt-2024-03-20T15:14:54.6057297Z StEmcRawMaker:INFO  - StEemcRaw::copyRawData() 6 data bloks copied
97_test (92, root6, gcc485).txt-2024-03-20T15:14:54.6058403Z StEmcRawMaker:INFO  - StEemcRaw::headersAreSick() --> Nfibers=54

This makes me conclude that we are just seeing false alarms when detectors were included in the run, but detectors were down. This should be treated in the physics data analysis QA to some extent already. In principle, it would be nice to have those additional errors silenced for the cases when we already know that ESMD or ETOW are not found.

from star-sw.

veprbl avatar veprbl commented on August 15, 2024

ok... If you can help me to understand any way that I can be of assistance in this, I'll certainly try. I don't understand your first two links there, sorry. (Do I get to see the error messages about headers somehow through those links?)

Oh, sorry, I could have explained more. For the first link, you need to expand the collapsed logs behind Run TEST_CMD= and look for the error message. The second link is just pointer to a file that has a mapping from test id like (123) to an actual DAQ file and the chain options.

from star-sw.

veprbl avatar veprbl commented on August 15, 2024

I'm seeing there are two implementations of the check

Bool_t StEemcRaw::headersAreSick(StEEMCReader *eeReader, StEmcRawData *raw, int token, int runId, int time)

and that one, looks like it has some check for if the bank is in
LOG_INFO << "StEemcRaw::headersAreSick() --> Nfibers=" << mDb->getNFiber() << endm;
LOG_INFO << "StEemcRaw::headersAreSick() --> isEemcBankIn('T')=" << eeReader->isEemcBankIn('T') << endm;
LOG_INFO << "StEemcRaw::headersAreSick() --> isEemcBankIn('S')=" << eeReader->isEemcBankIn('S') << endm;
for(icr=0;icr<mDb->getNFiber();icr++)
{
const EEmcDbCrate *fiber=mDb-> getFiber(icr);
if(!fiber->useIt)
continue; // drop masked out crates
if (fiber->type == 'T' && !eeReader->isEemcBankIn('T')) continue;
if (fiber->type == 'S' && !eeReader->isEemcBankIn('S')) continue;

This was introduced in a5acc06

But what we are seeing is a similar function

Bool_t StEemcRaw::headersAreSick(StEmcRawMaker* maker, StEmcRawData *raw, int token, int runId, int time)

which originally had something
LOG_INFO << "StEemcRaw::headersAreSick() --> Nfibers=" << mDb->getNFiber() << endm;
//LOG_INFO << "StEemcRaw::headersAreSick() --> isEemcBankIn('T')=" << etow << endm;
//LOG_INFO << "StEemcRaw::headersAreSick() --> isEemcBankIn('S')=" << esmd << endm;
for(icr=0;icr<mDb->getNFiber();icr++)
{
const EEmcDbCrate *fiber=mDb-> getFiber(icr);
if(!fiber->useIt)
continue; // drop masked out crates
//if (fiber->type == 'T' && !etow) continue;
//if (fiber->type == 'S' && !esmd) continue;

but it's been commented for ~20 years

I don't know the context on why there are code paths going through the StEmcRawMaker vs StEEMCReader.

from star-sw.

klendathu2k avatar klendathu2k commented on August 15, 2024

from star-sw.

fgeurts avatar fgeurts commented on August 15, 2024

If I understand Gerard correctly, this is what a warning message looks like when a crate has an issue. So, presuming this is the case, what is wrong with using this warning?

Or, is the issue that we know there should be no crate issues, and the software still gives us the message.

In the former case, I propose we close this issue unless there is an objection to using this mechanism to warn about a potential data-integrity/crate issue. In the latter case, it would be good to have EMC experts fix this software bug.

Consensus is that this is expected behavior.

from star-sw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.