Giter Site home page Giter Site logo

arpa-simc / arkimet Goto Github PK

View Code? Open in Web Editor NEW
13.0 8.0 5.0 17.26 MB

A set of tools to organize, archive and distribute data files.

License: Other

C++ 73.63% Shell 0.30% Yacc 0.24% Lua 11.86% Python 11.26% Makefile 0.57% M4 1.49% LLVM 0.07% Meson 0.59%
meteorology archiving grib grib2 bufr hdf5

arkimet's Introduction

Build Status Build Status Build Status Build Status

Arkimet

Introduction

Arkimet is a set of tools to organize, archive and distribute data files. It currently supports data in GRIB, BUFR, HDF5 and VM2 formats.

Arkimet manages a set of datasets, each of which contains omogeneous data stored in segments. It exploits the commonalities between the data in a dataset to implement a fast, powerful and space-efficient indexing system.

When data is ingested into arkimet, it is scanned and annotated with metadata, such as reference time and product information, then it is dispatched to one of the datasets configured in the system.

Datasets can be queried with a comprehensive query language, both locally and over the network using an HTTP-based protocol.

Old data can be archived using an automatic maintenance procedure, and archives can be taken offline and back online at will.

A summary of offline data is kept online, so that arkimet is able to report that more data for a query would be available but is currently offline.

Arkimet is Free Software, licensed under the GNU General Public License version 2 or later.

Arkimet documentation: https://arpa-simc.github.io/arkimet/

Installing arkimet

Arkimet is already packaged in .rpm format.

For CentOS and Fedora, rpm files are hosted in a copr repo: https://copr.fedorainfracloud.org/coprs/simc/stable/

If you want to build and install arkimet yourself, you'll need to install Meson and run the following commands:

meson setup builddir && cd builddir
meson compile
meson test
meson install

If you're familiar with .rpm and .deb packaging you'll find the packaging files in the debian and fedora directories.

Features

General

All arkimet functionality besides metadata extraction and dataset recovery is file format agnostic.

Data is treated like an opaque, read only binary string, that is never modified to guarantee integrity.

Data files in the archive are only accessed using append operations, to avoid the risk of accidentally corrupting existing data.

Metadata

The extraction of metadata is very flexible, and it can be customized with Python scripts.

Metadata contains timestamped annotations to track data workflow.

Metadata can be summarised, to represent what data can be found in a big dataset without needing to access its contents.

Summaries can be shared to build data catalogs.

Remote access

Remote data access is provided through arki-server, an HTTP server application.

arki-server can serve data from local datasets, as well as from remote datasets served by other. arki-server instances (this allows, for example, to provide a single arki-server external front-end to various internal arki-servers in an organisation).

arki-server can be run behind apache mod-proxy to provide encrypted (SSL) or authenticated access.

Client data access is done using the featureful libCURL, and can access the server over SSL or through HTTP proxies.

When performing a query, it is possible to extract only the summary of its results, as a quick preview before actually transfering the result data.

Postprocessing chains can be provided by the server to transfer only the postprocessed data (e.g. transferring an average value instead of a large grid of data).

Archive

File layout can be customised depending on data volumes (one file per day, one file per month, etc.)

Each dataset can be configured to index a different set of metadata items, to provide the best tradeoff between indexing speed, disk space used by the index and query speed.

Arkimet can detect if a datum already exists in a dataset, and either replace the old version or refuse to import the new one. It is possible to customize what metadata fields make data unique in each dataset.

Datasets are self-contained, so it is possible to store them in offline media, and query them right away as soon as the offline media comes online.

User interfaces

A powerful and flexible suite of commandline tools allows to easily integrate arkimet into automated data processing chains in production systems.

arki-server not only allows remote access to the datasets, but it also provides a low-level, web-based query interface.

ArkiWEB is a web-based front-end to arkimet that provides simple and powerful browsing and data retrieval for end users.

Factsheet

The metadata currently supported are:

  • Reference time
  • Origin
  • Product
  • Level
  • Time Range
  • Area
  • Ensemble run information

The formats used to handle metadata are:

  • Arkimet-specific compact binary representation
  • Human-readable, easy to parse YAML representation
  • JSON

The file formats that can be scanned are:

  • WMO GRIB (edition 1 and edition 2), any template can be supported via eccodes and Python scripts
  • WMO BUFR (edition 3 and edition 4), via DB-All.e
  • HDF5, currently only used for ODIM radar data
  • VM2 ascii line-based format used by ARPAE-SIMC

The software architecture is mostly format-agnostic, and is built to support implementing more data formats easily.

The query language supports:

  • Exact or partial match on any types of metadata
  • Advanced matching of reference times:
# Open or close intervals
reftime:>=2005
reftime:>=2005,<2007
# Exact years, month, days, hours, etc.
reftime:=2007
reftime:=2007-05
# Time extremes can vary in precision
reftime:>=2005-06-01 12:00
# Also interval of time of day can be specified
reftime:>=12:00,<18:00
# And repetitions during the day
reftime:>=2007-06/6h
reftime:=2007,>=12:00/30m,<18:00
  • Geospatial queries
area: bbox covers POINT(11 44)
area: bbox intersects LINESTRING(10 43, 10 45, 12 45, 12 43, 10 43)
# It is possible to configure aliases for query fragments, to use in common-queries:
area: italy

Contact and copyright information

The author of arkimet is Enrico Zini [email protected]

Arkimet is Copyright (C) 2005-2022 ARPAE-SIMC [email protected]

Arkimet is Free Software, licensed under the terms of the GNU General Public License version 2.

Software dependencies

Arkimet should build and run on any reasonably recent Unix system, and has been tested on Debian, RedHat, Suse Linux and AIX systems.

Python version 3.6 or later is required for the import scripts http://www.lua.org/

SQLite is required for indexed datasets http://www.sqlite.org/ and an embedded version is included with arkimet and used if the system does not provide one.

The fast LZO compression library is required for saving space-efficient metadata, and an embedded version is included with arkimet and used if the system does not provide one.

CURL is optionally required to access remote datasets http://curl.haxx.se/

ECMWF grib api is optionally required for GRIB (edition 1 and 2) support http://www.ecmwf.int/products/data/software/gribapi.html

DB-All.e version 6.0 or later is optionally required to enable BUFR support.

Examples

To extract metadata:

$ arki-scan --yaml --annotate file.grib

To create a summary of the content of many GRIB files:

$ arki-scan --summary *.grib > summary

To view the contents of the generated summary file:

$ arki-dump --yaml summary

To select GRIB messages from a file:

$ arki-scan file.grib | arki-grep --data 'origin:GRIB1,98;reftime:>=2008' > file1.grib

To configure a dataset interactively:

$ arki-config datasets/name

To dispatch grib messages into datasets:

$ arki-mergeconf datasets/* > allowed-datasets
$ arki-scan --dispatch=allowed-datasets file.grib > dispatched.md

To query:

$ arki-query 'origin:GRIB1,98;reftime:>=2008' datasets/* --data > output.grib

To get a preview of the results without performing the query:

$ arki-query 'origin:GRIB1,98;reftime:>=2008' datasets/* --summary --yaml --annotate

To export some datasets for remote access:

$ arki-mergeconf datasets/* > datasets.conf
$ arki-server --port=8080 datasets.conf

To query a remote dataset:

$ arki-mergeconf http://host.name > datasets.conf
$ arki-query 'origin:GRIB1,98;reftime:>=2008' datasets.conf --data

arkimet's People

Contributors

brancomat avatar dcesari avatar edigiacomo avatar mnuccioarpae avatar spanezz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arkimet's Issues

arki-query --data ogni tanto non trova file nel dataset

$ yum info arkimet | grep 'Version\|Release'
Version     : 1.0
Release     : 7

$ arki-query --data "reftime: >=2016-07-16, <=2016-07-21; product: t or u or v or pr; level: laypbl40" --debug --verbose /arkivio/arkimet/dataset/lamaz > /tmp/aq.grib 
arkimet 1.0, compiled on May  2 2016 17:25:27
Copyright (C) 2007-2010 ARPA Emilia Romagna.
arkimet comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option) any
later version.
Processing /arkivio/arkimet/dataset/lamaz...
Running query SELECT m.id, m.format, m.file, m.offset, m.size, m.notes, m.reftime, m.uniq, m.other FROM md AS m WHERE (reftime>='2016-07-16 00:00:00' AND reftime<='2016-07-21 23:59:59') AND uniq IN (SELECT id FROM mduniq WHERE product IN(15,17,19,20) AND level IN(33,34,35,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56)) ORDER BY m.reftime
file ./2016/07-16.grib not found

$ echo $?
1

La stessa cosa succede:

  • con altri dataset
  • shiftando l'intervallo temporale (e.g. dal 15 al 20 luglio)

La cosa non succede più diminuendo il numero di giorni oppure il numero di product. Sembra quasi che la cosa succeda quando si supera un certo numero di dati.

Inoltre, se la query viene fatta via http, il messaggio di errore non viene inviato al client e il processo termina con status 0.

arki-query --data print headers at the end of the file

arkimet-1.0-4

The headers are printed at the end of the response with --data only.

$ arki-query --data 'reftime:>= today 00:00' http://arkioss4.metarpa:8090/dataset/agrmet | tail 
201603151800,7357,80,10.4,,,000000000
201603151800,7357,81,1050,,,000000000
201603151800,7357,82,1029,,,000000000
201603151800,7357,99,1031,,,000000000
HTTP/1.0 200 OK
Date: Tue, 15 Mar 2016 18:42:19 GMT
Server: arkimet/1.0
Content-Type: application/octet-stream
Content-Disposition: attachment; filename=agrmet.bin

I run arki-server under gdb, it seems that the data_start_hook is not fired. I'm in a hurry, tomorrow I'll try to do some other tests.

@brancomat I think that tomorrow arkioss4.metarpa should be downgraded to the previous version.

Test fails: arki_scan_grib.10

$ eatmydata make check -C arki TEST_WHITELIST="arki_scan_grib.10"
GRIB_API ERROR   :  Unable to find template gridDefinitionSection from grib2/template.3.32768.def 
arki_scan_grib.10: metadata does not contain a timerange, but we expected: "Timedef(0s,254,0s)"
  scan/grib-tut.cc:498:actual(md.md).contains("timerange", "Timedef(0s,254,0s)") [Sample: inbound/cosmo/anist_1.grib2]

grib_api is not able to open test/data/cosmo/anist_1.grib2. Either grib_api needs to be fixed, or some of the files in test/data/cosmo are not relevant anymore and the test should be adjusted.

When importing, silently discard data older than `delete age`

Data older than delete age imported in a dataset would be immediately deleted at the first dataset repack. Discarding it at import time avoids that step, while staying within the idea of delete age, that is to have all the data in the dataset up to that age, and no more.

testAcquire not implemented for ondisk2 datasets

bufr e conf allegati, riscontrato su arkimet-1.0-6.x86_64

 [arkimet@arkimet4 SAVESOUND]$ arki-scan --testdispatch=conf --dump --status n1SMRPISOUNDLMM.2016041700.bufr
Message BLOB(bufr,/autofs/scarola/arkimet/SAVESOUND/n1SMRPISOUNDLMM.2016041700.bufr:0+2068): acquire to lami_i2_ope_temp dataset
import FAILED: testAcquire not implemented for ondisk2 datasets
/autofs/scarola/arkimet/SAVESOUND/n1SMRPISOUNDLMM.2016041700.bufr failed: testAcquire not implemented for ondisk2 datasets

data.zip

Test fails: arki_scan_grib.7

$ eatmydata make check -C arki TEST_WHITELIST="arki_scan_grib.7"
[...]
arki_scan_grib.7: product 'GRIB2(00250, 002, 000, 000, 005, 000)' is different than the expected 'GRIB2(00250, 002, 000, 000)'
  scan/grib-tut.cc:365:actual(md).contains("product", "GRIB2(250, 2, 0, 0)")

Something changed in grib_api when parsing test/data/cleps_pf16_HighPriority.grib2 and I do not know what should be the expected value of product in this case.

--disable-geos does not work

When configuring arkimet build with ./configure --disable-geos I get the compilation error:

libtool: compile:  g++ -DHAVE_CONFIG_H -I.. -I.. -L/usr/lib -DCONF_DIR=\"/usr/local/etc/arkimet\" -DPOSTPROC_DIR=\"/usr/local/lib/arkimet\" -Werror -Wall -g -O2 -std=c++11 -MT libarkimet_la-bbox.lo -MD -MP -MF .deps/libarkimet_la-bbox.Tpo -c bbox.cc  -fPIC -DPIC -o .libs/libarkimet_la-bbox.o
bbox.cc: In member function 'virtual std::unique_ptr<dummygeos::DummyGeos> arki::BBox::operator()(const arki::types::Area&) const':
bbox.cc:184:43: error: call of overloaded 'unique_ptr(int)' is ambiguous
  return unique_ptr<ARKI_GEOS_GEOMETRY>(0);
                                           ^
bbox.cc:184:43: note: candidates are:
In file included from /usr/include/c++/4.8.3/memory:81:0,
                 from ../arki/types/area.h:4,
                 from ../arki/bbox.h:26,
                 from bbox.cc:2:
/usr/include/c++/4.8.3/bits/unique_ptr.h:273:7: note: std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = dummygeos::DummyGeos; _Dp = std::default_delete<dummygeos::DummyGeos>] <deleted>
       unique_ptr(const unique_ptr&) = delete;
       ^
/usr/include/c++/4.8.3/bits/unique_ptr.h:160:7: note: std::unique_ptr<_Tp, _Dp>::unique_ptr(std::unique_ptr<_Tp, _Dp>&&) [with _Tp = dummygeos::DummyGeos; _Dp = std::default_delete<dummygeos::DummyGeos>]
       unique_ptr(unique_ptr&& __u) noexcept
       ^
/usr/include/c++/4.8.3/bits/unique_ptr.h:157:17: note: constexpr std::unique_ptr<_Tp, _Dp>::unique_ptr(std::nullptr_t) [with _Tp = dummygeos::DummyGeos; _Dp = std::default_delete<dummygeos::DummyGeos>; std::nullptr_t = std::nullptr_t]
       constexpr unique_ptr(nullptr_t) noexcept : unique_ptr() { }
                 ^
/usr/include/c++/4.8.3/bits/unique_ptr.h:141:7: note: std::unique_ptr<_Tp, _Dp>::unique_ptr(std::unique_ptr<_Tp, _Dp>::pointer) [with _Tp = dummygeos::DummyGeos; _Dp = std::default_delete<dummygeos::DummyGeos>; std::unique_ptr<_Tp, _Dp>::pointer = dummygeos::DummyGeos*]
       unique_ptr(pointer __p) noexcept
       ^
bbox.cc:186:1: error: control reaches end of non-void function [-Werror=return-type]
 }
 ^
bbox.cc: At global scope:
bbox.cc:95:39: error: 'std::vector<std::pair<double, double> > arki::bbox(lua_State*)' defined but not used [-Werror=unused-function]
 static vector< pair<double, double> > bbox(lua_State* L)
                                       ^

arki-check --remove doesn't remove anything

With arkimet-1.0-15, the --remove option do nothing.

Creating and populating the dataset

$ mkdir -p /tmp/dataset/test/2016
$ cat > /tmp/dataset/test/config <<EOF
type = ondisk2
filter = area:VM2,
step = daily
unique = reftime, area, product
EOF
$ echo "201610050000,12626,139,70,,,000000000" > /tmp/dataset/test/2016/10-05.vm2
$ arki-check -f /tmp/dataset/test/
test:2016/10-05.vm2: segment found on disk but not in index
test:2016/10-05.vm2: rescanned
test: check 0 files ok, 1 file rescanned
$ arki-query --summary --summary-restrict=reftime --dump '' /tmp/dataset/test/
SummaryItem:
SummaryStats:
  Count: 1
  Size: 37
  Reftime: 2016-10-05T00:00:00Z

Creating the metadata file

$ arki-query '' /tmp/dataset/test/ > /tmp/todelete.md
$ arki-dump /tmp/todelete.md
Source: BLOB(vm2,/tmp/tmp/dataset/test/2016/10-05.vm2:0+37)
Product: VM2(139, bcode=B13003, l1=2000, lt1=103, p1=0, p2=0, tr=254, unit=%)
Reftime: 2016-10-05T00:00:00Z
Area: VM2(12626,lat=4452250, lon=1251139, rep=boa)
Note: [2016-10-05T14:13:04Z]Scanned from /tmp/dataset/test/2016/10-05.vm2

Trying to delete the data, no success.

$ arki-check --remove=/tmp/todelete.md --debug --verbose /tmp/dataset/test/
arkimet 1.0, compiled on Sep 26 2016 10:41:09
Copyright (C) 2007-2010 ARPA Emilia Romagna.
arkimet comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option) any
later version.
$ arki-query --summary --summary-restrict=reftime --dump '' /tmp/dataset/test/
SummaryItem:
SummaryStats:
  Count: 1
  Size: 37
  Reftime: 2016-10-05T00:00:00Z

Using --fix, same result.

$ arki-check --fix --remove=/tmp/todelete.md --debug --verbose /tmp/dataset/test/
arkimet 1.0, compiled on Sep 26 2016 10:41:09
Copyright (C) 2007-2010 ARPA Emilia Romagna.
arkimet comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option) any
later version.
$ arki-query --summary --summary-restrict=reftime --dump '' /tmp/dataset/test/
SummaryItem:
SummaryStats:
  Count: 1
  Size: 37
  Reftime: 2016-10-05T00:00:00Z

test fails: arki_scan_grib.8

$ eatmydata make check -C arki TEST_WHITELIST="arki_scan_grib.8"
[...]
GRIB_API ERROR   :  Unable to find template gridDefinitionSection from grib2/template.3.32768.def 
arki_scan_grib.8: metadata does not contain a origin, but we expected: "GRIB2(00200, 00000, 000, 000, 203)"
  scan/grib-tut.cc:385:actual(md).contains("origin", "GRIB2(00200, 00000, 000, 000, 203)")

grib_api cannot open test/data/calmety_20110215.grib2 anymore: I do not know if grib_api should be modified, or if this kind of file is not relevant anymore and the test should just be removed.

arki-query --summary con valori raggruppati per tipo di metadato

(Titolo fuorviante, ma non riesco proprio a sintetizzare la cosa)

Sarebbe utile avere un'opzione di arki-query che permettesse di elencare tutti i valori per ogni tipo di metadato.

Ad esempio:

$ arki-query --dump --group-by-type QUERY DATASET

SummaryItem:
    Origin:
        GRIB1(098, 000, 145)
    Product:
         GRIB1(098, 128, 187)
         GRIB1(098, 228, 024)
         ...
    Level:
         GRIB1(001)
     ...
SummaryStats:
    Count: 124
    Size: 24404
    Reftime: 2015-12-14T00:00:00Z to 2015-12-15T00:00:00Z

probably off-by-one bug in importing OR repacking data (maybe already fixed)

vado in italiano che già è complicata così:

  • i dataset gts_bufr e gts_synop hanno errori tipo

    BUFR validation failed: buffer does not end with '7777'
    

    e apparentemente manca un byte alla fine
    (i dataset sono stati copiati e accessibili da server di sviluppo su /autofs/scratch2/dbranchini/dataset_gts)

  • questi errori non avrebbero passato un arki-scan quindi l'errore sta presumibilmente a valle di acquisizione e preparazione dati

  • l'ultimo di questi errori nel dataset gts_temp risale al 10/04/2016 nel dataset gts_temp:

    Cannot parse BUFR message #1225: 2016/04-10.bufr:8871404+992: section 5 does not contain '7777' (0b inside section End section) at offset 8871404.
    
    • questo il susseguirsi degli aggiornamenti sul server arkimet:
    1.0.5 dal 05/11/2016
    1.0.6 dal 11/04/2015
    1.0.7 dal 03/05/2016
    

    quindi TEORICAMENTE (a meno che non ci sia di mezzo un problema di repack) dalla 1.0.7 il problema non si è più presentato

Attività auspicabili nell'ordine:

  • identificare problema e escludere che sia ancora presente
  • diagnosticare dati affetti e sanarli
  • capire quando è stato introdotto (bonus)

errore su make (cpp11 branch)

premesso che il configure pare sano e nello spec è senza opzioni (nella vecchia versione c'era un --with-wibble=embedded che ho tolto), questo l'errore:

libtool: compile:  g++ -DHAVE_CONFIG_H -I.. -I.. -L/usr/lib -DCONF_DIR=\"/etc/arkimet\" -DPOSTPROC_DIR=\"/usr/lib64/arkimet\" -I/usr/include -Werror -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -std=c++11 -c wibble/sys/childprocess.cpp -o wibble/sys/libarkimet_la-childprocess.o
wibble/sys/childprocess.cpp: In member function 'void wibble::sys::ChildProcess::waitError()':
wibble/sys/childprocess.cpp:160:15: error: 'runtime_error' is not a member of 'std'
         throw std::runtime_error("system call interrupted while waiting for child termination");
               ^
wibble/sys/childprocess.cpp: In member function 'void wibble::sys::ChildProcess::waitForSuccess()':
wibble/sys/childprocess.cpp:203:19: error: 'runtime_error' is not a member of 'std'
             throw std::runtime_error(buf);
                   ^
wibble/sys/childprocess.cpp:212:15: error: 'runtime_error' is not a member of 'std'
         throw std::runtime_error(buf);
               ^
wibble/sys/childprocess.cpp:214:11: error: 'runtime_error' is not a member of 'std'
     throw std::runtime_error("error waiting for subprocess.");
           ^
wibble/sys/childprocess.cpp: In member function 'void wibble::sys::ChildProcess::kill(int)':
wibble/sys/childprocess.cpp:220:15: error: 'runtime_error' is not a member of 'std'
         throw std::runtime_error("cannot kill child process: child process has not been started");
               ^
make[4]: *** [wibble/sys/libarkimet_la-childprocess.lo] Error 1

arki-check -r non fa il repack con i VM

dataset di test in arkimet@arkioss4:~/ds/agrmet

[arkimet@arkioss4 agrmet]$ arki-check .
.: 2015/01-02.vm2 should be packed
.: 2015/01-03.vm2 should be packed
.: 2 files should be packed.

[arkimet@arkioss4 agrmet]$ arki-check -f -r .
.: packed 2015/01-02.vm2 (0 saved)
.: packed 2015/01-03.vm2 (0 saved)
.: 2 files packed, 585728 total bytes freed.

[arkimet@arkioss4 agrmet]$ arki-check .
.: 2015/01-02.vm2 should be packed
.: 2015/01-03.vm2 should be packed
.: 2 files should be packed.

arkimet/server not installed

Arkimet rpm package was successfully built, including tests, but when starting the python arki-server I get the error:

Traceback (most recent call last):
File "/usr/bin/arki-server", line 12, in
from arkimet.server import views
ImportError: No module named 'arkimet.server'

actually /usr/lib/python3.x/site-packages/arkimet/server directory is not installed, is it a makefile or a spec problem?

Is sysconfdir relocatable from environment?

runtime/config.h, runtime/config.cc seem to allow overriding of sysconfdir path hardcoded in CONF_DIR macro with an environmental variable, but it is not clear which variable. Is it possible and how (mainly for arki-query)?

Monthly summary cache are not correct

Each file .summaries/YYYY-mm.summary contains the summary of the previous months.

The bug is in the implementation of Contents::summaryForAll, a PR is ready.

Do we need to support different data formats in the same dataset?

At the moment, arkimet theoretically supports having data in multiple formats in the same dataset, although this is not being tested.

Unless I missed something, in actual use, only one data type is ever stored in a dataset. There is a dataset full of GRIB data, one full of BUFR data, and no dataset with both GRIB and BUFR at the same time.

Can I make this official, add the data type to the dataset configuration, and enforce that only data of that type ever enters the dataset? This would also make the dataset implementation a little simpler.

1/50 tests failed

su centos 7 64 bit falliscono questi test; sono un problema?

arki_scan_nc.1: actual value 0 is not true
  scan/nc-tut.cc:27:actual(scanner.next(md)).istrue()

arki_dataset_http_inbound.testdispatch: St13runtime_error: testAcquire
not implemented for simple datasets
  dataset/http/inbound-test.cc:141:f.do_testdispatch(r)

arki_dataset_checker_grib_simple_plain.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:2007/10.grib: should
be rescanned
  testds:2007/10.grib extra info: validation failed at
BLOB(grib,/tmp/tmp.u94pvhnS19/testds/2007/10.grib:0+2234): metadata to
validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_grib_simple_sqlite.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:2007/10.grib: should
be rescanned
  testds:2007/10.grib extra info: validation failed at
BLOB(grib,/tmp/tmp.u94pvhnS19/testds/2007/10.grib:0+2234): metadata to
validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_bufr_simple_plain.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2005.bufr: should
be rescanned
  testds:20/2005.bufr extra info: validation failed at
BLOB(bufr,/tmp/tmp.u94pvhnS19/testds/20/2005.bufr:0+194): metadata to
validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_bufr_simple_sqlite.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2005.bufr: should
be rescanned
  testds:20/2005.bufr extra info: validation failed at
BLOB(bufr,/tmp/tmp.u94pvhnS19/testds/20/2005.bufr:0+194): metadata to
validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_vm2_simple_plain.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2011.vm2: should be
rescanned
  testds:20/2011.vm2 extra info: validation failed at
BLOB(vm2,/tmp/tmp.u94pvhnS19/testds/20/2011.vm2:0+33): metadata to
validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_vm2_simple_sqlite.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2011.vm2: should be
rescanned
  testds:20/2011.vm2 extra info: validation failed at
BLOB(vm2,/tmp/tmp.u94pvhnS19/testds/20/2011.vm2:0+33): metadata to
validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_odim_ondisk2.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2000.odimh5: should
be rescanned
  testds:20/2000.odimh5 extra info: validation failed at
BLOB(odimh5,/tmp/tmp.u94pvhnS19/testds/20/2000.odimh5:0+320696): cannot
open file 000000.odimh5: No such file or directory

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_odim_simple_plain.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2000.odimh5: should
be rescanned
  testds:20/2000.odimh5 extra info: validation failed at
BLOB(odimh5,/tmp/tmp.u94pvhnS19/testds/20/2000.odimh5:0+320696):
metadata to validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

arki_dataset_checker_odim_simple_sqlite.scan_corrupted: 2 mismatches in
maintenance results:
  segment output not matched: rescanned on testds:20/2000.odimh5: should
be rescanned
  testds:20/2000.odimh5 extra info: validation failed at
BLOB(odimh5,/tmp/tmp.u94pvhnS19/testds/20/2000.odimh5:0+320696):
metadata to validate does not appear to be from this segment

  dataset/tests.cc:861:reporter.check(expected)
  dataset-checker-test.cc:288:actual(checker.get()).check(e, false, false)

16/4128 tests failed
Failed with result 0
make[4]: Leaving directory `/root/rpmbuild/BUILD/arkimet-1.0-5/arki'
make[3]: Leaving directory `/root/rpmbuild/BUILD/arkimet-1.0-5/arki'
make[2]: Leaving directory `/root/rpmbuild/BUILD/arkimet-1.0-5/arki'
make[1]: Leaving directory `/root/rpmbuild/BUILD/arkimet-1.0-5/arki'
Making check in src
make[1]: Entering directory `/root/rpmbuild/BUILD/arkimet-1.0-5/src'
make  test-src
make[2]: Entering directory `/root/rpmbuild/BUILD/arkimet-1.0-5/src'
g++ -DHAVE_CONFIG_H   -I.. -I..   -L/usr/lib
-DSERVER_DIR=\"/usr/share/arkimet/server\" -Wall -Werror -I/usr/include
  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
  -m64 -mtune=generic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches   -m64 -mtune=generic -std=c++11 -c -o
arki-server-tut.o arki-server-tut.cc
/bin/sh ../libtool  --tag=CXX   --mode=link g++  -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic
-std=c++11  -Wl,-z,relro  -o test-src arki-server-tut.o
../arki/utils/tests.o ../arki/types/tests.o ../arki/tests/tut-main.o
../arki/libarkimet.la  -ldballe -lwreport -llua -lm -ldl -lsqlite3
-I/usr/include -lsqlite3 -lgrib_api -ljasper -lm -llua -lm -ldl   -lz
 -lpthread -L/usr/lib64 -lgeos -lcurl   -llzo2 -lreadline
libtool: link: g++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -std=c++11 -Wl,-z -Wl,relro -o
test-src arki-server-tut.o ../arki/utils/tests.o ../arki/types/tests.o
../arki/tests/tut-main.o -I/usr/include  ../arki/.libs/libarkimet.a
-L/usr/lib64 -L/usr/lib -L/usr/lib64/mysql /usr/lib64/libmeteo-vm2.so
-lhdf5 -lhdf5_hl -lnetcdf_c++ /usr/lib64/libdballe.so -lmysqlclient
-lssl -lcrypto -lodbc -lpopt /usr/lib64/libwreport.so -lsqlite3
/usr/lib64/libgrib_api.so -lpng -lopenjpeg -lnetcdf -ljasper -llua -lm
-ldl -lz -lpthread -lgeos -lcurl -llzo2 -lreadline
make[2]: Leaving directory `/root/rpmbuild/BUILD/arkimet-1.0-5/src'
make  check-local
make[2]: Entering directory `/root/rpmbuild/BUILD/arkimet-1.0-5/src'
for test in test-arki-server test-arki-scan test-arki-query
test-arki-xargs run-test-src; do \
        ./$test ; \
done
Testing arki-server ...............while Performing query at
http://localhost:7117/dataset/test200/query: A libcurl function was
given a bad argument(�����)
Failed: arki-query --postproc=checkfiles --postproc-data=/dev/null ''
http://localhost:7117/dataset/test200
while Performing query at http://localhost:7117/dataset/test200/query: A
libcurl function was given a bad argument(�H���)
Failed: arki-query --postproc=checkfiles --postproc-data=/dev/null
--postproc-data=/dev/null '' http://localhost:7117/dataset/test200
...20 tests, 18 ok, 2 fail
Testing arki-scan..2 tests, 2 ok, 0 fail
Testing arki-query.1 tests, 1 ok, 0 fail
Testing arki-xargscannot run /tmp/tmp.QVeH6NHaVp/script.sh: process
returned exit status 256
Failed: arki-query --inline '' test200 | arki-xargs -n 1 --
/tmp/tmp.QVeH6NHaVp/script.sh
+ test GRIB = GRIB1
cannot run /tmp/tmp.QVeH6NHaVp/script.sh: process returned exit status 256
Failed: arki-query --inline '' test200 | arki-xargs -s 8000 --
/tmp/tmp.QVeH6NHaVp/script.sh
2 tests, 0 ok, 2 fail
Running  test-src
arki_server: ..........................................x.......

arki_server.8: metadata item
'BLOB(grib,http://localhost:7117/tmp/tmp.WBhm224opE/test200/2007/07-08.grib:0+7218)'
is not a arki::types::source::URL

arki-server-tut.cc:209:actual_type(mdc[0].source()).is_source_url("grib", "http://localhost:7117/dataset/test200/query")

1/50 tests failed
make[2]: Leaving directory `/root/rpmbuild/BUILD/arkimet-1.0-5/src'

*** Nessuna regola per generare l'obiettivo «matcher/reftime/reftime-lex.cc», necessario per «distdir». Stop.

sui branch master e cpp11
autoreconf -if
./configure
make dist
make dist-gzip am__post_remove_distdir='@:'
make[1]: Entering directory /root/git/arkimet' if test -d "arkimet-0.81"; then find "arkimet-0.81" -type d ! -perm -200 -exec chmod u+w {} ';' && rm -rf "arkimet-0.81" || { sleep 5 && rm -rf "arkimet-0.81"; }; else :; fi test -d "arkimet-0.81" || mkdir "arkimet-0.81" (cd conf && make top_distdir=../arkimet-0.81 distdir=../arkimet-0.81/conf \ am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) make[2]: Entering directory/root/git/arkimet/conf'
make[2]: Leaving directory /root/git/arkimet/conf' (cd embedded && make top_distdir=../arkimet-0.81 distdir=../arkimet-0.81/embedded \ am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) make[2]: Entering directory/root/git/arkimet/embedded'
(cd wibble/wibble && make top_distdir=../../../arkimet-0.81 distdir=../../../arkimet-0.81/embedded/wibble/wibble
am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
make[3]: Entering directory /root/git/arkimet/embedded/wibble/wibble' make[3]: Leaving directory/root/git/arkimet/embedded/wibble/wibble'
(cd sqlite && make top_distdir=../../arkimet-0.81 distdir=../../arkimet-0.81/embedded/sqlite
am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
make[3]: Entering directory /root/git/arkimet/embedded/sqlite' make[3]: Leaving directory/root/git/arkimet/embedded/sqlite'
(cd minilzo && make top_distdir=../../arkimet-0.81 distdir=../../arkimet-0.81/embedded/minilzo
am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
make[3]: Entering directory /root/git/arkimet/embedded/minilzo' make[3]: Leaving directory/root/git/arkimet/embedded/minilzo'
make[2]: Leaving directory /root/git/arkimet/embedded' (cd arki && make top_distdir=../arkimet-0.81 distdir=../arkimet-0.81/arki \ am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir) make[2]: Entering directory/root/git/arkimet/arki'
make[2]: *** Nessuna regola per generare l'obiettivo «matcher/reftime/reftime-lex.cc», necessario per «distdir». Stop.
make[2]: Leaving directory /root/git/arkimet/arki' make[1]: *** [distdir] Errore 1 make[1]: Leaving directory/root/git/arkimet'
make: *** [dist] Errore 2

Uso di AssignedDataset e Source in importazione.

Quando si importano dati in arkimet, in output vengono generati i metadati di quello che si è importato.

Al momento, tra quel metadati c'è un campo AssignedDataset che contiene il nome del database dove è stato importato il dato, e il Source rimane il source originale del file da importare.

Vorrei cambiare la cosa facendo sparire AssignedDataset, e settando a Source la posizione che il dato ora ha nel dataset di destinazione.

Ci sono controindicazioni?

(assegno per ora @edigiacomo, ma vorrei assegnare @edigiacomo e @brancomat)

Concorrenza di arki-scan --import

Quali tipi di dataset supportano arki-scan --import concorrenti?

In particolare la cosa ci interessa (a me e @brancomat) per i dataset di tipo error e duplicates, in quanto vorremmo assegnare uno o più dataset ad un singolo script (e qui non c'è concorrenza, a maggior ragione usando dei conf ad-hoc) ma scriverebbero tutti sugli stessi dataset error e duplicates.

Test fails: arki_scan_grib.9

$ eatmydata make check -C arki TEST_WHITELIST="arki_scan_grib.9"
[...]
GRIB_API ERROR   :  Unable to find template gridDefinitionSection from grib2/template.3.32768.def 
arki_scan_grib.9: metadata does not contain a timerange, but we expected: "Timedef(0s,254,0s)"
  scan/grib-tut.cc:407:actual(md).contains("timerange", "Timedef(0s,254,0s)")

grib_api is not able to open test/data/ninfa_ana.grib2 anymore. Either grib_api needs to be fixed, or this kind of file is not relevant anymore and the test should be removed. The test later would also test test/data/ninfa_forc.grib2, for which the same might apply.

Dubbio sulla conversione da Lua number a integer in ValueBag

Il bug #41 mi ha fatto venire un dubbio sulla conversione da double a int nei metadati settati da Lua (in ValueBag::load_lua_table): attualmente (con Lua < 5.3 o dopo la fix di #41 b94e206) il double viene troncato automaticamente.

Questo comportamento mi pare essere sempre stato implicito e la suddetta fix ha mantenuto la retrocompatibilità per poter pacchettizzare la nuova versione di arkimet in tempi brevi (e anche perché nessuno si è mai lamentato del troncamento).

Tuttavia, la cosa dovrebbe essere formalizzata oppure si potrebbe decidere che il troncamento/arrotondamento è delegato agli scanner Lua - ma in tal caso credo sarebbe meglio usare lua_tointegerx e lanciare un'eccezione in caso di conversione fallita da double ad int.

Archiving segments of sharded datasets moves files in `.archives/last/` with the wrong segment names

Con arkimet-1.0-12, facendo il repack (e negli arki-check successivi) di un dataset sharded e con archive age impostato, ho il seguente messaggio che non riesco a capire:

test.archives.last:2016-08/08/01.vm2: segment contents do not fit inside the step of this dataset

Passi per riprodurre il caso in esame:

$ mkdir -p /tmp/dataset/test /tmp/dataset/error
$ cat > /tmp/dataset/error/config <<EOF
type = error
step = daily
EOF
$ cat > /tmp/dataset/test/config <<EOF
filter = area: VM2,
index = reftime, area, product
replace = yes
smallfiles = yes
step = daily
type = ondisk2
shard = monthly
unique = reftime, area, product
archive age = 15
EOF
$ arki-mergeconf /tmp/dataset/error /tmp/dataset/test
$ echo 201608011400,8,99,840,,,000000000
$ arki-scan --dump --dispatch=/tmp/conf /tmp/test.vm2
Source: BLOB(vm2,/tmp/dataset/test/2016-08/08/01.vm2:0+33)
Product: VM2(99, bcode=B04196, l1=2000, l2=0, lt1=103, lt2=0, p1=0, p2=3600, tr=3, unit=mn)
Reftime: 2016-08-01T14:00:00Z
Area: VM2(8,lat=4426093, lon=1132646, rep=locali)
Value: 840,,,000000000
Note: [2016-09-09T05:52:40Z]Scanned from /tmp/test.vm2

$ arki-check -f /tmp/dataset/test
test:08/01.vm2: segment old enough to be archived
test: check 0 files ok
$ arki-check -f -r /tmp/dataset/test
test:08/01.vm2: segment old enough to be archived
test:2016-08/08/01.vm2: archived
test: repack 0 files ok, 1 file archived
test.archives.last:2016-08/08/01.vm2: segment contents do not fit inside the step of this dataset
test.archives.last: repack 0 files ok
$ arki-check /tmp/dataset/test
test: check 0 files ok
test.archives.last:2016-08/08/01.vm2: segment contents do not fit inside the step of this dataset
test.archives.last: check 0 files ok

fallimento arki-check su dataset in fase di interrogazione

su un dataset quando ci sono interrogazioni in contemporanea all'arki-check abbiamo un:

database is locked. Context:     executing query PRAGMA journal_mode = TRUNCATE

alcuni dataset stanno diventando impossibile da mantenere senza metterli offline (che è poco accettabile): è possibile gestire la concorrenza tra scrittura (arki-check) e lettura (arki-query) concorrenti?

Empty reply from arki-server using expa

Arkimet version: v1.0-14

$ arki-query --summary --dump --summary-restrict=reftime 'reftime:=today 00:00; product:GRIB1,200,2,2; level:GRIB1,102; timerange:GRIB1,0,0h,0h' http://maialinux.metarpa:8090/dataset/COSMO_I7 | grep Size
  Size: 186250
$ arki-query --data 'reftime:=today 00:00; product:GRIB1,200,2,2; level:GRIB1,102; timerange:GRIB1,0,0h,0h' http://maialinux.metarpa:8090/dataset/COSMO_I7 | wc -c
186250
$ echo "ds:COSMO_I7. d:@. t:0000. s:GRIB1/0/0h/0h. l:GRIB1/102. v:GRIB1/200/2/2." > expa
$ arki-query --data --qmacro="expa 2016-09-21" --file=expa http://maialinux.metarpa:8090
while Performing query at http://maialinux.metarpa:8090/query: Server returned nothing (no headers, no data)(Empty reply from server)
$ arki-query --data --qmacro="expa 2016-09-21" --file=expa http://maialinux.metarpa:8090/dataset/COSMO_I7
while Performing query at http://maialinux.metarpa:8090/query: Server returned nothing (no headers, no data)(Empty reply from server)
$ arki-query --summary --dump --qmacro="expa 2016-09-21" --file=expa http://maialinux.metarpa:8090/dataset/COSMO_I7
while Performing query at http://maialinux.metarpa:8090/summary: Server returned nothing (no headers, no data)(Empty reply from server)

server-error.log is empty.
server-access.log:

172.20.5.151 - - [21/Sep/2016:14:31:32 +0000] "GET /dataset/COSMO_I7/config HTTP/1.1" 200 -

Running the query locally:

[lm_ope@maialinux ~]$ arki-query --data --qmacro='expa 2016-09-21' --file=expa /ope/lm_ope/arkimet/COSMO_I7 | wc -c
186250

Refuse to import data older than `archive age`

Importing data older than archive age would create a segment that potentially cannot be archived, because it would have the same name as a segment already present in the archive.

A viable solution to this is to refuse to import data older than archive age.

`cat` in postprocessor hangs when called from `arki-query --postproc`

Server Fedora 16, arkimet-0.80-3162.

arki-query --postproc="subarea 10 40 15 45" 'reftime:=today 00:00; product:GRIB1,200,2,11 or GRIB1,200,2,17; level:GRIB1,105,2' /arkivio/arkimet/dataset/lmsmr4x52

hangs on the last command (cat).

The same problem with a simple cat postprocessor.

When using the postprocessor in pipe, everything is fine:

arki-query --inline 'reftime:=today 00:00; product:GRIB1,200,2,11 or GRIB1,200,2,17; level:GRIB1,105,2' /arkivio/arkimet/dataset/lmsmr4x52 | /usr/lib64/arkimet/subarea 10 40 15 45

@brancomat if the problem is already solved in a more recent version, we could wait the server update instead of fixing this bug for the 0.80 version, what do you think?

Thread safety of metadata.cc:dataReader

metadata.cc has a dataReader global that is not thread safe, and thread access happens with MergedDatasets.

dataReader should become a thread local affair.

arki-scan --dispatch --files can't read the list of files properly

(riportato da @eminguzzi al momento privo di account github)

Il comando:
arki-scan --dispatch=config --dump --status --summary --files=import.lst
invece di importare i dati contenuti in anai_CHI_20160206.grb, svuota il file import.lst e da' l'errore "import.lst: cannot read"

Il comando:
arki-scan --dispatch=config --dump --status --summary grib:anai_CHI_20160206.grb

funziona regolarmente.

import_lst.zip

Strange -Wmisleading-indentation

In #38 I have fixed some misleading indentations but there are more that I don't understand, because they seem typos (a semicolon after an if), but git blame says that they're very old commit...

if (luaL_newmetatable(L, "arki.gridquery"));
{
...
if (sys::unlink_ifexists(summary_pathname(t.ye, t.mo)));
    deleted = True
if (luaL_newmetatable(L, "arki.summary"));
{
if (luaL_newmetatable(L, "arki.metadata"));
{

error compiling

with rpmbuild on a i386 fedora 20 system I get:

libtool: compile: g++ -DHAVE_CONFIG_H -I.. -I.. -L/usr/lib -DCONF_DIR="/etc/arkimet" -DPOSTPROC_DIR="/usr/lib/arkimet" -I/usr/include -Werror -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -std=c++11 -c iotrace.cc -o libarkimet_la-iotrace.o
iotrace.cc: In member function 'virtual void arki::iotrace::Logger::operator()(const arki::iotrace::Event&)':
iotrace.cc:136:83: error: format '%zu' expects argument of type 'size_t', but argument 4 has type 'off_t {aka long long int}' [-Werror=format=]
fprintf(out, "%s:%zu:%zu:%s\n", e.filename().c_str(), e.offset, e.size, e.desc);
^
cc1plus: all warnings being treated as errors
make[4]: *** [libarkimet_la-iotrace.lo] Error 1
make[4]: Leaving directory /root/rpmbuild/BUILD/arkimet-1.0-1/arki' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory/root/rpmbuild/BUILD/arkimet-1.0-1/arki'
make[2]: *** [all] Error 2
make[2]: Leaving directory /root/rpmbuild/BUILD/arkimet-1.0-1/arki' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/root/rpmbuild/BUILD/arkimet-1.0-1'
make: *** [all] Error 2
errore: Stato d'uscita errato da /var/tmp/rpm-tmp.PkCPvY (%build)

arki-check not working in odimh5 archives in version 1.0 (cpp11 branch)

odimh5 archive data structure as usual:

odimSPC/.archive/odimSPC-YYYYMMDD/YYYY/MM-DD.odimh5/000xxx.odimh5

(with a .sequence empty file in each subdir)

with arkimet-1.0-1 (bf20a28)

$ rpm -q arkimet
arkimet-1.0-1.x86_64
$ arki-check -f odimSPC/
odimSPC: check 0 files ok
odimSPC.archives.last: check 0 files ok

with arkimet-0.81 (43d56a3)

$ rpm -q arkimet
arkimet-0.81-1.fc20.x86_64
$ arki-check -f odimSPC/
odimSPC: rescanned in archive odimSPC-20130101/2013/01-01.odimh5
odimSPC: rescanned in archive odimSPC-20130101/2013/01-02.odimh5
odimSPC: rescanned in archive odimSPC-20130101/2013/01-03.odimh5
odimSPC: rescanned in archive odimSPC-20130101/2013/01-04.odimh5
odimSPC: rescanned in archive odimSPC-20130101/2013/01-05.odimh5
(...)

Server startup script fails

...because pidof fails since arki-server now is a python script and -x does not even help, so, cutting the head to the bull, I drafted a systemd startup script and adapted the spec file. On CentOS 7 it works, more testing needed.

Best practices on managing offline archives

Le esigenze sostanzialmente sono due:

  • creazione di sezioni d'archivio partendo da dati mai archiviati senza interferire con l'arki-server di produzione
  • gestione e spostamento delle sezioni di archivio (cosa portarsi dietro e necessaire pur rimetterle online)

In particolare, rispetto ai commenti di #17: opportunità di sottodirectory .archive/ al di fuori di .archive/last (mentre il commento sul MANIFEST non l'ho capito, dovrebbe essere ricreato dinamicamente dall'arki-check)

Provo ad elencare task:

  • breve incontro sul tema preferibilmente di persona per verificare ns. attuale sistema vs. situazione ideale
  • documentazione best practices almeno sull'arkiguide
  • valutare integrazione/estensione di tool ad hoc (tipo arkimet-tools by @edigiacomo) (opzionale)

test fails: arki_scan_any.5

$ eatmydata make check -C arki TEST_WHITELIST="arki_scan_any.5"
[...]
arki_scan_any.5: actual value 0 is not true
  scan/any-tut.cc:232:actual(scan::scan("inbound/example_1.nc", mdc.inserter_func())).istrue()

This test was added in preparation for implementing NetCDF support, which then became less of a priority. It can probably be disabled until NetCDF support becomes a priority again.

Ordinamento VM2

TL;DR: come faccio ad ordinare dei VM2 in modo analogo a quanto fatto da arki-check -fr?

Dovendo fare un'importazione molto grossa di dati VM2, ho pensato di ricreare i dataset a mano.

Ogni dataset ha il config con la seguente struttura:

type = ondisk2
step = daily
smallfiles = yes
shard = monthly
unique = reftime, area, product
index = reftime, area, product
filter = area:VM2:rep=NAME

(NAME è il nome della rete, ma questo è trascurabile, per i test basta un unico dataset con filter = area:VM2,)

Dopo aver sistemato i dati a mano, prima di usare larki-check per ricreare l'indice, vorrei riordinare i dati, in modo che un repack non sia necessario (i dati sono già univoci).

Purtroppo, non capisco in che modo i dati siano ordinati. Ho provato un sort -t, -n -k1,1 -k2,2 -k3,3 (ordino come lo 'unique), maarki-check -f -r` mi riordina comunque i dati secondo un criterio che non capisco:

$ head dataset/agrmet/2000-01/01/01.vm2
200001010000,2313,305,21,,,
200001010000,2312,305,21,,,
200001010000,2307,305,21,,,
200001010000,2301,305,21,,,
200001010000,2313,257,192,,,000000000
200001010000,2312,257,433,,,000000000
200001010000,2307,257,0,,,000000000
200001010000,2301,257,0,,,000000000
200001010000,2309,233,6.2,,,000000000
200001010000,2302,233,4.9,,,000000000
$ sort -t, -n -k1,1 -k2,2 -k3,3 dataset/agrmet/2000-01/01/01.vm2 | head
200001010000,2287,78,-4.5,,,000000000
200001010000,2289,78,-3.5,,,000000000
200001010000,2290,78,-4.7,,,000000000
200001010000,2292,78,-2.4,,,000000000
200001010000,2293,78,-.9,,,000000000
200001010000,2296,78,-2.8,,,000000000
200001010000,2297,78,1.4,,,000000000
200001010000,2298,78,1.3,,,000000000
200001010000,2299,78,.2,,,000000000
200001010000,2305,78,-5.6,,,000000000

In allegato, il file 01.vm2.zip come è stato repack-ato da arki-check -f -r (v1.0-14).

[Fedora 24] C++11 destructors default to noexcept

compiling on server "ventiquattro":

/bin/sh ../libtool  --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H   -I.. -I.. -L/usr/lib -DCONF_DIR=\"/etc/arkimet\" -DPOSTPROC_DIR=\"/usr/lib64/arkimet\"     -I/usr/include        -Werror -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -Wall  -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -c -o utils/libarkimet_la-files.lo `test -f 'utils/files.cc' || echo './'`utils/files.cc
libtool: compile:  g++ -DHAVE_CONFIG_H -I.. -I.. -L/usr/lib -DCONF_DIR=\"/etc/arkimet\" -DPOSTPROC_DIR=\"/usr/lib64/arkimet\" -I/usr/include -Werror -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -Wall -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -c utils/files.cc  -fPIC -DPIC -o utils/.libs/libarkimet_la-files.o
utils/files.cc: In destructor 'arki::utils::files::PreserveFileTimes::~PreserveFileTimes()':
utils/files.cc:166:87: error: throw will always call terminate() [-Werror=terminate]
         throw std::system_error(errno, std::system_category(), "cannot set file times");
                                                                                       ^
utils/files.cc:166:87: note: in C++11 destructors default to noexcept
cc1plus: all warnings being treated as errors
Makefile:2636: recipe for target 'utils/libarkimet_la-files.lo' failed
make[4]: *** [utils/libarkimet_la-files.lo] Error 1

Cannot compile with wibble embedded

  • wibble/config.h not found: when compiling the directory embedded/wibble/wibble, the directory itself should be included in the headers search path.
  • #ifdef POSIX in embedded/wibble/wibble/net/server.cpp fails.

In commit bd063ae I patched the fedora package .The same patches could be applied directly to wibble.

arki-query --inline

Con la versione 1.0.3, il comando:
arki-query --inline --config=lamaz.conf --file=tmp.query.1 > tmp.out
da' l'errore: "no decoder found for item type ASSIGNEDATASE"

bug.zip

arki-bufr-prepare con --usn segfaulta quando non riesce a decodificare il BUFR

Con riferimento al file IUSG11AMMC180000CCA.BIN.zip:

$ arki-bufr-prepare IUSG11AMMC180000CCA.BIN > /dev/null
IUSG11AMMC180000CCA.BIN:25: BUFR #0 failed to decode: /usr/share/wreport/B0000000000000021000.txt:863: input file is not sorted. Passing it through unmodified.
$ echo $?
0

Ma con l'opzione --usn:

$ arki-bufr-prepare --usn=100 IUSG11AMMC180000CCA.BIN > /dev/null
IUSG11AMMC180000CCA.BIN:25: BUFR #0 failed to decode: /usr/share/wreport/B0000000000000021000.txt:863: input file is not sorted. Passing it through unmodified.
Segmentation fault

Il segfault si verifica in arki-bufr-prepare.cc:97:

(gdb) r --usn=100 IUSG11AMMC180000CCA.BIN 
Starting program: /usr/bin/arki-bufr-prepare --usn=100 IUSG11AMMC180000CCA.BIN
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
IUSG11AMMC180000CCA.BIN:25: BUFR #0 failed to decode: /usr/share/wreport/B0000000000000021000.txt:863: input file is not sorted. Passing it through unmodified.

Program received signal SIGSEGV, Segmentation fault.
splitmsg (outfile=..., importer=..., msg=..., rmsg=..., this=0x7fffffffd9f0) at arki-bufr-prepare.cc:97
97          copy_base_msg(*newmsg, msg);

(gdb) bt
#0  splitmsg (outfile=..., importer=..., msg=..., rmsg=..., this=0x7fffffffd9f0) at arki-bufr-prepare.cc:97
#1  Copier::process (this=this@entry=0x7fffffffd9f0, infile=..., outfile=...) at arki-bufr-prepare.cc:181
#2  0x000000000040b3bd in process (outfile=..., filename="IUSG11AMMC180000CCA.BIN", this=0x7fffffffd9f0) at arki-bufr-prepare.cc:156
#3  main (argc=<optimized out>, argv=<optimized out>) at arki-bufr-prepare.cc:213

Credo che il problema sia in questo if/else https://github.com/ARPA-SIMC/arkimet/blob/master/src/arki-bufr-prepare.cc#L178-181: nonostante msg=NULL e decoded=false, la variabile override_usn_active è true e quindi invoca la funzione splitmsg (dentro cui avviene il segfault).

make distcheck fails

$ make distcheck
...
make[3]: Entering directory '/home/edg/src/arkimet/arkimet-1.0/_build/sub/arki'
../../../arki/buildflex ../../../arki/matcher/reftime/reftime-lex.ll
flex: could not create reftime-lex.cc
Makefile:4068: recipe for target '../../../arki/matcher/reftime/reftime-lex.h' failed
make[3]: *** [../../../arki/matcher/reftime/reftime-lex.h] Error 1
make[3]: Leaving directory '/home/edg/src/arkimet/arkimet-1.0/_build/sub/arki'
Makefile:614: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory '/home/edg/src/arkimet/arkimet-1.0/_build/sub'
Makefile:503: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/edg/src/arkimet/arkimet-1.0/_build/sub'
Makefile:820: recipe for target 'distcheck' failed
make: *** [distcheck] Error 1

Add minimal python3 bindings

I intend to implement minimal python3 bindings, functional enough to replace arki-server with a python version that uses python's HTTP implementation, and to do away with the internal http server in arkimet.

There is no need at first to access the contents of a Metadata from python, and there can just be the binding of a dataset::Reader with query methods that stream output to an integer file descriptor.

Further steps from there can be to expand the bindings to replace one at a time all the other command line tools.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.