rug-compling / alpinocorpus Goto Github PK
View Code? Open in Web Editor NEWLibrary for handling Alpino corpora
License: GNU Lesser General Public License v2.1
Library for handling Alpino corpora
License: GNU Lesser General Public License v2.1
For non-local corpora, there is currently no interface for discovering which corpora are available. For RemoteCorpusReader, this is done in Dact. It would be better to have a general interface for remote corpora.
CorpusWriter.open() fails if corpus does not exist and argument 'overwrite' is false.
RecursiveCorpusReader query returns filenames of dact files as result for dact files that contain no matches for the query.
Uitvoer van cmake .
:
-- Boost version: 1.62.0
-- Found the following Boost libraries:
-- system
-- chrono
-- date_time
-- filesystem
-- thread
-- regex
-- atomic
Uitvoer van make
:
[ 49%] Linking CXX shared library libalpino_corpus.so
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libboost_filesystem.a(operations.o): relocation R_X86_64_PC32 against symbol `_ZN5boost6detail15sp_counted_base7destroyEv' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
Ik heb een C-programmaatje dat alle entries (naam en inhoud) uit een stel dact-bestanden leest. Het geheugengebruik blijft daarbij constant groeien. Na verwerking van bijna 100000 entries (in tien dact-bestanden) geeft top dit aan (op machine urd):
PID USER VIRT RES SHR SWAP %MEM %CPU TIME COMMAND
23667 alfa 794m 704m 10m 89m 35.0 20 1:08 dctest
Hier is het programmaatje:
#include <stdio.h>
#include <AlpinoCorpus/capi.h>
int main (int argc, char *argv [])
{
int
i;
long
count = 0;
alpinocorpus_reader
r;
alpinocorpus_iter
it;
alpinocorpus_entry
ent;
alpinocorpus_initialize();
for (i = 1; i < argc; i++) {
r = alpinocorpus_open(argv[i]);
it = alpinocorpus_entry_iter(r);
while (alpinocorpus_iter_has_next(r, it)) {
ent = alpinocorpus_iter_next(r, it);
printf("%li %s\n", ++count, alpinocorpus_entry_name(ent));
alpinocorpus_entry_contents(ent);
alpinocorpus_entry_free(ent);
}
alpinocorpus_close(r);
}
return 0;
}
Nog een bug: als ik bovenstaand programma de twee includes omwissel, dan krijg ik van de compiler een foutmelding:
In file included from dctest.c:1:
/my/opt/alpino/include/AlpinoCorpus/capi.h:63: error: expected declaration specifiers or ‘...’ before ‘size_t’
/my/opt/alpino/include/AlpinoCorpus/capi.h:127: error: expected declaration specifiers or ‘...’ before ‘size_t’
/my/opt/alpino/include/AlpinoCorpus/capi.h:146: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘alpinocorpus_size’
make: *** [dctest] Fout 1
It throws an exception as soon as I call CorpusReader::query(XPATH, "//node[@rel='su']"). alpino2db also does not work:
% ./alpino2db "//node[@rel='su']" ~/Workspace/treebanks/cdb.dact /tmp/su.dact ./alpino2db: Error: Cannot resolve container: . Container not open and auto-open is not enabled. Container may not exist.
edit: it could be related to this: http://forums.oracle.com/forums/thread.jspa?messageID=2189426
This use case is now covered by PaQu. I also think that there is no remaining server running, so this only potentially confuses library users.
Since the classes are in git, we can always choose to revive it when necessary.
For the DirectoryCorpusReader this would be implementable.
Currently, Dact tries to open each argument supplied through ac::CorpusReaderFactory::open and adds each CorpusReader instance to a MultiCorpusReader.
One way to implement lazy opening of corpora would be to extend the MultiCorpusReader with an additional constructor. One could supply the paths instead of the corpora and the reader would open and close the corpora when needed. This would change the public interface of MultiCorpusReader but is probably the easiest way to implement it.
Another method might be to add additional methods to the CorpusReader interface to signal it is no longer actively used. The MultiCorpusReader could then signal the previous CorpusReader that it won't query it for some time, and start querying the next one. The public interface would remain the same, but the behavior of CorpusReader instances would slightly change. Corpora would no longer be opened as soon as an instance of CorpusReader is created, and as a result the error checking and even the try-catch statements of ac::CorpusReaderFactory::open would fail their purpose.
De constructor van de iterator in RemoteCorpusReader hangt zolang er nog geen data is ontvangen. Dit is een probleem als de server lang moet zoeken naar een eerste match. (Omdat de server wacht met het sturen van headers tot er data is die verstuurd moet worden.)
Nu wordt dat als volgt opgelost: De server stuurt direct een regel bestaande uit Ctrl-B, en die regel wordt door de klasse GetUrl genegeerd. Dit is niet fraai.
Alternatieven:
CorpusWriter.write(CorpusReader const &corpus, ...) overwrites existing entries, even is corpus was opened with overwrite=False.
I'll remove API documentation from the gh-pages branch, each diff is huge and tracking history has no added value.
As a substitute, I'll run a cron script somewhere that runs a 'git pull ; doxygen'.
We have long been considering whether to add support for XSL tranformations to alpinocorpus. Until now, we decided against, because it was cleaner to let the library user do transformations. However, this poses a problem to a future RemoteCorpusReader: it should implement the CorpusReader interface (and nothing more), but we cannot realistically expect a client to download every XML file matching a query to apply transformations (e.g. in the sentence widget of Dact).
We cannot just implement transformations in the server, because then only RemoteCorpusReader would provide this functionality.
Summary: we need library/server-side XSL transformations.
My proposal is to add a method to CorpusReader:
EntryIterator CorpusReader::queryWithStylesheet(QueryDialect d, std::string const &q,
std::string const &stylesheet, std::list<MarkerQuery> const &markerQueries) const;
If this method is called, it will return a normal EntryIterator, with an overloaded version of the ill-fated contents() method. Calling the contents() method would:
The markerQueries argument can be used to mark nodes using queries (like readMarkQueries). We could have some default behavior, where, if the argument is unspecified, it will use the query specified in the second argument to mark nodes.
This change would make it possible to:
Any comments?
RecursiveCorpusReader should sort filenames, before giving a list of entries.
There far too much publicly visible in CorpusReader, for instance FilterIter and StylesheetIter.
Such implementation details should be made private.
I can't seem to get dbxml compiled with multiple -arch arguments. My gcc compiler (i686-apple-darwin10-g++-4.2.1) exists on "g++-4.2: -E, -S, -save-temps and -M options are not allowed with multiple -arch flags" when I add "-arch i386 -arch x86_64" to CXXFLAGS/CFLAGS.
Building it just x86_64, which seems to be the default choice of my compiler, works fine. But it did require some modifications to the CMakeLists.txt (http://mirror.ikhoefgeen.nl/CMakeLists.diff)
Executing queries such as
//node[@rel='su']/@root/string()
with alpinocorpus-act results in a segmentation fault.
At the time of writing, this is in branch 'RemoteCorpus'.
When asking RemoteCorpusReader for the list of entries without query, and then the process is interrupted, and then later it is done again, RemoteCorpusReader restarts retrieving entries from the server from the beginning.
Should RemoteCorpus be changed, to start retrieving entries from the point it was cut off in the previous run?
We are now using libxml2 as a dependency, as well as Xerces-C and XQilla via dbxml. This has some disadvantages:
The nolibxml branch attempts to eliminate the use of libxml2.
DbCorpusReader::contents() returns an empty string.
Simple file to test:
#include <AlpinoCorpus/CorpusReader.hh>
#include <AlpinoCorpus/CorpusReaderFactory.hh>
#include <iostream>
int main(int argc, char *argv[])
{
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " corpusfile.dact" << std::endl;
return 1;
}
alpinocorpus::CorpusReader *r = alpinocorpus::CorpusReaderFactory::open(argv[1]);
for (alpinocorpus::CorpusReader::EntryIterator i =
r->query(alpinocorpus::CorpusReader::XPATH, "//node[@root=\"fiets\"]");
i != r->end(); i++)
std::cout << *i << "\t" << i.contents(*r) << std::endl;
return 0;
}
The alpinocorpus library has experimental websockets support (besides a more REST-like remote corpus reader). There is some error handling in that reader. Some things that are not dealt with currently, are:
In directory and compact corpora, we could support bidirectional iteration.
The obvious solution would be to add operator-- to EntryIterator, and throw an exception in iterators that do not support it.
We should provide a functional equivalent to dtxslt:
http://www.let.rug.nl/vannoord/alp/Alpino/TreebankTools.html#_dtxslt_running_stylesheets_on_a_corpus
The name is set on the private class, but is not available from the outside.
Ik zie geen verschil tussen gebruik van natural_order
en numerical_order
0.xml
1.xml
10.xml
100.xml
1000.xml
1001.xml
1002.xml
1003.xml
1004.xml
1005.xml
1006.xml
1007.xml
1008.xml
1009.xml
101.xml
1010.xml
1011.xml
1012.xml
1013.xml
(peter) /my/src git clone https://github.com/rug-compling/alpinocorpus
Cloning into 'alpinocorpus'...
remote: Enumerating objects: 5980, done.
remote: Counting objects: 100% (107/107), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 5980 (delta 49), reused 59 (delta 35), pack-reused 5873
Receiving objects: 100% (5980/5980), 1.34 MiB | 3.05 MiB/s, done.
Resolving deltas: 100% (3245/3245), done.
(peter) /my/src cd /my/src/alpinocorpus
(peter) /my/src/alpinocorpus rm -rf builddir /my/opt/alpinocorpus
(peter) /my/src/alpinocorpus meson builddir -D dbxml_bundle=/my/opt/dbxml-2 --prefix=/my/opt/alpinocorpus
The Meson build system
Version: 0.56.2
Source dir: /my/src/alpinocorpus
Build dir: /my/src/alpinocorpus/builddir
Build type: native build
Project name: alpinocorpus
Project version: 3.0.0
C++ compiler for the host machine: c++ (gcc 10.2.1 "c++ (Debian 10.2.1-6) 10.2.1 20210110")
C++ linker for the host machine: c++ ld.bfd 2.35.2
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /usr/bin/pkg-config (0.29.2)
Run-time dependency Boost (found: filesystem, system) found: YES 1.74.0 (/usr)
Run-time dependency libexslt found: YES 0.8.20
Run-time dependency libxml-2.0 found: YES 2.9.10
Run-time dependency libxslt found: YES 1.1.34
Run-time dependency zlib found: YES 1.2.11
Library xerces-c found: YES
Library xqilla found: YES
Library dbxml found: YES
Configuring config.h using configuration
Build targets in project: 9
Found ninja-1.10.1 at /usr/bin/ninja
(peter) /my/src/alpinocorpus ninja -C builddir install
ninja: Entering directory `builddir'
[4/65] Compiling C++ object src/libalpinocorpus.so.3.0.0.p/capi.cpp.o
FAILED: src/libalpinocorpus.so.3.0.0.p/capi.cpp.o
c++ -Isrc/libalpinocorpus.so.3.0.0.p -Isrc -I../src -Iinclude -I../include -I/my/opt/dbxml-2/include -I/usr/include -I/usr/include/libxml2 -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -std=c++11 -g -fPIC -DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_SYSTEM_DYN_LINK=1 -DBOOST_ALL_NO_LIB -MD -MQ src/libalpinocorpus.so.3.0.0.p/capi.cpp.o -MF src/libalpinocorpus.so.3.0.0.p/capi.cpp.o.d -o src/libalpinocorpus.so.3.0.0.p/capi.cpp.o -c ../src/capi.cpp
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/capi.cpp:13:
/usr/include/unicode/localpointer.h:67:1: error: template with C linkage
67 | template<typename T>
| ^~~~~~~~
In file included from ../src/capi.cpp:13:
../include/AlpinoCorpus/Stylesheet.hh:8:1: note: ‘extern "C"’ linkage started here
8 | extern "C" {
| ^~~~~~~~~~
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/capi.cpp:13:
/usr/include/unicode/localpointer.h:190:1: error: template with C linkage
190 | template<typename T>
| ^~~~~~~~
In file included from ../src/capi.cpp:13:
../include/AlpinoCorpus/Stylesheet.hh:8:1: note: ‘extern "C"’ linkage started here
8 | extern "C" {
| ^~~~~~~~~~
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/capi.cpp:13:
/usr/include/unicode/localpointer.h:365:1: error: template with C linkage
365 | template<typename T>
| ^~~~~~~~
In file included from ../src/capi.cpp:13:
../include/AlpinoCorpus/Stylesheet.hh:8:1: note: ‘extern "C"’ linkage started here
8 | extern "C" {
| ^~~~~~~~~~
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/capi.cpp:13:
/usr/include/unicode/ucnv.h:585:1: error: conflicting declaration of C function ‘void icu_67::swap(icu_67::LocalUConverterPointer&, icu_67::LocalUConverterPointer&)’
585 | U_DEFINE_LOCAL_OPEN_POINTER(LocalUConverterPointer, UConverter, ucnv_close);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/unicode/uenum.h:68:1: note: previous declaration ‘void icu_67::swap(icu_67::LocalUEnumerationPointer&, icu_67::LocalUEnumerationPointer&)’
68 | U_DEFINE_LOCAL_OPEN_POINTER(LocalUEnumerationPointer, UEnumeration, uenum_close);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
[10/65] Compiling C++ object src/libalpinocorpus.so.3.0.0.p/CorpusReader.cpp.o
FAILED: src/libalpinocorpus.so.3.0.0.p/CorpusReader.cpp.o
c++ -Isrc/libalpinocorpus.so.3.0.0.p -Isrc -I../src -Iinclude -I../include -I/my/opt/dbxml-2/include -I/usr/include -I/usr/include/libxml2 -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -std=c++11 -g -fPIC -DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_SYSTEM_DYN_LINK=1 -DBOOST_ALL_NO_LIB -MD -MQ src/libalpinocorpus.so.3.0.0.p/CorpusReader.cpp.o -MF src/libalpinocorpus.so.3.0.0.p/CorpusReader.cpp.o.d -o src/libalpinocorpus.so.3.0.0.p/CorpusReader.cpp.o -c ../src/CorpusReader.cpp
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/CorpusReader.cpp:17:
/usr/include/unicode/localpointer.h:67:1: error: template with C linkage
67 | template<typename T>
| ^~~~~~~~
In file included from ../src/CorpusReader.cpp:17:
../include/AlpinoCorpus/Stylesheet.hh:8:1: note: ‘extern "C"’ linkage started here
8 | extern "C" {
| ^~~~~~~~~~
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/CorpusReader.cpp:17:
/usr/include/unicode/localpointer.h:190:1: error: template with C linkage
190 | template<typename T>
| ^~~~~~~~
In file included from ../src/CorpusReader.cpp:17:
../include/AlpinoCorpus/Stylesheet.hh:8:1: note: ‘extern "C"’ linkage started here
8 | extern "C" {
| ^~~~~~~~~~
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/CorpusReader.cpp:17:
/usr/include/unicode/localpointer.h:365:1: error: template with C linkage
365 | template<typename T>
| ^~~~~~~~
In file included from ../src/CorpusReader.cpp:17:
../include/AlpinoCorpus/Stylesheet.hh:8:1: note: ‘extern "C"’ linkage started here
8 | extern "C" {
| ^~~~~~~~~~
In file included from /usr/include/unicode/uenum.h:23,
from /usr/include/unicode/ucnv.h:53,
from /usr/include/libxml2/libxml/encoding.h:31,
from /usr/include/libxml2/libxml/parser.h:810,
from /usr/include/libxml2/libxml/globals.h:18,
from /usr/include/libxml2/libxml/threads.h:35,
from /usr/include/libxml2/libxml/xmlmemory.h:218,
from /usr/include/libxml2/libxml/tree.h:1307,
from /usr/include/libxslt/xsltInternals.h:16,
from ../include/AlpinoCorpus/Stylesheet.hh:9,
from ../src/CorpusReader.cpp:17:
/usr/include/unicode/ucnv.h:585:1: error: conflicting declaration of C function ‘void icu_67::swap(icu_67::LocalUConverterPointer&, icu_67::LocalUConverterPointer&)’
585 | U_DEFINE_LOCAL_OPEN_POINTER(LocalUConverterPointer, UConverter, ucnv_close);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/unicode/uenum.h:68:1: note: previous declaration ‘void icu_67::swap(icu_67::LocalUEnumerationPointer&, icu_67::LocalUEnumerationPointer&)’
68 | U_DEFINE_LOCAL_OPEN_POINTER(LocalUEnumerationPointer, UEnumeration, uenum_close);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /my/opt/dbxml-2/include/xqilla/utils/UTF8Str.hpp:28,
from /my/opt/dbxml-2/include/xqilla/xqilla-dom3.hpp:28,
from ../src/CorpusReader.cpp:32:
/my/opt/dbxml-2/include/xercesc/util/XMLUTF8Transcoder.hpp: In member function ‘void xercesc_3_0::XMLUTF8Transcoder::checkTrailingBytes(XMLByte, unsigned int, unsigned int) const’:
/my/opt/dbxml-2/include/xercesc/util/XMLUTF8Transcoder.hpp:110:25: warning: narrowing conversion of ‘(XMLByte)toCheck’ from ‘XMLByte’ {aka ‘unsigned char’} to ‘char’ [-Wnarrowing]
110 | char byte[2] = {toCheck,0};
| ^~~~~~~
[13/65] Compiling C++ object src/libalpinocorpus.so.3.0.0.p/DbCorpusWriter.cpp.o
ninja: build stopped: subcommand failed.
alpinocorpus now has alpinocorpus-extract to extract a Dact or compact corpus, but most of the functionality of dtget is missing, such as printing one particular entry to stdout.
This functionality could be provided by read(), using an empty list of marker queries by default.
this works with dtxslt:
dtxslt -s Scripts/plat.xsl Enhanced/wiki-1846
here, Scripts/plat.xsl imports another sylesheet (alp.basename.xsl), also from the Scripts directory.
Trying the same with alpinocorpus-xslt, I get an error:
alpinocorpus-xslt Scripts/plat.xsl Enhanced/wiki-1846
I/O warning : failed to load external entity "alp.basename.xsl"
compilation error: element import
xsl:import : unable to load alp.basename.xsl
too bad, since dtxslt does not know about .dact files...
Bug introduced in commit 08fa499
Queries such as
//node[node[@cat="np"]/@begin = node[@cat="np"]/@end ]
do not work with the Berkeley DB XML backend. This seems to be a bug in DB XML, since this query evaluates fine using XQilla and regular document. I am discussing this problem upstream:
http://forums.oracle.com/forums/thread.jspa?threadID=2224974
Most classes in AlpinoCorpus are not thread-safe.
Since opening a file happens at the construction of the object, we cannot tell the object to stop opening a file (or query its progress) For the DirectoryCorpusReader this might be really useful. As far as I understand it isn't possbile to implement this in the DbCorpusReader because dbxml does not support it.
Todo: determine the priority of this feature
The problem:
Cannot cancel the open file operation, wich can take a lot of time for big corpusses.
Possible solutions:
I think 3 or 4 are the best options. I think I would prefer 4 above 3 because it is less magical.
GetUrl.line() onderbreekt als GetUrl.interrupt() is aangeroepen. Hoe dit is geïmplementeerd verhindert de detectie van 'einde data' van de server. Daarom moet de server afsluiten met een regel met Ctrl-D. Anders blijft GetUrl.line() proberen nieuwe regels te lezen.
Is dit zo aan te passen dat het sturen van een Ctrl-D niet nodig is?
alpinocorpus-create -c COMPACT/xxx rno
Segmentation fault (core dumped)
happens if I don't have permission to write in directory COMPACT
When the multicorpusreader is used in Dact, the statisticswindow does not contain any values.
(To test, start dact from the commandline with multiple corpora as arguments)
edit: I think MultiCorpusReaderPrivate::MultiIter::contents() is missing.
For the statistics window in Dact it would be really useful if we could access the value of the maching nodes. So for example I could create a query //node[@pt="ww"]/@root and I could query the iterator for this query for the filename of the xml file that matched, and for the string value of @root of the matched node.
edit: to do this correctly, it might be best to merge the search functionality that is now divided between XPathMapper in Dact and runQuery for the dbxml corpuses into alpinocorpus. Let's support runQuery for all the corpus types.
In CMake...
(ALPINOCORPUS_VERSION "1.2.0")
I get::
[ 2%] Building CXX object CMakeFiles/alpino_corpus.dir/src/CompactCorpusWriter.cpp.o In file included from /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/CorpusReader.hh:10, from /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/CompactCorpusWriter.hh:6, from /home/peter/alpino/alpinocorpus/src/CompactCorpusWriter.cpp:3: /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/IterImpl.hh:4:33: error: AlpinoCorpus/Entry.hh: Bestand of map bestaat niet In file included from /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/CorpusReader.hh:10, from /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/CompactCorpusWriter.hh:6, from /home/peter/alpino/alpinocorpus/src/CompactCorpusWriter.cpp:3: /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/IterImpl.hh:16: error: ‘Entry’ does not name a type In file included from /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/CompactCorpusWriter.hh:6, from /home/peter/alpino/alpinocorpus/src/CompactCorpusWriter.cpp:3: /home/peter/alpino/alpinocorpus/include/AlpinoCorpus/CorpusReader.hh:35: error: ‘Entry’ does not name a type make[2]: *** [CMakeFiles/alpino_corpus.dir/src/CompactCorpusWriter.cpp.o] Fout 1 make[1]: *** [CMakeFiles/alpino_corpus.dir/all] Fout 2 make: *** [all] Fout 2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.