inveniosoftware / invenio Goto Github PK

View Code? Open in Web Editor NEW

613.0 128.0 290.0 87.68 MB

Invenio digital library framework

Home Page: https://invenio.readthedocs.io

License: MIT License

Python 88.44% Shell 11.56%

python flask elasticsearch postgresql redis rabbitmq digital-library digital-repository invenio inveniosoftware

invenio's Introduction

Invenio Framework v3

Warning

The main development effort is currently on the InvenioRDM project and there will be no new releases of Invenio framework. However, each Invenio module is actively maintained as part of InvenioRDM.

Open Source framework for large-scale digital repositories.

Invenio Framework is like a Swiss Army knife of battle-tested, safe and secure modules providing you with all the features you need to build a trusted digital repository.

Our other products

Looking for a turn-key Research Data Management platform? Checkout InvenioRDM

Looking for a modern Integrated Library System? Checkout InvenioILS

Built with Invenio Framework

See examples on https://inveniosoftware.org/products/framework/ and https://inveniosoftware.org/showcase/.

invenio's People

Contributors

Stargazers

Watchers

Forkers

tiborsimko kaplun epsallida giorgospa egabancho jirikuncar adrianp lnielsen jalavik osso mmo arcolife pamfilos antonis-manaras pombredanne chokribr makistsantekidis fpoli konstantinoskostis caitriana switowski kasioumis ludmilamarian cerndocumentserver fschwenn jeanyvescern wohthan lwegelin maellak rubinir jmartinm chsterz helix84 lukeandrewsmith nalinc pbroz gardenunez bogdan-kulynych inspirehep njbeyli atsikiridis rpantic dzielke voltane kuna91 jsvgoncalves blixhavn robertwinchell tpmccauley mvassilev harrycutts jmacmahon hachreak romanchyla bessemaamira tkurze dset0x kennethhole scoap3 omegak wulfila marioskogias bmckinney aw-bib quatroelementos sdsg-invenio aaroc eamonnmag mvesper jstypka ddaze crepererum atx2020 danmichaelo jochenklein nharraud ffelsner derekstrom otron jiangmin9 costaflavio lchrzaszcz dorsivalg rthieledesy ividim eudat-b2share chakshuahuja charlotteirisc osub3 aurdan jaags bruno314 rosenthalo javierdelgadofernandez drschoener bigyan icimod kgiokas hikkihiki aeonium

invenio's Issues

full-text snippets: configuration to use number of chars

Originally on 2010-05-11

The full-text snippet configuration needs to use the number of
characters, not the number of words. We are allowed to show say 100
characters around the pattern, rounded to the closest word outside of
these 100 characters. So we need to replace
CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS configuration variables with
character counting before v1.0 is out, in order to stabilize the
config file.
Moreover, the length of the snippet depends on the full-text file
provenance. The provenance is currently store as bibdoc type, so this
has to be analyzed when snippets are generated. The full-text snippet
configuration should then look almost like a dictionary:

CFG_WEBSEARCH_FULLTEXT_SNIPPETS_CHARS = {
  'arXiv': 200,
  'Springer': 180,
  'APS': 100,
}

Even the number of snippets to show can perhaps vary per source, so
it may be perhaps good to store it in the configuration as well, e.g.
(50, 200) would mean we are able to show up to 50 snippets containing
up to 200 characters.
The configuration variable that determines how many snippets are
shown per record in the HTML brief output format on the search results
pages can probably stay source independent.

ERROR: bibdocfile - BibDocs functions

Originally on 2010-05-04

======================================================================
ERROR: bibdocfile - BibDocs functions
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibdocfile_regression_tests.py", line 165, in test_BibDocs
    my_new_bibdoc.add_icon( CFG_PREFIX + '/lib/webtest/invenio/icon-test.gif', basename=None, format=None)
TypeError: add_icon() got an unexpected keyword argument 'basename'

FAIL: webjournal - gets an article view of a journal from cache

Originally on 2010-05-04

======================================================================
FAIL: webjournal - gets an article view of a journal from cache
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/webjournal_regression_tests.py", line 311, in test_get_article_page_from_cache
    assert("April 14th, 1832.—Leaving Socêgo, we rode to another estate on the Rio Macâe" in value)
AssertionError

batchuploader: add Run Batchuploader to administration menu

Originally on 2010-05-05

The action Run batchuploader is currently not included in the Administration menu.

search_engine.py is catching Exception all over the place

Originally on 2010-05-08

Many places in search_engine catch Exception, when they should be more specific about the errors that they catch. It not only helps the reader to understand how the try-wrapped call could fail, it also means that unexpected exception types still get raised up the stack - so we see more types of errors, and can fix them.

BibIndex (fulltext): Demo fulltext searching on inspire-hep-dev doesn't find 'rattazzon'

Originally on 2010-04-29

Suprisingly:

astro-ph/0607086

does not find rattazzon even though it is in the fulltext (see snippet for "honor theorist" search in fulltext)

This may be due to its enclosure in '' in the text...

full-text snippets: `Show more' snippets and `Find inside' facility

Originally on 2010-05-11

By default, in the HTML brief format on the search results pages, we
are showing only a limited number of snippets, governed by the
CFG_WEBSEARCH_FULLTEXT_SNIPPETS configuration variable.

The snippet box on the search results page should offer a link in the
bottom-right entitled e.g. Show more that the user could click upon
to see more snippets. This link would lead the user to the Fulltext
tab on the detailed record page for the given record. This page would
offer on the rhs a box entitled Find inside that would provide
snippet grepping facility with more snippets to show. It would be
kind of similar to `Search in this book' of Google Books.

Here is an ASCII art mock-up of the detailed record page tab:


Information | References (34) | Citations (124) | Fulltext (3)


 Main file(s):                              Find inside:

 thesis                                     [neutralino_______] [FIND]
 version 1
  thesis.pdf [247.7 KB] 04 May 2010         1. ... supersymmetric neutralino ...
                                            2. ... 238 GeV neutralinos ...
                                           20. ... neutralino dark matter ...

                                            [view next 20 snippets]

BibEdit: adding field seems broken

Originally on 2010-05-18

Adding new fields in BibEdit seems broken. UI says "Updating..." but nothing happens.

batchuploader: permit submission of Text MARC files

Originally on 2010-05-05

Catalogers may prefer to work offline with Text MARC files instead of
with MARCXML files in their editors. So batchuploader should permit
submitting of Text MARC files, not only MARCXML files. For Text MARC
submissions, the conversion utility (textmarc2xmlmarc, to be
committed soon) should be called to convert Text MARC files into
MARCXML files before further processing. (That is, before checking
collection rights, submitting to bibupload queue, etc.)

Test Ticket for RSS integration

Originally on 2010-04-20

This is a Test to see how newly opened tickets show up in RSS/feeds

FAIL: webjournal - checks if an article is new or not

Originally on 2010-05-04

======================================================================
FAIL: webjournal - checks if an article is new or not
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/webjournal_regression_tests.py", line 42, in test_is_new_article
    self.assertEqual(article, True)
AssertionError: False != True

test

Originally on 2010-04-20

test

#1?resolve=fixed test mail from same machine

Originally by [email protected] on 2009-10-21

test

WebSubmit Icon Creator should become a plugin of WebSubmit Converter Tools

Originally on 2010-04-22

To create an icon is indeed a conversion from a given format to another. So this is a good usecase of a high level plugin to add to the WebSubmit Converter Tools library.

batchuploader: prettify user messages

Originally on 2010-05-05

The batchuploader facility output messages should be prettified.

E.g. try to submit document for insertion vs document for revision.
The mode works, but the output message is misleading.

E.g. try to submit a MARCXML file to which you don't have proper
collection rights. The demo user Dorian can submit only Books files.
The check works, but if he does not have rights, the general output
message from the access control system is misleading.

WebSubmit: Refactoring websubmit_converter_tools to use pluginutils

Originally on 2010-04-22

Currently, recently merged converter tools are implemented as a set of hardcoded algorithms in the websubmit_converter_tools.py module. This is not directly extensible by Invenio admins.

The library should be refactored to use plugins.

FAIL: htmlutils - washing of tags altering formatting of a page (e.g. </html>)

Originally on 2010-05-04

======================================================================
FAIL: htmlutils - washing of tags altering formatting of a page (e.g. </html>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/htmlutils_tests.py", line 42, in test_forbidden_formatting_tags
    '')
AssertionError: '</div>' != ''

----------------------------------------------------------------------

FAIL: websearch - query ellis, citation summary output format

Originally on 2010-05-04

======================================================================
FAIL: websearch - query ellis, citation summary output format
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/websearch_regression_tests.py", line 1420, in test_ellis_citation_summary
    expected_link_label='1'))
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/search?p=ellis&of=hcs (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/search?p=ellis&of=hcs (login guest) does not contain link to http://pcuds33.cern.ch/search?p=ellis%20cited%3A1-%3E9&rm=citation entitled 1..']

use specific jQuery versions

Originally on 2010-04-20

One more thing to do before 1.0 in the jQuery department: there are
still some svn/tags/latest parts in install-jquery-plugins. This
is too bleeding edge, and we have been bitten by this in the past, so
to speak, e.g. file renames in
0c841f1, e.g. calendar changes in
e2a978e.

So it would be good to go through the whole install-jquery-plugins
target and change remaining parts in order to wget always some very
specific version of a jQuery library, as is needed, like in my changes
mentioned above. (I have been doing that as was necessary for
merging, I have not gone systematically through every jQuery
dependency we have.)

Can you please look at that?

If there is a trouble with some dependency lib URL stability, or
versions are lacking on the remote site, then we can host the
dependency on cdsware.cern.ch site, like the other installation files,
e.g. demo records, e.g. keyword ontologies.

Remove 'TextColor' item from WebComment FCKeditor toolbar

Originally on 2010-04-30

Since submitted comments are washed from potentially unsafe attributes and tags, it does not make sense to offer the 'TextColor' item in the toolbar displayed for WebComment input form (as the "style" attribute will be washed away). This item should be removed from the default settings.

Behaviour of other toolbar items should also be checked.

Bibclassify slow - cache loading

Originally on 2010-05-18

Bibclassify, when loading the cache, is very slow

Cache contains regexes (and many of them), Python recompiles the regexes during load: http://stackoverflow.com/questions/65266/caching-compiled-regex-objects-in-python

Since we have much bigger taxonomy now, I need to investigate ways to:

make it smaller (perhaps share regexes among Keyword object)
import only once (and share among threads)
make the console application interactive (thus there is a penalty
only for the first run)

(the cache is loaded only once, as this profile shows):

Mon May 17 15:25:57 2010 /opt/cds-invenio/var/tmp/invenio-profile-stats-20100517152447.raw

    10719206 function calls (10505840 primitive calls) in 69.884 CPU seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 69.884 69.884 webinterface_handler.py:330(_handler)
2/1 0.000 0.000 69.883 69.883 webinterface_handler.py:171(_traverse)
1 0.000 0.000 69.878 69.878 websearch_webinterface.py:418(call)
1 0.000 0.000 69.874 69.874 search_engine.py:3986(perform_request_search)
1 0.000 0.000 69.762 69.762 search_engine.py:3214(print_records)
1 0.000 0.000 69.750 69.750 bibclassify_webinterface.py:65(main_page)
1 0.000 0.000 69.734 69.734 bibclassify_webinterface.py:189(generate_keywords)
1 0.000 0.000 69.720 69.720 bibclassify_engine.py:141(get_keywords_from_local_file)
1 0.000 0.000 69.637 69.637 bibclassify_engine.py:165(get_keywords_from_text)
1 0.000 0.000 56.950 56.950 bibclassify_ontology_reader.py:69(get_regular_expressions)
1 0.003 0.003 56.948 56.948 bibclassify_ontology_reader.py:572(_get_cache)
1 1.354 1.354 56.945 56.945 {cPickle.load}
17752 0.527 0.000 55.818 0.003 re.py:227(_compile)
17701 0.465 0.000 55.023 0.003 sre_compile.py:501(compile)
17701 0.325 0.000 31.684 0.002 sre_parse.py:669(parse)
28313/17701 0.743 0.000 30.786 0.002 sre_parse.py:307(_parse_sub)
34424/17701 8.570 0.000 30.433 0.002 sre_parse.py:385(_parse)
17701 0.197 0.000 22.618 0.001 sre_compile.py:486(_code)
83568/17701 5.385 0.000 17.814 0.001 sre_compile.py:38(_compile)

a test ticket for invenio

Originally by [email protected] on 2009-10-22

test

"raise StandardError" in bibindex_engine

Originally on 2010-05-18

In bibindex_engine, we find 8 occurrences of 'raise StandardError' without a message. If not caught and treated, this leads to useless emails sent to the admin as they do not contain the cause of the exception and thus prevent the admin to correctly fix the problem.

Here are the lines causing problems:
1245: raise StandardError
1311: raise StandardError
1486: raise StandardError
1510: raise StandardError
1542: raise StandardError
1565: raise StandardError
1605: raise StandardError
1629: raise StandardError

#1?resolve=fixed

Originally by [email protected] on 2009-10-21

test

grapher: investigate usage of flot

Originally on 2010-04-26

Instead of using gnuplot in the grapher, investigate the usage of
jQuery plotting tools, such as [http://code.google.com/p/flot/].
Advantage being the possibility to zoom into graphs easily.
Disadvantage may be higher server side stress related to Ajax. So we
would have to nicely cache the {(X1,Y1),(X2,Y2),...} JSON structures
that Flot would need, just as we are now caching the PNGs produced by
gnuplot. The server stress would have to be tested. Caching PNGs is
definitely more scalable: once cached, they are served directly by
Apache, not by Invenio WSGI application.
Flot would be especially well suited for admin-like graphs in the
WebStat module, where currently the x-axis scale is chosen from a
drow-down selection box. It would be even more practical to zoom
here, and the stress issue would be less important here, since there
is only a few connections to the admin-level pages, as opposed to the
user-level pages.

Remove hardcoded 100a 700a etc values

Originally by man on 2010-04-30

In some websearch utils, author tags are still hard coded in Python source. I'll replace by tag names.

BitTorrent support

Originally on 2010-05-16

As Invenio is a digital object repository, when big files are being handled, BitTorrent could be a perfect solution for user to access such big files. Although not currently allowed at CERN, it could be a good solution for other installation handling huge multimedia material, or e.g. huge scientific data.

A possible integration would be as a special plugin to the WebSubmit converter tools. We can introduce a new converter to .torrent files, (e.g. by using transmission CLI) that would first create the torrent, and the seed it.

This would require integration with a Tracker, and e.g. we might use OpenTracker that can be configured to track only white-listed torrents (e.g. those created before with transmission).

Since BitTorrent would be useful only for huge files which are of interest to many users, we might either extend the /files handler to answer even for format that does not exists yet, and register the desire for a conversion that can be later handled by a daemon (See ticket #47), or alternatively we might have an automatic procedure that base on a threshold of size and number of download will pre-create the torrent.

WebSession: more user-friendly /yourgroups messages

Originally on 2010-05-04

The output messages of /yourgroups subsystem should be prettified.
Currently, when you try to edit a group that you don't have rights for
(via URL mangling), the system says:

Error: Sorry there was an error with the database.
<type 'exceptions.TypeError'> 'dict' object is not callable

which is misleading. Moreover, invenio.err is unnecessarily
filled with ERR_WEBSESSION_DB_ERROR alarms.

The UI should report something more user friendly, like:

Error: Sorry, you don't have rights to edit this user group.

or:

Error: Sorry, group foo does not exist.

The database errors should be reported only when there are real
database problems.

WebAccess: Missing Record Merge in admin menu

Originally on 2010-05-11

Link to 'Record Merge' is missing under the administration menu.

crash in /yourgroups related to unwashed arguments

Originally on 2010-04-23

The /yourgroups facility should improve its argument washing.

An URL such as https://localhost/yourgroups/edit?grpID=foo leads to
500 Internal Server Error and a traceback, because grpID had not been
washed properly in the web interface layer before being passed onto
the business logic layer.

>>>> Frame edit in /usr/lib/python2.5/site-packages/invenio/websession_webinterface.py at line 1190
*******************************************************************************
      1187         else :
      1188             (body, errors, warnings)= webgroup.perform_request_edit_group(uid=uid,
      1189                                                                           grpID=argd['grpID'],
----> 1190                                                                           ln=argd['ln'])
      1191
      1192
      1193
*******************************************************************************


>>>> Frame perform_request_edit_group in /usr/lib/python2.5/site-packages/invenio/webgroup.py at line 387
*******************************************************************************
       384
       385     body = ''
       386     errors = []
---->  387     user_status = db.get_user_status(uid, grpID)
       388     if not len(user_status):
       389         errors.append('ERR_WEBSESSION_DB_ERROR')
       390         return (body, errors, warnings)
*******************************************************************************

>>>> Frame get_user_status in /usr/lib/python2.5/site-packages/invenio/webgroup_dblayer.py at line 296
*******************************************************************************
       293                 WHERE id_user = %s
       294                 AND id_usergroup=%s"""
       295     uid = int(uid)
---->  296     grpID = int(grpID)
       297     res = run_sql(query, (uid, grpID))
       298     return res
       299
*******************************************************************************

WebStat: Clean escape_string from queries

Originally on 2010-05-11

Running 'make kwalitee-check-sql-queries' reveals, among others, use of escape_string in generation of queries in webstat_engine.py.

** SQL queries using charset-ignorant escape_string():

...

./modules/webstat/lib/webstat.py:33:from invenio.dbquery import run_sql, escape_string
./modules/webstat/lib/webstat.py:174:        arg = escape_string(argument)
./modules/webstat/lib/webstat_engine.py:25:from invenio.dbquery import run_sql, escape_string
./modules/webstat/lib/webstat_engine.py:259:                sql_query.append("AND `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:261:                sql_query.append("OR `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:263:                sql_query.append("AND NOT `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:317:                    sql_query.append("AND `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:319:                    sql_query.append("OR `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:321:                    sql_query.append("AND NOT `%s`" % escape_string(col_title))

...

This should be cleaned.

FAIL: bibformat - MARCXML output

Originally on 2010-05-04

======================================================================
FAIL: bibformat - MARCXML output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibformat_regression_tests.py", line 345, in test_marcxml_output
    self.assertEqual([], result)
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/record/9?of=xm (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/record/9?of=xm (login guest) does not contain <?xml version="1.0" encoding="UTF-8"?>\n<collection xmlns="http://www.loc.gov/MARC21/slim">\n<record>\n  <controlfield tag="001">9</controlfield>\n  <datafield tag="041" ind1=" " ind2=" ">\n    <subfield code="a">eng</subfield>\n  </datafield>\n  <datafield tag="088" ind1=" " ind2=" ">\n    <subfield code="a">PRE-25553</subfield>\n  </datafield>\n  <datafield tag="088" ind1=" " ind2=" ">\n    <subfield code="a">RL-82-024</subfield>\n  </datafield>\n  <datafield tag="100" ind1=" " ind2=" ">\n    <subfield code="a">Ellis, J</subfield>\n    <subfield code="u">University of Oxford</subfield>\n  </datafield>\n  <datafield tag="245" ind1=" " ind2=" ">\n    <subfield code="a">Grand unification with large supersymmetry breaking</subfield>\n  </datafield>\n  <datafield tag="260" ind1=" " ind2=" ">\n    <subfield code="c">Mar 1982</subfield>\n  </datafield>\n  <datafield tag="300" ind1=" " ind2=" ">\n    <subfield code="a">18 p</subfield>\n  </datafield>\n  <datafield tag="650" ind1="1" ind2="7">\n    <subfield code="2">SzGeCERN</subfield>\n    <subfield code="a">General Theoretical Physics</subfield>\n  </datafield>\n  <datafield tag="700" ind1=" " ind2=" ">\n    <subfield code="a">Ibanez, L E</subfield>\n  </datafield>\n  <datafield tag="700" ind1=" " ind2=" ">\n    <subfield code="a">Ross, G G</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="0">\n    <subfield code="y">1982</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="0">\n    <subfield code="b">11</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="u">Oxford Univ.</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="u">Univ. Auton. Madrid</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="u">Rutherford Lab.</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="c">1990-01-28</subfield>\n    <subfield code="l">50</subfield>\n    <subfield code="m">2002-01-04</subfield>\n    <subfield code="o">BATCH</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="S">\n    <subfield code="s">h</subfield>\n    <subfield code="w">1982n</subfield>\n  </datafield>\n  <datafield tag="980" ind1=" " ind2=" ">\n    <subfield code="a">PREPRINT</subfield>\n  </datafield>\n</record>\n</collection>..']

FAIL: bibformat - Detailed HTML output

Originally on 2010-05-04

======================================================================
FAIL: bibformat - Detailed HTML output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibformat_regression_tests.py", line 155, in test_detailed_html_output
    self.assertEqual([], result)
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/record/7?of=hd (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/record/7?of=hd (log
in guest) does not contain <img src="http://pcuds33.cern.ch/record/7/files/icon-9806033.gif" alt="" /><br /><font size="-2"><b>\xc2\xa9 CERN Geneva</b></font>..']

test from out of the machine

Originally by [email protected] on 2009-10-22

test

(pdf2)hocr(2pdf) family of algorithms should use compressed images

Originally on 2010-04-22

Currently, when a PDF is re-generated after an intermediate OCR process in the WebSubmit Converter Tools, the generate images are uncompressed bitmaps, that are stored directly in the final PDF, thus unnecessarily bloating it. Usage of JPEG is advisable instead.

Test for RSS: unicode ≍≬☠☮☻♥♾

Originally on 2010-04-21

≍≬☠☮☻♥♾

≍≬☠☮☻♥♾
≍≬☠☮☻♥♾

bibrank grapher: adaptive x-axis ticks

Originally on 2010-04-26

The citation and download history grapher tool should be more
clever WRT x-axis unit calculation. For articles published long
time ago, the unit used for x axis ticks is so small that the
final result is unreadable:

[http://inspirebeta.net/record/3198/citations]

The x-axis ticks should be adapted to the real x-axis range used
in the input data, close to (x_max - x_min) / 10.

BibEdit: crash when empty history

Originally on 2010-05-10

BibEdit produces traceback if for some reason there are no MARCXML
historical versions for the given record in hstRECORD table. BibEdit
should simply continue gracefully in these cases, the History panel
being empty.

How to reproduce:

$ echo "TRUNCATE hstRECORD" | /opt/cds-invenio/bin/dbexec

and then edit a record.

The traceback is:

  File "/usr/lib/python2.5/site-packages/invenio/bibedit_engine.py", line 527, in perform_request_record
    record_revision, record = create_cache_file(recid, uid)
  File "/usr/lib/python2.5/site-packages/invenio/bibedit_utils.py", line 104, in create_cache_file
    record_revision = get_record_last_modification_date(recid)
  File "/usr/lib/python2.5/site-packages/invenio/bibedit_dblayer.py", line 61, in get_record_last_modification_date
    return run_sql("SELECT max(job_date) FROM hstRECORD WHERE id_bibrec=%s", (recid, ))[0][0].timetuple();
AttributeError: 'NoneType' object has no attribute 'timetuple'

ERROR: bibconvert - availability of BibConvert Admin Guide parts

Originally on 2010-05-04

======================================================================
ERROR: bibconvert - availability of BibConvert Admin Guide parts
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibconvert_regression_tests.py", line 43, in test_availability_bibconvert_admin_guide_parts
    test_web_page_existence(CFG_SITE_URL + '/admin/bibconvert/bibtex.cfg')
  File "/usr/lib/python2.5/site-packages/invenio/testutils.py", line 146, in test_web_page_existence
    browser.open(url)
  File "/usr/lib/python2.5/site-packages/mechanize/_mechanize.py", line 209, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/lib/python2.5/site-packages/mechanize/_mechanize.py", line 261, in _mech_open
    raise response
httperror_seek_wrapper: HTTP Error 404: Not Found

BTW, I'll take care of this when merging mpp->wsgi branches.

WebSubmit File Stamper should become a plugin of WebSubmit Converter Tools

Originally on 2010-04-22

As in #15, also the File Stamper tool should become a plugin of WebSubmit Converter Tools, as what you obtain from the PDF is again a PDF (but with stamping), so is the same content in the document.

This too depends on #14

BibRank: micro-optimize citation dict memory footprint

Originally on 2010-04-27

The citation dictionary is cached inside each WSGI Invenio daemon
process for speed purposes. It looks like this: (for the demo site)

{18: [96],
 74: [92],
 77: [85, 86],
 78: [79, 91],
 79: [91],
 81: [82, 83, 87, 89],
 84: [85, 88, 91],
 91: [92],
 94: [80],
 95: [77, 86]}

For bigger sites containing 1M of records and having fuller citation
maps, this dictionary can get quite big, e.g. WSGI daemon processes of
the INSPIRE instance eat about 1 GB of RAM.

It would be good to decrease the memory footprint of this citation
dictionary, especially since we are running on a 64-bit OS, where we
may easily consume more bytes to store list elements (of `unsigned
mediumint' type) than necessary.

We should investigate potential local replacements for the list
structure, for example using numpy.array. We can measure the
memory footprint of various data structures via sys.getsizeof()
or via ps auxw process sizes, aiming to find a more memory
optimized, yet still fast enough, data structure to represent the
citation dict.

If needed, we can even create a dedicated intbitset-like C extension,
that would be capable of storing recID vectors in a memory-efficient
way. This is arguably the best micro-optimization technique that we
could go for, albeit it would represent a bit more work than reusing
numpy.array or other some such pre-existing module.

Note that this task is of a micro-optimization kind only, keeping the
overall citation indexer and searcher machinery unchanged, only
changing its internal data structures. The tests will show how much
such a micro-optimization would be worth it. The overall rethinking
of the citation dictionary handling and the inherent memory sharing
procedures would be another task, see some older musings at
[https://twiki.cern.ch/twiki/bin/view/CDS/InvenioScalability].

test from outside

Originally by [email protected] on 2009-10-22

test

A test ticket for invenio

Originally by [email protected] on 2009-10-22

Test ticket

test from outside

Originally by [email protected] on 2009-10-22

test

automated file converter daemon

Originally on 2010-05-16

It would be great to have a deamon (BibTask) to perform conversions. We might have a new table (e.g. conversion) with:

id_bibdoc
format
status
id_users
comment

where id_bibdoc is the document involved in the conversion, format is the desired format, status is either 'waiting', 'running', 'done', 'error', id_users is a intbitset of all the uid of users that should be alerted about the outcome of a given conversion (e.g. a user that have asked for a conversion or a submitter).
In case of error, comment would contain the error message.

make install does not work when the directory contains spaces

Originally on 2010-04-30

If the Invenio source code is placed under a directory containing space in its, name, make install command fails

FAIL: websearch - restricted pictures not available to Mr. Hyde

Originally on 2010-05-04

======================================================================
FAIL: websearch - restricted pictures not available to Mr. Hyde
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/websearch_regression_tests.py", line 988, in test_restricted_pictures_hyde
    self.fail(merge_error_messages(error_messages))
AssertionError:
*** ERROR: Page http://pcuds33.cern.ch/record/1/files/0106015_01.jpg (login hyde) not accessible. HTTP Error 401: Unauthorized

Note: as with many failed tests I'm now ticketizing, this is a problem
with the test case comparison technique, not with the file restriction
facility. The files are well restricted. I'm noting this down just
in case the summary of this ticket may appear to be alarming...

FAIL: bibcirculation - availability of your loans page

Originally on 2010-05-04

======================================================================
FAIL: bibcirculation - availability of your loans page
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibcirculation_regression_tests.py", line 44, in test_your_loans_page_availability
    self.fail(merge_error_messages(error_messages))
AssertionError:
*** ERROR: Page http://pcuds33.cern.ch/yourloans/ (login guest) not accessible. HTTP Error 500: Internal Server Error

BibEdit: integrate BibEdit with bibCirculation

Originally on 2010-05-18

In case of editing documents having a physical copy, a link to BibCirculation interface should be provided

FAIL: websearch - check formats exported through /record/1/export/ URLs

Originally on 2010-05-04

======================================================================
FAIL: websearch - check formats exported through /record/1/export/ URLs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/websearch_regression_tests.py", line 272, in test_exported_formats
    CFG_SITE_LANG))
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/record/1/export/hs (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/record/1/export/hs (login guest) does not contain <a href="/record/1?ln=en">ALEPH experiment..']

inveniosoftware / invenio Goto Github PK

invenio's Introduction

Invenio Framework v3

Warning

invenio's People

Contributors

Stargazers

Watchers

Forkers

invenio's Issues

Recommend Projects

Recommend Topics

Recommend Org