Giter Site home page Giter Site logo

inveniosoftware / invenio Goto Github PK

View Code? Open in Web Editor NEW
613.0 128.0 290.0 87.68 MB

Invenio digital library framework

Home Page: https://invenio.readthedocs.io

License: MIT License

Python 88.44% Shell 11.56%
python flask elasticsearch postgresql redis rabbitmq digital-library digital-repository invenio inveniosoftware

invenio's Introduction

Invenio Framework v3

Warning

The main development effort is currently on the InvenioRDM project and there will be no new releases of Invenio framework. However, each Invenio module is actively maintained as part of InvenioRDM.

Open Source framework for large-scale digital repositories.

image

image

Invenio Framework is like a Swiss Army knife of battle-tested, safe and secure modules providing you with all the features you need to build a trusted digital repository.

Our other products

Looking for a turn-key Research Data Management platform? Checkout InvenioRDM

Looking for a modern Integrated Library System? Checkout InvenioILS

Built with Invenio Framework

See examples on https://inveniosoftware.org/products/framework/ and https://inveniosoftware.org/showcase/.

invenio's People

Contributors

crepererum avatar drjova avatar egabancho avatar fjorba avatar greut avatar jalavik avatar jeromecaffaro avatar jirikuncar avatar jmartinm avatar kaplun avatar kasioumis avatar kneczaj avatar konstantinoskostis avatar kpsherva avatar lnielsen avatar ludmilamarian avatar manzikki avatar nkalodimas avatar ntarocco avatar osso avatar pedrogaudencio avatar ppiotr avatar pxke avatar romanchyla avatar sbujam avatar slint avatar tgbaron avatar tiborsimko avatar valkyriesavage avatar wohthan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

invenio's Issues

full-text snippets: configuration to use number of chars

Originally on 2010-05-11

  1. The full-text snippet configuration needs to use the number of
    characters, not the number of words. We are allowed to show say 100
    characters around the pattern, rounded to the closest word outside of
    these 100 characters. So we need to replace
    CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS configuration variables with
    character counting before v1.0 is out, in order to stabilize the
    config file.

  2. Moreover, the length of the snippet depends on the full-text file
    provenance. The provenance is currently store as bibdoc type, so this
    has to be analyzed when snippets are generated. The full-text snippet
    configuration should then look almost like a dictionary:

CFG_WEBSEARCH_FULLTEXT_SNIPPETS_CHARS = {
  'arXiv': 200,
  'Springer': 180,
  'APS': 100,
}
  1. Even the number of snippets to show can perhaps vary per source, so
    it may be perhaps good to store it in the configuration as well, e.g.
    (50, 200) would mean we are able to show up to 50 snippets containing
    up to 200 characters.

  2. The configuration variable that determines how many snippets are
    shown per record in the HTML brief output format on the search results
    pages can probably stay source independent.

ERROR: bibdocfile - BibDocs functions

Originally on 2010-05-04

======================================================================
ERROR: bibdocfile - BibDocs functions
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibdocfile_regression_tests.py", line 165, in test_BibDocs
    my_new_bibdoc.add_icon( CFG_PREFIX + '/lib/webtest/invenio/icon-test.gif', basename=None, format=None)
TypeError: add_icon() got an unexpected keyword argument 'basename'

FAIL: webjournal - gets an article view of a journal from cache

Originally on 2010-05-04

======================================================================
FAIL: webjournal - gets an article view of a journal from cache
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/webjournal_regression_tests.py", line 311, in test_get_article_page_from_cache
    assert("April 14th, 1832.—Leaving Socêgo, we rode to another estate on the Rio Macâe" in value)
AssertionError

search_engine.py is catching Exception all over the place

Originally on 2010-05-08

Many places in search_engine catch Exception, when they should be more specific about the errors that they catch. It not only helps the reader to understand how the try-wrapped call could fail, it also means that unexpected exception types still get raised up the stack - so we see more types of errors, and can fix them.

full-text snippets: `Show more' snippets and `Find inside' facility

Originally on 2010-05-11

By default, in the HTML brief format on the search results pages, we
are showing only a limited number of snippets, governed by the
CFG_WEBSEARCH_FULLTEXT_SNIPPETS configuration variable.

The snippet box on the search results page should offer a link in the
bottom-right entitled e.g. Show more that the user could click upon
to see more snippets. This link would lead the user to the Fulltext
tab on the detailed record page for the given record. This page would
offer on the rhs a box entitled Find inside that would provide
snippet grepping facility with more snippets to show. It would be
kind of similar to `Search in this book' of Google Books.

Here is an ASCII art mock-up of the detailed record page tab:


Information | References (34) | Citations (124) | Fulltext (3)


 Main file(s):                              Find inside:

 thesis                                     [neutralino_______] [FIND]
 version 1
  thesis.pdf [247.7 KB] 04 May 2010         1. ... supersymmetric neutralino ...
                                            2. ... 238 GeV neutralinos ...
                                           20. ... neutralino dark matter ...

                                            [view next 20 snippets]

batchuploader: permit submission of Text MARC files

Originally on 2010-05-05

Catalogers may prefer to work offline with Text MARC files instead of
with MARCXML files in their editors. So batchuploader should permit
submitting of Text MARC files, not only MARCXML files. For Text MARC
submissions, the conversion utility (textmarc2xmlmarc, to be
committed soon) should be called to convert Text MARC files into
MARCXML files before further processing. (That is, before checking
collection rights, submitting to bibupload queue, etc.)

FAIL: webjournal - checks if an article is new or not

Originally on 2010-05-04

======================================================================
FAIL: webjournal - checks if an article is new or not
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/webjournal_regression_tests.py", line 42, in test_is_new_article
    self.assertEqual(article, True)
AssertionError: False != True

test

Originally on 2010-04-20

test

batchuploader: prettify user messages

Originally on 2010-05-05

The batchuploader facility output messages should be prettified.

E.g. try to submit document for insertion vs document for revision.
The mode works, but the output message is misleading.

E.g. try to submit a MARCXML file to which you don't have proper
collection rights. The demo user Dorian can submit only Books files.
The check works, but if he does not have rights, the general output
message from the access control system is misleading.

FAIL: htmlutils - washing of tags altering formatting of a page (e.g. </html>)

Originally on 2010-05-04

======================================================================
FAIL: htmlutils - washing of tags altering formatting of a page (e.g. </html>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/htmlutils_tests.py", line 42, in test_forbidden_formatting_tags
    '')
AssertionError: '</div>' != ''

----------------------------------------------------------------------

FAIL: websearch - query ellis, citation summary output format

Originally on 2010-05-04

======================================================================
FAIL: websearch - query ellis, citation summary output format
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/websearch_regression_tests.py", line 1420, in test_ellis_citation_summary
    expected_link_label='1'))
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/search?p=ellis&of=hcs (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/search?p=ellis&of=hcs (login guest) does not contain link to http://pcuds33.cern.ch/search?p=ellis%20cited%3A1-%3E9&rm=citation entitled 1..']

use specific jQuery versions

Originally on 2010-04-20

One more thing to do before 1.0 in the jQuery department: there are
still some svn/tags/latest parts in install-jquery-plugins. This
is too bleeding edge, and we have been bitten by this in the past, so
to speak, e.g. file renames in
0c841f1, e.g. calendar changes in
e2a978e.

So it would be good to go through the whole install-jquery-plugins
target and change remaining parts in order to wget always some very
specific version of a jQuery library, as is needed, like in my changes
mentioned above. (I have been doing that as was necessary for
merging, I have not gone systematically through every jQuery
dependency we have.)

Can you please look at that?

If there is a trouble with some dependency lib URL stability, or
versions are lacking on the remote site, then we can host the
dependency on cdsware.cern.ch site, like the other installation files,
e.g. demo records, e.g. keyword ontologies.

Remove 'TextColor' item from WebComment FCKeditor toolbar

Originally on 2010-04-30

Since submitted comments are washed from potentially unsafe attributes and tags, it does not make sense to offer the 'TextColor' item in the toolbar displayed for WebComment input form (as the "style" attribute will be washed away). This item should be removed from the default settings.

Behaviour of other toolbar items should also be checked.

Bibclassify slow - cache loading

Originally on 2010-05-18

Bibclassify, when loading the cache, is very slow

Cache contains regexes (and many of them), Python recompiles the regexes during load: http://stackoverflow.com/questions/65266/caching-compiled-regex-objects-in-python

Since we have much bigger taxonomy now, I need to investigate ways to:

  • make it smaller (perhaps share regexes among Keyword object)
  • import only once (and share among threads)
  • make the console application interactive (thus there is a penalty
    only for the first run)

(the cache is loaded only once, as this profile shows):

Mon May 17 15:25:57 2010 /opt/cds-invenio/var/tmp/invenio-profile-stats-20100517152447.raw

    10719206 function calls (10505840 primitive calls) in 69.884 CPU seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 69.884 69.884 webinterface_handler.py:330(_handler)
2/1 0.000 0.000 69.883 69.883 webinterface_handler.py:171(_traverse)
1 0.000 0.000 69.878 69.878 websearch_webinterface.py:418(call)
1 0.000 0.000 69.874 69.874 search_engine.py:3986(perform_request_search)
1 0.000 0.000 69.762 69.762 search_engine.py:3214(print_records)
1 0.000 0.000 69.750 69.750 bibclassify_webinterface.py:65(main_page)
1 0.000 0.000 69.734 69.734 bibclassify_webinterface.py:189(generate_keywords)
1 0.000 0.000 69.720 69.720 bibclassify_engine.py:141(get_keywords_from_local_file)
1 0.000 0.000 69.637 69.637 bibclassify_engine.py:165(get_keywords_from_text)
1 0.000 0.000 56.950 56.950 bibclassify_ontology_reader.py:69(get_regular_expressions)
1 0.003 0.003 56.948 56.948 bibclassify_ontology_reader.py:572(_get_cache)
1 1.354 1.354 56.945 56.945 {cPickle.load}
17752 0.527 0.000 55.818 0.003 re.py:227(_compile)
17701 0.465 0.000 55.023 0.003 sre_compile.py:501(compile)
17701 0.325 0.000 31.684 0.002 sre_parse.py:669(parse)
28313/17701 0.743 0.000 30.786 0.002 sre_parse.py:307(_parse_sub)
34424/17701 8.570 0.000 30.433 0.002 sre_parse.py:385(_parse)
17701 0.197 0.000 22.618 0.001 sre_compile.py:486(_code)
83568/17701 5.385 0.000 17.814 0.001 sre_compile.py:38(_compile)

"raise StandardError" in bibindex_engine

Originally on 2010-05-18

In bibindex_engine, we find 8 occurrences of 'raise StandardError' without a message. If not caught and treated, this leads to useless emails sent to the admin as they do not contain the cause of the exception and thus prevent the admin to correctly fix the problem.

Here are the lines causing problems:
1245: raise StandardError
1311: raise StandardError
1486: raise StandardError
1510: raise StandardError
1542: raise StandardError
1565: raise StandardError
1605: raise StandardError
1629: raise StandardError

grapher: investigate usage of flot

Originally on 2010-04-26

  1. Instead of using gnuplot in the grapher, investigate the usage of
    jQuery plotting tools, such as [http://code.google.com/p/flot/].
    Advantage being the possibility to zoom into graphs easily.
    Disadvantage may be higher server side stress related to Ajax. So we
    would have to nicely cache the {(X1,Y1),(X2,Y2),...} JSON structures
    that Flot would need, just as we are now caching the PNGs produced by
    gnuplot. The server stress would have to be tested. Caching PNGs is
    definitely more scalable: once cached, they are served directly by
    Apache, not by Invenio WSGI application.

  2. Flot would be especially well suited for admin-like graphs in the
    WebStat module, where currently the x-axis scale is chosen from a
    drow-down selection box. It would be even more practical to zoom
    here, and the stress issue would be less important here, since there
    is only a few connections to the admin-level pages, as opposed to the
    user-level pages.

BitTorrent support

Originally on 2010-05-16

As Invenio is a digital object repository, when big files are being handled, BitTorrent could be a perfect solution for user to access such big files. Although not currently allowed at CERN, it could be a good solution for other installation handling huge multimedia material, or e.g. huge scientific data.

A possible integration would be as a special plugin to the WebSubmit converter tools. We can introduce a new converter to .torrent files, (e.g. by using transmission CLI) that would first create the torrent, and the seed it.

This would require integration with a Tracker, and e.g. we might use OpenTracker that can be configured to track only white-listed torrents (e.g. those created before with transmission).

Since BitTorrent would be useful only for huge files which are of interest to many users, we might either extend the /files handler to answer even for format that does not exists yet, and register the desire for a conversion that can be later handled by a daemon (See ticket #47), or alternatively we might have an automatic procedure that base on a threshold of size and number of download will pre-create the torrent.

WebSession: more user-friendly /yourgroups messages

Originally on 2010-05-04

The output messages of /yourgroups subsystem should be prettified.
Currently, when you try to edit a group that you don't have rights for
(via URL mangling), the system says:

Error: Sorry there was an error with the database.
<type 'exceptions.TypeError'> 'dict' object is not callable

which is misleading. Moreover, invenio.err is unnecessarily
filled with ERR_WEBSESSION_DB_ERROR alarms.

The UI should report something more user friendly, like:

Error: Sorry, you don't have rights to edit this user group.

or:

Error: Sorry, group foo does not exist.

The database errors should be reported only when there are real
database problems.

crash in /yourgroups related to unwashed arguments

Originally on 2010-04-23

The /yourgroups facility should improve its argument washing.

An URL such as https://localhost/yourgroups/edit?grpID=foo leads to
500 Internal Server Error and a traceback, because grpID had not been
washed properly in the web interface layer before being passed onto
the business logic layer.

>>>> Frame edit in /usr/lib/python2.5/site-packages/invenio/websession_webinterface.py at line 1190
*******************************************************************************
      1187         else :
      1188             (body, errors, warnings)= webgroup.perform_request_edit_group(uid=uid,
      1189                                                                           grpID=argd['grpID'],
----> 1190                                                                           ln=argd['ln'])
      1191
      1192
      1193
*******************************************************************************


>>>> Frame perform_request_edit_group in /usr/lib/python2.5/site-packages/invenio/webgroup.py at line 387
*******************************************************************************
       384
       385     body = ''
       386     errors = []
---->  387     user_status = db.get_user_status(uid, grpID)
       388     if not len(user_status):
       389         errors.append('ERR_WEBSESSION_DB_ERROR')
       390         return (body, errors, warnings)
*******************************************************************************

>>>> Frame get_user_status in /usr/lib/python2.5/site-packages/invenio/webgroup_dblayer.py at line 296
*******************************************************************************
       293                 WHERE id_user = %s
       294                 AND id_usergroup=%s"""
       295     uid = int(uid)
---->  296     grpID = int(grpID)
       297     res = run_sql(query, (uid, grpID))
       298     return res
       299
*******************************************************************************

WebStat: Clean escape_string from queries

Originally on 2010-05-11

Running 'make kwalitee-check-sql-queries' reveals, among others, use of escape_string in generation of queries in webstat_engine.py.

** SQL queries using charset-ignorant escape_string():

...

./modules/webstat/lib/webstat.py:33:from invenio.dbquery import run_sql, escape_string
./modules/webstat/lib/webstat.py:174:        arg = escape_string(argument)
./modules/webstat/lib/webstat_engine.py:25:from invenio.dbquery import run_sql, escape_string
./modules/webstat/lib/webstat_engine.py:259:                sql_query.append("AND `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:261:                sql_query.append("OR `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:263:                sql_query.append("AND NOT `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:317:                    sql_query.append("AND `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:319:                    sql_query.append("OR `%s`" % escape_string(col_title))
./modules/webstat/lib/webstat_engine.py:321:                    sql_query.append("AND NOT `%s`" % escape_string(col_title))

...

This should be cleaned.

FAIL: bibformat - MARCXML output

Originally on 2010-05-04

======================================================================
FAIL: bibformat - MARCXML output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibformat_regression_tests.py", line 345, in test_marcxml_output
    self.assertEqual([], result)
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/record/9?of=xm (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/record/9?of=xm (login guest) does not contain <?xml version="1.0" encoding="UTF-8"?>\n<collection xmlns="http://www.loc.gov/MARC21/slim">\n<record>\n  <controlfield tag="001">9</controlfield>\n  <datafield tag="041" ind1=" " ind2=" ">\n    <subfield code="a">eng</subfield>\n  </datafield>\n  <datafield tag="088" ind1=" " ind2=" ">\n    <subfield code="a">PRE-25553</subfield>\n  </datafield>\n  <datafield tag="088" ind1=" " ind2=" ">\n    <subfield code="a">RL-82-024</subfield>\n  </datafield>\n  <datafield tag="100" ind1=" " ind2=" ">\n    <subfield code="a">Ellis, J</subfield>\n    <subfield code="u">University of Oxford</subfield>\n  </datafield>\n  <datafield tag="245" ind1=" " ind2=" ">\n    <subfield code="a">Grand unification with large supersymmetry breaking</subfield>\n  </datafield>\n  <datafield tag="260" ind1=" " ind2=" ">\n    <subfield code="c">Mar 1982</subfield>\n  </datafield>\n  <datafield tag="300" ind1=" " ind2=" ">\n    <subfield code="a">18 p</subfield>\n  </datafield>\n  <datafield tag="650" ind1="1" ind2="7">\n    <subfield code="2">SzGeCERN</subfield>\n    <subfield code="a">General Theoretical Physics</subfield>\n  </datafield>\n  <datafield tag="700" ind1=" " ind2=" ">\n    <subfield code="a">Ibanez, L E</subfield>\n  </datafield>\n  <datafield tag="700" ind1=" " ind2=" ">\n    <subfield code="a">Ross, G G</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="0">\n    <subfield code="y">1982</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="0">\n    <subfield code="b">11</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="u">Oxford Univ.</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="u">Univ. Auton. Madrid</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="u">Rutherford Lab.</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="1">\n    <subfield code="c">1990-01-28</subfield>\n    <subfield code="l">50</subfield>\n    <subfield code="m">2002-01-04</subfield>\n    <subfield code="o">BATCH</subfield>\n  </datafield>\n  <datafield tag="909" ind1="C" ind2="S">\n    <subfield code="s">h</subfield>\n    <subfield code="w">1982n</subfield>\n  </datafield>\n  <datafield tag="980" ind1=" " ind2=" ">\n    <subfield code="a">PREPRINT</subfield>\n  </datafield>\n</record>\n</collection>..']

FAIL: bibformat - Detailed HTML output

Originally on 2010-05-04

======================================================================
FAIL: bibformat - Detailed HTML output
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibformat_regression_tests.py", line 155, in test_detailed_html_output
    self.assertEqual([], result)
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/record/7?of=hd (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/record/7?of=hd (log
in guest) does not contain <img src="http://pcuds33.cern.ch/record/7/files/icon-9806033.gif" alt="" /><br /><font size="-2"><b>\xc2\xa9 CERN Geneva</b></font>..']

(pdf2)hocr(2pdf) family of algorithms should use compressed images

Originally on 2010-04-22

Currently, when a PDF is re-generated after an intermediate OCR process in the WebSubmit Converter Tools, the generate images are uncompressed bitmaps, that are stored directly in the final PDF, thus unnecessarily bloating it. Usage of JPEG is advisable instead.

bibrank grapher: adaptive x-axis ticks

Originally on 2010-04-26

The citation and download history grapher tool should be more
clever WRT x-axis unit calculation. For articles published long
time ago, the unit used for x axis ticks is so small that the
final result is unreadable:

[http://inspirebeta.net/record/3198/citations]

The x-axis ticks should be adapted to the real x-axis range used
in the input data, close to (x_max - x_min) / 10.

BibEdit: crash when empty history

Originally on 2010-05-10

BibEdit produces traceback if for some reason there are no MARCXML
historical versions for the given record in hstRECORD table. BibEdit
should simply continue gracefully in these cases, the History panel
being empty.

How to reproduce:

$ echo "TRUNCATE hstRECORD" | /opt/cds-invenio/bin/dbexec

and then edit a record.

The traceback is:

  File "/usr/lib/python2.5/site-packages/invenio/bibedit_engine.py", line 527, in perform_request_record
    record_revision, record = create_cache_file(recid, uid)
  File "/usr/lib/python2.5/site-packages/invenio/bibedit_utils.py", line 104, in create_cache_file
    record_revision = get_record_last_modification_date(recid)
  File "/usr/lib/python2.5/site-packages/invenio/bibedit_dblayer.py", line 61, in get_record_last_modification_date
    return run_sql("SELECT max(job_date) FROM hstRECORD WHERE id_bibrec=%s", (recid, ))[0][0].timetuple();
AttributeError: 'NoneType' object has no attribute 'timetuple'

ERROR: bibconvert - availability of BibConvert Admin Guide parts

Originally on 2010-05-04

======================================================================
ERROR: bibconvert - availability of BibConvert Admin Guide parts
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibconvert_regression_tests.py", line 43, in test_availability_bibconvert_admin_guide_parts
    test_web_page_existence(CFG_SITE_URL + '/admin/bibconvert/bibtex.cfg')
  File "/usr/lib/python2.5/site-packages/invenio/testutils.py", line 146, in test_web_page_existence
    browser.open(url)
  File "/usr/lib/python2.5/site-packages/mechanize/_mechanize.py", line 209, in open
    return self._mech_open(url, data, timeout=timeout)
  File "/usr/lib/python2.5/site-packages/mechanize/_mechanize.py", line 261, in _mech_open
    raise response
httperror_seek_wrapper: HTTP Error 404: Not Found

BTW, I'll take care of this when merging mpp->wsgi branches.

BibRank: micro-optimize citation dict memory footprint

Originally on 2010-04-27

The citation dictionary is cached inside each WSGI Invenio daemon
process for speed purposes. It looks like this: (for the demo site)

{18: [96],
 74: [92],
 77: [85, 86],
 78: [79, 91],
 79: [91],
 81: [82, 83, 87, 89],
 84: [85, 88, 91],
 91: [92],
 94: [80],
 95: [77, 86]}

For bigger sites containing 1M of records and having fuller citation
maps, this dictionary can get quite big, e.g. WSGI daemon processes of
the INSPIRE instance eat about 1 GB of RAM.

It would be good to decrease the memory footprint of this citation
dictionary, especially since we are running on a 64-bit OS, where we
may easily consume more bytes to store list elements (of `unsigned
mediumint' type) than necessary.

We should investigate potential local replacements for the list
structure, for example using numpy.array. We can measure the
memory footprint of various data structures via sys.getsizeof()
or via ps auxw process sizes, aiming to find a more memory
optimized, yet still fast enough, data structure to represent the
citation dict.

If needed, we can even create a dedicated intbitset-like C extension,
that would be capable of storing recID vectors in a memory-efficient
way. This is arguably the best micro-optimization technique that we
could go for, albeit it would represent a bit more work than reusing
numpy.array or other some such pre-existing module.

Note that this task is of a micro-optimization kind only, keeping the
overall citation indexer and searcher machinery unchanged, only
changing its internal data structures. The tests will show how much
such a micro-optimization would be worth it. The overall rethinking
of the citation dictionary handling and the inherent memory sharing
procedures would be another task, see some older musings at
[https://twiki.cern.ch/twiki/bin/view/CDS/InvenioScalability].

automated file converter daemon

Originally on 2010-05-16

It would be great to have a deamon (BibTask) to perform conversions. We might have a new table (e.g. conversion) with:

id_bibdoc
format
status
id_users
comment

where id_bibdoc is the document involved in the conversion, format is the desired format, status is either 'waiting', 'running', 'done', 'error', id_users is a intbitset of all the uid of users that should be alerted about the outcome of a given conversion (e.g. a user that have asked for a conversion or a submitter).
In case of error, comment would contain the error message.

FAIL: websearch - restricted pictures not available to Mr. Hyde

Originally on 2010-05-04

======================================================================
FAIL: websearch - restricted pictures not available to Mr. Hyde
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/websearch_regression_tests.py", line 988, in test_restricted_pictures_hyde
    self.fail(merge_error_messages(error_messages))
AssertionError:
*** ERROR: Page http://pcuds33.cern.ch/record/1/files/0106015_01.jpg (login hyde) not accessible. HTTP Error 401: Unauthorized

Note: as with many failed tests I'm now ticketizing, this is a problem
with the test case comparison technique, not with the file restriction
facility. The files are well restricted. I'm noting this down just
in case the summary of this ticket may appear to be alarming...

FAIL: bibcirculation - availability of your loans page

Originally on 2010-05-04

======================================================================
FAIL: bibcirculation - availability of your loans page
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/bibcirculation_regression_tests.py", line 44, in test_your_loans_page_availability
    self.fail(merge_error_messages(error_messages))
AssertionError:
*** ERROR: Page http://pcuds33.cern.ch/yourloans/ (login guest) not accessible. HTTP Error 500: Internal Server Error

FAIL: websearch - check formats exported through /record/1/export/ URLs

Originally on 2010-05-04

======================================================================
FAIL: websearch - check formats exported through /record/1/export/ URLs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/invenio/websearch_regression_tests.py", line 272, in test_exported_formats
    CFG_SITE_LANG))
AssertionError: [] != ['ERROR: Page http://pcuds33.cern.ch/record/1/export/hs (login guest) led to an error: ERROR: Page http://pcuds33.cern.ch/record/1/export/hs (login guest) does not contain <a href="/record/1?ln=en">ALEPH experiment..']

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.