Comments (5)
From the error message, it appears code tries to open a file whose name is the comma separated list:
"Bibcode '2003A&A...402..531C' is linked to a non-existent file '/proj/ads/fulltext/sources/A+A/backdata/2003/17/aah3724/aah3724.right.html,/proj/ads/fulltext/sources/A+A/backdata/2003/17/aah3724/tableE.1.html,/proj/ads/fulltext/sources/A+A/backdata/2003/17/aah3724/table2.html,/proj/ads/fulltext/sources/A+A/backdata/2003/17/aah3724/table1.html'"
from adsfulltext.
non-existent file error is generated at https://github.com/adsabs/ADSfulltext/blob/master/adsft/checker.py#L235
from adsfulltext.
Currently, there are 11259 bibcodes in fulltext/all.links that list multiple files. This is out of 4823177 bibcodes that have fulltext. When multiple files are listed, it looks like the first file (right.html) contains the text of the paper while the other files hold tables.
from adsfulltext.
One complicating factor, commas can part of a filename:
2008bhgs.confE...1V /proj/ads/fulltext/sources/downloads/cache/POS/pos.sissa.it//archive/conferences/075/001/BHs,%20GR%20and%20Strings_001.pdf POS
Perhaps a comma followed by a '/' is the start of a new filename.
from adsfulltext.
I verified the Solr body field for 2003A&A...402..531C has text from each of the 4 files listed in fulltext/all.links.
from adsfulltext.
Related Issues (20)
- Do not remove facilities from acknowledgements HOT 1
- XML articles with valid empty body should end up with an empty body field in solr
- False positive OCR files for "NASA" queries
- Invalid empty body resulting from <!-- body endbody --> syntax
- Handle facility tags without xlink:href
- HTML extractor cannot decode certain characters
- html5lib parser failing due to less than symbol in XML files
- Facilities appear outside of acknowledgements in XML
- Fulltext query for "Mars" and "HiRISE" returns less papers than before
- Extract useful data from tables
- Some parsers are only extracting the acknowledgements from XML files
- Facilities not properly tokenized
- fulltext contains serialized json in the body
- Use database instead of filesystem HOT 2
- Improve use of file close
- Support encodings other than utf-8
- Update parser for Nature XML HOT 2
- Fix missing appendix text HOT 2
- Document Similarity
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adsfulltext.