Comments (1)
There is a publication-type tag, but it's not used in many of the publications I've tested (mainly in the NYT). Also, the "section" tag may help us more than this tag. We are now also adding a "database" field in the json metadata, so we will be able to separate documents by the database they come from. May want to close this issue and not implement.
from we1s-collector.
Related Issues (20)
- rename repo and rebind to docker hub automated builds
- Generate merged "all" collection and simple sampling (1%, 5%, 10%...)
- `query_expander` minor issue
- Improve error reporting on `searchcmd.py` log file
- Why do searches abort?
- `searchcmd.py` date collection HOT 4
- Sort results from `searchcmd.py` into subdirectories by keyword string
- `searchcmd.py` add no-exact-match to each json file in no exact match folder
- `searchcmd.py` add file slug to each json file
- `searchcmd.py` add regions of coverage field to results HOT 2
- 'searchcmd.py' improve title collection for university wire HOT 1
- 'searchcmd.py' scrubbing inspection
- 'seachcmd.py' move scrubbing outside of notebook workflow
- move scrubbing to collector
- improve scrubber by using NLTK for tokenization
- improve scrub by fixing broken punctuation
- delete stopword removal from scrubber
- title text storage options for collector search / searchcmd
- search.py fails on csv date fields with whitespace before or after
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from we1s-collector.