Comments (5)
a. Who is scraping the catchup page?
We've already had one project that does it (https://github.com/mihikagaonkar/OTC-Dashboard), so let us not make any assumptions and keep things open for the future.
There was no need to scrape the website, all the data was availabe in the repo.
b. The idea is to expose an api, which people can use instead of scraping.
That is an alternative, but it requires additional effort. What is your plan for this API?
No, that is a side effect of implementing infinite scroll.
The endpoint that will be called to get the next set of summaries will be the same one that someone might use to scrape them.
How much detail will it include? Will it send over the entire file or will it provide options to get dates, durations and other specific parts of the content?
Just the <section>
tags that currently contain each summary in the combined summary page.
-e, --embedded
Output an embeddable document, which excludes the header, the footer, and everything outside the body of the document. This option is useful for producing documents that can be inserted into an external template.
We shouldn't need a new parser, just the-e
flag.
Also more importantly, how would we let someone who wanted to scrape our pages know that we have such a feature available?
Hmm maybe add a page, but most likely someone who wants to scrape the page will analyze network requests.
Or ask us about it.
But in general, there are very few reasons to scrape the summaries from the website.
If someone wants to run static analysis, individual files in the repo are better for that.
The only other reason might be to integrate with another website, but in that case an api would be easier.
from catchup.
Will make it a pain in the ass to scrape ;-;
from catchup.
a. Who is scraping the catchup page?
b. The idea is to expose an api, which people can use instead of scraping.
from catchup.
a. Who is scraping the catchup page?
We've already had one project that does it (https://github.com/mihikagaonkar/OTC-Dashboard), so let us not make any assumptions and keep things open for the future.
b. The idea is to expose an api, which people can use instead of scraping.
That is an alternative, but it requires additional effort.
What is your plan for this API? How much detail will it include? Will it send over the entire file or will it provide options to get dates, durations and other specific parts of the content? (This API will also act as a blocker if we have to change any file formatting in the future, as we will have to handle different scenarios of file formattings to be parsed and returned.)
Also more importantly, how would we let someone who wanted to scrape our pages know that we have such a feature available?
from catchup.
Makes sense. Thank you.
We should add a note somewhere for scrapers though, just to inform them about the API. (Maybe in the API response?)
We will also have to document the API.
from catchup.
Related Issues (20)
- bugfix: Update actions dependent on deprecated versions of node HOT 1
- cleanup: Remove `alt-start` script from package.json HOT 6
- cleanup: Migrate all markdown files to asciidoc HOT 2
- Update UI for summary page HOT 7
- Hamburger menu consistency across all pages. HOT 1
- Redirect to OTC Talks Web Site. HOT 3
- Add Prettier linting instructions to `CONTRIBUTING.md`. HOT 3
- Add PR checklist. HOT 3
- Add an optional Prettier pre-commit hook. HOT 2
- Error in superscript ordinals HOT 4
- Video element overflows on summary pages when screen width is less than 700 px. HOT 6
- Automate linking Twitter handles in `attendees.adoc` HOT 17
- feat: A button to edit/make the summary better via PRs.. HOT 29
- Improve Documentation for Attendee List Generation Scripts.
- New Telegram/Twitter Actions workflow error. HOT 4
- Refactor Twitter & Telegram Action name & description. HOT 1
- Links in Attendee Map Overridden by `null`. HOT 1
- Add a `devcontainer.json` for all Dependency Needs. HOT 1
- Migrate catchup site to Cloudflare serverless functions HOT 1
- Missing Images in Summaries HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catchup.