Make the /summary page initially load only the last 3

a. Who is scraping the catchup page? <p di

feat: Infinite Scroll for `/summary` page about catchup HOT 5 OPEN

ourtechcommunity commented on September 26, 2024

feat: Infinite Scroll for `/summary` page

from catchup.

Comments (5)

KartikSoneji commented on September 26, 2024 1

a. Who is scraping the catchup page?

We've already had one project that does it (https://github.com/mihikagaonkar/OTC-Dashboard), so let us not make any assumptions and keep things open for the future.

There was no need to scrape the website, all the data was availabe in the repo.

b. The idea is to expose an api, which people can use instead of scraping.

That is an alternative, but it requires additional effort. What is your plan for this API?

No, that is a side effect of implementing infinite scroll.
The endpoint that will be called to get the next set of summaries will be the same one that someone might use to scrape them.

How much detail will it include? Will it send over the entire file or will it provide options to get dates, durations and other specific parts of the content?

Just the <section> tags that currently contain each summary in the combined summary page.

-e, --embedded
Output an embeddable document, which excludes the header, the footer, and everything outside the body of the document. This option is useful for producing documents that can be inserted into an external template.
We shouldn't need a new parser, just the -e flag.

Also more importantly, how would we let someone who wanted to scrape our pages know that we have such a feature available?

Hmm maybe add a page, but most likely someone who wants to scrape the page will analyze network requests.
Or ask us about it.

But in general, there are very few reasons to scrape the summaries from the website.
If someone wants to run static analysis, individual files in the repo are better for that.
The only other reason might be to integrate with another website, but in that case an api would be easier.

from catchup.

sreekaransrinath commented on September 26, 2024

Will make it a pain in the ass to scrape ;-;

from catchup.

KartikSoneji commented on September 26, 2024

a. Who is scraping the catchup page?
b. The idea is to expose an api, which people can use instead of scraping.

from catchup.

HarshKapadia2 commented on September 26, 2024

a. Who is scraping the catchup page?

We've already had one project that does it (https://github.com/mihikagaonkar/OTC-Dashboard), so let us not make any assumptions and keep things open for the future.

b. The idea is to expose an api, which people can use instead of scraping.

That is an alternative, but it requires additional effort.
What is your plan for this API? How much detail will it include? Will it send over the entire file or will it provide options to get dates, durations and other specific parts of the content? (This API will also act as a blocker if we have to change any file formatting in the future, as we will have to handle different scenarios of file formattings to be parsed and returned.)
Also more importantly, how would we let someone who wanted to scrape our pages know that we have such a feature available?

from catchup.

HarshKapadia2 commented on September 26, 2024

Makes sense. Thank you.

We should add a note somewhere for scrapers though, just to inform them about the API. (Maybe in the API response?)
We will also have to document the API.

from catchup.

feat: Infinite Scroll for `/summary` page about catchup HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent