tokitouq / manga-api Goto Github PK

View Code? Open in Web Editor NEW

17.0 2.0 4.0 3.03 MB

A Python based web scraping api built with fastapi that provides easy access to manga contents

Home Page: https://mangareader-api.vercel.app

License: MIT License

Python 87.68% Makefile 2.89% Shell 5.41% HTML 4.02%

fastapi manga mangareader python python-web-scraper scraping web-scraping manga-api anime

manga-api's Introduction

mangareader-api

Python-based web scraping tool built with FastAPI that provides easy access to manga content from the mangareader.to website. This API allows users to retrieve up-to-date information. Enabling developers to create their own manga-related applications and services.

API URL: https://mangareader-api.vercel.app/

Setup Project

Important

CMake rules and Bash scripts are based on poetry
So please make sure you've already installed it.

Bash Script

bash setup.sh

CMake

make install
make dev # run development server

Poetry

poetry install
poetry run dev # run development server

You can also create a python virtual environment and active it and install all the dependencies from requirements.txt file pip install -r requirements.txt and finally run python3 main.py

And server will be running on 0.0.0.0:8000

Contribution

Contributions are welcome!
If you encounter issues or want to add new features, feel free to open pull requests.
Give a ⭐️ if you find this project interesting and useful!

Disclaimer

This project is developed for educational purposes and convenience in accessing manga content.
Respect the website's terms of use and consider the legality of web scraping in your jurisdiction.

manga-api's People

Contributors

Stargazers

Watchers

Forkers

svlobao tconns samuel-1-avson code-master-ajay

manga-api's Issues

docs: add description to endpoints

Helpful links:

https://fastapi.tiangolo.com/tutorial/metadata/#create-metadata-for-tags
https://fastapi.tiangolo.com/tutorial/metadata/#use-your-tags
https://fastapi.tiangolo.com/tutorial/path-operation-configuration/#description-from-docstring
https://fastapi.tiangolo.com/tutorial/path-operation-configuration/#summary-and-description

feat: migrate to `Django` and `drf`

Currently this project uses FastAPI, but now I'm thinking to rewrite this in django and django-rest-framework.
since I use this kinda scraping logics for mangacore ( a project for streaming mangas ), it would be better to try this in django and drf.

I'd like hear your opinions.

feat: add `pre-commit`

Merge codes with coreproject

The scraper should be in shinobi

the rest of the logcis should be ported to django-rest-framework

fix: Typo in endpoint details

https://github.com/tokitou-san/MangaAPI/blob/4a0660358d2e40053921458c2588e0f0af82fdfd/app/api/endpoints.py#L139-L143

Here, used summary and description of genre instead type ( mistake )
fix: Change summary and description for type endpoint

feat: add endpoint `/type/`

Add new endpoint /type/ which returns list of Mangas of specific types.

refactor: use `self.parser` instead `utilities.get-*`

https://github.com/tokitou-san/MangaAPI/blob/e09ad4d36c7001a95c9ece318cb5d3c23bc6fa97/app/api/utils.py#L1-L15

Use self.parser.* instead functions from utilities.
eg: instead this code
https://github.com/tokitou-san/MangaAPI/blob/cccbcb6437861ebaf8aea7ef3d52bd00881cc3be/app/api/scrapers/popular.py#L16

Do something like ( might different for other cases )

slug = self.parser.css("div#manga-trending div.swiper-slide a.link-mask").attributes["href"]

Lmk if you want more info or anything :)

[`Ideas`] : change API architecture

Hi there, hope you like this API.

What do you think about making this a single provider? I mean, scrape datas from just 1 site?
currently you can get same data from 2 different providers.

I think if we move to single provider, we can provider more info such as "latest, trending, sorting, etc..." like extra queries.
duplicate(#175) nvm.

Please share your thoughts bout this.
have a nice day.... ❤️

refactor: add better `404` response

https://github.com/tokitou-san/MangaAPI/blob/4a0660358d2e40053921458c2588e0f0af82fdfd/app/api/scrapers/base_manga.py#L30

Currently it returns empty fields if manga is not found or any error occurs
Instead I'd like to return a better 404 response with message

[`Bug`]: exclude Index route from OpenAPI

https://github.com/tokitouq/manga-api/blob/aa6570d8f9ea777103c8fbc45548016f1dd7d34f/server/api.py#L25-L27

Exclude this route from OpenAPI
use this: https://fastapi.tiangolo.com/advanced/path-operation-advanced-configuration/#exclude-from-openapi

Feat: add response_model for each endpoints

Create a pydantic model and follow this:
https://fastapi.tiangolo.com/tutorial/response-model/

[`Backend`]: refactor codebase for supporting more providers

Unfortunately mangareader seems down ( showing its for sale ).
Currently manga-api only relies on mangareader for data, but since it went down we gotta change provider.

Well I think its better if we can change whole codebase for supporting different providers, we need to scale ( currently this architecture isnt scalable ).
We will prolly change folder structure as well like
url can be like: /api/<provider>/<query>/

List of manga providers we can use: https://github.com/anshumanv/awesome-anime-sources#manga
Note: we might need to use selenium instead, well but its better if we can do this with selectolax itself.

refactor: move `__get_parser` to `helpers`

https://github.com/tokitou-san/MangaAPI/blob/e09ad4d36c7001a95c9ece318cb5d3c23bc6fa97/app/api/scrapers/base_manga.py#L13-L15

feat: add `Search` endpoint

Add a new endpoint which search mangas and return a list.
Endpoint eg: /v1/search/?keyword=one+piece
Scraping url: https://mangareader.to/search?keyword=one+piece

feat: add `markdown`

Add markdown for index page, currently its rendering in markdown format, we need in html
use markdown and pass to template
then render with |safe filter

feat: new `Completed` endpoint

A new endpoint: completed which returns list of Mangas which is completed airing.
scraping link: https://mangareader.to/completed

features:

Offset and Limit queries
Page query

feat: new endpoint `/genre/`

New endpoint to get list of Manga with genre passed as parameter. eg: /genre/action, returns list.
Utilize BaseSearchScraper for this since HTML structure is same.

also add other queries like page offset limit and sort

[`Feature`] : Revert to single provider

Our current workflow is like, now we've 2 providers for getting manga contents.
Due to this, we could only provide limited informations for each endpoints ( need to provide synced info ).
So I think its better if we can provide more information.
For this goal, we need to rely on single provider.

AniList seems promising and has server side rendering.
means we can use selectolax to get info ( faster ).