peppapig450 / fashioncrawler Goto Github PK

3.0 2.0 0.0 344 KB

FashionCrawler is a versatile Python tool designed for scraping data from various online shopping platforms including Grailed, Depop, and more.

License: Apache License 2.0

Python 95.83% Jinja 2.95% CSS 1.22%

data-extraction fashion python web-scraping

fashioncrawler's Introduction

fashioncrawler's People

Contributors

Stargazers

Watchers

fashioncrawler's Issues

Setting up CI/CD Environment with Poetry

Description:

To enhance my development process, I'm initiating the setup of a CI/CD environment integrated with Poetry. This CI/CD pipeline will automate the build, test, and deployment processes, ensuring efficiency and consistency in my development workflow. The following tasks need to be addressed:

CI/CD Tool Selection: Research and select a CI/CD tool compatible with Poetry and tailored for our project requirements.
Pipeline Configuration: Define the stages and steps of the CI/CD pipeline, including building, testing, and deploying projects using Poetry.
Integration with GitHub: Integrate the CI/CD pipeline with our Github to trigger builds automatically upon code changes.
Artifact Management: Establish a process for managing built artifacts generated by the CI/CD pipeline, ensuring traceability and reproducibility.
Monitoring and Alerts: Set up monitoring and alerting mechanisms to detect and respond to failures or issues in the CI/CD pipeline.

Tasks:

Research and evaluate CI/CD tools compatible with Poetry and tailored for our project requirements.
Configure the CI/CD pipeline to automate the build, test, and deployment processes using Poetry.
Integrate the CI/CD pipeline with our version control system to trigger builds on code changes.
Establish artifact management practices to store and track built artifacts generated by the CI/CD pipeline.
Implement monitoring and alerting mechanisms to ensure the reliability and stability of the CI/CD pipeline.

Create custom objects for items

Create class for items with attributes instead of using the dictionary maybe

Refine doc strings and documentation in accordance with refactoring

add the module documentation to the init.py files and fix the doc strings and class documentation

Submit button isn't being clicked for Grailed

Grailed submit button isn't being clicked and search isn't being submitted.

HTML and PDF output

Add the ability to output the information scraped by the application in both HTML and PDF formats. This feature will enhance the usability of the application by allowing users to view and share scraped data in a more accessible and versatile manner.

Proposed Implementation:

HTML Output:
- Utilize Jinja2 templating engine to generate HTML output from the scraped data.
- Design HTML templates to present the scraped information in a structured and visually appealing format. See image below
- Ensure compatibility with modern web browsers and responsive design principles for optimal viewing experience across devices.
PDF Output:
- Convert the rendered HTML output to PDF format for offline viewing and sharing.
- Investigate suitable libraries or tools for converting HTML to PDF, considering factors such as performance, customization options
- Implement a solution that seamlessly integrates with the application and provides high-quality PDF output.

Use Cases:

Users can generate HTML reports containing the scraped data for easy viewing and sharing via web browsers.
Users can export the HTML reports to PDF format for offline access or distribution.

Requirements:

Ensure compatibility with common web browsers and PDF viewers.
Support customization options for the HTML templates, such as styling and layout configurations.
Implement error handling mechanisms to gracefully handle edge cases during HTML to PDF conversion.

Dependencies:

Investigate and select a suitable library or tool for HTML to PDF conversion.
Ensure compatibility with existing data scraping functionality and data processing pipeline.

peppapig450 / fashioncrawler Goto Github PK

fashioncrawler's Introduction

Web Scraper for Fashion Marketplace Sites

Table of Contents

Introduction

Project Plan

To-Do List / Possible Features:

Installation

Usage

Options:

Site Selection:

Search Options:

Output Options:

Example Usage:

License