Giter Site home page Giter Site logo

amazon-orders-scraper's Introduction

Amazon Orders Scraper

Overview

This project aims to develop an automatic agent that scrapes Amazon orders, retrieving their details in a structured format. The initial focus is on Amazon Egypt due to the availability of existing orders for testing.

Strategy to Develop an Automated Data Scraping Agent for Amazon

Strategy Overview:

  1. Initial Manual Process:

    • Perform the entire data scraping process manually to understand each step thoroughly.
    • This includes reading library documentation, understanding and debugging code errors, reading HTML contents of the pages, and interacting with page elements.
  2. Incremental Automation:

    • Abstract and automate each step identified in the manual process.
    • Ensure the agent can handle various challenges, such as slow internet speeds and handling ads or popups.
  3. Quality and Adaptability:

    • Develop the agent with qualities such as robust abstraction of manual steps and adaptability to handle unexpected scenarios.
    • Envision a "super agent" that can complete tasks with minimal input, such as "scrape Amazon orders using the following username and password."

Phased Implementation Plan:

Phase 1: Basic Functionality
  • Agent 1: Generates Selenium code to fetch product details based on detailed instructions.
  • Agent 2: Extracts data using Beautiful Soup code, also based on detailed instructions.
Phase 2: Enhanced Automation with Feedback
  • Agent 1: Executes Selenium code multiple times, incorporating feedback from the website to improve performance.
  • Agent 2: Explores HTML patterns to generate more efficient Beautiful Soup code.
Phase 3: Generalized Instructions
  • Agent 1 and Agent 2: Operate based on more general instructions, reducing the need for detailed guidance.
Phase 4: Unified Intelligent Agent
  • Combine the functionalities of the two agents into a single, highly autonomous agent that can perform the entire scraping process with minimal input.

Getting Started

Prerequisites

  • Python 3.x
  • Selenium
  • WebDriver for your preferred browser (e.g., ChromeDriver)

Installation

  1. Clone the repository:

    git clone https://github.com/bely66/amazon-orders-scraper.git
    cd amazon-orders-scraper
  2. Install the required packages:

    pip install -r requirements.txt
  3. Create .env file and add the following (I chose to use the .env file to store the API key and the Amazon credentials safely):

    OPENAI_API_KEY='openai_api_key'
    # ask the user about their username and password
    amz_mail="amazon_email"
    amz_pass="amazon_password"

Usage

Stage One: Manual Approach

  1. Run the manual scraping script to understand the process:

    python login_amazon.py
  2. Follow the instructions to navigate and extract order details manually.

Stage Two: Automated Agent

The agent don't have access to the passwords, but has access to the variables containing them, this way the data isn't sent to openai's servers.

Run the automated agent script:

```bash
python main.py
```

Todos

Todos Phase 1

  • Add a way to scroll through all the products using the years filter.
  • Extract the properties of all divs and send it to the Structurer agent to generate the suitable Beautiful Soup code.
  • Add retrying mechanism while sending the trace of the mistakes to handle any code issues.

Todos for Phase 2

  • Rewrite the prompt to be more generic, in the sense it responds to code outputs rather than only writing code.
  • Find a way to have a brief overview of elements on the page in Selenium or BeautifulSoup so the agent can have the needed information to make an informed decision without exceeding the maximum length as HTMLs could be very long.

Todos for Model Size

  • It's advisable we get to the wanted performance from a very good model.
  • Use the logs and documentation we have from that model to train a smaller model (a good Python model to be fine-tuned on Selenium and BeautifulSoup).
  • We'll need to experiment with the model size but generally a 13b quantized model trained with LORA in 4bit will be perfect in terms of both accuracy, memory requirements, flexibility, and speed.

Ideas to improve automation

  • Incorporate memory so that the agent can use past actions to learn

amazon-orders-scraper's People

Watchers

Mohamed Nabil  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.