Giter Site home page Giter Site logo

skyvern-ai / skyvern Goto Github PK

View Code? Open in Web Editor NEW
3.5K 26.0 247.0 107.78 MB

Automate browser-based workflows with LLMs and Computer Vision

Home Page: https://www.skyvern.com

License: GNU Affero General Public License v3.0

Python 70.99% Mako 0.09% Shell 1.65% Jinja 1.07% JavaScript 5.69% HTML 0.05% TypeScript 19.78% CSS 0.55% Dockerfile 0.14%
api automation browser computer gpt llm playwright python rpa vision

skyvern's Introduction

๐Ÿ‰ Automate Browser-based workflows using LLMs and Computer Vision ๐Ÿ‰

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions.

Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.

Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them.

This approach gives us a few advantages:

  1. Skyvern can operate on websites itโ€™s never seen before, as itโ€™s able to map visual elements to actions necessary to complete a workflow, without any customized code
  2. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
  3. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
    1. If you wanted to get an auto insurance quote from Geico, the answer to a common question โ€œWere you eligible to drive at 18?โ€ could be inferred from the driver receiving their license at age 16
    2. If you were doing competitor analysis, itโ€™s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)

Want to see examples of Skyvern in action? Jump to #real-world-examples-of-skyvern

How it works

Skyvern was inspired by the Task-Driven autonomous agent design popularized by BabyAGI and AutoGPT -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like Playwright.

Demo

skyvern_demo_video.mp4

Skyvern Cloud

We offer a managed cloud version of Skyvern that allows you to run Skyvern without having to manage the infrastructure. It allows to you run multiple Skyvern instances in parallel to automate your workflows at scale. In addition, Skyvern cloud comes bundled with anti-bot detection mechanisms, proxy network, and CAPTCHA solving to allow you to complete more complicated workflows.

Skyvern Cloud is currently in private beta. If you're interested in using Skyvern Cloud, please reach out to us via email

Quickstart

This quickstart guide will walk you through getting Skyvern up and running on your local machine.

Prerequisites

โš ๏ธ โš ๏ธ MAKE SURE YOU ARE USING PYTHON 3.11 โš ๏ธ โš ๏ธ

Before you begin, make sure you have the following installed:

Note: Our setup script does these two for you, but they are here for reference.

  • Python 3.11
    • poetry env use 3.11
  • PostgreSQL 14 (if you're on a Mac, setup script will install it for you if you have homebrew installed)
    • brew install postgresql

Setup

  1. Clone the repository and navigate to the root directory
  2. Open Docker Desktop (Works for Windows, macOS, and Linux) or run Docker Daemon
  3. Run the setup script to install the necessary dependencies and setup your environment
    ./setup.sh
  4. Start the server
    ./run_skyvern.sh
  5. You can start sending requests to the server, but we built a simple UI to help you get started. To start the UI, run the following command:
    ./run_ui.sh
  6. Navigate to http://localhost:8501 in your browser to start using the UI

Docker Compose setup

  1. Fill in the LLM provider key on the docker-compose.yml
  2. Run the following command:
     docker compose up -d
  3. Navigate to http://localhost:8501 in your browser to start using the UI

Additional Setup for Contributors

If you're looking to contribute to Skyvern, you'll need to install the pre-commit hooks to ensure code quality and consistency. You can do this by running the following command:

pre-commit install

Running your first automation

Executing tasks (UI)

Once you have the UI running, you can start an automation by filling out the fields shown in the UI and clicking "Execute"

Executing tasks (cURL)

curl -X POST -H 'Content-Type: application/json' -H 'x-api-key: {Your local API key}' -d '{
    "url": "https://www.geico.com",
    "webhook_callback_url": "",
    "navigation_goal": "Navigate through the website until you generate an auto insurance quote. Do not generate a home insurance quote. If this page contains an auto insurance quote, consider the goal achieved",
    "data_extraction_goal": "Extract all quote information in JSON format including the premium amount, the timeframe for the quote.",
    "navigation_payload": "{Your data here}",
    "proxy_location": "NONE"
}' http://0.0.0.0:8000/api/v1/tasks

Debugging Skyvern

Skyvern's visualizer allows you to debug every interaction Skyvern takes on the web.

demo_visualizer.mp4

Tasks, Steps, and Actions

Each API request you sent to Skyvern is called a "task". Each task is made up of "steps" which are the individual actions Skyvern takes to complete the task. Each step is made up of "actions" which are the individual interactions Skyvern takes on a particular website.

Every time you call the API, you will be given a task_id you can use to find a task within the visualizer. Within each task, you'll be able to interact with each step, and see the specific actions Skyvern took to complete the task.

In the screenshot below, we're navigating to finditparts.com and searching for a truck part. You'll see each action it took listed there, alongside the reasoning behind each action.

In addition to the actions suggested by the LLM in text form, Skyvern's visualizer also shows the state of the screen at the time of the action, with a 1:1 action to screenshot mapping. This allows you to see exactly what Skyvern saw when it made a decision, and debug any issues that may have arisen.

Real-world examples of Skyvern

We love to see how Skyvern is being used in the wild. Here are some examples of how Skyvern is being used to automate workflows in the real world. Please open PRs to add your own examples!

You'll need to have Skyvern running locally if you want to try these examples out. Please run the following command after going through the quickstart guide:

./run_skyvern.sh

Automate materials procurement for a manufacturing company

๐Ÿ’ก See it in action

./run_ui.sh finditparts

Navigating to government websites to register accounts or fill out forms

๐Ÿ’ก See it in action

./run_ui.sh california_edd 

Retrieving insurance quotes from insurance providers in any language

๐Ÿ’ก See it in action

./run_ui.sh bci_seguros

๐Ÿ’ก See it in action

./run_ui.sh geico

Frequently Asked Questions (FAQs)

What gets us excited about Skyvern?

Our focus is bringing stability to browser-based workflows. We leverage LLMs to create an AI Agent capable of interacting with websites like you or I would โ€” all via a simple API call.

Feature Roadmap

This is our planned roadmap for the next few months. If you have any suggestions or would like to see a feature added, please don't hesitate to reach out to us via email or discord.

  • Open Source - Open Source Skyvern's core codebase
  • [BETA] Workflow support - Allow support to chain multiple Skyvern calls together
  • Improved context - Improve Skyvern's ability to understand content around interactable elements by introducing feeding relevant label context through the text prompt
  • Cost Savings - Improve Skyvern's stability and reduce the cost of running Skyvern by optimizing the context tree passed into Skyvern
  • Self-serve UI - Deprecate the Streamlit UI in favour of a React-based UI component that allows users to kick off new jobs in Skyvern
  • Prompt Caching - Introduce a caching layer to the LLM calls to dramatically reduce the cost of running Skyvern (memorize past actions and repeat them!)
  • Chrome Viewport streaming - Introduce a way to live-stream the Chrome viewport to the user's browser (as a part of the self-serve UI)
  • Past Runs UI - Deprecate the Streamlit UI in favour of a React-based UI that allows you to visualize past runs and their results
  • Integrate LLM Observability tools - Integrate LLM Observability tools to allow back-testing prompt changes with specific data sets + visualize the performance of Skyvern over time
  • Integrate public datasets - Integrate Skyvern with public benchmark tests to track the quality our models over time
  • Workflow UI Builder - Introduce a UI to allow users to build and analyze workflows visually
  • Langchain Integration - Create langchain integration in langchain_community to use Skyvern as a "tool".

Contributing

We welcome PRs and suggestions! Don't hesitate to open a PR/issue or to reach out to us via email or discord. Please have a look at our contribution guide and "Help Wanted" issues to get started!

Telemetry

By Default, Skyvern collects basic usage statistics to help us understand how Skyvern is being used. If you would like to opt-out of telemetry, please set the SKYVERN_TELEMETRY environment variable to false.

License

Skyvern's open source repository is supported via a managed cloud. All of the core logic powering Skyvern is available in this open source repository licensed under the AGPL-3.0 License, with the exception of anti-bot measures available in our managed cloud offering.

If you have any questions or concerns around licensing, please contact us and we would be happy to help.

Star History

Star History Chart

skyvern's People

Contributors

eltociear avatar lawyzheng avatar martincarapia avatar msalihaltun avatar suchintan avatar webermatias avatar wintonzheng avatar ykeremy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skyvern's Issues

Error logs don't make sense if LLM_KEY is an empty string

When the LLM_KEY is an empty string, error logs don't make sense.

Example:

class InvalidLLMConfigError(BaseLLMError):
    def __init__(self, llm_key: str) -> None:
        super().__init__(f"LLM config with key {llm_key} is not a valid LLMConfig")

Example output:
skyvern.forge.sdk.api.llm.exceptions.InvalidLLMConfigError: LLM config key with key is not a valid LLMConfig

Possible solutions:

  • Instead of LLM config key with key, it could log LLM_KEY= is not a valid LLMConfig
  • Default LLM_KEY can be FILLMEIN
  • We can handle the empty string/None cases while logging: instead of key {llm_key} is, key {llm_key if llm_key else 'EMPTY_VALUE'} is

Add tutorial for utilizing skyvern for automated UI testing

I'm not sure if this is an intended use case, but this looks like something that could seriously help tremendously with automated UI testing. Would you be able to add something to the ReadMe that explains how one might integrate this into a CI/CD pipeline for that purpose?

Can't determine where or why the timeout error occurred.

How can I set the wait_until option for a specific URL format be domcontentloaded instead of commit, load, or networkidle?

โžœ  skyvern git:(main) ./run_skyvern.sh
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: skyvern (0.1.0)
Alembic mode:  online
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Alembic mode:  online
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
No new upgrade operations detected.
2024-04-01T22:28:45.277841Z [info     ] Agent server starting.         host=0.0.0.0 port=8000
INFO:     Will watch for changes in these directories: ['/Users/user/GitHub/free-font-downloader/skyvern']
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [11003] using WatchFiles
2024-04-02 07:28:49 [info     ] Registering LLM config         llm_key=OPENAI_GPT4_TURBO
2024-04-02 07:28:49 [info     ] Registering LLM config         llm_key=OPENAI_GPT4V
2024-04-01T22:28:49.912753Z [info     ] Initializing ForgeAgent        browser_action_timeout_ms=5000 browser_type=chromium-headful debug_mode=False env=local execute_all_steps=True long_running_task_warning_ratio=0.95 max_scraping_retries=0 max_steps_per_run=50 video_path=./videos
2024-04-01T22:28:50.016098Z [info     ] Starting the skyvern scheduler.
2024-04-01T22:46:22.247142Z [info     ] Created new task               data_goal=Extract the actual URL used for downloading the file and the name of the file being downloaded. Ensure the data is provided in JSON format, including both the direct download link and the file name. nav_goal=Navigate through the website to first locate the name of the font and the publisher who created and distributed it. After identifying these details, search for a yellow download page button and click it to open the page where the font can be downloaded. In the newly opened page, find and click on the button or buttons to download the font. Some pages may contain multiple download buttons; ensure all are clicked to achieve the goal. proxy_location=NONE task_id=tsk_241928560382524842 title=None url=https://noonnu.cc/font_page/1339
2024-04-01T22:46:22.248015Z [info     ] Executing task using background task executor task_id=tsk_241928560382524842
2024-04-01T22:46:22.340120Z [info     ] Creating browser state for task task_id=tsk_241928560382524842
2024-04-01T22:46:31.162207Z [info     ] Creating a new page           
2024-04-01T22:46:31.858802Z [info     ] A new page is created         
2024-04-01T22:46:31.858918Z [info     ] Navigating page to https://noonnu.cc/font_page/1339 and waiting for 3 seconds
/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/structlog/_base.py:167: UserWarning: Remove `format_exc_info` from your processor chain if you want pretty exceptions.
  event_dict = proc(self._logger, method_name, event_dict)
2024-04-01T22:47:04.922176Z [error    ] Error while navigating to url: Timeout 30000ms exceeded.
Traceback (most recent call last):
  File "/Users/user/GitHub/free-font-downloader/skyvern/skyvern/webeye/browser_factory.py", line 176, in check_and_fix_state
    await self.page.goto(url)
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 9275, in goto
    await self._impl_obj.goto(
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_page.py", line 484, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_frame.py", line 149, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 63, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 495, in wrap_api_call
    return await cb()
           ^^^^^^^^^^
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 101, in inner_send
    result = next(iter(done)).result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.

Refactor the setup

If I update the LLM provider now, my whole env will be re-setup again, including poetry install, database install, playwright install...

As the PR #102 said, I simply split the functions apart. Sometimes I just want to setup some parts of the project, such as env, or database(actually my db is setup in the remote server, but I can't just execute alembic upgrade head and create secrets.toml alone).

For the long term, the setup still should be refactored by Python scripts or CLI tools to deal with more complicated commands like docker build for image building, clean for local cache cleaning, pytest for testing, uninstall for project uninstalling...

UserWarning: Field "model_max_budget" has conflict with protected namespace "model_".

Its not clear whether this installation error is fatal or can be ignored. This is after running ./setup.sh once...

Installing the current project: skyvern (0.1.0)
Installing postgresql using brew
/tmp:5432 - no response
PostgreSQL is already running in a Docker container.
Database user exists.
Database exists.
Installing playwright dependencies...
Running Alembic upgrade...
Alembic mode: online
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
Creating organization and API token...
/Users/steve/Library/Caches/pypoetry/virtualenvs/skyvern-HWQwphh0-py3.11/lib/python3.11/site-packages/pydantic/_internal/fields.py:151: UserWarning: Field "model_max_budget" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
Existing secrets.toml file backed up as secrets.backup.toml
.streamlit/secrets.toml file updated with organization details.
error uploading: HTTPSConnectionPool(host='us-api.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10f7b6a50>: Failed to establish a new connection: [Errno 61] Connection refused'))
Setup completed successfully.

I still seem to be able to run the scheduler after this.

Unable to interact with popup modals on costcotravel.com

https://www.costcotravel.com/

Feedback from here: https://news.ycombinator.com/reply?id=39707620&goto=item%3Fid%3D39706004%2339707620

I played with the Geico example, and it seems to do a good job on the happy path there. But I tried another one where it struggled... I want to get me car rental prices fromย https://www.costcotravel.com/. I gave it airport + time of pickup and dropoff, but it struggled to hit the "rental car" tab. It got caught up on hitting the Rental Car button at the top, which brings up a popup that it doesn't seem to read.When I put inย https://www.costcotravel.com/Rental-Cars, it entered JFK into the pickup location, but then failed to click the popup.

Postgres in Docker not getting connected

I am trying to run the skyvern after cloning from the main branch in my local windows system. I have setup all the required recommendations, as directed in readme.md in documentation using python 3.11.

The mongodb docker got setup properly as I could see the database with tables and user are properly setup, But I couldn't see any data in the database, which are supposed to be added.
To further analyze after running the command "poetry run python scripts/create_organization.py Skyvern-Open-Source" I am getting errors, I am attaching the following screenshots for the further reference in this regard so that I can get & fix the root cause run skyvern tool locally from my system,
top-part
bottom-part

Evaluate: Run skyvern on mobile layouts instead of desktop layouts to reduce the number of clickable elements

Today: browser actions are mostly "Desktop-based", but i believe that "Mobile-based" pages are usually lighter and with less elements, which can result in faster and easier understanding of the content. WDYT?

Hypothesis: this will make running Skyvern a bit cheaper by reducing the set of possible actions on a page

Counter hypothesis: some mobile pages tend to have less information than desktop pages, leading to more steps to complete a worfklow (which tends to be more expensive overall)

Worth testing out!

"--disable-blink-features=AutomationControlled" Really?

I can understand wanting to use an LLM to browse the web.

Purposely obfuscating the fact that you're an automation tool is gross and supports people using more intrusive DRM to prevent bots from accessing their sites - at the expense of real visitors.

By all means this is automation software and should respect anti-bot protections.

For context:

"--disable-blink-features=AutomationControlled" is a command line argument that prevents Chromium from indicating that it's currently being controlled by automation software. This is a typical method used by data scrapers to obfuscate their program, bypassing anti-scraping/anti-bot protection.

For the good health of the WWW this service should be more respectful. Otherwise more intrusive methods will be introduced/implemented such as the intrusive Web Environment Integrity

Add quickstart instructions for Linux and Windows platforms

The current quickstart instructions and the setup.sh script is written for and tested on MacOS.

There is nothing that prevents Skyvern from working successfully on other platforms.

It'd be great if we could add instructions for other platforms.

Geico Test not passing step 1

Hello team,

I have followed the installation instructions and have both

./run_skyvern.sh as well as ./run_ui.sh executing.

However, When trying to run ANY of the tests, It never gets passed step 1:

image

As you can see, it only creates the first step and then nothing happens, Can you please assist in debugging this issue?

dynamic choice of browser context

Although chromium-headless and chromium-heaful are registered into BrowserContextFactory,
we can only choose one for all tasks(workflows) because we create the type by reading the config in env.

browser_type = SettingsManager.get_settings().BROWSER_TYPE

Is it necessary to offer a dynamic option? maybe at the task(workflow) level? Like task(workflow) A could choose headless and task(workflow) B could choose headful?
Sure, the BROWSER_TYPE in the env config could be the default choice if not browser type is specified by the task(workflow).

Adding Gemini support

LiteLLM docs for Gemini: https://litellm.vercel.app/docs/providers/gemini

Required steps for integrating a new LLM model:

  1. Define the required variables for the new models

#####################
# LLM Configuration #
#####################
# ACTIVE LLM PROVIDER
LLM_KEY: str = "OPENAI_GPT4V"
# COMMON
LLM_CONFIG_MAX_TOKENS: int = 4096
LLM_CONFIG_TEMPERATURE: float = 0
# LLM PROVIDER SPECIFIC
ENABLE_OPENAI: bool = True
ENABLE_ANTHROPIC: bool = False
ENABLE_AZURE: bool = False
# OPENAI
OPENAI_API_KEY: str | None = None
# ANTHROPIC
ANTHROPIC_API_KEY: str | None = None
# AZURE
AZURE_DEPLOYMENT: str | None = None
AZURE_API_KEY: str | None = None
AZURE_API_BASE: str | None = None
AZURE_API_VERSION: str | None = None

  1. Implement the configuration for the new models and register them

if SettingsManager.get_settings().ENABLE_OPENAI:
LLMConfigRegistry.register_config("OPENAI_GPT4_TURBO", LLMConfig("gpt-4-turbo-preview", ["OPENAI_API_KEY"], False))
LLMConfigRegistry.register_config("OPENAI_GPT4V", LLMConfig("gpt-4-vision-preview", ["OPENAI_API_KEY"], True))

  1. Update Skyvern setup script with the LLM configuration options

setup_llm_providers() {

Budget Management

What would be nice is a simple feature where you can input the maximum number of calls / tokens to use on the entire call. Or even better, do some math and put in a dollar cap. i.e., go fill out the Geico forms for me and don't spend more than $1.00 doing it.

Add uninstall script

There are a lot of dependencies that at install or upgraded. we need a better story on how to undo that changes by this project

How to use local vision model to replace gpt-4 turbo?

I am interested in this project, I tried a lot and find this work very well. But this seems have to use a lot token of gpt, because of screenshot processing. I tried to replace gpt by local other vision model, but not find where should I modify? where is gpt vision used in the source code?

Pass skyvern a reference to a file?

I'm playing around with Skyvern to evaluate it's potential impact on automating general purpose job applications, and hopefully leverage it. Here's what I tried:

URL: https://www.tesla.com/careers/search/job/security-intelligence-analyst--220453
Navigation Goal: Visit the Tesla's job application page for Security Intelligence Analyst, and create and apply as a convincing application profile for a 5-year junior security engineer named John Bogdanovic.

Data Extraction Goal: [Empty]
Navigation Payload JSON: [Empty]
Extracted Information Schema: [Empty]

Result: task fails after repeated attempts at filling out the page. Let me be a little more specific:

  1. Skyvern does an AMAZING job first round, filling out all of the text fields & items that are instantly fillable
    2-5: Skyvern attempts to fill out dropdowns, but isn't able to. It attempts to self correct through a retry loop, but each time, ends up with the following:
2024-03-28T15:51:29.376578Z [info     ] Updating step in db            diff={'status': {'old': <StepStatus.running: 'running'>, 'new': <StepStatus.failed: 'failed'>}, 'output': {'old': None, 'new': AgentStepOutput({'action_results': [{'success': False, 'exception_type': 'ImaginaryFileUrl', 'exception_message': 'File url https://example.com/john_bogdanovic_resume.pdf is imaginary.', 'data': None, 'step_retry_number': 5, 'step_order': 2, 'javascript_triggered': False, 'interacted_with_sibling': False, 'interacted_with_parent': False}], 'actions_and_results': [({'action_type': <ActionType.UPLOAD_FILE: 'upload_file'>, 'description': None, 'reasoning': "The user needs to upload a resume to proceed with the application. This action is required as indicated by the context message stating 'Resume; One file max size 10 MB (PDF, Doc, TXT)'.", 'element_id': 305, 'file_url': 'https://example.com/john_bogdanovic_resume.pdf', 'is_upload_file_tag': True}, [{'success': False, 'exception_type': 'ImaginaryFileUrl', 'exception_message': 'File url https://example.com/john_bogdanovic_resume.pdf is imaginary.', 'data': None, 'step_retry_number': 5, 'step_order': 2, 'javascript_triggered': False, 'interacted_with_sibling': False, 'interacted_with_parent': False}]), ({'action_type': <ActionType.CLICK: 'click'>, 'description': None, 'reasoning': "After uploading the resume, the next step is to proceed with the application by clicking the 'Next' button.", 'element_id': 307, 'file_url': None}, [])], 'errors': []})}} step_id=stp_240337201361129268 task_id=tsk_240336174863945298

Is there some way to refer skyvern to a local file?

Also, if any of you do get a chance, I'd love to see a demo of the prompts you'd use to tackle job application pages like Tesla's and get it to work. It would be a really effective use case (none of the SOTA open source agents known can currently fill out a job app page) that would help a lot!

Wrong host in secrets.toml

setup.sh will create the API host http://0.0.0.0:8000/api/v1 in secrets.toml

echo -e "[skyvern]\nconfigs = [\n {\"env\" = \"local\", \"host\" = \"http://0.0.0.0:8000/api/v1\", \"orgs\" = [{name=\"Skyvern\", cred=\"$api_token\"}]}\n]" > .streamlit/secrets.toml

This will make UI send the request to http://0.0.0.0:8000, and fail to get the response in some situations (maybe firewall policy or some permission problems)

We start API listening at 0.0.0.0, but we should access it through localhost, 127.0.0.1, or YOUR LAN IP rather than 0.0.0.0.

Prompt exceeding OpenAI's rate limit

I asked Skyvern to perform a search in google and do an exploration. One of the results was a YouTube page with many comments. In this step, it hit the rate limit several times, resulting in OpenAI marking the key as unavailable.

2024-03-10T15:25:57 064833_a_233651193154000422_screenshot_final
2024-03-10T15:25:54.087388_a_233651180269098532_llm_request.json
2024-03-10T15:25:54.039585_a_233651180269098518_llm_prompt.txt

The errors:

OpenAI rate limit exceeded, marking key as unavailable. error_code=rate_limit_exceeded error_message=Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-KvD1ZUhg9B7cNvz3mxtyAZBX on tokens per min (TPM): Limit 40000, Requested 53355. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-KvD1ZUhg9B7cNvz3mxtyAZBX on tokens per min (TPM): Limit 40000, Requested 53355. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

./setup.sh OperationalError: (psycopg.OperationalError) connection failed: FATAL: role "skyvern" does not exist

brew install poetry
brew install postgresql
gh repo clone Skyvern-AI/skyvern
cd skyvern/
poetry env use 3.11
./setup.sh 

OperationalError: (psycopg.OperationalError) connection failed: FATAL:  role "skyvern" does not exist

/Users/josh/Library/Caches/pypoetry/virtualenvs/skyvern-Lm4w_20w-py3.11/lib/python3.11/site-pac โ”‚
โ”‚ kages/psycopg/connection.py:748 in connect                                                       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    745 โ”‚   โ”‚                                                                                     โ”‚
โ”‚    746 โ”‚   โ”‚   if not rv:                                                                        โ”‚
โ”‚    747 โ”‚   โ”‚   โ”‚   assert last_ex                                                                โ”‚
โ”‚ โฑ  748 โ”‚   โ”‚   โ”‚   raise last_ex.with_traceback(None)                                            โ”‚
โ”‚    749 โ”‚   โ”‚                                                                                     โ”‚
โ”‚    750 โ”‚   โ”‚   rv._autocommit = bool(autocommit)                                                 โ”‚
โ”‚    751 โ”‚   โ”‚   if row_factory:                                                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ locals โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ โ”‚
โ”‚ โ”‚           attempt = {                                                                        โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   'host': 'localhost',                                                 โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   'dbname': 'skyvern',                                                 โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   'user': 'skyvern',                                                   โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   'hostaddr': '127.0.0.1'                                              โ”‚ โ”‚
โ”‚ โ”‚                     }                                                                        โ”‚ โ”‚
โ”‚ โ”‚          attempts = [                                                                        โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   {                                                                    โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   โ”‚   'host': 'localhost',                                             โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   โ”‚   'dbname': 'skyvern',                                             โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   โ”‚   'user': 'skyvern',                                               โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   โ”‚   'hostaddr': '127.0.0.1'                                          โ”‚ โ”‚
โ”‚ โ”‚                     โ”‚   }                                                                    โ”‚ โ”‚
โ”‚ โ”‚                     ]                                                                        โ”‚ โ”‚
โ”‚ โ”‚        autocommit = False                                                                    โ”‚ โ”‚
โ”‚ โ”‚               cls = <class 'psycopg.Connection'>                                             โ”‚ โ”‚
โ”‚ โ”‚          conninfo = 'host=localhost dbname=skyvern user=skyvern hostaddr=127.0.0.1'          โ”‚ โ”‚
โ”‚ โ”‚           context = <psycopg.adapt.AdaptersMap object at 0x122d1bf50>                        โ”‚ โ”‚
โ”‚ โ”‚    cursor_factory = None                                                                     โ”‚ โ”‚
โ”‚ โ”‚            kwargs = {'host': 'localhost', 'dbname': 'skyvern', 'user': 'skyvern'}            โ”‚ โ”‚
โ”‚ โ”‚           last_ex = OperationalError('connection failed: FATAL:  role "skyvern" does not     โ”‚ โ”‚
โ”‚ โ”‚                     exist')                                                                  โ”‚ โ”‚
โ”‚ โ”‚            params = {'host': 'localhost', 'dbname': 'skyvern', 'user': 'skyvern'}            โ”‚ โ”‚
โ”‚ โ”‚ prepare_threshold = 5                                                                        โ”‚ โ”‚
โ”‚ โ”‚       row_factory = None                                                                     โ”‚ โ”‚
โ”‚ โ”‚                rv = None                                                                     โ”‚ โ”‚
โ”‚ โ”‚           timeout = 130                                                                      โ”‚ โ”‚
โ”‚ โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
OperationalError: (psycopg.OperationalError) connection failed: FATAL:  role "skyvern" does not exist
(Background on this error at: https://sqlalche.me/e/20/e3q8)
.streamlit/secrets.toml file updated with organization details.
Setup completed successfully.

Integration with Ollama

Great work on this! For testing purposes and local development, potentially can we integrate this with Ollama Chat Completions?

https://github.com/ollama/ollama/blob/main/docs/openai.md

When I set mine up and changed the OpenAPIWrapper to a localhost base_url

class OpenAIKeyClientWrapper:
    client: AsyncOpenAI
    key: str
    remaining_requests: int | None

    def __init__(self, key: str, remaining_requests: int | None) -> None:
        self.key = key
        self.remaining_requests = remaining_requests
        self.updated_at = datetime.utcnow()
        self.client = AsyncOpenAI(api_key=self.key, base_url = 'http://localhost:11434/v1',)

and changing the its model to point towards llama2

    json_response = await app.OPENAI_CLIENT.chat_completion(
        model="llama2",
        step=step,
        prompt=extract_information_prompt,
        screenshots=scraped_page.screenshots,
    )

It doesn't seem to like the request from the Geico.com boilerplate Task

Error message:

Error code: 400 - {'error': {'message': 'json: cannot unmarshal array into Go struct field Message.messages.content of type string', 'type': 'invalid_request_error', 'param': None, 'code': None}}
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://localhost:11434/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

During handling of the above exception, another exception occurred:

  File "/Users/bryankho/Code/skyvern/skyvern/forge/sdk/api/open_ai.py", line 154, in chat_completion
    response = await available_client.client.chat.completions.with_raw_response.create(**chat_completion_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryankho/Code/skyvern/skyvern/forge/agent.py", line 335, in agent_step
    json_response = await app.OPENAI_CLIENT.chat_completion(
  File "/Users/bryankho/Code/skyvern/skyvern/forge/agent.py", line 211, in execute_step
    step, detailed_output = await self.agent_step(task, step, browser_state, organization=organization)
  File "/Users/bryankho/Code/skyvern/skyvern/forge/sdk/agent.py", line 97, in __call__
    await self.app(scope, receive, send)
  File "<string>", line 1, in <module>
openai.BadRequestError: Error code: 400 - {'error': {'message': 'json: cannot unmarshal array into Go struct field Message.messages.content of type string', 'type': 'invalid_request_error', 'param': None, 'code': None}}

Not sure is this more of a setup issue or Ollama unable to support all use cases of OpenAI Chat Completion API

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.