skyvern-ai / skyvern Goto Github PK

Automate browser-based workflows with LLMs and Computer Vision

License: GNU Affero General Public License v3.0

Python 70.46% Mako 0.09% Shell 1.61% Jinja 1.86% JavaScript 5.60% HTML 0.05% TypeScript 19.65% CSS 0.54% Dockerfile 0.13%

api automation browser browser-automation computer gpt llm playwright python rpa vision workflow

skyvern's Issues

Dockerfile and Docker compose yml with instructions needed

Please dockerize this project and provide docker compose file too for running it

How about interaction with google extensions?

Update setup.sh to create DB user with a password

I saw a couple of cases on Discord where re-creating the postgresql user with a password solved some issues.

References:

Create a Docker file to make it easier to run Skyvern instances

Context (https://news.ycombinator.com/item?id=39698546#39700013)

Refactor the setup

If I update the LLM provider now, my whole env will be re-setup again, including poetry install, database install, playwright install...

As the PR #102 said, I simply split the functions apart. Sometimes I just want to setup some parts of the project, such as env, or database(actually my db is setup in the remote server, but I can't just execute alembic upgrade head and create secrets.toml alone).

For the long term, the setup still should be refactored by Python scripts or CLI tools to deal with more complicated commands like docker build for image building, clean for local cache cleaning, pytest for testing, uninstall for project uninstalling...

Add option for Azure OpenAI endpoints

Reference from the openai package: https://github.com/openai/openai-python?tab=readme-ov-file#microsoft-azure-openai

This could be handled during the setup script to set some additional environment variables, etc.

Postgres in Docker not getting connected

I am trying to run the skyvern after cloning from the main branch in my local windows system. I have setup all the required recommendations, as directed in readme.md in documentation using python 3.11.

The mongodb docker got setup properly as I could see the database with tables and user are properly setup, But I couldn't see any data in the database, which are supposed to be added.
To further analyze after running the command "poetry run python scripts/create_organization.py Skyvern-Open-Source" I am getting errors, I am attaching the following screenshots for the further reference in this regard so that I can get & fix the root cause run skyvern tool locally from my system,

Prompt exceeding OpenAI's rate limit

I asked Skyvern to perform a search in google and do an exploration. One of the results was a YouTube page with many comments. In this step, it hit the rate limit several times, resulting in OpenAI marking the key as unavailable.

2024-03-10T15:25:54.087388_a_233651180269098532_llm_request.json
2024-03-10T15:25:54.039585_a_233651180269098518_llm_prompt.txt

The errors:

OpenAI rate limit exceeded, marking key as unavailable. error_code=rate_limit_exceeded error_message=Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-KvD1ZUhg9B7cNvz3mxtyAZBX on tokens per min (TPM): Limit 40000, Requested 53355. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-KvD1ZUhg9B7cNvz3mxtyAZBX on tokens per min (TPM): Limit 40000, Requested 53355. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

Skyvern's server startup message is confusing -- we should let the user know that the server has started up correctly!

Should the Skyvern GUI browser also appear automatically in Windows wsl2?

I can't see it.

The browser opens with run_ui.sh, but nothing happens when I click the button "Execute Task".

Budget Management

What would be nice is a simple feature where you can input the maximum number of calls / tokens to use on the entire call. Or even better, do some math and put in a dollar cap. i.e., go fill out the Geico forms for me and don't spend more than $1.00 doing it.

"--disable-blink-features=AutomationControlled" Really?

I can understand wanting to use an LLM to browse the web.

Purposely obfuscating the fact that you're an automation tool is gross and supports people using more intrusive DRM to prevent bots from accessing their sites - at the expense of real visitors.

By all means this is automation software and should respect anti-bot protections.

For context:

"--disable-blink-features=AutomationControlled" is a command line argument that prevents Chromium from indicating that it's currently being controlled by automation software. This is a typical method used by data scrapers to obfuscate their program, bypassing anti-scraping/anti-bot protection.

For the good health of the WWW this service should be more respectful. Otherwise more intrusive methods will be introduced/implemented such as the intrusive Web Environment Integrity

Handle autofill-like inputs with dynamic autocomplete

Skyvern fails to fill out autocomplete fields

https://discordapp.com/channels/1212486326352617534/1214296823066534021/1218050218856153118

autofill.mov

Wrong host in secrets.toml

setup.sh will create the API host http://0.0.0.0:8000/api/v1 in secrets.toml

skyvern/setup.sh

Line 253 in d273510

    
           echo -e "[skyvern]\nconfigs = [\n    {\"env\" = \"local\", \"host\" = \"http://0.0.0.0:8000/api/v1\", \"orgs\" = [{name=\"Skyvern\", cred=\"$api_token\"}]}\n]" > .streamlit/secrets.toml

This will make UI send the request to http://0.0.0.0:8000, and fail to get the response in some situations (maybe firewall policy or some permission problems)

We start API listening at 0.0.0.0, but we should access it through localhost, 127.0.0.1, or YOUR LAN IP rather than 0.0.0.0.

Update `./run_skyvern.sh` to run both the visualizer and the server on different ports

This is one of the most common issues people run into while starting our service

Model is a reserved namespace in pydantic

How to use local vision model to replace gpt-4 turbo?

I am interested in this project, I tried a lot and find this work very well. But this seems have to use a lot token of gpt, because of screenshot processing. I tried to replace gpt by local other vision model, but not find where should I modify? where is gpt vision used in the source code?

Integration with Ollama

Great work on this! For testing purposes and local development, potentially can we integrate this with Ollama Chat Completions?

https://github.com/ollama/ollama/blob/main/docs/openai.md

When I set mine up and changed the OpenAPIWrapper to a localhost base_url

class OpenAIKeyClientWrapper:
    client: AsyncOpenAI
    key: str
    remaining_requests: int | None

    def __init__(self, key: str, remaining_requests: int | None) -> None:
        self.key = key
        self.remaining_requests = remaining_requests
        self.updated_at = datetime.utcnow()
        self.client = AsyncOpenAI(api_key=self.key, base_url = 'http://localhost:11434/v1',)

and changing the its model to point towards llama2

    json_response = await app.OPENAI_CLIENT.chat_completion(
        model="llama2",
        step=step,
        prompt=extract_information_prompt,
        screenshots=scraped_page.screenshots,
    )

It doesn't seem to like the request from the Geico.com boilerplate Task

Error message:

Error code: 400 - {'error': {'message': 'json: cannot unmarshal array into Go struct field Message.messages.content of type string', 'type': 'invalid_request_error', 'param': None, 'code': None}}
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://localhost:11434/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

During handling of the above exception, another exception occurred:

  File "/Users/bryankho/Code/skyvern/skyvern/forge/sdk/api/open_ai.py", line 154, in chat_completion
    response = await available_client.client.chat.completions.with_raw_response.create(**chat_completion_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/bryankho/Code/skyvern/skyvern/forge/agent.py", line 335, in agent_step
    json_response = await app.OPENAI_CLIENT.chat_completion(
  File "/Users/bryankho/Code/skyvern/skyvern/forge/agent.py", line 211, in execute_step
    step, detailed_output = await self.agent_step(task, step, browser_state, organization=organization)
  File "/Users/bryankho/Code/skyvern/skyvern/forge/sdk/agent.py", line 97, in __call__
    await self.app(scope, receive, send)
  File "<string>", line 1, in <module>
openai.BadRequestError: Error code: 400 - {'error': {'message': 'json: cannot unmarshal array into Go struct field Message.messages.content of type string', 'type': 'invalid_request_error', 'param': None, 'code': None}}

Not sure is this more of a setup issue or Ollama unable to support all use cases of OpenAI Chat Completion API

Create discussion page

Evaluate: Run skyvern on mobile layouts instead of desktop layouts to reduce the number of clickable elements

Today: browser actions are mostly "Desktop-based", but i believe that "Mobile-based" pages are usually lighter and with less elements, which can result in faster and easier understanding of the content. WDYT?

Hypothesis: this will make running Skyvern a bit cheaper by reducing the set of possible actions on a page

Counter hypothesis: some mobile pages tend to have less information than desktop pages, leading to more steps to complete a worfklow (which tends to be more expensive overall)

Worth testing out!

Openapi docs / api documentation on mintlify

We have plans to write some docs hosted on mintlify. We should also document our endpoints

Add quickstart instructions for Linux and Windows platforms

The current quickstart instructions and the setup.sh script is written for and tested on MacOS.

There is nothing that prevents Skyvern from working successfully on other platforms.

It'd be great if we could add instructions for other platforms.

We are eagerly anticipating integrating Ollama into Skyvern.

We are currently utilizing Ollama and Llama 3, and the prospect of integrating them would be a truly delightful occurrence.

./setup.sh OperationalError: (psycopg.OperationalError) connection failed: FATAL: role "skyvern" does not exist

brew install poetry
brew install postgresql
gh repo clone Skyvern-AI/skyvern
cd skyvern/
poetry env use 3.11
./setup.sh 

OperationalError: (psycopg.OperationalError) connection failed: FATAL:  role "skyvern" does not exist

/Users/josh/Library/Caches/pypoetry/virtualenvs/skyvern-Lm4w_20w-py3.11/lib/python3.11/site-pac │
│ kages/psycopg/connection.py:748 in connect                                                       │
│                                                                                                  │
│    745 │   │                                                                                     │
│    746 │   │   if not rv:                                                                        │
│    747 │   │   │   assert last_ex                                                                │
│ ❱  748 │   │   │   raise last_ex.with_traceback(None)                                            │
│    749 │   │                                                                                     │
│    750 │   │   rv._autocommit = bool(autocommit)                                                 │
│    751 │   │   if row_factory:                                                                   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │           attempt = {                                                                        │ │
│ │                     │   'host': 'localhost',                                                 │ │
│ │                     │   'dbname': 'skyvern',                                                 │ │
│ │                     │   'user': 'skyvern',                                                   │ │
│ │                     │   'hostaddr': '127.0.0.1'                                              │ │
│ │                     }                                                                        │ │
│ │          attempts = [                                                                        │ │
│ │                     │   {                                                                    │ │
│ │                     │   │   'host': 'localhost',                                             │ │
│ │                     │   │   'dbname': 'skyvern',                                             │ │
│ │                     │   │   'user': 'skyvern',                                               │ │
│ │                     │   │   'hostaddr': '127.0.0.1'                                          │ │
│ │                     │   }                                                                    │ │
│ │                     ]                                                                        │ │
│ │        autocommit = False                                                                    │ │
│ │               cls = <class 'psycopg.Connection'>                                             │ │
│ │          conninfo = 'host=localhost dbname=skyvern user=skyvern hostaddr=127.0.0.1'          │ │
│ │           context = <psycopg.adapt.AdaptersMap object at 0x122d1bf50>                        │ │
│ │    cursor_factory = None                                                                     │ │
│ │            kwargs = {'host': 'localhost', 'dbname': 'skyvern', 'user': 'skyvern'}            │ │
│ │           last_ex = OperationalError('connection failed: FATAL:  role "skyvern" does not     │ │
│ │                     exist')                                                                  │ │
│ │            params = {'host': 'localhost', 'dbname': 'skyvern', 'user': 'skyvern'}            │ │
│ │ prepare_threshold = 5                                                                        │ │
│ │       row_factory = None                                                                     │ │
│ │                rv = None                                                                     │ │
│ │           timeout = 130                                                                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OperationalError: (psycopg.OperationalError) connection failed: FATAL:  role "skyvern" does not exist
(Background on this error at: https://sqlalche.me/e/20/e3q8)
.streamlit/secrets.toml file updated with organization details.
Setup completed successfully.

Many people are getting invalid credentials while running Skyvern through the visualizer

Is there a way to connect to psql and make sure it's using the right local credentials / API key?

Geico Test not passing step 1

Hello team,

I have followed the installation instructions and have both

./run_skyvern.sh as well as ./run_ui.sh executing.

However, When trying to run ANY of the tests, It never gets passed step 1:

As you can see, it only creates the first step and then nothing happens, Can you please assist in debugging this issue?

Support running postgres via docker instead of brew

Add a check for old poetry versions to setup.sh

Make sure people are using >1.8

How to replace with a local large model?

Docker Needs to be running

Flagging that docker needs to be running, and that's not covered in the setup instructions.

Support interacting elements within iframes

Can't determine where or why the timeout error occurred.

How can I set the wait_until option for a specific URL format be domcontentloaded instead of commit, load, or networkidle?

➜  skyvern git:(main) ./run_skyvern.sh
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Installing dependencies from lock file

No dependencies to install or update

Installing the current project: skyvern (0.1.0)
Alembic mode:  online
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Alembic mode:  online
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
No new upgrade operations detected.
2024-04-01T22:28:45.277841Z [info     ] Agent server starting.         host=0.0.0.0 port=8000
INFO:     Will watch for changes in these directories: ['/Users/user/GitHub/free-font-downloader/skyvern']
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [11003] using WatchFiles
2024-04-02 07:28:49 [info     ] Registering LLM config         llm_key=OPENAI_GPT4_TURBO
2024-04-02 07:28:49 [info     ] Registering LLM config         llm_key=OPENAI_GPT4V
2024-04-01T22:28:49.912753Z [info     ] Initializing ForgeAgent        browser_action_timeout_ms=5000 browser_type=chromium-headful debug_mode=False env=local execute_all_steps=True long_running_task_warning_ratio=0.95 max_scraping_retries=0 max_steps_per_run=50 video_path=./videos
2024-04-01T22:28:50.016098Z [info     ] Starting the skyvern scheduler.
2024-04-01T22:46:22.247142Z [info     ] Created new task               data_goal=Extract the actual URL used for downloading the file and the name of the file being downloaded. Ensure the data is provided in JSON format, including both the direct download link and the file name. nav_goal=Navigate through the website to first locate the name of the font and the publisher who created and distributed it. After identifying these details, search for a yellow download page button and click it to open the page where the font can be downloaded. In the newly opened page, find and click on the button or buttons to download the font. Some pages may contain multiple download buttons; ensure all are clicked to achieve the goal. proxy_location=NONE task_id=tsk_241928560382524842 title=None url=https://noonnu.cc/font_page/1339
2024-04-01T22:46:22.248015Z [info     ] Executing task using background task executor task_id=tsk_241928560382524842
2024-04-01T22:46:22.340120Z [info     ] Creating browser state for task task_id=tsk_241928560382524842
2024-04-01T22:46:31.162207Z [info     ] Creating a new page           
2024-04-01T22:46:31.858802Z [info     ] A new page is created         
2024-04-01T22:46:31.858918Z [info     ] Navigating page to https://noonnu.cc/font_page/1339 and waiting for 3 seconds
/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/structlog/_base.py:167: UserWarning: Remove `format_exc_info` from your processor chain if you want pretty exceptions.
  event_dict = proc(self._logger, method_name, event_dict)
2024-04-01T22:47:04.922176Z [error    ] Error while navigating to url: Timeout 30000ms exceeded.
Traceback (most recent call last):
  File "/Users/user/GitHub/free-font-downloader/skyvern/skyvern/webeye/browser_factory.py", line 176, in check_and_fix_state
    await self.page.goto(url)
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 9275, in goto
    await self._impl_obj.goto(
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_page.py", line 484, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_frame.py", line 149, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 63, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 495, in wrap_api_call
    return await cb()
           ^^^^^^^^^^
  File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 101, in inner_send
    result = next(iter(done)).result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.

Circular import `from skyvern.forge.agent import ForgeAgent` or `from skyvern.forge.sdk.db.client import AgentDB`

This is making it hard to write tests

dynamic choice of browser context

Although chromium-headless and chromium-heaful are registered into BrowserContextFactory,
we can only choose one for all tasks(workflows) because we create the type by reading the config in env.

skyvern/skyvern/webeye/browser_factory.py

Line 86 in d273510

browser_type = SettingsManager.get_settings().BROWSER_TYPE

Is it necessary to offer a dynamic option? maybe at the task(workflow) level? Like task(workflow) A could choose headless and task(workflow) B could choose headful?
Sure, the BROWSER_TYPE in the env config could be the default choice if not browser type is specified by the task(workflow).

Should we have a Dockerfile as a part of the quickstart?

Just went through setup with someone:

He had the wrong poetry env locally (poetry 1.5 instead of 1.8)
He had a messed up postgres directory (Homebrew/homebrew-core#109644 (comment))
He had conda conflicting with poetry (ended up not being a major issue)

Should we just have the quickstart be a docker image?

Add tutorial for utilizing skyvern for automated UI testing

I'm not sure if this is an intended use case, but this looks like something that could seriously help tremendously with automated UI testing. Would you be able to add something to the ReadMe that explains how one might integrate this into a CI/CD pipeline for that purpose?

Add uninstall script

There are a lot of dependencies that at install or upgraded. we need a better story on how to undo that changes by this project

How to do automation on pages requiring login? Need to automate removing YouTube videos from one of my lists

I tried to setup this automation but faced a problem.

Chromium doesn't save login information so every time I try to run a task, it cannot open my videos list page as I'm not logged in.

test issue sync

updated description in linear

How does skyvern integrate with ollama litellm

Is there a way or tutorial on how to configure ollama litellm to work with skyvern? How can skyvern work with a local llm?

Unable to interact with popup modals on costcotravel.com

https://www.costcotravel.com/

Feedback from here: https://news.ycombinator.com/reply?id=39707620&goto=item%3Fid%3D39706004%2339707620

I played with the Geico example, and it seems to do a good job on the happy path there. But I tried another one where it struggled... I want to get me car rental prices from https://www.costcotravel.com/. I gave it airport + time of pickup and dropoff, but it struggled to hit the "rental car" tab. It got caught up on hitting the Rental Car button at the top, which brings up a popup that it doesn't seem to read.When I put in https://www.costcotravel.com/Rental-Cars, it entered JFK into the pickup location, but then failed to click the popup.

Update README with a list of supported multi-modal models

Currently only GPT-4V is supported

Implement an LLM router to help test out models other than GPT4V easily

UserWarning: Field "model_max_budget" has conflict with protected namespace "model_".

Its not clear whether this installation error is fatal or can be ignored. This is after running ./setup.sh once...

Installing the current project: skyvern (0.1.0)
Installing postgresql using brew
/tmp:5432 - no response
PostgreSQL is already running in a Docker container.
Database user exists.
Database exists.
Installing playwright dependencies...
Running Alembic upgrade...
Alembic mode: online
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
Creating organization and API token...
/Users/steve/Library/Caches/pypoetry/virtualenvs/skyvern-HWQwphh0-py3.11/lib/python3.11/site-packages/pydantic/_internal/fields.py:151: UserWarning: Field "model_max_budget" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
Existing secrets.toml file backed up as secrets.backup.toml
.streamlit/secrets.toml file updated with organization details.
error uploading: HTTPSConnectionPool(host='us-api.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10f7b6a50>: Failed to establish a new connection: [Errno 61] Connection refused'))
Setup completed successfully.

I still seem to be able to run the scheduler after this.

Pass skyvern a reference to a file?

I'm playing around with Skyvern to evaluate it's potential impact on automating general purpose job applications, and hopefully leverage it. Here's what I tried:

URL: https://www.tesla.com/careers/search/job/security-intelligence-analyst--220453
Navigation Goal: Visit the Tesla's job application page for Security Intelligence Analyst, and create and apply as a convincing application profile for a 5-year junior security engineer named John Bogdanovic.

Data Extraction Goal: [Empty]
Navigation Payload JSON: [Empty]
Extracted Information Schema: [Empty]

Result: task fails after repeated attempts at filling out the page. Let me be a little more specific:

Skyvern does an AMAZING job first round, filling out all of the text fields & items that are instantly fillable
2-5: Skyvern attempts to fill out dropdowns, but isn't able to. It attempts to self correct through a retry loop, but each time, ends up with the following:

2024-03-28T15:51:29.376578Z [info     ] Updating step in db            diff={'status': {'old': <StepStatus.running: 'running'>, 'new': <StepStatus.failed: 'failed'>}, 'output': {'old': None, 'new': AgentStepOutput({'action_results': [{'success': False, 'exception_type': 'ImaginaryFileUrl', 'exception_message': 'File url https://example.com/john_bogdanovic_resume.pdf is imaginary.', 'data': None, 'step_retry_number': 5, 'step_order': 2, 'javascript_triggered': False, 'interacted_with_sibling': False, 'interacted_with_parent': False}], 'actions_and_results': [({'action_type': <ActionType.UPLOAD_FILE: 'upload_file'>, 'description': None, 'reasoning': "The user needs to upload a resume to proceed with the application. This action is required as indicated by the context message stating 'Resume; One file max size 10 MB (PDF, Doc, TXT)'.", 'element_id': 305, 'file_url': 'https://example.com/john_bogdanovic_resume.pdf', 'is_upload_file_tag': True}, [{'success': False, 'exception_type': 'ImaginaryFileUrl', 'exception_message': 'File url https://example.com/john_bogdanovic_resume.pdf is imaginary.', 'data': None, 'step_retry_number': 5, 'step_order': 2, 'javascript_triggered': False, 'interacted_with_sibling': False, 'interacted_with_parent': False}]), ({'action_type': <ActionType.CLICK: 'click'>, 'description': None, 'reasoning': "After uploading the resume, the next step is to proceed with the application by clicking the 'Next' button.", 'element_id': 307, 'file_url': None}, [])], 'errors': []})}} step_id=stp_240337201361129268 task_id=tsk_240336174863945298

Is there some way to refer skyvern to a local file?

Also, if any of you do get a chance, I'd love to see a demo of the prompts you'd use to tackle job application pages like Tesla's and get it to work. It would be a really effective use case (none of the SOTA open source agents known can currently fill out a job app page) that would help a lot!

Error logs don't make sense if LLM_KEY is an empty string

When the LLM_KEY is an empty string, error logs don't make sense.

Example:

class InvalidLLMConfigError(BaseLLMError):
    def __init__(self, llm_key: str) -> None:
        super().__init__(f"LLM config with key {llm_key} is not a valid LLMConfig")

Example output:
skyvern.forge.sdk.api.llm.exceptions.InvalidLLMConfigError: LLM config key with key is not a valid LLMConfig

Possible solutions:

Instead of LLM config key with key, it could log LLM_KEY= is not a valid LLMConfig
Default LLM_KEY can be FILLMEIN
We can handle the empty string/None cases while logging: instead of key {llm_key} is, key {llm_key if llm_key else 'EMPTY_VALUE'} is

Adding Gemini support

LiteLLM docs for Gemini: https://litellm.vercel.app/docs/providers/gemini

Required steps for integrating a new LLM model:

Define the required variables for the new models

skyvern/skyvern/config.py

Lines 52 to 72 in 5706295

    
           ##################### 
        
           # LLM Configuration # 
        
           ##################### 
        
           # ACTIVE LLM PROVIDER 
        
           LLM_KEY: str = "OPENAI_GPT4V" 
        
           # COMMON 
        
           LLM_CONFIG_MAX_TOKENS: int = 4096 
        
           LLM_CONFIG_TEMPERATURE: float = 0 
        
           # LLM PROVIDER SPECIFIC 
        
           ENABLE_OPENAI: bool = True 
        
           ENABLE_ANTHROPIC: bool = False 
        
           ENABLE_AZURE: bool = False 
        
           # OPENAI 
        
           OPENAI_API_KEY: str | None = None 
        
           # ANTHROPIC 
        
           ANTHROPIC_API_KEY: str | None = None 
        
           # AZURE 
        
           AZURE_DEPLOYMENT: str | None = None 
        
           AZURE_API_KEY: str | None = None 
        
           AZURE_API_BASE: str | None = None 
        
           AZURE_API_VERSION: str | None = None

Implement the configuration for the new models and register them

skyvern/skyvern/forge/sdk/api/llm/config_registry.py

Lines 57 to 59 in 5706295

    
           if SettingsManager.get_settings().ENABLE_OPENAI: 
        
               LLMConfigRegistry.register_config("OPENAI_GPT4_TURBO", LLMConfig("gpt-4-turbo-preview", ["OPENAI_API_KEY"], False)) 
        
               LLMConfigRegistry.register_config("OPENAI_GPT4V", LLMConfig("gpt-4-vision-preview", ["OPENAI_API_KEY"], True))

Update Skyvern setup script with the LLM configuration options

skyvern/setup.sh

Line 39 in 5706295

setup_llm_providers() {

	#####################
	# LLM Configuration #
	#####################
	# ACTIVE LLM PROVIDER
	LLM_KEY: str = "OPENAI_GPT4V"
	# COMMON
	LLM_CONFIG_MAX_TOKENS: int = 4096
	LLM_CONFIG_TEMPERATURE: float = 0
	# LLM PROVIDER SPECIFIC
	ENABLE_OPENAI: bool = True
	ENABLE_ANTHROPIC: bool = False
	ENABLE_AZURE: bool = False
	# OPENAI
	OPENAI_API_KEY: str \| None = None
	# ANTHROPIC
	ANTHROPIC_API_KEY: str \| None = None
	# AZURE
	AZURE_DEPLOYMENT: str \| None = None
	AZURE_API_KEY: str \| None = None
	AZURE_API_BASE: str \| None = None
	AZURE_API_VERSION: str \| None = None

	if SettingsManager.get_settings().ENABLE_OPENAI:
	LLMConfigRegistry.register_config("OPENAI_GPT4_TURBO", LLMConfig("gpt-4-turbo-preview", ["OPENAI_API_KEY"], False))
	LLMConfigRegistry.register_config("OPENAI_GPT4V", LLMConfig("gpt-4-vision-preview", ["OPENAI_API_KEY"], True))

skyvern-ai / skyvern Goto Github PK

skyvern's Issues

Recommend Projects

Recommend Topics

Recommend Org