skyvern-ai / skyvern Goto Github PK
View Code? Open in Web Editor NEWAutomate browser-based workflows with LLMs and Computer Vision
Home Page: https://www.skyvern.com
License: GNU Affero General Public License v3.0
Automate browser-based workflows with LLMs and Computer Vision
Home Page: https://www.skyvern.com
License: GNU Affero General Public License v3.0
Please dockerize this project and provide docker compose file too for running it
I saw a couple of cases on Discord where re-creating the postgresql user with a password solved some issues.
References:
If I update the LLM provider now, my whole env will be re-setup again, including poetry install, database install, playwright install...
As the PR #102 said, I simply split the functions apart. Sometimes I just want to setup some parts of the project, such as env
, or database
(actually my db is setup in the remote server, but I can't just execute alembic upgrade head
and create secrets.toml
alone).
For the long term, the setup still should be refactored by Python scripts or CLI tools to deal with more complicated commands like docker build
for image building, clean
for local cache cleaning, pytest
for testing, uninstall
for project uninstalling...
Reference from the openai package: https://github.com/openai/openai-python?tab=readme-ov-file#microsoft-azure-openai
This could be handled during the setup script to set some additional environment variables, etc.
I am trying to run the skyvern after cloning from the main branch in my local windows system. I have setup all the required recommendations, as directed in readme.md in documentation using python 3.11.
The mongodb docker got setup properly as I could see the database with tables and user are properly setup, But I couldn't see any data in the database, which are supposed to be added.
To further analyze after running the command "poetry run python scripts/create_organization.py Skyvern-Open-Source" I am getting errors, I am attaching the following screenshots for the further reference in this regard so that I can get & fix the root cause run skyvern tool locally from my system,
I asked Skyvern to perform a search in google and do an exploration. One of the results was a YouTube page with many comments. In this step, it hit the rate limit several times, resulting in OpenAI marking the key as unavailable.
2024-03-10T15:25:54.087388_a_233651180269098532_llm_request.json
2024-03-10T15:25:54.039585_a_233651180269098518_llm_prompt.txt
The errors:
OpenAI rate limit exceeded, marking key as unavailable. error_code=rate_limit_exceeded error_message=Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-KvD1ZUhg9B7cNvz3mxtyAZBX on tokens per min (TPM): Limit 40000, Requested 53355. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-KvD1ZUhg9B7cNvz3mxtyAZBX on tokens per min (TPM): Limit 40000, Requested 53355. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
I can't see it.
The browser opens with run_ui.sh, but nothing happens when I click the button "Execute Task".
What would be nice is a simple feature where you can input the maximum number of calls / tokens to use on the entire call. Or even better, do some math and put in a dollar cap. i.e., go fill out the Geico forms for me and don't spend more than $1.00 doing it.
I can understand wanting to use an LLM to browse the web.
Purposely obfuscating the fact that you're an automation tool is gross and supports people using more intrusive DRM to prevent bots from accessing their sites - at the expense of real visitors.
By all means this is automation software and should respect anti-bot protections.
For context:
"--disable-blink-features=AutomationControlled"
is a command line argument that prevents Chromium from indicating that it's currently being controlled by automation software. This is a typical method used by data scrapers to obfuscate their program, bypassing anti-scraping/anti-bot protection.
For the good health of the WWW this service should be more respectful. Otherwise more intrusive methods will be introduced/implemented such as the intrusive Web Environment Integrity
Skyvern fails to fill out autocomplete fields
https://discordapp.com/channels/1212486326352617534/1214296823066534021/1218050218856153118
setup.sh will create the API host http://0.0.0.0:8000/api/v1
in secrets.toml
Line 253 in d273510
http://0.0.0.0:8000
, and fail to get the response in some situations (maybe firewall policy or some permission problems)
We start API listening at 0.0.0.0
, but we should access it through localhost
, 127.0.0.1
, or YOUR LAN IP
rather than 0.0.0.0
.
This is one of the most common issues people run into while starting our service
I am interested in this project, I tried a lot and find this work very well. But this seems have to use a lot token of gpt, because of screenshot processing. I tried to replace gpt by local other vision model, but not find where should I modify? where is gpt vision used in the source code?
Great work on this! For testing purposes and local development, potentially can we integrate this with Ollama Chat Completions?
https://github.com/ollama/ollama/blob/main/docs/openai.md
When I set mine up and changed the OpenAPIWrapper to a localhost base_url
class OpenAIKeyClientWrapper:
client: AsyncOpenAI
key: str
remaining_requests: int | None
def __init__(self, key: str, remaining_requests: int | None) -> None:
self.key = key
self.remaining_requests = remaining_requests
self.updated_at = datetime.utcnow()
self.client = AsyncOpenAI(api_key=self.key, base_url = 'http://localhost:11434/v1',)
and changing the its model to point towards llama2
json_response = await app.OPENAI_CLIENT.chat_completion(
model="llama2",
step=step,
prompt=extract_information_prompt,
screenshots=scraped_page.screenshots,
)
It doesn't seem to like the request from the Geico.com
boilerplate Task
Error message:
Error code: 400 - {'error': {'message': 'json: cannot unmarshal array into Go struct field Message.messages.content of type string', 'type': 'invalid_request_error', 'param': None, 'code': None}}
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'http://localhost:11434/v1/chat/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
During handling of the above exception, another exception occurred:
File "/Users/bryankho/Code/skyvern/skyvern/forge/sdk/api/open_ai.py", line 154, in chat_completion
response = await available_client.client.chat.completions.with_raw_response.create(**chat_completion_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/bryankho/Code/skyvern/skyvern/forge/agent.py", line 335, in agent_step
json_response = await app.OPENAI_CLIENT.chat_completion(
File "/Users/bryankho/Code/skyvern/skyvern/forge/agent.py", line 211, in execute_step
step, detailed_output = await self.agent_step(task, step, browser_state, organization=organization)
File "/Users/bryankho/Code/skyvern/skyvern/forge/sdk/agent.py", line 97, in __call__
await self.app(scope, receive, send)
File "<string>", line 1, in <module>
openai.BadRequestError: Error code: 400 - {'error': {'message': 'json: cannot unmarshal array into Go struct field Message.messages.content of type string', 'type': 'invalid_request_error', 'param': None, 'code': None}}
Not sure is this more of a setup issue or Ollama unable to support all use cases of OpenAI Chat Completion API
Today: browser actions are mostly "Desktop-based", but i believe that "Mobile-based" pages are usually lighter and with less elements, which can result in faster and easier understanding of the content. WDYT?
Hypothesis: this will make running Skyvern a bit cheaper by reducing the set of possible actions on a page
Counter hypothesis: some mobile pages tend to have less information than desktop pages, leading to more steps to complete a worfklow (which tends to be more expensive overall)
Worth testing out!
We have plans to write some docs hosted on mintlify. We should also document our endpoints
The current quickstart instructions and the setup.sh script is written for and tested on MacOS.
There is nothing that prevents Skyvern from working successfully on other platforms.
It'd be great if we could add instructions for other platforms.
We are currently utilizing Ollama and Llama 3, and the prospect of integrating them would be a truly delightful occurrence.
brew install poetry
brew install postgresql
gh repo clone Skyvern-AI/skyvern
cd skyvern/
poetry env use 3.11
./setup.sh
OperationalError: (psycopg.OperationalError) connection failed: FATAL: role "skyvern" does not exist
/Users/josh/Library/Caches/pypoetry/virtualenvs/skyvern-Lm4w_20w-py3.11/lib/python3.11/site-pac │
│ kages/psycopg/connection.py:748 in connect │
│ │
│ 745 │ │ │
│ 746 │ │ if not rv: │
│ 747 │ │ │ assert last_ex │
│ ❱ 748 │ │ │ raise last_ex.with_traceback(None) │
│ 749 │ │ │
│ 750 │ │ rv._autocommit = bool(autocommit) │
│ 751 │ │ if row_factory: │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ attempt = { │ │
│ │ │ 'host': 'localhost', │ │
│ │ │ 'dbname': 'skyvern', │ │
│ │ │ 'user': 'skyvern', │ │
│ │ │ 'hostaddr': '127.0.0.1' │ │
│ │ } │ │
│ │ attempts = [ │ │
│ │ │ { │ │
│ │ │ │ 'host': 'localhost', │ │
│ │ │ │ 'dbname': 'skyvern', │ │
│ │ │ │ 'user': 'skyvern', │ │
│ │ │ │ 'hostaddr': '127.0.0.1' │ │
│ │ │ } │ │
│ │ ] │ │
│ │ autocommit = False │ │
│ │ cls = <class 'psycopg.Connection'> │ │
│ │ conninfo = 'host=localhost dbname=skyvern user=skyvern hostaddr=127.0.0.1' │ │
│ │ context = <psycopg.adapt.AdaptersMap object at 0x122d1bf50> │ │
│ │ cursor_factory = None │ │
│ │ kwargs = {'host': 'localhost', 'dbname': 'skyvern', 'user': 'skyvern'} │ │
│ │ last_ex = OperationalError('connection failed: FATAL: role "skyvern" does not │ │
│ │ exist') │ │
│ │ params = {'host': 'localhost', 'dbname': 'skyvern', 'user': 'skyvern'} │ │
│ │ prepare_threshold = 5 │ │
│ │ row_factory = None │ │
│ │ rv = None │ │
│ │ timeout = 130 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OperationalError: (psycopg.OperationalError) connection failed: FATAL: role "skyvern" does not exist
(Background on this error at: https://sqlalche.me/e/20/e3q8)
.streamlit/secrets.toml file updated with organization details.
Setup completed successfully.
Is there a way to connect to psql and make sure it's using the right local credentials / API key?
Hello team,
I have followed the installation instructions and have both
./run_skyvern.sh as well as ./run_ui.sh executing.
However, When trying to run ANY of the tests, It never gets passed step 1:
As you can see, it only creates the first step and then nothing happens, Can you please assist in debugging this issue?
Make sure people are using >1.8
How to replace with a local large model?
Flagging that docker needs to be running, and that's not covered in the setup instructions.
How can I set the wait_until
option for a specific URL format be domcontentloaded
instead of commit
, load
, or networkidle
?
➜ skyvern git:(main) ./run_skyvern.sh
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Installing dependencies from lock file
No dependencies to install or update
Installing the current project: skyvern (0.1.0)
Alembic mode: online
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
Alembic mode: online
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
No new upgrade operations detected.
2024-04-01T22:28:45.277841Z [info ] Agent server starting. host=0.0.0.0 port=8000
INFO: Will watch for changes in these directories: ['/Users/user/GitHub/free-font-downloader/skyvern']
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: Started reloader process [11003] using WatchFiles
2024-04-02 07:28:49 [info ] Registering LLM config llm_key=OPENAI_GPT4_TURBO
2024-04-02 07:28:49 [info ] Registering LLM config llm_key=OPENAI_GPT4V
2024-04-01T22:28:49.912753Z [info ] Initializing ForgeAgent browser_action_timeout_ms=5000 browser_type=chromium-headful debug_mode=False env=local execute_all_steps=True long_running_task_warning_ratio=0.95 max_scraping_retries=0 max_steps_per_run=50 video_path=./videos
2024-04-01T22:28:50.016098Z [info ] Starting the skyvern scheduler.
2024-04-01T22:46:22.247142Z [info ] Created new task data_goal=Extract the actual URL used for downloading the file and the name of the file being downloaded. Ensure the data is provided in JSON format, including both the direct download link and the file name. nav_goal=Navigate through the website to first locate the name of the font and the publisher who created and distributed it. After identifying these details, search for a yellow download page button and click it to open the page where the font can be downloaded. In the newly opened page, find and click on the button or buttons to download the font. Some pages may contain multiple download buttons; ensure all are clicked to achieve the goal. proxy_location=NONE task_id=tsk_241928560382524842 title=None url=https://noonnu.cc/font_page/1339
2024-04-01T22:46:22.248015Z [info ] Executing task using background task executor task_id=tsk_241928560382524842
2024-04-01T22:46:22.340120Z [info ] Creating browser state for task task_id=tsk_241928560382524842
2024-04-01T22:46:31.162207Z [info ] Creating a new page
2024-04-01T22:46:31.858802Z [info ] A new page is created
2024-04-01T22:46:31.858918Z [info ] Navigating page to https://noonnu.cc/font_page/1339 and waiting for 3 seconds
/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/structlog/_base.py:167: UserWarning: Remove `format_exc_info` from your processor chain if you want pretty exceptions.
event_dict = proc(self._logger, method_name, event_dict)
2024-04-01T22:47:04.922176Z [error ] Error while navigating to url: Timeout 30000ms exceeded.
Traceback (most recent call last):
File "/Users/user/GitHub/free-font-downloader/skyvern/skyvern/webeye/browser_factory.py", line 176, in check_and_fix_state
await self.page.goto(url)
File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 9275, in goto
await self._impl_obj.goto(
File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_page.py", line 484, in goto
return await self._main_frame.goto(**locals_to_params(locals()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_frame.py", line 149, in goto
await self._channel.send("goto", locals_to_params(locals()))
File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 63, in send
return await self._connection.wrap_api_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 495, in wrap_api_call
return await cb()
^^^^^^^^^^
File "/Users/user/GitHub/free-font-downloader/skyvern/.venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 101, in inner_send
result = next(iter(done)).result()
^^^^^^^^^^^^^^^^^^^^^^^^^
playwright._impl._errors.TimeoutError: Timeout 30000ms exceeded.
This is making it hard to write tests
Although chromium-headless
and chromium-heaful
are registered into BrowserContextFactory,
we can only choose one for all tasks(workflows) because we create the type by reading the config in env.
skyvern/skyvern/webeye/browser_factory.py
Line 86 in d273510
Is it necessary to offer a dynamic option? maybe at the task(workflow) level? Like task(workflow) A could choose headless and task(workflow) B could choose headful?
Sure, the BROWSER_TYPE in the env config could be the default choice if not browser type is specified by the task(workflow).
Just went through setup with someone:
Should we just have the quickstart be a docker image?
I'm not sure if this is an intended use case, but this looks like something that could seriously help tremendously with automated UI testing. Would you be able to add something to the ReadMe that explains how one might integrate this into a CI/CD pipeline for that purpose?
There are a lot of dependencies that at install or upgraded. we need a better story on how to undo that changes by this project
Hi
I tried to setup this automation but faced a problem.
Chromium doesn't save login information so every time I try to run a task, it cannot open my videos list page as I'm not logged in.
updated description in linear
Is there a way or tutorial on how to configure ollama litellm to work with skyvern? How can skyvern work with a local llm?
Feedback from here: https://news.ycombinator.com/reply?id=39707620&goto=item%3Fid%3D39706004%2339707620
I played with the Geico example, and it seems to do a good job on the happy path there. But I tried another one where it struggled... I want to get me car rental prices from https://www.costcotravel.com/. I gave it airport + time of pickup and dropoff, but it struggled to hit the "rental car" tab. It got caught up on hitting the Rental Car button at the top, which brings up a popup that it doesn't seem to read.When I put in https://www.costcotravel.com/Rental-Cars, it entered JFK into the pickup location, but then failed to click the popup.
Currently only GPT-4V is supported
Its not clear whether this installation error is fatal or can be ignored. This is after running ./setup.sh once...
Installing the current project: skyvern (0.1.0)
Installing postgresql using brew
/tmp:5432 - no response
PostgreSQL is already running in a Docker container.
Database user exists.
Database exists.
Installing playwright dependencies...
Running Alembic upgrade...
Alembic mode: online
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
Creating organization and API token...
/Users/steve/Library/Caches/pypoetry/virtualenvs/skyvern-HWQwphh0-py3.11/lib/python3.11/site-packages/pydantic/_internal/fields.py:151: UserWarning: Field "model_max_budget" has conflict with protected namespace "model".
You may be able to resolve this warning by setting model_config['protected_namespaces'] = ()
.
warnings.warn(
Existing secrets.toml file backed up as secrets.backup.toml
.streamlit/secrets.toml file updated with organization details.
error uploading: HTTPSConnectionPool(host='us-api.i.posthog.com', port=443): Max retries exceeded with url: /batch/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10f7b6a50>: Failed to establish a new connection: [Errno 61] Connection refused'))
Setup completed successfully.
I still seem to be able to run the scheduler after this.
I'm playing around with Skyvern to evaluate it's potential impact on automating general purpose job applications, and hopefully leverage it. Here's what I tried:
URL: https://www.tesla.com/careers/search/job/security-intelligence-analyst--220453
Navigation Goal: Visit the Tesla's job application page for Security Intelligence Analyst, and create and apply as a convincing application profile for a 5-year junior security engineer named John Bogdanovic.
Data Extraction Goal: [Empty]
Navigation Payload JSON: [Empty]
Extracted Information Schema: [Empty]
Result: task fails after repeated attempts at filling out the page. Let me be a little more specific:
2024-03-28T15:51:29.376578Z [info ] Updating step in db diff={'status': {'old': <StepStatus.running: 'running'>, 'new': <StepStatus.failed: 'failed'>}, 'output': {'old': None, 'new': AgentStepOutput({'action_results': [{'success': False, 'exception_type': 'ImaginaryFileUrl', 'exception_message': 'File url https://example.com/john_bogdanovic_resume.pdf is imaginary.', 'data': None, 'step_retry_number': 5, 'step_order': 2, 'javascript_triggered': False, 'interacted_with_sibling': False, 'interacted_with_parent': False}], 'actions_and_results': [({'action_type': <ActionType.UPLOAD_FILE: 'upload_file'>, 'description': None, 'reasoning': "The user needs to upload a resume to proceed with the application. This action is required as indicated by the context message stating 'Resume; One file max size 10 MB (PDF, Doc, TXT)'.", 'element_id': 305, 'file_url': 'https://example.com/john_bogdanovic_resume.pdf', 'is_upload_file_tag': True}, [{'success': False, 'exception_type': 'ImaginaryFileUrl', 'exception_message': 'File url https://example.com/john_bogdanovic_resume.pdf is imaginary.', 'data': None, 'step_retry_number': 5, 'step_order': 2, 'javascript_triggered': False, 'interacted_with_sibling': False, 'interacted_with_parent': False}]), ({'action_type': <ActionType.CLICK: 'click'>, 'description': None, 'reasoning': "After uploading the resume, the next step is to proceed with the application by clicking the 'Next' button.", 'element_id': 307, 'file_url': None}, [])], 'errors': []})}} step_id=stp_240337201361129268 task_id=tsk_240336174863945298
Is there some way to refer skyvern to a local file?
Also, if any of you do get a chance, I'd love to see a demo of the prompts you'd use to tackle job application pages like Tesla's and get it to work. It would be a really effective use case (none of the SOTA open source agents known can currently fill out a job app page) that would help a lot!
When the LLM_KEY is an empty string, error logs don't make sense.
Example:
class InvalidLLMConfigError(BaseLLMError):
def __init__(self, llm_key: str) -> None:
super().__init__(f"LLM config with key {llm_key} is not a valid LLMConfig")
Example output:
skyvern.forge.sdk.api.llm.exceptions.InvalidLLMConfigError: LLM config key with key is not a valid LLMConfig
Possible solutions:
LLM config key with key
, it could log LLM_KEY= is not a valid LLMConfig
FILLMEIN
key {llm_key} is
, key {llm_key if llm_key else 'EMPTY_VALUE'} isLiteLLM docs for Gemini: https://litellm.vercel.app/docs/providers/gemini
Required steps for integrating a new LLM model:
Lines 52 to 72 in 5706295
skyvern/skyvern/forge/sdk/api/llm/config_registry.py
Lines 57 to 59 in 5706295
Line 39 in 5706295
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.