reworkd / tarsier Goto Github PK

View Code? Open in Web Editor NEW

492.0 492.0 31.0 1.64 GB

Vision utilities for web interaction agents 👀

Home Page: https://reworkd.ai

License: MIT License

Python 20.56% Shell 0.23% Jupyter Notebook 72.69% TypeScript 6.52%

gpt4v llms ocr playwright pypi-package python selenium webscraping

tarsier's People

Stargazers

Watchers

tarsier's Issues

🔨 Make selenium and playwright optional dependancies

Currently both playwright and selenium are required to run tarsier. Ideally we want user to only have to install a single one of these to get up and running. This is normally done though pip optional dependancies.

pip install tarsier[playwright]
pip install tarsier[selenium]

Textarea Tagging System Doesn't Work on Google Search

Hey, I ran the same prompt as the example video. The new tagging system does not tag the textareas properly on the Google search page!

Selecting Icons

Hi! ,

I'm trying to automate using the search bar on a list of unknown sites.

In most cases the bar is not visible but there is an icon I must click before to display the search bar.

This example, I want to detect and click the magnifying glass:

The problem is it shows this way in the text [ @ 18 ] so GPT can not pick it (I'm using the llamaindex agent)

The website is https://elastic.co

I read @asim-shrestha mentions GPT-V mode in another issue but I'm not sure on how activate that one, I'm following the docs without success.

Any advice? thanks

🐛 Annotations not removed when filling out inputs

Seeing this a lot, e.g. when I tweak the example in the cookbook to log in to resy :)

I've tried adding both await page.locator(x_path).fill('') and await page.locator(x_path).clear() to the type_text tool, but it doesn't seem to work (my playwright experience is very limited).

fine tune

I feel like the success rate of a given objective will always be significantly less than 100%. E.g., I've been testing Tarsier to try and get it to make a reservation at a restaurant using resy.com, but it fails at the login step because it incorrectly picks the label on the "password" input to type text into (and wants to type my email in that field no less).

I wonder if there's some system that could be built that has a human in the loop correcting mistakes made by the AI, and those corrections are then used to fine tune the model for that specific task.

Anyway this is more of a "idea" than an issue, but wondering what you think! (Feel free to close)

Extract web text directly instead of OCR

I'm working on something pretty similar to what you guys are doing and had a thought. Why not grab text directly from the web instead of using OCR? Langchain and llamaindex both have such tools, and there are also some repos about converting html to markdown.

Just a thought. Would love to know what you think!

✨ Ability to customize tag colours

As a user, I'd like to customize the colors of tags. I'd also like to set different tag colors based on the type of element.

For example, maybe I want tags blue and 's red

🗑️ Expose an API method to remove tags from elements

When tags are added to elements, we insert the tag text directly within elements. This is problematic for input elements as it will always add unnecessary tags to forms which break the input value. See #5

As a simple fix, we should expose an API method to delete all existing tags from elements (Or perhaps just input elements). We mark tagged elements via an attribute. We would need to just filter for this attribute and delete the tag text along with the tagged marked.

👀 Integrate with Amazon Textextract

Currently the only OCR service tarsier supports is GoogleOCR vision. It would be good to provide another ocr service that allows textextract to be used

reworkd / tarsier Goto Github PK

tarsier's People

Stargazers

Watchers

Forkers

tarsier's Issues

🔨 Make selenium and playwright optional dependancies

Textarea Tagging System Doesn't Work on Google Search

Selecting Icons

🐛 Annotations not removed when filling out inputs

fine tune

Extract web text directly instead of OCR

✨ Ability to customize tag colours

🗑️ Expose an API method to remove tags from elements

👀 Integrate with Amazon Textextract

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent