Giter Site home page Giter Site logo

reworkd / tarsier Goto Github PK

View Code? Open in Web Editor NEW
492.0 492.0 31.0 1.64 GB

Vision utilities for web interaction agents ๐Ÿ‘€

Home Page: https://reworkd.ai

License: MIT License

Python 20.56% Shell 0.23% Jupyter Notebook 72.69% TypeScript 6.52%
gpt4v llms ocr playwright pypi-package python selenium webscraping

tarsier's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tarsier's Issues

๐Ÿ”จ Make selenium and playwright optional dependancies

Currently both playwright and selenium are required to run tarsier. Ideally we want user to only have to install a single one of these to get up and running. This is normally done though pip optional dependancies.

pip install tarsier[playwright]
pip install tarsier[selenium]

Selecting Icons

Hi! ,

I'm trying to automate using the search bar on a list of unknown sites.

In most cases the bar is not visible but there is an icon I must click before to display the search bar.

This example, I want to detect and click the magnifying glass:

image

The problem is it shows this way in the text [ @ 18 ] so GPT can not pick it (I'm using the llamaindex agent)

The website is https://elastic.co

I read @asim-shrestha mentions GPT-V mode in another issue but I'm not sure on how activate that one, I'm following the docs without success.

Any advice? thanks

๐Ÿ› Annotations not removed when filling out inputs

Seeing this a lot, e.g. when I tweak the example in the cookbook to log in to resy :)

image

I've tried adding both await page.locator(x_path).fill('') and await page.locator(x_path).clear() to the type_text tool, but it doesn't seem to work (my playwright experience is very limited).

fine tune

I feel like the success rate of a given objective will always be significantly less than 100%. E.g., I've been testing Tarsier to try and get it to make a reservation at a restaurant using resy.com, but it fails at the login step because it incorrectly picks the label on the "password" input to type text into (and wants to type my email in that field no less).

I wonder if there's some system that could be built that has a human in the loop correcting mistakes made by the AI, and those corrections are then used to fine tune the model for that specific task.

Anyway this is more of a "idea" than an issue, but wondering what you think! (Feel free to close)

Extract web text directly instead of OCR

I'm working on something pretty similar to what you guys are doing and had a thought. Why not grab text directly from the web instead of using OCR? Langchain and llamaindex both have such tools, and there are also some repos about converting html to markdown.

Just a thought. Would love to know what you think!

โœจ Ability to customize tag colours

As a user, I'd like to customize the colors of tags. I'd also like to set different tag colors based on the type of element.

For example, maybe I want tags blue and 's red

๐Ÿ—‘๏ธ Expose an API method to remove tags from elements

When tags are added to elements, we insert the tag text directly within elements. This is problematic for input elements as it will always add unnecessary tags to forms which break the input value. See #5

As a simple fix, we should expose an API method to delete all existing tags from elements (Or perhaps just input elements). We mark tagged elements via an attribute. We would need to just filter for this attribute and delete the tag text along with the tagged marked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.