reworkd / tarsier Goto Github PK
View Code? Open in Web Editor NEWVision utilities for web interaction agents ๐
Home Page: https://reworkd.ai
License: MIT License
Vision utilities for web interaction agents ๐
Home Page: https://reworkd.ai
License: MIT License
Currently both playwright and selenium are required to run tarsier. Ideally we want user to only have to install a single one of these to get up and running. This is normally done though pip optional dependancies.
pip install tarsier[playwright]
pip install tarsier[selenium]
Hi! ,
I'm trying to automate using the search bar on a list of unknown sites.
In most cases the bar is not visible but there is an icon I must click before to display the search bar.
This example, I want to detect and click the magnifying glass:
The problem is it shows this way in the text [ @ 18 ]
so GPT can not pick it (I'm using the llamaindex agent)
The website is https://elastic.co
I read @asim-shrestha mentions GPT-V mode in another issue but I'm not sure on how activate that one, I'm following the docs without success.
Any advice? thanks
I feel like the success rate of a given objective will always be significantly less than 100%. E.g., I've been testing Tarsier to try and get it to make a reservation at a restaurant using resy.com, but it fails at the login step because it incorrectly picks the label on the "password" input to type text into (and wants to type my email in that field no less).
I wonder if there's some system that could be built that has a human in the loop correcting mistakes made by the AI, and those corrections are then used to fine tune the model for that specific task.
Anyway this is more of a "idea" than an issue, but wondering what you think! (Feel free to close)
I'm working on something pretty similar to what you guys are doing and had a thought. Why not grab text directly from the web instead of using OCR? Langchain and llamaindex both have such tools, and there are also some repos about converting html to markdown.
Just a thought. Would love to know what you think!
As a user, I'd like to customize the colors of tags. I'd also like to set different tag colors based on the type of element.
For example, maybe I want tags blue and 's red
When tags are added to elements, we insert the tag text directly within elements. This is problematic for input elements as it will always add unnecessary tags to forms which break the input value. See #5
As a simple fix, we should expose an API method to delete all existing tags from elements (Or perhaps just input elements). We mark tagged elements via an attribute. We would need to just filter for this attribute and delete the tag text along with the tagged marked.
Currently the only OCR service tarsier supports is GoogleOCR vision. It would be good to provide another ocr service that allows textextract to be used
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.