Giter Site home page Giter Site logo

sphinxbio / sliceanddice Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 1.0 4.64 MB

Experiment to slice, dice, and clean up spreadsheets

Home Page: https://sheets.sphinxbio.com/

License: Other

JavaScript 25.76% HTML 0.39% SCSS 6.66% Svelte 42.10% TypeScript 25.10%
dice slice spreadsheet

sliceanddice's Introduction

Slice n Dice Logo

A small experiment that uses LLMs to analyze and isolate data boundaries within messy spreadsheets.

๐Ÿ”— Try it here ย ย โ€ขย ย  ๐Ÿฆ @sphinx_bio ย ย โ€ขย ย  ๐Ÿ˜ผ Sphinx Bio

Slice & Dice is an experiment in using AI to extract structured data from unstructured spreadsheets. Upload an Excel/CSV and watch the AI to (attempt) to identify slice the sheet up into multiple regions of datasets.

This is a work in progress, and everyone is welcome to contribute!


๐Ÿ”— Try it here

Try out the live demo on our website.

โš ๏ธ Please don't upload any personal or private data in the demo! Your files will be visible to AI providers and our analytics platform โš ๏ธ

Feel free to reach out about self-hosted or enterprise versions.

gif demo

๐Ÿ‘€ Discussion

Currently, the app sends the CSV/Excel data as text to a series of fast Llama3 or Mixtral prompts, some to ask for boundaries, others to check for correctness. Previous attempts to use LangChain agents, OpenAI Assistants API and a combination of Claude and OpenAI yielded fairly unimpressive results (and were very expensive and slow as well). Though Llama3 and Mixtral are very fast and affordable (especially through Groq) for prototype development and iteration, they do present various challenges and short-comings:

  • No function calling or JSON mode (sometimes the results are not well-formatted)
  • Small context window (large datasets won't work)
  • Lack of planning, reasoning, and agent-like decision-making (all models are incredibly error-prone and generally poor at handling spreadsheets)

We wrote up a blog post about the entire process

๐ŸŽ‰ Features & Roadmap

Slice & Dice is a small experiment and playground for manipulating spreadsheets. We'd love to add more features like:

  • Manually create slices: Create slices manually, and an LLM will add context like name/descriptions.
  • Manipulate sheets & slices: Add a way to Join & Concatenate datasets and slices to manipulate, merge, or generate new tabular datasets from sheets & slices.
  • Download tabular slices: Create & download well-formed, tabular slices of data.
  • Explore alternate UIs: Add "Chat with sheets" and other UI modes to progressively get + edit the slices you need
  • Excel macro or Google Apps Script: A few have asked for building this into Excel
  • Handle more messy use cases: Many use cases will break the tool; add better slicing for more kinds of data. More use cases can be found under src/lib/samples/dirtysheets
  • UI improvements: Clear/reset button and other UI improvements makes the demo more than a toy

We'd also like to work with the community to come up with better ways to overcome some of the severe limitations for handling spreadsheets. These might include using methods to identify data type (e.g. what kind of "messed up" is the data) and sub-tasks (e.g. given that we know a spreadsheet has a control and experiment arm, how can that be used to prompt a model?). There are lots of variations and possibilities, and we'd like to explore them with you!

๐Ÿš€ Tech Stack

Note that we use Posthog for usage analytics on the demo site.

Supporters

Sphinxbio Logo

Many thanks to Sphinx Bio for sponsoring this project. If you'd like to work on this (and other cool) projects, consider joining us!

License

This project is licensed under the terms of the Apache 2.0 license.

Contributing

Nothing in life is certain except Death, Taxes, and Really Messy Spreadsheets. We're excited to permanently remove the spreadsheet problem! We'd love your contributions, so please submit ideas, and errors/bug reports on Github!

Acknowledgements

This is a Sphinx Bio project! If you're interested in a hosted solution for your lab, please reach out.

Thanks to Harrison and the Langchain team for their help as well!

sliceanddice's People

Contributors

alexrejto avatar janzheng avatar

Stargazers

 avatar Dean Taplin avatar Fred Bliss avatar  avatar Nishant Jha avatar Guillaume Gaullier avatar Prince Ravi Leow avatar Holger Stitz avatar  avatar Jean-Paul Courneya avatar  avatar Nicholas Larus-Stone avatar

Watchers

 avatar

Forkers

ashleypng

sliceanddice's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.