approximatelabs / sketch Goto Github PK

AI code-writing assistant that understands data content

License: MIT License

Python 100.00%

ai codex copilot data data-science dataframe datasketch datasketches df ds gpt3 lambdaprompt pandas python sketches tabular-data

sketch's People

Contributors

Stargazers

Watchers

Forkers

muharremokutan nasa03 hbcbh1999 afrog33k kjahan deepanprabhu silvertrend mistobaan brentes techthiyanes designium jaytoday ssusantachary valvalu acmeyer markhng525 rogervaas trickkkkkkk djon3s rosmilahjo rarhs spread0x joztos maximebodereau thewchan jaedukseo grigrid viktorlov drsotayo bayonlelukmansalami kashund maddyonline ssghost dataadvisor andypower mohan-zhang-u pascam73 aarondr77 felipevaldes mesumraza rammohana01 kimist99 zchristian955 adjdunn wsanjay oceans0423 fsndzomga rishabharora90 jaimescarlos thesekyi shubin-vadim mellorison k1anshul aaroncwacker denniskevogo hiennguyen15 jphannaford arenas-carlos davidchoi76 birx-web rk4mile kbb99 chenpo3725 rootsystem2010 zdarkbloodz googlesheets ianuragbhatt stevenlhb roman-212 rishrapsody wowcw marcosferreiraoli wuliqq aicodehunt sremm restevesd butayama passiontim tuapsekad harendrasingh22 wenger9 5amfung cloudbreadpapa promptengineer48 kayceeanyanwu sredevopsorg twonp168 guomanrong mike100101100011 laychansetha tolulopeoyejide g-mervo sksundaram-learning ludoplex ztsin pereking aliozturkseksen spielplatzz abdelrahmankatkat xrenatow

sketch's Issues

Local Mode fails on GGML models

Via the CTransformers library we're using ggml library

For increasing context length, which is necessary for local-mode CPU verison of StarCoder, sketch fails and can crash dropping the full kernel.

Raised issue in ggml, and hopefully this will be transparent to fix through ctransformers
Note: from the thread about quantization support: marella/ctransformers#1 if the new fix for ggml is after the quantization changes, and ctransformers doesn't update, we might be "stuck" for a bit.

Issue in ggml: ggerganov/ggml#158

Gunicron FastAPI - Value Error while importing sketch

Getting below error while importing sketch library

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/home/ubuntu/.local/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
super(UvicornWorker, self).init_process()
File "/usr/lib/python3/dist-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/lib/python3/dist-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/lib/python3/dist-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/lib/python3/dist-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/lib/python3/dist-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/lib/python3/dist-packages/gunicorn/util.py", line 384, in import_app
mod = importlib.import_module(module)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/ubuntu/api/main.py", line 15, in
import chat
File "/home/ubuntu/api/chat.py", line 25, in
import sketch
File "/home/ubuntu/.local/lib/python3.10/site-packages/sketch/init.py", line 2, in
from .pandas_extension import SketchHelper # noqa
File "/home/ubuntu/.local/lib/python3.10/site-packages/sketch/pandas_extension.py", line 16, in
import lambdaprompt
File "/home/ubuntu/.local/lib/python3.10/site-packages/lambdaprompt/init.py", line 8, in
nest_asyncio.apply()
File "/home/ubuntu/.local/lib/python3.10/site-packages/nest_asyncio.py", line 14, in apply
raise ValueError('Can't patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>
[2023-06-29 06:34:51 +0000] [341645] [INFO] Worker exiting (pid: 341645)
[2023-06-29 06:34:51 +0000] [341644] [INFO] Shutting down: Master
[2023-06-29 06:34:51 +0000] [341644] [INFO] Reason: Worker failed to boot.

Code:
import sketch content = df.sketch.ask("hello", call_display=False)

Not working

Hello, Tried to use sketch with a simple use case, but I'm not getting results. I'm working in Spyder IDE on Anaconda, pip install of sketch worked fine, but my console output is always: "<IPython.core.display.HTML object>"

This happens regardless of the type of question I ask it, following the documentation.

Thank you

sketch package no longer working in Google collab

Hello! I recently installed Sketch and imported a dataset with 18 columns. Previously, I could use the 'how to' functions on this dataset, but now they're not generating the code as they used to. Any update is greatly appreciated, thanks!

I want to implement a google palm swtich

Want to implement a google palm switch for this package. Any suggestions or pointers in code I should try to replace

List inside a dataframe causes "TypeError: unhashable type: 'list'" error

Hello,

First of all, thank you for your awesome tool. I just wanted to report a little issue. When a dataframe contains a list as a cell value, sketch will not work. Here is a minimum piece of code to reproduce the problem:

import pandas as pd
import sketch

df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
df.sketch.ask("how many columns this dataframe has?")

Error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [110], in <cell line: 5>()
      2 import sketch
      4 df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
----> 5 df.sketch.ask("how many columns this dataframe has?")

File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:330, in SketchHelper.ask(self, question, call_display)
    329 def ask(self, question, call_display=True):
--> 330     result = call_prompt_on_dataframe(self._obj, ask_from_parts, question=question)
    331     if not call_display:
    332         return result

File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:146, in call_prompt_on_dataframe(df, prompt, **kwargs)
    144 names = retrieve_name(df)
    145 name = "df" if len(names) == 0 else names[0]
--> 146 column_names, data_types, extras, index_col_name = get_parts_from_df(df)
    147 max_columns = int(os.environ.get("SKETCH_MAX_COLUMNS", "20"))
    148 if len(column_names) > max_columns:

File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:121, in get_parts_from_df(df, useSketches)
    116 extras = []
    117 for col in df.columns:
    118     extra = {
    119         "rows": len(df[col]),
    120         "count": int(df[col].count()),
--> 121         "uniqecount": int(df[col].nunique()),
    122         "head-sample": str(
    123             [string_repr_truncated(x) for x in df[col].head(5).tolist()]
    124         ),
    125     }
    126     # if column is numeric, get quantiles
    127     if df[col].dtype in [np.float64, np.int64]:

File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:1027, in IndexOpsMixin.nunique(self, dropna)
    993 def nunique(self, dropna: bool = True) -> int:
    994     """
    995     Return number of unique elements in the object.
    996 
   (...)
   1025     4
   1026     """
-> 1027     uniqs = self.unique()
   1028     if dropna:
   1029         uniqs = remove_na_arraylike(uniqs)

File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:2088, in Series.unique(self)
   2030 def unique(self) -> ArrayLike:
   2031     """
   2032     Return unique values of Series object.
   2033 
   (...)
   2086     Categories (3, object): ['a' < 'b' < 'c']
   2087     """
-> 2088     return super().unique()

File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:989, in IndexOpsMixin.unique(self)
    987             result = np.asarray(result)
    988 else:
--> 989     result = unique1d(values)
    991 return result

File ~/.local/lib/python3.9/site-packages/pandas/core/algorithms.py:440, in unique(values)
    437 htable, values = _get_hashtable_algo(values)
    439 table = htable(len(values))
--> 440 uniques = table.unique(values)
    441 uniques = _reconstruct_data(uniques, original.dtype, original)
    442 return uniques

File pandas/_libs/hashtable_class_helper.pxi:5361, in pandas._libs.hashtable.PyObjectHashTable.unique()

File pandas/_libs/hashtable_class_helper.pxi:5310, in pandas._libs.hashtable.PyObjectHashTable._unique()

TypeError: unhashable type: 'list'

Note that this is not just a random error. This happens when someone tries to aggregate a column into a list.

Misspelling error

In the pandas_extension.py, the word 'uniquecount' is misspelled into 'uniqecount'. And it is used in prompts. Maybe it will harm the final performance?

aiohttp does not respect proxy env variables by default

I work for a corporation that uses a web proxy for any external connections. I wanted to try out sketch, but hit with this error:

File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/lambdaprompt/gpt3.py", line 51, in async_get_gpt3_response

    async with [session.post](http://session.post/)(

  File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/client.py", line 1141, in __aenter__

    self._resp = await self._coro

  File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/client.py", line 536, in _request

    conn = await self._connector.connect(

  File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 540, in connect

    proto = await self._create_connection(req, traces, timeout)

  File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 901, in _create_connection

    _, proto = await self._create_direct_connection(req, traces, timeout)

  File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 1166, in _create_direct_connection

    raise ClientConnectorError(req.connection_key, exc) from exc

aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host [api.openai.com:443](http://api.openai.com:443/) ssl:default [nodename nor servname provided, or not known]

The problem is that aiohttp doesn’t respect the HTTPS_PROXY environment variable by default. Could you either expose a way to configure the proxy in code, or do aiohttp.ClientSession(trust_env=True) to get the config from the standard env variables?

Edit: just realised that the problem is in your other library lambdaprompt, hope you don't mind the issue being here.

BTW, very cool library! We think it can do wonders for python/pandas beginners

Can we calculate the Token costs (aka figure out what OpenAI will charge)

First: This is really great, and I can see tremendous value that me and others can get out.

And more of a question/than an issue...

Anyway to build the data model/sample questions to estimate costs? As always with any tool we bring in its about ROI - so knowing some way to evaluate it (I assume questions are tokens, but also sending the data to openAI etc. is also)

List inside a dataframe causes "TypeError: unhashable type: 'list'" error

Hello,

import pandas as pd
import sketch

df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
df.sketch.ask("how many columns this dataframe has?")

Error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [110], in <cell line: 5>()
      2 import sketch
      4 df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
----> 5 df.sketch.ask("how many columns this dataframe has?")

File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:330, in SketchHelper.ask(self, question, call_display)
    329 def ask(self, question, call_display=True):
--> 330     result = call_prompt_on_dataframe(self._obj, ask_from_parts, question=question)
    331     if not call_display:
    332         return result

File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:146, in call_prompt_on_dataframe(df, prompt, **kwargs)
    144 names = retrieve_name(df)
    145 name = "df" if len(names) == 0 else names[0]
--> 146 column_names, data_types, extras, index_col_name = get_parts_from_df(df)
    147 max_columns = int(os.environ.get("SKETCH_MAX_COLUMNS", "20"))
    148 if len(column_names) > max_columns:

File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:121, in get_parts_from_df(df, useSketches)
    116 extras = []
    117 for col in df.columns:
    118     extra = {
    119         "rows": len(df[col]),
    120         "count": int(df[col].count()),
--> 121         "uniqecount": int(df[col].nunique()),
    122         "head-sample": str(
    123             [string_repr_truncated(x) for x in df[col].head(5).tolist()]
    124         ),
    125     }
    126     # if column is numeric, get quantiles
    127     if df[col].dtype in [np.float64, np.int64]:

File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:1027, in IndexOpsMixin.nunique(self, dropna)
    993 def nunique(self, dropna: bool = True) -> int:
    994     """
    995     Return number of unique elements in the object.
    996 
   (...)
   1025     4
   1026     """
-> 1027     uniqs = self.unique()
   1028     if dropna:
   1029         uniqs = remove_na_arraylike(uniqs)

File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:2088, in Series.unique(self)
   2030 def unique(self) -> ArrayLike:
   2031     """
   2032     Return unique values of Series object.
   2033 
   (...)
   2086     Categories (3, object): ['a' < 'b' < 'c']
   2087     """
-> 2088     return super().unique()

File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:989, in IndexOpsMixin.unique(self)
    987             result = np.asarray(result)
    988 else:
--> 989     result = unique1d(values)
    991 return result

File ~/.local/lib/python3.9/site-packages/pandas/core/algorithms.py:440, in unique(values)
    437 htable, values = _get_hashtable_algo(values)
    439 table = htable(len(values))
--> 440 uniques = table.unique(values)
    441 uniques = _reconstruct_data(uniques, original.dtype, original)
    442 return uniques

File pandas/_libs/hashtable_class_helper.pxi:5361, in pandas._libs.hashtable.PyObjectHashTable.unique()

File pandas/_libs/hashtable_class_helper.pxi:5310, in pandas._libs.hashtable.PyObjectHashTable._unique()

TypeError: unhashable type: 'list'

Note that this is not just a random error. This happens when someone tries to aggregate a column into a list.

Privacy Policy

First off, amazing package and thoughtful design! As mentioned in the readme, the default behavior is to call out to https://prompts.approx.dev. Is there more information on that endpoint regarding the privacy policy? How is the data used, is it stored, is it used for other purposes, etc?

LAMBDAPROMPT_BACKEND = StarCoder causes runtime ValidationError on .sketch.ask

Hello,

I am using these in .env and load_dotenv() to load these into the runtime

LAMBDAPROMPT_BACKEND=StarCoder
SKETCH_USE_REMOTE_LAMBDAPROMPT='False'
HF_ACCESS_TOKEN=(my token)

This results in the below ValidationError.
This does work when I do not use the local copy.

Any ideas ?

Thank you !

Python 3.10.0
sketch==0.5.2

ValidationError Traceback (most recent call last)
Cell In[27], line 1
----> 1 df.sketch.ask("What is this dataset about?")
...
[329]def ask(self, question, call_display=True):
--> [330] result = call_prompt_on_dataframe(self._obj, ask_from_parts, question=question)
...
ValidationError: 1 validation error for Parameters
stop
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.7/v/missing

Question : What is the use of to_b64() and from_b64() ? Why are they used?

Wrong result for query "Get the top 5 grossing states" in sample colab

According to data total value for every row should be calculated as Price Each * Quantity Ordered. In sample colab library summarizes prices but doesn't take in account quantity.

<IPython.core.display.HTML object> error

I get that error when i try to run it as a python file on vscode and not a jupiter notebook. In the sketch extensions file i can see that ask method has display(HTML(f"""{result}""")) which pretty much means it should be able to output it to console, yet it doesn't. I have also tried to import and apply display to the ask line and it still didn't work. Heres my python file code.

import sketch
import pandas as pd
state_data = pd.read_csv('state.csv')
state_data.sketch.ask("How many columns are there?")

Joining dataframes

Nice. I've been doing similar things with SQL sources. I'm curious why you attach the ask to a specific dataframe. Can multiple dataframes be considered in one request? Thanks

There is a validation error

Hello,

While I ran the following code:

import sketch
import pandas as pd
sales_data = pd.read_csv("https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv")
sales_data.sketch.ask("What columns might have PII information in them?")

I got the the error below:

File /usr/local/anaconda3/envs/mypersonal_env/lib/python3.10/site-packages/pydantic/main.py:159, in BaseModel.__init__(__pydantic_self__, **data)
    157 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    158 __tracebackhide__ = True
--> 159 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)

ValidationError: 1 validation error for Parameters
stop
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.2/v/missing

Could you please help me figure out the reason? Thanks!

Valid OpenApi Key Not Accepted with df.sektch.apply

I copied your original Google Colb Example Sheet https://colab.research.google.com/gist/bluecoconut/410a979d94613ea2aaf29987cf0233bc/sketch-demo.ipynb#scrollTo=6xZgjwWypy91

https://colab.research.google.com/gist/bluecoconut/410a979d94613ea2aaf29987cf0233bc/sketch-demo.ipynb

top_5_states = state_sales.sort_values(by='Price Each', ascending=False).head(5).copy()
top_5_states.sketch.apply("new column with full name of the states. just the top 5")

added the lines from the screenshot, the error code is always

Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

I copied the API key from
https://beta.openai.com/account/api-keys

Bug? or
What am I missing?
Thx

Discord link not working

I'm seeing "Invalid Invite" when I click it.

This library is amazing. Is there any way to use the library for PySpark and SQL instead of Pandas?

Hello Team,

Thanks for creating this amazing library. Is there any way to use the library for PySpark and SQL instead of Pandas?

sketch.ask "Plot a bar chart..." in jupyter lab not including new lines..

So using sketch.ask now not returning properly...

I am getting the following
import matplotlib.pyplot as plt # Get the data for 2014 NY AGI by county df_2014 = df[df['Tax Year'] == 2014] county_agi = df_2014.groupby('County')['NY AGI of Returns'].sum() # Plot the bar chart plt.bar(county_agi.index, county_agi.values) plt.xlabel('County') plt.ylabel('NY AGI of Returns') plt.title('2014 NY AGI by County') plt.show()

What i should get is (and if i put the returns in the right spacce it works..) Funny a week ago or so it was working...
import matplotlib.pyplot as plt # Get the data for 2014 NY AGI by county
df_2014 = df[df['Tax Year'] == 2014]
county_agi = df_2014.groupby('County')['NY AGI of Returns'].sum() # Plot the bar chart
plt.bar(county_agi.index, county_agi.values)
plt.xlabel('County')
plt.ylabel('NY AGI of Returns')
plt.title('2014 NY AGI by County') plt.show()

LAMBDAPROMPT_BACKEND = StarCoder causes a runtime error

When I activate the local execution, I get the following error message:
ValueError: The current "device_map" had weights offloaded to the disk. Please provide an "offload_folder" for them. Alternatively, make sure you have "safetensors" installed if the model you are using offers the weights in this format.

To activate the local execution, I did the following:

os.environ['LAMBDAPROMPT_BACKEND'] = 'StarCoder'
os.environ['SKETCH_USE_REMOTE_LAMBDAPROMPT'] = 'False'
os.environ['HF_ACCESS_TOKEN'] = 'myToken'

Unfortunately, I cannot find a way to set the offload_folder.

How can I do that? Note: "safetensors" is already installed

AttributeError: 'DataFrame' object has no attribute 'sketch'

Just pip install sketch, and import sketch fails with

File "/home/plagchk/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 5902, in __getattr__
  return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'sketch'

Generated code execution

Is it possible to directly run the code that sketch provides in an automated way.
so user asks a question, behind the scene the code is generated and executed and the user only sees the response to his question

and how to check that I'm using openai?

`import pandas as pd
import sketch
import os
from dotenv import load_dotenv

os.environ['OPENAI_API_KEY'] = 'key'
os.environ['SKETCH_USE_REMOTE_LAMBDAPROMPT'] = 'False'`

and it doesn't seem to work

approximatelabs / sketch Goto Github PK

sketch's People

Contributors

Stargazers

Watchers

Forkers

sketch's Issues

Recommend Projects

Recommend Topics

Recommend Org