approximatelabs / sketch Goto Github PK
View Code? Open in Web Editor NEWAI code-writing assistant that understands data content
License: MIT License
AI code-writing assistant that understands data content
License: MIT License
Via the CTransformers
library we're using ggml
library
For increasing context length, which is necessary for local-mode CPU verison of StarCoder
, sketch fails and can crash dropping the full kernel.
Raised issue in ggml, and hopefully this will be transparent to fix through ctransformers
Note: from the thread about quantization support: marella/ctransformers#1 if the new fix for ggml is after the quantization changes, and ctransformers
doesn't update, we might be "stuck" for a bit.
Issue in ggml: ggerganov/ggml#158
Getting below error while importing sketch library
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/home/ubuntu/.local/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
super(UvicornWorker, self).init_process()
File "/usr/lib/python3/dist-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/lib/python3/dist-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/lib/python3/dist-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/lib/python3/dist-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/lib/python3/dist-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/lib/python3/dist-packages/gunicorn/util.py", line 384, in import_app
mod = importlib.import_module(module)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/ubuntu/api/main.py", line 15, in
import chat
File "/home/ubuntu/api/chat.py", line 25, in
import sketch
File "/home/ubuntu/.local/lib/python3.10/site-packages/sketch/init.py", line 2, in
from .pandas_extension import SketchHelper # noqa
File "/home/ubuntu/.local/lib/python3.10/site-packages/sketch/pandas_extension.py", line 16, in
import lambdaprompt
File "/home/ubuntu/.local/lib/python3.10/site-packages/lambdaprompt/init.py", line 8, in
nest_asyncio.apply()
File "/home/ubuntu/.local/lib/python3.10/site-packages/nest_asyncio.py", line 14, in apply
raise ValueError('Can't patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>
[2023-06-29 06:34:51 +0000] [341645] [INFO] Worker exiting (pid: 341645)
[2023-06-29 06:34:51 +0000] [341644] [INFO] Shutting down: Master
[2023-06-29 06:34:51 +0000] [341644] [INFO] Reason: Worker failed to boot.
Code:
import sketch content = df.sketch.ask("hello", call_display=False)
Hello, Tried to use sketch with a simple use case, but I'm not getting results. I'm working in Spyder IDE on Anaconda, pip install of sketch worked fine, but my console output is always: "<IPython.core.display.HTML object>"
This happens regardless of the type of question I ask it, following the documentation.
Thank you
Hello! I recently installed Sketch and imported a dataset with 18 columns. Previously, I could use the 'how to' functions on this dataset, but now they're not generating the code as they used to. Any update is greatly appreciated, thanks!
Want to implement a google palm switch for this package. Any suggestions or pointers in code I should try to replace
Hello,
First of all, thank you for your awesome tool. I just wanted to report a little issue. When a dataframe contains a list as a cell value, sketch will not work. Here is a minimum piece of code to reproduce the problem:
import pandas as pd
import sketch
df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
df.sketch.ask("how many columns this dataframe has?")
Error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [110], in <cell line: 5>()
2 import sketch
4 df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
----> 5 df.sketch.ask("how many columns this dataframe has?")
File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:330, in SketchHelper.ask(self, question, call_display)
329 def ask(self, question, call_display=True):
--> 330 result = call_prompt_on_dataframe(self._obj, ask_from_parts, question=question)
331 if not call_display:
332 return result
File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:146, in call_prompt_on_dataframe(df, prompt, **kwargs)
144 names = retrieve_name(df)
145 name = "df" if len(names) == 0 else names[0]
--> 146 column_names, data_types, extras, index_col_name = get_parts_from_df(df)
147 max_columns = int(os.environ.get("SKETCH_MAX_COLUMNS", "20"))
148 if len(column_names) > max_columns:
File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:121, in get_parts_from_df(df, useSketches)
116 extras = []
117 for col in df.columns:
118 extra = {
119 "rows": len(df[col]),
120 "count": int(df[col].count()),
--> 121 "uniqecount": int(df[col].nunique()),
122 "head-sample": str(
123 [string_repr_truncated(x) for x in df[col].head(5).tolist()]
124 ),
125 }
126 # if column is numeric, get quantiles
127 if df[col].dtype in [np.float64, np.int64]:
File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:1027, in IndexOpsMixin.nunique(self, dropna)
993 def nunique(self, dropna: bool = True) -> int:
994 """
995 Return number of unique elements in the object.
996
(...)
1025 4
1026 """
-> 1027 uniqs = self.unique()
1028 if dropna:
1029 uniqs = remove_na_arraylike(uniqs)
File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:2088, in Series.unique(self)
2030 def unique(self) -> ArrayLike:
2031 """
2032 Return unique values of Series object.
2033
(...)
2086 Categories (3, object): ['a' < 'b' < 'c']
2087 """
-> 2088 return super().unique()
File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:989, in IndexOpsMixin.unique(self)
987 result = np.asarray(result)
988 else:
--> 989 result = unique1d(values)
991 return result
File ~/.local/lib/python3.9/site-packages/pandas/core/algorithms.py:440, in unique(values)
437 htable, values = _get_hashtable_algo(values)
439 table = htable(len(values))
--> 440 uniques = table.unique(values)
441 uniques = _reconstruct_data(uniques, original.dtype, original)
442 return uniques
File pandas/_libs/hashtable_class_helper.pxi:5361, in pandas._libs.hashtable.PyObjectHashTable.unique()
File pandas/_libs/hashtable_class_helper.pxi:5310, in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'list'
Note that this is not just a random error. This happens when someone tries to aggregate a column into a list.
In the pandas_extension.py, the word 'uniquecount' is misspelled into 'uniqecount'. And it is used in prompts. Maybe it will harm the final performance?
I work for a corporation that uses a web proxy for any external connections. I wanted to try out sketch, but hit with this error:
File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/lambdaprompt/gpt3.py", line 51, in async_get_gpt3_response
async with [session.post](http://session.post/)(
File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/client.py", line 1141, in __aenter__
self._resp = await self._coro
File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/client.py", line 536, in _request
conn = await self._connector.connect(
File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 540, in connect
proto = await self._create_connection(req, traces, timeout)
File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 901, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/Users/einarbui.magnusson/Development/gpt-demo/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 1166, in _create_direct_connection
raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host [api.openai.com:443](http://api.openai.com:443/) ssl:default [nodename nor servname provided, or not known]
The problem is that aiohttp
doesn’t respect the HTTPS_PROXY
environment variable by default. Could you either expose a way to configure the proxy in code, or do aiohttp.ClientSession(trust_env=True)
to get the config from the standard env variables?
Edit: just realised that the problem is in your other library lambdaprompt
, hope you don't mind the issue being here.
BTW, very cool library! We think it can do wonders for python/pandas beginners
First: This is really great, and I can see tremendous value that me and others can get out.
And more of a question/than an issue...
Anyway to build the data model/sample questions to estimate costs? As always with any tool we bring in its about ROI - so knowing some way to evaluate it (I assume questions are tokens, but also sending the data to openAI etc. is also)
Hello,
First of all, thank you for your awesome tool. I just wanted to report a little issue. When a dataframe contains a list as a cell value, sketch will not work. Here is a minimum piece of code to reproduce the problem:
import pandas as pd
import sketch
df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
df.sketch.ask("how many columns this dataframe has?")
Error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [110], in <cell line: 5>()
2 import sketch
4 df = pd.DataFrame({'A':[[2, 2]], 'B': [[1,2]]})
----> 5 df.sketch.ask("how many columns this dataframe has?")
File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:330, in SketchHelper.ask(self, question, call_display)
329 def ask(self, question, call_display=True):
--> 330 result = call_prompt_on_dataframe(self._obj, ask_from_parts, question=question)
331 if not call_display:
332 return result
File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:146, in call_prompt_on_dataframe(df, prompt, **kwargs)
144 names = retrieve_name(df)
145 name = "df" if len(names) == 0 else names[0]
--> 146 column_names, data_types, extras, index_col_name = get_parts_from_df(df)
147 max_columns = int(os.environ.get("SKETCH_MAX_COLUMNS", "20"))
148 if len(column_names) > max_columns:
File ~/.local/lib/python3.9/site-packages/sketch/pandas_extension.py:121, in get_parts_from_df(df, useSketches)
116 extras = []
117 for col in df.columns:
118 extra = {
119 "rows": len(df[col]),
120 "count": int(df[col].count()),
--> 121 "uniqecount": int(df[col].nunique()),
122 "head-sample": str(
123 [string_repr_truncated(x) for x in df[col].head(5).tolist()]
124 ),
125 }
126 # if column is numeric, get quantiles
127 if df[col].dtype in [np.float64, np.int64]:
File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:1027, in IndexOpsMixin.nunique(self, dropna)
993 def nunique(self, dropna: bool = True) -> int:
994 """
995 Return number of unique elements in the object.
996
(...)
1025 4
1026 """
-> 1027 uniqs = self.unique()
1028 if dropna:
1029 uniqs = remove_na_arraylike(uniqs)
File ~/.local/lib/python3.9/site-packages/pandas/core/series.py:2088, in Series.unique(self)
2030 def unique(self) -> ArrayLike:
2031 """
2032 Return unique values of Series object.
2033
(...)
2086 Categories (3, object): ['a' < 'b' < 'c']
2087 """
-> 2088 return super().unique()
File ~/.local/lib/python3.9/site-packages/pandas/core/base.py:989, in IndexOpsMixin.unique(self)
987 result = np.asarray(result)
988 else:
--> 989 result = unique1d(values)
991 return result
File ~/.local/lib/python3.9/site-packages/pandas/core/algorithms.py:440, in unique(values)
437 htable, values = _get_hashtable_algo(values)
439 table = htable(len(values))
--> 440 uniques = table.unique(values)
441 uniques = _reconstruct_data(uniques, original.dtype, original)
442 return uniques
File pandas/_libs/hashtable_class_helper.pxi:5361, in pandas._libs.hashtable.PyObjectHashTable.unique()
File pandas/_libs/hashtable_class_helper.pxi:5310, in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'list'
Note that this is not just a random error. This happens when someone tries to aggregate a column into a list.
First off, amazing package and thoughtful design! As mentioned in the readme, the default behavior is to call out to https://prompts.approx.dev
. Is there more information on that endpoint regarding the privacy policy? How is the data used, is it stored, is it used for other purposes, etc?
Hello,
I am using these in .env and load_dotenv() to load these into the runtime
LAMBDAPROMPT_BACKEND=StarCoder
SKETCH_USE_REMOTE_LAMBDAPROMPT='False'
HF_ACCESS_TOKEN=(my token)
This results in the below ValidationError.
This does work when I do not use the local copy.
Any ideas ?
Thank you !
Python 3.10.0
sketch==0.5.2
ValidationError Traceback (most recent call last)
Cell In[27], line 1
----> 1 df.sketch.ask("What is this dataset about?")
...
[329]def ask(self, question, call_display=True):
--> [330] result = call_prompt_on_dataframe(self._obj, ask_from_parts, question=question)
...
ValidationError: 1 validation error for Parameters
stop
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.7/v/missing
According to data total value for every row should be calculated as Price Each * Quantity Ordered. In sample colab library summarizes prices but doesn't take in account quantity.
I get that error when i try to run it as a python file on vscode and not a jupiter notebook. In the sketch extensions file i can see that ask method has display(HTML(f"""{result}"""))
which pretty much means it should be able to output it to console, yet it doesn't. I have also tried to import and apply display to the ask line and it still didn't work. Heres my python file code.
import sketch
import pandas as pd
state_data = pd.read_csv('state.csv')
state_data.sketch.ask("How many columns are there?")
Nice. I've been doing similar things with SQL sources. I'm curious why you attach the ask to a specific dataframe. Can multiple dataframes be considered in one request? Thanks
Hello,
While I ran the following code:
import sketch
import pandas as pd
sales_data = pd.read_csv("https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv")
sales_data.sketch.ask("What columns might have PII information in them?")
I got the the error below:
File /usr/local/anaconda3/envs/mypersonal_env/lib/python3.10/site-packages/pydantic/main.py:159, in BaseModel.__init__(__pydantic_self__, **data)
157 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
158 __tracebackhide__ = True
--> 159 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
ValidationError: 1 validation error for Parameters
stop
Field required [type=missing, input_value={}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.2/v/missing
Could you please help me figure out the reason? Thanks!
I copied your original Google Colb Example Sheet https://colab.research.google.com/gist/bluecoconut/410a979d94613ea2aaf29987cf0233bc/sketch-demo.ipynb#scrollTo=6xZgjwWypy91
top_5_states = state_sales.sort_values(by='Price Each', ascending=False).head(5).copy()
top_5_states.sketch.apply("new column with full name of the states. just the top 5")
added the lines from the screenshot, the error code is always
Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} Not sure what happened: {'error': {'message': "Incorrect API key provided: 'sk-ZCTf*****************************************cUZ'. You can find your API key at https://beta.openai.com/.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
I copied the API key from
https://beta.openai.com/account/api-keys
Bug? or
What am I missing?
Thx
I'm seeing "Invalid Invite" when I click it.
Hello Team,
Thanks for creating this amazing library. Is there any way to use the library for PySpark and SQL instead of Pandas?
So using sketch.ask now not returning properly...
I am getting the following
import matplotlib.pyplot as plt # Get the data for 2014 NY AGI by county df_2014 = df[df['Tax Year'] == 2014] county_agi = df_2014.groupby('County')['NY AGI of Returns'].sum() # Plot the bar chart plt.bar(county_agi.index, county_agi.values) plt.xlabel('County') plt.ylabel('NY AGI of Returns') plt.title('2014 NY AGI by County') plt.show()
What i should get is (and if i put the returns in the right spacce it works..) Funny a week ago or so it was working...
import matplotlib.pyplot as plt # Get the data for 2014 NY AGI by county
df_2014 = df[df['Tax Year'] == 2014]
county_agi = df_2014.groupby('County')['NY AGI of Returns'].sum() # Plot the bar chart
plt.bar(county_agi.index, county_agi.values)
plt.xlabel('County')
plt.ylabel('NY AGI of Returns')
plt.title('2014 NY AGI by County') plt.show()
When I activate the local execution, I get the following error message:
ValueError: The current "device_map" had weights offloaded to the disk. Please provide an "offload_folder" for them. Alternatively, make sure you have "safetensors" installed if the model you are using offers the weights in this format.
To activate the local execution, I did the following:
os.environ['LAMBDAPROMPT_BACKEND'] = 'StarCoder'
os.environ['SKETCH_USE_REMOTE_LAMBDAPROMPT'] = 'False'
os.environ['HF_ACCESS_TOKEN'] = 'myToken'
Unfortunately, I cannot find a way to set the offload_folder.
How can I do that? Note: "safetensors" is already installed
Just pip install sketch, and import sketch
fails with
File "/home/plagchk/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 5902, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'sketch'
Is it possible to directly run the code that sketch provides in an automated way.
so user asks a question, behind the scene the code is generated and executed and the user only sees the response to his question
`import pandas as pd
import sketch
import os
from dotenv import load_dotenv
os.environ['OPENAI_API_KEY'] = 'key'
os.environ['SKETCH_USE_REMOTE_LAMBDAPROMPT'] = 'False'`
and it doesn't seem to work
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.