Comments (4)
Hey @mark-druffel -- thanks for the report! This should be fixed on main
if you want to try out a pre-release. We're no longer sanitizing the inputs to format
so if the underlying spark version supports writing to delta
, it should "just work" since create_table
is calling saveAsTable
under the hood.
from ibis.
Hey @mark-druffel -- can you open a separate issue for that? That is definitely an oversight and we can add in ('catalog', 'database')
support for create_table
but it's helpful to have a separate issue to track it.
from ibis.
@gforsyth Thanks, the pre-release works great! I do have one other question though 😬 Let me know if this question deserves it's own issue.
TLDR
I'm wondering if it's intended that the database
argument in create_table
works different than the one in drop_table
? create_table
only accepts a str
and drop_table
accepts a tuple
.
If I set the catalog and database via pyspark, create_table
works as excepted, but I can't figure out a way to do so in my create_table
, I had to do it through the pyspark session directly:
from pyspark.sql import SparkSession
import ibis
spark = SparkSession.builder.getOrCreate()
ispark = ibis.pyspark.connect(session = spark)
ispark._session.catalog.setCurrentCatalog("comms_media_dev")
ispark._session.catalog.setCurrentDatabase("dart_extensions")
ispark.create_table(name = "raw_camp_info", obj = df, overwrite = True, format="delta")
I can drop a table without accessing the pyspark session:
ispark.drop_table(name = "raw_camp_info", database=tuple(["comms_media_dev", "dart_extensions"]))
Additional Details
To drop my table I can just specify the catalog and database in my call:
from pyspark.sql import SparkSession
import ibis
spark = SparkSession.builder.getOrCreate()
ispark = ibis.pyspark.connect(session = spark)
ispark.drop_table(name = "raw_camp_info", database=tuple(["comms_media_dev", "dart_extensions"]))
Trying the same approach with create_table
if fails:
ispark.create_table(name = "raw_camp_info", obj = df, overwrite = True, format="delta", database=tuple(["comms_media_dev", "dart_extensions"]))
py4j.Py4JException: Method setCurrentDatabase([class java.util.ArrayList]) does not exist
I also tried with dot separator:
ispark.create_table(name = "raw_camp_info", obj = df, overwrite = True, format="delta", database="comms_media_dev.dart_extensions")
AnalysisException: too many name parts for schema name: comms_media_dev.dart_extensions
I then tried to set the catalog and provide the database name and got a permissions error. Looking through the error, it looks like create_table didn't pass the database
argument because the database was set to default (i.e. comms_media_dev.default
:
ispark._session.catalog.setCurrentCatalog("comms_media_dev")
ispark.create_table(name = "raw_camp_info", obj = df, overwrite = True, format="delta", database="dart_extensions")
com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have USE SCHEMA on Schema 'comms_media_dev.default'.
Py4JJavaError Traceback (most recent call last)
File :10
8 df = ispark.read_parquet(source = "abfss://my_parquet")
9 df = df.select(_.CAMP_START_DATE, _.CAMP_END_DATE, _.KPM_PROJECT_ID)
---> 10 ispark.create_table(name = "raw_camp_info", obj = df, overwrite = True, format="delta", database="dart_extensions")File /local_disk0/.ephemeral_nfs/envs/pythonEnv-c1f2e765-cb45-4a96-a1dc-bd25ecd4f460/lib/python3.9/site-packages/ibis/backends/pyspark/init.py:453, in Backend.create_table(self, name, obj, schema, database, temp, overwrite, format)
451 self._run_pre_execute_hooks(table)
452 df = self._session.sql(query)
--> 453 df.write.saveAsTable(name, format=format, mode=mode)
454 elif schema is not None:
455 schema = PySparkSchema.from_ibis(schema)File /usr/lib/python3.9/contextlib.py:124, in _GeneratorContextManager.exit(self, type, value, traceback)
122 if type is None:
123 try:
--> 124 next(self.gen)
125 except StopIteration:
126 return FalseFile /local_disk0/.ephemeral_nfs/envs/pythonEnv-c1f2e765-cb45-4a96-a1dc-bd25ecd4f460/lib/python3.9/site-packages/ibis/backends/pyspark/init.py:234, in Backend._active_database(self, name)
232 yield
233 finally:
--> 234 self._session.catalog.setCurrentDatabase(current)File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function..wrapper(*args, **kwargs)
46 start = time.perf_counter()
47 try:
---> 48 res = func(*args, **kwargs)
49 logger.log_success(
50 module_name, class_name, function_name, time.perf_counter() - start, signature
51 )
52 return resFile /databricks/spark/python/pyspark/sql/catalog.py:185, in Catalog.setCurrentDatabase(self, dbName)
172 def setCurrentDatabase(self, dbName: str) -> None:
173 """
174 Sets the current default database in this session.
175
(...)
183 >>> spark.catalog.setCurrentDatabase("default")
184 """
--> 185 return self._jcatalog.setCurrentDatabase(dbName)File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321, in JavaMember.call(self, *args)
1315 command = proto.CALL_COMMAND_NAME +
1316 self.command_header +
1317 args_command +
1318 proto.END_COMMAND_PART
1320 answer = self.gateway_client.send_command(command)
-> 1321 return_value = get_return_value(
1322 answer, self.gateway_client, self.target_id, self.name)
1324 for temp_arg in temp_args:
1325 temp_arg._detach()File /databricks/spark/python/pyspark/errors/exceptions.py:228, in capture_sql_exception..deco(*a, **kw)
226 def deco(*a: Any, **kw: Any) -> Any:
227 try:
--> 228 return f(*a, **kw)
229 except Py4JJavaError as e:
230 converted = convert_exception(e.java_exception)File /databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))Py4JJavaError: An error occurred while calling o569.setCurrentDatabase.
: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: User does not have USE SCHEMA on Schema 'comms_media_dev.default'.
at com.databricks.managedcatalog.UCReliableHttpClient.reliablyAndTranslateExceptions(UCReliableHttpClient.scala:84)
at com.databricks.managedcatalog.UCReliableHttpClient.get(UCReliableHttpClient.scala:136)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getSchema$1(ManagedCatalogClientImpl.scala:378)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:3401)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:3400)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:106)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:3397)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getSchema(ManagedCatalogClientImpl.scala:374)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.$anonfun$getSchemaMetadata$5(ManagedCatalogCommon.scala:227)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.getSchemaMetadata(ManagedCatalogCommon.scala:227)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.schemaExists(ManagedCatalogCommon.scala:233)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.databaseExists(ManagedCatalogSessionCatalog.scala:631)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.requireScExists(ManagedCatalogSessionCatalog.scala:274)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentDatabase(ManagedCatalogSessionCatalog.scala:492)
at com.databricks.sql.DatabricksCatalogManager.setCurrentNamespace(DatabricksCatalogManager.scala:159)
at org.apache.spark.sql.internal.CatalogImpl.setCurrentDatabase(CatalogImpl.scala:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
If I set the catalog and database via pyspark, create_table
works as excepted:
ispark._session.catalog.setCurrentCatalog("comms_media_dev")
ispark._session.catalog.setCurrentDatabase("dart_extensions")
ispark.create_table(name = "raw_camp_info", obj = df, overwrite = True, format="delta")
from ibis.
Thanks for all your help @gforsyth!
from ibis.
Related Issues (20)
- bug Error when connecting to Trino after upgrading to 9.0.0
- bug: table formatting characters don't render monospace with some fonts HOT 5
- bug: `read_parquet` and similar methods silently overwrite tables HOT 2
- bug: names_sort argument in table.pivot_wider has no effect HOT 1
- feat: add a method for table existence check HOT 5
- bug: `create_table(temp=True)` timing out due to slow table existence check HOT 2
- add support for TIMESTAMPTZ HOT 4
- feat: add a table_exists(table_name) api HOT 1
- bug: `to_sql` always shows DuckDB SQL for a memtable even if there's a default backend set HOT 1
- Polars backend can read only 1 csv HOT 1
- docs: add ops docstrings
- ci: more granular cloud run labels
- feat: allow to connect to a duckdb named in-memory database
- feat(memtable): handle castable inputs from date-like strings to date HOT 3
- feat: better distinguish ibis scalars from evaluated results in interactive repr
- feat: add databricks support HOT 2
- bug: .insert() uses column position, not names, to align HOT 1
- feat(settings): generalize and extend settings passthrough dictionary
- feat: Leverage SQLGlot AST for query parsing HOT 2
- bug: EOFError when running ibis tutorial code with ClickHouse backend HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ibis.