Pandas version checks <li class="task-list-item"

BUG: RuntimeWarning: coroutine 'AsyncCursor.execute' was never awaited about pandas HOT 3 OPEN

dss010101 commented on July 24, 2024

BUG: RuntimeWarning: coroutine 'AsyncCursor.execute' was never awaited

from pandas.

Comments (3)

dss010101 commented on July 24, 2024

this is the version that works with ThreadPoolExecutor, but seems to block:

Here is the version that uses Thread Pooling

def read_sql(self, sql, params = None):
  t1 = timeit.default_timer()
  with warnings.catch_warnings():
	 warnings.simplefilter("ignore", UserWarning)
	 with psycopg.connect(self.connect_str, autocommit=True) as conn:
		df = pd.read_sql(sql, con = conn, params = params)
	 
	 self.log.debug(f'perf: {timeit.default_timer() - t1}')
	 return df

the concurrent futures code is this:

import concurrent.futures as cf

def test_thread_pool():
   db_reader = DataReader()
   sql = "select id, name from factor f where f.id = ANY(%s)"
   threads = 20
   id_partitions = np.array_split(list(range(1, 10000)), threads)
   id_partitions = [[p.tolist()] for p in id_partitions]
   with cf.ThreadPoolExecutor(max_workers=threads) as exec:
      futures = {
         exec.submit(db_reader.read_sql, sql, params=p):
           p for p in id_partitions
      }
      
      for future in cf.as_completed(futures):
         ids = futures[future]
         try:
            df = future.result()
         except Exception as exc:
            log.exception(f'error retrieving data for: {ids}')
         else:
            if df is not None:
               print(f'shape: {df.shape}')

The output of the debug line from read_sql looks like this:

perf: 0.7313497869981802
perf: 0.8116309550023288
perf: 3.401154975006648
perf: 5.22201336100261
perf: 6.325166654998611
perf: 6.338692951001576
perf: 6.573095380997984
perf: 6.5976604809984565
perf: 6.8282670119951945
perf: 7.291718505999597
perf: 7.4276196580030955
perf: 7.407097272000101
perf: 8.38801568299823
perf: 9.119963648998237

You'll notice that it is incrementing - id have expected it to be all roughly around the same time - so it seems there is some sql blocking. Also, the time gap between the first two threads and 3rd is always about 2-3 seconds - why is that?
I've also tried creating a new DbReader instance for each thread..but same effect.

anyone know if the connection or pandas read_sql blocks? or how to solve?

from pandas.

Aloqeely commented on July 24, 2024

I have not had enough time to look at this, but from the issue title, shouldn't you await this to remove the warning:

tasks = {db_reader.read_sql_async(sql, params=p) for p in id_partitions}

Because db_reader.read_sql_async is an async method.

from pandas.

dss010101 commented on July 24, 2024

I have not had enough time to look at this, but from the issue title, shouldn't you await this to remove the warning:
tasks = {db_reader.read_sql_async(sql, params=p) for p in id_partitions}
Because db_reader.read_sql_async is an async method.

I am. You can see the await in the for/as_completed loop.

I think the problem is prob an async connection or cusor doesn't work with pandas. If so, if like to understand if pandas blocks ..as per the 2nd example, why am I not able to achieve better concurrency

from pandas.

Recommend Projects

BUG: RuntimeWarning: coroutine 'AsyncCursor.execute' was never awaited about pandas HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent