afair / postgresql_cursor Goto Github PK
View Code? Open in Web Editor NEWActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set
License: MIT License
ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set
License: MIT License
When running a query using an association, ActiveRecord has changed to use prepared statements with bind parameters, something like:
"select * from comments where post_id = $1", [["post_id", 123]]
so when to_sql
is called on the relation, the $1 is not yet interpolated.
Solution: use connection.unprepared_statement { to_sql }
syntax to skip preparation. Cursor declaration includes preparation anyway.
I'd like to assign my own name of cursor during declare. Here it is defined rigidly:
Please, use rspec, appraisal, combustion for tests.
Impossible contribute to this gem.
Your Gem is unique and exists in the singular. It must develop and improve with the community more effectively.
https://github.com/rspec/rspec
https://github.com/thoughtbot/appraisal
https://github.com/pat/combustion
Any error within the cursor block seems to spit out only this exception...
NameError: undefined local variable or method `e' for #<PostgreSQLCursor:0x007f83f76fe190>
It seems as if the original exception (in this case it was a missing column) does not get bubbled up.
Won't it be more natural for the each_instance
method (as well as other iterators) w/o a block to return a enumerator instead of the "raw" cursor?
Now I can do this explicitly:
MyModel.all.each_instance.to_enum.with_object([]) { |record, obj| obj << record }
I think this to_enum
can be applied to cursor under the hood, namely here:
https://github.com/afair/postgresql_cursor/blob/master/lib/postgresql_cursor/active_record/sql_cursor.rb#L34
This would make API a bit more natural IMHO:
MyModel.all.each_instance.with_object([]) { |raw, obj| ... }
First of all, great gem, love it ๐
I've only run into a problem with enum attributes in ActiveRecord. If I define a model with an enum attribute, .each() outputs the symbolic enum value, but .each_instance() outputs the underlying numeric value. e.g.
class User < ApplicationRecord
enum gender: [:male, :female]
end
User.all.each {|u| puts u.gender} # Outputs "male", "male", "female"
User.all.each_instance {|u| puts u.gender} # Outputs 0, 0, 1
I'am trying to use this gem in the following way:
sum = 0
Product.for_sale.each_instance {|p| p.sum = sum + p.balance ; sum += p.balance }
And return that as a collection. From the documentation, it says that each_instance returns the cursor as an enumerable. Is tthere a way to get a collection of Products?
Thanks!
I use #each_instance_batch
method to iterate over batches, and when it tries to find column types for postgresql enum
columns it warns
/gems/2.7.0/gems/activerecord-6.0.3.4/lib/active_record/connection_adapters/abstract_adapter.rb:698: warning: deprecated Object#=~ is called on Integer; it always returns nil
/gems/2.7.0/gems/activerecord-6.0.3.4/lib/active_record/connection_adapters/abstract_adapter.rb:694: warning: deprecated Object#=~ is called on Integer; it always returns nil
ruby -v => ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]
rails -v => Rails 6.0.3.4
With Rails 5 around the corner, no recent commits to this project and a few forks with a few small optimiztions that have not been pulled back in, I'd like to ask if a different way of lazily plucking attributes has arisen recently or if postgresql_cursor is still suggested to use?
Thanks!
I'm not sure how difficult it would be to do, but I realized Model.includes(:something).each_instance
will return back records that do not have the associations preloaded.
To reproduce run the following:
ActiveRecord::Base.each_row_by_sql('select * from bad_table') { }
This leads to the following exception:
ActiveRecord::StatementInvalid: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block
: close cursor_1
It would be much easier to troubleshoot issues if the original exception was thrown:
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "bad_table" does not exist
LINE 1: declare cursor_1 cursor for select * from bad_table
^
: declare cursor_1 cursor for select * from bad_table
Our app opens a cursor which can run for hours at a time.
On the longer runs (7+ hours) the cursor occasionally errors out with:
ActiveRecord::StatementInvalid
PG::ConnectionBad: PQsocket() can't get socket descriptor: ROLLBACK
I am trying to rule this gem out as the culprit. I'm not sure if this is masking a different error.
Should we be passing with_hold: true
in the options?
Here is a snippet of the cursor loop:
# BATCH_SIZE = 10_000
records.each_instance(block_size: BATCH_SIZE) { |record| @handler.call(record) }
Couple things that may or may not be pertinent:
#each_instance
Hello there,
Great work!
I want to use the version for Rails 4 in another gem. Its not possible, since gemspec doesnt' support git branches as dependencies, but only dependencies from Rubygems.
Any chance version 0.5 will be published to Rubygems soon?
Looking at https://github.com/afair/postgresql_cursor/blob/master/lib/postgresql_cursor/cursor.rb#L298, the comment explains that the gem prefers to override PostgreSQL's default cursor_tuple_fraction = 0.1
to 1.0
.
However because the arg frac=1.0
and there's an default on the options lookup @cursor_tuple_fraction ||= @options.fetch(:fraction) { 1.0 }
, I believe the early return return @cursor_tuple_fraction if frac == @cursor_tuple_fraction
will always fire unless you set a custom value not equal to 1.0
.
This means both that the 1.0
default never gets applied, and also that it's impossible to apply a configuration value of 1.0
.
I think the early return probably is an unnecessary optimization in terms of speed, and having any early return will possibly cause bugs since the default config on a given Postgres install might not be 0.1
anyway, so it's not possible to fully fix by changing to only return early if the desired value is 0.1
.
Finally, the gem doesn't reset the value after the cursor is done, so it's potentially poisoning the connection settings for any other use not going through the gem.
PostgreSQLCursor::Cursor#each_row
symbolizes keys in the yield hashes when the :symbolize_keys
option is truthy. It would be nice if PostgreSQLCursor::Cursor#fetch
also supported this option.
According to my tests and code - https://github.com/afair/postgresql_cursor/blob/master/lib/postgresql_cursor.rb#L161 - each_row is not using options
provided in argument. But what's funny the each_instance is using them properly.
I will provide fix in PR soon.
It's related to this problem from Rails: rails/rails#15107
Because of the row.symbolize_keys
line ( #61 ) instantiate
from line #173 is not properly assigning attributes.
I will write a tests and temporary fix for it. At least until it's not fixed in Rails 4.
Currently, I have my main code working where I run each_row
on my ActiveRecord Relation. But in the test, I have an array, and the original code used to be .each
so the test used to pass. Now I am running into undefined method 'each_row' for #<Array:0x00007f8a584358e8>
. Any suggestion on what the structure in the test should be that it supports each_row
? Any advice will be great! Thank you.
The README says that this gem is not intended to support FOR UPDATE to process and update each row in place. But why not? I scanned the functionality of this gem and the PostgreSQL documentation for a bit but it's not obvious why FOR UPDATE is not supported.
Does postgresql_cursor work with current Rails 5 pre-releases or is it better to expect that this gem will not be compatible with Rails 5 and will need major reworking? As the last commit was 14 months ago, have people moved on to alternative solutions, and if yes, which?
The DECLARE
command allows the NO SCROLL
argument, which hints to PostgreSQL that you will only be scrolling forward, not backwards. If you do not specify this argument then PostgreSQL tries to figure out for you whether it should allow scrolling or not, although the documentation does not explain what that means.
It seems this gem is only designed for scrolling forward anyway, so would it be a good idea to always call DECLARE with NO SCROLL as a micro-optimization? According to the documentation, NO SCROLL was introduced in PostgreSQL 8.3, which is very old. For reference: Ubuntu 12.04 and Debian 7 both supply PostgreSQL 9.1.
Documentation on cursors: https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html
each_instance
incorrectly sets the column types, for example
class User < ActiveRecord::Base
enum state: {
active: 1,
}
end
pp User.first
# #<User:0x00007f884e1bee40
# id: 1,
# created_at: Wed, 17 Apr 2019 14:33:34 AEST +10:00
# state: "active">
pp User.each_instance.first
# #<User:0x00007f884e1bee40
# id: 1,
# created_at: 2019-04-16 22:24:59 UTC
# state: 1>
Expected.
in the example above objects from both queries should be identical
rails -v => Rails 6.0.3.4
ruby -v => ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin19]
We have discovered that this gem doesn't handle properly the ActiveRecord none
method.
subscriptions = Subscription.none
subscriptions.each_row { |row| p row }
The above code raises:
ActiveRecord::StatementInvalid:
PG::SyntaxError: ERROR: syntax error at end of input
LINE 1: ...ursor_a7205fd123ab9169d9a3105 no scroll cursor for
^
The expected behavior is that no error is raised, because the above code is perfectly valid. For example the following code that uses ActiveRecord each
does not raise any error:
subscriptions = Subscription.none
subscriptions.each { |row| p row }
I think that the problem is that this gem uses the to_sql
method, which returns an empty string when none
is used (somewhere in the filter chain).
Probably the best solution is to check if to_sql
returns an empty string and in that case return an empty array of results.
For example when using .each_row dates and integers are returned as strings in a hash. Also dynamic methods do not work, i.e. user.name does not work, but user[:name] does.
For anyone seeing this error, I encountered it when adding postgresql_cursor
to my Gemfile
.
I had to use gem "postgresql_cursor", require: false
and then require "postgresql_cursor"
manually where I needed it.
This would be good to document, I think.
Rails 4.2.11, Ruby 2.5.3p105, Bundler 1.17.2
Hi,
Thanks for the recent release. I've just tried to install 0.6.5 but it's not compatible with Rails 6 - was this intentional? We're likely to be a few months away from upgrading to Rails 7 but would really like to keep using this gem. Was the restriction intentional?
The following minimal Gemfile shows the problem when using bundle install
:
source "https://rubygems.org"
gem "rails", "~> 6.1.7"
gem "postgresql_cursor", "~> 0.6.5"
Bundler could not find compatible versions for gem "activerecord":
In Gemfile:
postgresql_cursor (~> 0.6.5) was resolved to 0.6.5, which depends on
activerecord (>= 7.0.0)
rails (~> 6.1.7) was resolved to 6.1.7, which depends on
activerecord (= 6.1.7)
Thanks,
Owen.
Hello!
I try render a collection from cursor in Ruby on Rails (it may be useful for rendering big collections).
But got an error
undefined method 'size' for #<PostgreSQLCursor::Cursor:0x00007fcc4edd4dd0
$ psql
postgres=# create user books with password 'books';
CREATE ROLE
postgres=# create database books_development owner books;
CREATE DATABASE
rails new books --database postgresql
cd books
Fix development block in config/database.yml
file for connect to database.
bundle add postgresql_cursor
rails g scaffold book name:string
rails db:migrate
rails s
Open in a browser: http://localhost:3000/books
It should show book list (now empty).
Append one or several books to list.
# app/views/books/index.html.erb
<p style="color: green"><%= notice %></p>
<h1>Books</h1>
<div id="books">
<%= render partial: "books/book", collection: @books %>
</div>
<%= link_to "New book", new_book_path %>
Link http://localhost:3000/books should show book list.
index.html.erb
for use postgresql_cursor enumAppend .each_instance
to books collection:
<%= render partial: "books/book", collection: @books.each_instance %>
Refresh book list in browser. It show an error:
NoMethodError in Books#index
Showing books/app/views/books/index.html.erb where line #6 raised:
undefined method `size' for #<PostgreSQLCursor::Cursor:0x00007fcc4edd4dd0 @sql="SELECT \"books\".* FROM \"books\"", ...
Using @@cursor_seq +=
in this line is not thread safe!
I can pull together a more descriptive example if it's useful, but I have a JSON type in the db where I store arrays of stringified integers. I use serialize
to serialize arrays of integers to arrays of strings upon save, and to deserialize back into arrays of integers upon load from the db. each_instance
doesn't appear to do this; a standard ActiveRecord load does.
http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.