Comments (6)
This question is better asked on the libRETS-users mailing list. Lots of people experienced with all sorts of wackiness there.
This is more of a general RETS question than a libRETS issue.
Keith T. Garner
[email protected]:[email protected] 312-329-3294tel:312-329-3294
National Association of REALTORS®
VP - Information Technology Services
On Nov 25, 2014, at 4:27 PM, Mike Sparr <[email protected]mailto:[email protected]> wrote:
What is the best way to grab the full set and also prevent nightmares if an MLS decides to update the modified timestamp for the entire database in a single day? (Python snippet example)
Using python's binding for librets, and attempting to iterate through metadata resource/class and pull all records for that class for the initial pull, I come across a market for example with 43K records with (L_UpdateDate=1957-01-01T00:00:00+) and it will only retrieve 2500 at a time.
request.SetLimit(librets.SearchRequest.LIMIT_DEFAULT) # changing these does nothing on some
request.SetOffset(librets.SearchRequest.OFFSET_NONE) # changing these does nothing on some
request.SetCountType(librets.SearchRequest.RECORD_COUNT_AND_RESULTS)
request.SetFormatType(librets.SearchRequest.COMPACT)
# perform query
t_start = time.strftime('%Y-%m-%d %H:%M:%S')
results = session.Search(request)
record_count = results.GetCount()
print "Record count: " + `record_count`
print
columns = results.GetColumns()
while results.HasNext():
rec = {'request-id':request_id,'data':{}}
for column in columns:
rec['data'][column] = results.GetString(column).decode('utf-8') # had to fix encoding issues
As @ktgeekhttps://github.com/ktgeek mentioned in other posts, all the MAY rules mean RETS server vendors don't have to guarantee cursor position or properly support the limit/offset. In some markets this is causing issues with missing some listings during larger pulls, or having to "chunk" results into many queries by last modified or price intervals, etc. Unfortunately we've come across some large markets that during conversions or major changes update the timestamp in a single day for a million+ records and wreak havoc on the last modified approach.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11.
from librets.
Thanks Keith for getting back so quickly.
Using libRETS would you first query request.SetCountType(librets.SearchRequest.RECORD_COUNT_ONLY)
to get count and check response for MaxRows?
Then using that information recursively query each "chunk" using request.SetCountType(librets.SearchRequest.NO_RECORD_COUNT)
and changing the request.SetLimit(librets.SearchRequest.LIMIT_NONE)
and
request.SetOffset(chunk_size * i)
I'm curious if that is the best practice to first query total count, use that to determine chunk strategy, and if limit/offset are not working, and modified timestamp not failproof, what 'chunk' strategy is guaranteed. The latter might be users group but the prior I'd appreciate suggestions on most performant way to leverage the lib.
Also assuming the limit/offset works and query doesn't have to be used for chunking, I wanted to know if the request
object can be reused for many session.Search(request)
while iterating through "chunks" or if a preferred method to recursively execute the queries using the lib. If limit/offset do not work, then should you create the request object each time with new query, or is there way to just update the query of existing request?
Mainly, best practice using the lib to recursively query a result set, regardless of how we solve the "chunking" issue.
from librets.
On Nov 25, 2014, at 2:46 PM, Mike Sparr [email protected] wrote:
Thanks Keith for getting back so quickly.
Using libRETS would you first query request.SetCountType(librets.SearchRequest.RECORD_COUNT_ONLY) to get count and check response for MaxRows?
Looks like today is a RETS Spec kinda day, and let’s hope your server provider is “doing the right thing” with regard to HasKeyIndex and InKeyIndex as well as the suspension of limits when a search only includes those.
First thing I’d do is do my own “chunkification” of the data and not rely upon the server because of the excess of “MAY”s in the spec. That said …
I’d do a query that only returns the keys and store them somewhere. Then I’d iterate through that list requesting the entries. That way you can update your list to indicate what succeeded and what didn’t. That way you have a way to restart/continue if/when things go south.
Here’s some c++ code I’ve used in the past as an example. You can get fancier and construct your secondary query such that it is (keyfield=value)|(keyfield=value)|(keyfield=value) …. so as to do bulk queries. Or, just loop through as I do below:
SearchResultSetAPtr results = session->Search(searchRequest.get());
if (printCount)
{
cout << "Matching record count: " << results->GetCount() << endl;
}
/*
* For all listings found, fetch the full listing detail and then the
* associated Photos.
*/
while (results->HasNext())
{
totalListings++;
listingIds.push_back(results->GetString(keyField));
/*
* Create a new search to fetch all the detail for the listing.
*/
boost::shared_ptr<boost::thread> thd;
thd = boost::shared_ptr<boost::thread>(new boost::thread(
boost::bind(
find_listing,
resource,
searchClass,
str_stream() << "("
<< keyField
<< "="
<< results->GetString(keyField)
<< ")")));
threads.push_back(thd);
thd = boost::shared_ptr<boost::thread>(new boost::thread(
boost::bind(
fetch_media,
resource,
"Photo",
results->GetString(keyField))));
threads.push_back(thd);
}
for (vector<boost::shared_ptr<boost::thread> >::iterator i = threads.begin(); i != threads.end(); i++)
(*i)->join();
cout << "Total Listings Retrieved: " << totalListings << endl;
cout << "Listing IDs:" << endl;
for (vector<string>::iterator i = listingIds.begin(); i != listingIds.end(); i++)
cout << *i << endl;
from librets.
I'm making sure the MAY madness is questioned in ongoing RESO API, Data Dictionary, Transport discussions for exactly that reason. Thanks for the tip on technique. Some servers still appear to be 1.5 so won't support the key field check but I'm confident we can get by that.
Using this approach will it unnecessarily "tax" the servers with countless queries versus larger batches of results or is this the only way to guarantee accuracy?
from librets.
FBS servers do allow a “Query=*” for the approximation of a table scan. That HAS to be less taxing than repeated narrow queries. (I think it even came up back in RETS meetings before there was a thing called RESO.)
Unfortunately, the RETS vendors don’t give you a lot of other choices other than the keyfield thing. For servers that have that it is pretty much an exception to the limits. I think they only reason they are slightly “better” than other queries is that backend they are generally expected to be an indexed unique field.
For the other servers, I’ve seen people do queries like “(City=A*)” and then rotate through the alphabet. Obviously, the field to choose varies by area.
(Also, thanks for fighting the good fight! I got tired of that fight 7 years ago.)
On Nov 25, 2014, at 5:24 PM, Mike Sparr <[email protected]mailto:[email protected]> wrote:
I'm making sure the MAY madness is questioned in ongoing RESO API, Data Dictionary, Transport discussions for exactly that reason. Thanks for the tip on technique. Some servers still appear to be 1.5 so won't support the key field check but I'm confident we can get by that.
Using this approach will it unnecessarily "tax" the servers with countless queries versus larger batches of results or is this the only way to guarantee accuracy?
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/11#issuecomment-64489948.
Keith T. Garner - [email protected]:[email protected] - 312-329-3294 - http://realtor.org/
National Association of REALTORS® - VP - Information Technology Services
from librets.
On Nov 25, 2014, at 3:24 PM, Mike Sparr [email protected] wrote:
Using this approach will it unnecessarily "tax" the servers with countless queries versus larger batches of results or is this the only way to guarantee accuracy?
It will be poorer performing. If you don’t mind a little code, my preference would be to do a compound query. You’d need to figure out the best mix of memory consumed vs. hard limits (if the server uses one) vs. restart-ability if something goes wrong.=
from librets.
Related Issues (20)
- C# looping using hasnext() does not go through all records HOT 1
- How to install librets to use it with python3.7.X HOT 1
- Building 1.6.6 in Ubuntu 22.04 with Python 3.10
- Missing first photoObject for MRIS HOT 6
- The type initializer for 'librets.libretsPINVOKE' threw an exception. HOT 1
- RETS or RESO Web API for keeping local database up to date? HOT 13
- Must pass MODE_NO_SSL_VERIFY as of 1.6.3 HOT 13
- ContentLength required
- Error installing LibRets package 1.6.3 on .Net Core 3.0 HOT 1
- Error after update to 1.6.3 HOT 7
- 1.6.1 or 1.6.3 and TLS 1.2 HOT 25
- Authentication errors tls1.2 HOT 5
- librets won't work with RETS 1.9.0 servers HOT 6
- Windows Build HOT 1
- libRETS-x64 1.6.3 Does not work on Windows Server x64 HOT 4
- c# or vb.net example console applications HOT 1
- libRETS-x64 1.6.3 -503 HOT 1
- java use libRETS HOT 7
- how to install librets.pm module on centos 8 for PERl Script
- RetsSession.Search(query) error: Sequence contains more than one matching element HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from librets.