lugensa / scorched Goto Github PK
View Code? Open in Web Editor NEWSunburnt offspring solr client
License: MIT License
Sunburnt offspring solr client
License: MIT License
search.is_iter returns False for sets (and others). A more appropriate implementation would be:
from collections import Iterable
try:
basestring
except NameError:
basestring = str
def is_iter(val):
return not isinstance(val, basestring) and isinstance(val, Iterable)
Just found this bug while browsing the code: The results_as
function ignores its keyword argument and just returns a copy of the query.
Line 550 in 0074415
since this function is also documented nowhere, maybe just remove it?
I've stumble on this annoying bug here on python 3.4.
import unittest
from scorched.search import SolrSearch
class TestOptionsMethodWipesPivots(unittest.TestCase):
def test_there_is_a_problem_with_pivot_by_with_facet(self):
facet = 'facet'
pivot = 'pivot'
search = SolrSearch(None)
search = search.facet_by(facet, mincount=1).pivot_by([pivot, facet], mincount=1)
self.assertIn(pivot, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
self.assertIn(facet, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})
options = search.options()
self.assertIn('facet.pivot', options)
self.assertEqual(options['facet.pivot'], 'facet,pivot')
self.assertIn(pivot, search.pivoter.fields)
# Equals True
self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
self.assertIn(facet, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})
def test_there_is_a_problem_with_pivot_by_even_without_facet(self):
facet = 'facet'
pivot = 'pivot'
search = SolrSearch(None)
search = search.pivot_by([pivot, facet], mincount=1)
self.assertIn(pivot, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
self.assertIn(facet, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})
options = search.options()
self.assertIn('facet.pivot', options)
self.assertEqual(options['facet.pivot'], 'facet,pivot')
# Equals True
self.assertIn(pivot, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
self.assertIn(facet, search.pivoter.fields)
self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})
If you do a query in the current version of scorched, you'll get back a SolrReponse
object that has a bunch of properties, but the important ones are:
response.highlighting
(a dict mapping document IDs to highlighted field values)response.result
(a SolrResult object with docs
as a property)response.result.docs
is a list of the first N results you requested.
There's a fantastic feature for pagination and iteration that allows you to iterate a SolrResponse
object, so you can do:
for r in response.results:
print r.my_field
But if you make a query that involves highlighting, this utterly fails. What you want to do is something like:
for r in response.results:
print r.my_field, r.my_highlighted_field
But the highlighted fields are a separate property on the SolrReponse
, and aren't part of the iterated object. This makes it basically impossible to return highlighted results without pre-processing the SolrResponse
to merge the highlighting
attribute with the docs
.
In sunburnt
there was code that did exactly this, creating a solr_highlights
property on every result document that contained the highlighting for that document:
if result.highlighting:
for d in result.result.docs:
# if the unique key for a result doc is present in highlighting,
# add the highlighting for that document into the result dict
# (but don't override any existing content)
# If unique key field is not a string field (eg int) then we need to
# convert it to its solr representation
unique_key = self.schema.fields[self.schema.unique_key].to_solr(d[self.schema.unique_key])
if 'solr_highlights' not in d and \
unique_key in result.highlighting:
d['solr_highlights'] = result.highlighting[unique_key]
I think we need something like this or else highlighting is very difficult to use and requires that the calling code do some wonky merging.
I think the easiest place to fix this is in the to_json method of the SolrResponse
. Maybe this can be fixed with a constructor, but I haven't looked into that yet.
Hi all, I would like to use the latest scorched code.
I see there is some activity that indicates a release happened:
But I cannot find the package on PyPI:
Is it possible to have a release on PyPI? Are there any known issues?
I discovered some issues with how result grouping works. The implementation of groups doesn't support a few different things:
ngroups
must be set to true. If it's not the query will crash.
'group.format' cannot be set to simple
. If it is, the query will crash.
group.main
doesn't work in conjunction with group.format = simple
. If it's set, the query will crash.
I'm not sure if there's an appetite for this, but I find it enormously useful to be able to send arbitrary parameters to Solr outside of what the API typically allows (i.e., low-level queries like what pysolr provides).
I have a function I hacked on top of sunburnt that allows me to do raw queries. For example, I can do this:
self.si.raw_query(**{'q': '*', 'caller': 'update_index'})
And that'll make a Solr request like:
http://localhost:8983/select/?q=*&caller=update_index
The main way I use this is to add a caller
parameter to every request I make so that I can keep track of which ones are slow or later sort things out in the logs, but I also use it when sunburnt
or lacks a parameter I need.
Any appetite for adding this into core? I can provide a PR if so.
Hi ,
Tried utf-8 search on text field...got following error.
Please let me know the fix.
import scorched
si = scorched.SolrInterface("http://192.168.0.115:8983/solr/unicodecore/")
for result in si.query(name="आदित्य"):
print result
Traceback (most recent call last):
File "test_sss.py", line 9, in
for result in si.query(name="आदित्य"):
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/connection.py", line 389, in query
return q.query(_args, *_kwargs)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 411, in query
newself.query_obj.add(args, kwargs)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 329, in add
self.add_exact(field_name, v, terms_or_phrases)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 346, in add_exact
this_term_or_phrase = term_or_phrase or self.term_or_phrase(inst)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 369, in term_or_phrase
return 'terms' if self.default_term_re.match(str(arg)) else 'phrases'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
Hi:
I've added support for deep paging /cursormark ( see https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results ) of query results to scorched. At present it only supports iterating through the whole result set without having to explicitly fetch each page (which is done in the iterator), which I imagine would be the most common use case.
I'd like to contribute this - should I just send you the diffs ( to results.py and search.py) or open a pull request ?
best
-Simon
Example:
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scorched import SolrInterface
>>> si = SolrInterface("http://localhost:8983/solr/mysearchengine/")
>>> si.query('foo +bar -baz').alt_parser('edismax').options()
{'defType': 'edismax', 'q': 'foo\\ \\+bar\\ \\-baz'}
Notice how valid edismax syntax is being escaped: spaces, +
, and -
no longer have special meaning, and so the edismax parser will not actually look at them.
The results i get from this query do not reflect my intent: I get documents that do not include bar
and do include baz
. The Solr log shows this query:
q=foo\+\%2Bbar\+\-baz
If I run the same query in Solr Admin (which shows correct results), then the Solr log shows this query:
q=foo+%2Bbar+-baz
I think that when using an "alternative parser", Scorched should not preprocess the query like this. It means the alternative parser doesn't even have a chance to parse anything.
Thoughts?
Testing with Solr5 and also get tests running with Solr5. testing-solr.sh
need to be updated maybe like https://github.com/ghindows/travis-solr.
I have a weird problem. I'm querying a Solr 5.3 instance with Django through Scorched. It all works great as far as I don't ask an exact-match query. In other words,
q=something something else
returns exactly the same result as:
q="something something else"
The culprit, as far as I can see, is the actual query which Django throws at Solr. In fact, for the second case this is:
q="something+something+else"
So, in other words, the " character is escaped. Am I right? How do I tell Solr that when I query something between double quotes I want an exact match?
In the Solr admin webpage it all works well, i.e. if I search for "something something else" I get the correct result.
I'm not sure this is a Scorched problem or not. Does it have something to do with filters/tokenizers (e.g. solr.MappingCharFilterFactory)?
Any plans to support solr4 join queries?
I had an open pull request with join support for sunburnt (tow/sunburnt#88). If I can clean that up and/or reimplement to work with scorched, would a pull request for join support be welcomed?
The SolrReponse object has a __len__
method that has very simple code right now:
def __len__(self):
if self.groups:
return len(getattr(self.groups, self.group_field)['groups'])
else:
return len(self.result.docs)
The third line of that I wrote, copying the code from the last line, but both are wrong. The updated code should be:
def __len__(self):
if self.groups:
return getattr(self.groups, self.group_field)['ngroups']
else:
return self.result.numFound
This way things like paginators will be able to know the full size of the response.
My issue is fairly simple, my index contains fields that have dots (.) in it, for instance: object.id
I try to create a range query :
search.query(object.id__lt=before)
But that's not a legal way to declare field.
What I have found yet is to add a LuceneQuery manually:
search.query_obj.add_range('object.id', 'lt', before)
Is there a better way to bypass that problem ?
This is a small thing that I just discovered while looking at the code, but if you don't supply a mode
parameter to your SolrInterface
, you get a read/write interface by default. The code in the __init__
method is:
if mode == 'r':
self.writeable = False
elif mode == 'w':
self.readable = False
This is a design question, but I think this would be better if the default was read only. This would require people to create writable interfaces explicitly, which seems important.
This is a breaking API change, so we'll want to consider it carefully, but it seems like a good direction to me.
I would like to add regex support for Solr 4+. I was considering doing it as scorched.strings.RegexpString. It looks pretty straightforward. Would you consider accepting a PR (with tests) for this feature?
Hello,
it seems that example files in recent Solr distributions define dates using a pdate
, not date
type in the schema.xml
file:
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
This type is not recognised as a date type by Scorched, because date types are defined in method SolrInterface._extract_datefields
in file connection.py
:
def _extract_datefields(self, schema):
ret = [x['name'] for x in
schema['fields'] if x['type'] == 'date']
ret.extend([x['name'] for x in schema['dynamicFields']
if x['type'] == 'date'])
return ret
Recognising pdate
as a date could be achieved by replacing the two occurrences of the test:
if x['type'] == 'date'
by:
if x['type'] in ('date', 'pdate')
For now, the workaround is to rename pdate
to date
in the schema file.
q = scorched.strings.WildcardString('abc abc*')
si.query(q)
raises scorched.exc.SolrError: <Response [400]>
I am using this library but met an error of maximum recursion depth exceeded in this function. Does anyone have any idea to fix this?
https://github.com/lugensa/scorched/blob/master/scorched/search.py#L146
Any plans to support result grouping?
https://cwiki.apache.org/confluence/display/solr/Result+Grouping
I might be able to take a look at implementing this, would a pull request be welcomed? Any thoughts on how you'd like to see it implemented?
I have a need for this, and was planning to start with the simple group result format, because it would require the least change in handling and displaying results.
Hello,
I've just come across a problem which arose after upgrading scorched from 0.11.0 to 0.12.0, in relation with the use of caching in the httplib2 module. The following code:
from scorched import SolrInterface
from httplib2 import Http
si = SolrInterface(url='http://localhost:8983/solr/XXX', http_connection=Http('/tmp'))
works with scorched <= 0.11.0, but produces the following error with 0.12.0:
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 292, in __init__
self.schema = self.init_schema()
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 298, in init_schema
self.remote_schema_file))
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 73, in request
return self.http_connection.request(*args, **kwargs)
File "/Users/daverio/pydev/lib/python3.5/site-packages/httplib2/__init__.py", line 1176, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
File "/Users/daverio/pydev/lib/python3.5/site-packages/httplib2/__init__.py", line 148, in urlnorm
raise RelativeURIError("Only absolute URIs are allowed. uri = %s" % uri)
httplib2.RelativeURIError: Only absolute URIs are allowed. uri = GET
I haven't taken the time to investigate the problem yet. I'm not sure if I should continue using httplib2 caching.
tow/sunburnt#93
tow/sunburnt#94
Are either of the above PRs of interest to you? I'd be happy to re-make them for scorched.`
When you try to index an item with a multi-valued date field, you run into this error:
In [14]: sun.add(judy.as_search_dict())
---------------------------------------------------------------------------
SolrError Traceback (most recent call last)
<ipython-input-14-c11bdcf59b84> in <module>()
----> 1 sun.add(judy.as_search_dict())
/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/connection.py in add(self, docs, chunk, **kwargs)
343 ret = []
344 for doc_chunk in grouper(docs, chunk):
--> 345 update_message = json.dumps(self._prepare_docs(doc_chunk))
346 ret.append(scorched.response.SolrUpdateResponse.from_json(
347 self.conn.update(update_message, **kwargs)))
/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/connection.py in _prepare_docs(self, docs)
319 continue
320 if scorched.dates.is_datetime_field(name, self._datefields):
--> 321 value = str(scorched.dates.solr_date(value))
322 new_doc[name] = value
323 prepared_docs.append(new_doc)
/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/dates.py in __init__(self, v)
93 else:
94 raise scorched.exc.SolrError(
---> 95 "Cannot initialize solr_date from %s object" % type(v))
96
97 @staticmethod
SolrError: Cannot initialize solr_date from <type 'list'> object
This appears to be because of the code here, which assumes that date fields are never multi-value:
def _prepare_docs(self, docs):
prepared_docs = []
for doc in docs:
new_doc = {}
for name, value in list(doc.items()):
# XXX remove all None fields this is needed for adding date
# fields
if value is None:
continue
if scorched.dates.is_datetime_field(name, self._datefields):
# This is where the code needs a tweak, I'd say:
value = str(scorched.dates.solr_date(value))
new_doc[name] = value
prepared_docs.append(new_doc)
return prepared_docs
I can think of two solutions here. We can either interrogate the schema to see if the item is multi-valued, and to assume a list in that case, or we can see if we got a list, and to assume that means it's a multi-value field.
I'd be happy to implement either solution, if desired.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.