lugensa / scorched Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 19.0 736 KB

Sunburnt offspring solr client

License: MIT License

Python 96.46% Shell 3.54%

scorched's People

Contributors

Stargazers

Watchers

Forkers

annaisystems wilkenm mamico djangsters rlskoeser pombredanne iucn-elc cjappl emory-lits-labs k-wojcik mlissner quinot syslabcom kdonaldson pduval top1610 ale-rt

scorched's Issues

is_iter() in search only works for tuples and lists

search.is_iter returns False for sets (and others). A more appropriate implementation would be:

from collections import Iterable

try:
    basestring
except NameError:
    basestring = str

def is_iter(val):
    return not isinstance(val, basestring) and isinstance(val, Iterable)

results_as() does nothing

Just found this bug while browsing the code: The results_as function ignores its keyword argument and just returns a copy of the query.

scorched/scorched/search.py

Line 550 in 0074415

def results_as(self, constructor):

since this function is also documented nowhere, maybe just remove it?

pivoter fields are wiped by options()

I've stumble on this annoying bug here on python 3.4.

import unittest
from scorched.search import SolrSearch


class TestOptionsMethodWipesPivots(unittest.TestCase):
    def test_there_is_a_problem_with_pivot_by_with_facet(self):
        facet = 'facet'
        pivot = 'pivot'

        search = SolrSearch(None)
        search = search.facet_by(facet, mincount=1).pivot_by([pivot, facet], mincount=1)
        self.assertIn(pivot, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

        options = search.options()
        self.assertIn('facet.pivot', options)
        self.assertEqual(options['facet.pivot'], 'facet,pivot')

        self.assertIn(pivot, search.pivoter.fields)
        # Equals True
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

    def test_there_is_a_problem_with_pivot_by_even_without_facet(self):
        facet = 'facet'
        pivot = 'pivot'

        search = SolrSearch(None)
        search = search.pivot_by([pivot, facet], mincount=1)
        self.assertIn(pivot, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

        options = search.options()
        self.assertIn('facet.pivot', options)
        self.assertEqual(options['facet.pivot'], 'facet,pivot')

        # Equals True
        self.assertIn(pivot, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

Highlighting results at top level of SolrResponse isn't sufficient

If you do a query in the current version of scorched, you'll get back a SolrReponse object that has a bunch of properties, but the important ones are:

response.highlighting (a dict mapping document IDs to highlighted field values)
response.result (a SolrResult object with docs as a property)

response.result.docs is a list of the first N results you requested.

There's a fantastic feature for pagination and iteration that allows you to iterate a SolrResponse object, so you can do:

for r in response.results:
     print r.my_field

But if you make a query that involves highlighting, this utterly fails. What you want to do is something like:

for r in response.results:
    print r.my_field, r.my_highlighted_field

But the highlighted fields are a separate property on the SolrReponse, and aren't part of the iterated object. This makes it basically impossible to return highlighted results without pre-processing the SolrResponse to merge the highlighting attribute with the docs.

In sunburnt there was code that did exactly this, creating a solr_highlights property on every result document that contained the highlighting for that document:

if result.highlighting:
    for d in result.result.docs:
        # if the unique key for a result doc is present in highlighting,
        # add the highlighting for that document into the result dict
        # (but don't override any existing content)
        # If unique key field is not a string field (eg int) then we need to
        # convert it to its solr representation
        unique_key = self.schema.fields[self.schema.unique_key].to_solr(d[self.schema.unique_key])
        if 'solr_highlights' not in d and \
               unique_key in result.highlighting:
            d['solr_highlights'] = result.highlighting[unique_key]

I think we need something like this or else highlighting is very difficult to use and requires that the calling code do some wonky merging.

I think the easiest place to fix this is in the to_json method of the SolrResponse. Maybe this can be fixed with a constructor, but I haven't looked into that yet.

1.0 beta release missing on PyPI

Hi all, I would like to use the latest scorched code.

I see there is some activity that indicates a release happened:

https://github.com/lugensa/scorched/commits/master

But I cannot find the package on PyPI:

https://pypi.org/project/scorched/#history

Is it possible to have a release on PyPI? Are there any known issues?

Grouping has a few issues

I discovered some issues with how result grouping works. The implementation of groups doesn't support a few different things:

ngroups must be set to true. If it's not the query will crash.
'group.format' cannot be set to simple. If it is, the query will crash.
group.main doesn't work in conjunction with group.format = simple. If it's set, the query will crash.

Support for sending custom parameters to Solr

I'm not sure if there's an appetite for this, but I find it enormously useful to be able to send arbitrary parameters to Solr outside of what the API typically allows (i.e., low-level queries like what pysolr provides).

I have a function I hacked on top of sunburnt that allows me to do raw queries. For example, I can do this:

self.si.raw_query(**{'q': '*', 'caller': 'update_index'})

And that'll make a Solr request like:

http://localhost:8983/select/?q=*&caller=update_index

The main way I use this is to add a caller parameter to every request I make so that I can keep track of which ones are slow or later sort things out in the logs, but I also use it when sunburnt or lacks a parameter I need.

Any appetite for adding this into core? I can provide a PR if so.

Utf-8 search fails

Hi ,

Tried utf-8 search on text field...got following error.
Please let me know the fix.

import scorched

si = scorched.SolrInterface("http://192.168.0.115:8983/solr/unicodecore/")

for result in si.query(name="आदित्य"):
print result

Traceback (most recent call last):
File "test_sss.py", line 9, in
for result in si.query(name="आदित्य"):
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/connection.py", line 389, in query
return q.query(_args, *_kwargs)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 411, in query
newself.query_obj.add(args, kwargs)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 329, in add
self.add_exact(field_name, v, terms_or_phrases)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 346, in add_exact
this_term_or_phrase = term_or_phrase or self.term_or_phrase(inst)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 369, in term_or_phrase
return 'terms' if self.default_term_re.match(str(arg)) else 'phrases'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

Adding support for deep paging /cursormark queries

Hi:

I've added support for deep paging /cursormark ( see https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results ) of query results to scorched. At present it only supports iterating through the whole result set without having to explicitly fetch each page (which is done in the iterator), which I imagine would be the most common use case.

I'd like to contribute this - should I just send you the diffs ( to results.py and search.py) or open a pull request ?

best

-Simon

Edismax queries are parsed incorrectly

Example:

Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scorched import SolrInterface
>>> si = SolrInterface("http://localhost:8983/solr/mysearchengine/")
>>> si.query('foo +bar -baz').alt_parser('edismax').options()
{'defType': 'edismax', 'q': 'foo\\ \\+bar\\ \\-baz'}

Notice how valid edismax syntax is being escaped: spaces, +, and - no longer have special meaning, and so the edismax parser will not actually look at them.

The results i get from this query do not reflect my intent: I get documents that do not include bar and do include baz. The Solr log shows this query:

q=foo\+\%2Bbar\+\-baz

If I run the same query in Solr Admin (which shows correct results), then the Solr log shows this query:

q=foo+%2Bbar+-baz

I think that when using an "alternative parser", Scorched should not preprocess the query like this. It means the alternative parser doesn't even have a chance to parse anything.

Thoughts?

Solr >= 5

Testing with Solr5 and also get tests running with Solr5. testing-solr.sh need to be updated maybe like https://github.com/ghindows/travis-solr.

Exact search (double quotes) ignored

I have a weird problem. I'm querying a Solr 5.3 instance with Django through Scorched. It all works great as far as I don't ask an exact-match query. In other words,

q=something something else

returns exactly the same result as:

q="something something else"

The culprit, as far as I can see, is the actual query which Django throws at Solr. In fact, for the second case this is:

q="something+something+else"

So, in other words, the " character is escaped. Am I right? How do I tell Solr that when I query something between double quotes I want an exact match?

In the Solr admin webpage it all works well, i.e. if I search for "something something else" I get the correct result.

I'm not sure this is a Scorched problem or not. Does it have something to do with filters/tokenizers (e.g. solr.MappingCharFilterFactory)?

Solr4 join support

Any plans to support solr4 join queries?

I had an open pull request with join support for sunburnt (tow/sunburnt#88). If I can clean that up and/or reimplement to work with scorched, would a pull request for join support be welcomed?

len method on SolrResponse only returns number of rows

The SolrReponse object has a __len__ method that has very simple code right now:

def __len__(self):
    if self.groups:
        return len(getattr(self.groups, self.group_field)['groups'])
    else:
        return len(self.result.docs)

The third line of that I wrote, copying the code from the last line, but both are wrong. The updated code should be:

def __len__(self):
    if self.groups:
        return getattr(self.groups, self.group_field)['ngroups']
    else:
        return self.result.numFound

This way things like paginators will be able to know the full size of the response.

Dot in the field name and query composition

My issue is fairly simple, my index contains fields that have dots (.) in it, for instance: object.id

I try to create a range query :

search.query(object.id__lt=before)

But that's not a legal way to declare field.

What I have found yet is to add a LuceneQuery manually:

search.query_obj.add_range('object.id', 'lt', before)

Is there a better way to bypass that problem ?

Default mode should be read only

This is a small thing that I just discovered while looking at the code, but if you don't supply a mode parameter to your SolrInterface, you get a read/write interface by default. The code in the __init__ method is:

    if mode == 'r':
        self.writeable = False
    elif mode == 'w':
        self.readable = False

This is a design question, but I think this would be better if the default was read only. This would require people to create writable interfaces explicitly, which seems important.

This is a breaking API change, so we'll want to consider it carefully, but it seems like a good direction to me.

Consider adding regex support

I would like to add regex support for Solr 4+. I was considering doing it as scorched.strings.RegexpString. It looks pretty straightforward. Would you consider accepting a PR (with tests) for this feature?

Include `pdate` as a date type?

Hello,

it seems that example files in recent Solr distributions define dates using a pdate, not date type in the schema.xml file:

    <fieldType name="pdate" class="solr.DatePointField" docValues="true"/>

This type is not recognised as a date type by Scorched, because date types are defined in method SolrInterface._extract_datefields in file connection.py:

    def _extract_datefields(self, schema):
        ret = [x['name'] for x in
               schema['fields'] if x['type'] == 'date']
        ret.extend([x['name'] for x in schema['dynamicFields']
                    if x['type'] == 'date'])
        return ret

Recognising pdateas a date could be achieved by replacing the two occurrences of the test:

if x['type'] == 'date'

by:

if x['type'] in ('date', 'pdate')

For now, the workaround is to rename pdate to date in the schema file.

Wildcard strings with spaces not being handled properly

 q = scorched.strings.WildcardString('abc abc*')
 si.query(q)

raises scorched.exc.SolrError: <Response [400]>

Maximum recursion depth exceeded

I am using this library but met an error of maximum recursion depth exceeded in this function. Does anyone have any idea to fix this?

https://github.com/lugensa/scorched/blob/master/scorched/search.py#L146

cc @lujh @mlissner

group query support

Any plans to support result grouping?

https://cwiki.apache.org/confluence/display/solr/Result+Grouping

I might be able to take a look at implementing this, would a pull request be welcomed? Any thoughts on how you'd like to see it implemented?

I have a need for this, and was planning to start with the simple group result format, because it would require the least change in handling and displaying results.

Scorched 0.12.0 no longer compatible with httlib2 caching

Hello,
I've just come across a problem which arose after upgrading scorched from 0.11.0 to 0.12.0, in relation with the use of caching in the httplib2 module. The following code:

from scorched import SolrInterface
from httplib2 import Http
si = SolrInterface(url='http://localhost:8983/solr/XXX', http_connection=Http('/tmp'))

works with scorched <= 0.11.0, but produces the following error with 0.12.0:

File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 292, in __init__
self.schema = self.init_schema()
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 298, in init_schema
self.remote_schema_file))
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 73, in request
return self.http_connection.request(*args, **kwargs)
File "/Users/daverio/pydev/lib/python3.5/site-packages/httplib2/__init__.py", line 1176, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
File "/Users/daverio/pydev/lib/python3.5/site-packages/httplib2/__init__.py", line 148, in urlnorm
raise RelativeURIError("Only absolute URIs are allowed. uri = %s" % uri)
httplib2.RelativeURIError: Only absolute URIs are allowed. uri = GET

I haven't taken the time to investigate the problem yet. I'm not sure if I should continue using httplib2 caching.

PRs on sunburnt

tow/sunburnt#93
tow/sunburnt#94

Are either of the above PRs of interest to you? I'd be happy to re-make them for scorched.`

Multi-valued date fields cannot be indexed

When you try to index an item with a multi-valued date field, you run into this error:

In [14]: sun.add(judy.as_search_dict())
---------------------------------------------------------------------------
SolrError                                 Traceback (most recent call last)
<ipython-input-14-c11bdcf59b84> in <module>()
----> 1 sun.add(judy.as_search_dict())

/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/connection.py in add(self, docs, chunk, **kwargs)
    343         ret = []
    344         for doc_chunk in grouper(docs, chunk):
--> 345             update_message = json.dumps(self._prepare_docs(doc_chunk))
    346             ret.append(scorched.response.SolrUpdateResponse.from_json(
    347                 self.conn.update(update_message, **kwargs)))

/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/connection.py in _prepare_docs(self, docs)
    319                     continue
    320                 if scorched.dates.is_datetime_field(name, self._datefields):
--> 321                     value = str(scorched.dates.solr_date(value))
    322                 new_doc[name] = value
    323             prepared_docs.append(new_doc)

/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/dates.py in __init__(self, v)
     93         else:
     94             raise scorched.exc.SolrError(
---> 95                 "Cannot initialize solr_date from %s object" % type(v))
     96 
     97     @staticmethod

SolrError: Cannot initialize solr_date from <type 'list'> object

This appears to be because of the code here, which assumes that date fields are never multi-value:

def _prepare_docs(self, docs):
    prepared_docs = []
    for doc in docs:
        new_doc = {}
        for name, value in list(doc.items()):
            # XXX remove all None fields this is needed for adding date
            # fields
            if value is None:
                continue
            if scorched.dates.is_datetime_field(name, self._datefields):
                # This is where the code needs a tweak, I'd say:
                value = str(scorched.dates.solr_date(value))
            new_doc[name] = value
        prepared_docs.append(new_doc)
return prepared_docs

I can think of two solutions here. We can either interrogate the schema to see if the item is multi-valued, and to assume a list in that case, or we can see if we got a list, and to assume that means it's a multi-value field.

I'd be happy to implement either solution, if desired.