Giter Site home page Giter Site logo

scorched's People

Contributors

ale-rt avatar chronial avatar delijati avatar flooie avatar lujh avatar mamico avatar mehaase avatar mghh avatar mlissner avatar quinot avatar rlskoeser avatar sweh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

scorched's Issues

results_as() does nothing

Just found this bug while browsing the code: The results_as function ignores its keyword argument and just returns a copy of the query.

def results_as(self, constructor):

since this function is also documented nowhere, maybe just remove it?

pivoter fields are wiped by options()

I've stumble on this annoying bug here on python 3.4.

import unittest
from scorched.search import SolrSearch


class TestOptionsMethodWipesPivots(unittest.TestCase):
    def test_there_is_a_problem_with_pivot_by_with_facet(self):
        facet = 'facet'
        pivot = 'pivot'

        search = SolrSearch(None)
        search = search.facet_by(facet, mincount=1).pivot_by([pivot, facet], mincount=1)
        self.assertIn(pivot, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

        options = search.options()
        self.assertIn('facet.pivot', options)
        self.assertEqual(options['facet.pivot'], 'facet,pivot')

        self.assertIn(pivot, search.pivoter.fields)
        # Equals True
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

    def test_there_is_a_problem_with_pivot_by_even_without_facet(self):
        facet = 'facet'
        pivot = 'pivot'

        search = SolrSearch(None)
        search = search.pivot_by([pivot, facet], mincount=1)
        self.assertIn(pivot, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

        options = search.options()
        self.assertIn('facet.pivot', options)
        self.assertEqual(options['facet.pivot'], 'facet,pivot')

        # Equals True
        self.assertIn(pivot, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[pivot], {'mincount': 1})
        self.assertIn(facet, search.pivoter.fields)
        self.assertEqual(search.pivoter.fields[facet], {'mincount': 1})

Highlighting results at top level of SolrResponse isn't sufficient

If you do a query in the current version of scorched, you'll get back a SolrReponse object that has a bunch of properties, but the important ones are:

  • response.highlighting (a dict mapping document IDs to highlighted field values)
  • response.result (a SolrResult object with docs as a property)

response.result.docs is a list of the first N results you requested.

There's a fantastic feature for pagination and iteration that allows you to iterate a SolrResponse object, so you can do:

for r in response.results:
     print r.my_field

But if you make a query that involves highlighting, this utterly fails. What you want to do is something like:

for r in response.results:
    print r.my_field, r.my_highlighted_field

But the highlighted fields are a separate property on the SolrReponse, and aren't part of the iterated object. This makes it basically impossible to return highlighted results without pre-processing the SolrResponse to merge the highlighting attribute with the docs.

In sunburnt there was code that did exactly this, creating a solr_highlights property on every result document that contained the highlighting for that document:

if result.highlighting:
    for d in result.result.docs:
        # if the unique key for a result doc is present in highlighting,
        # add the highlighting for that document into the result dict
        # (but don't override any existing content)
        # If unique key field is not a string field (eg int) then we need to
        # convert it to its solr representation
        unique_key = self.schema.fields[self.schema.unique_key].to_solr(d[self.schema.unique_key])
        if 'solr_highlights' not in d and \
               unique_key in result.highlighting:
            d['solr_highlights'] = result.highlighting[unique_key]

I think we need something like this or else highlighting is very difficult to use and requires that the calling code do some wonky merging.

I think the easiest place to fix this is in the to_json method of the SolrResponse. Maybe this can be fixed with a constructor, but I haven't looked into that yet.

Grouping has a few issues

I discovered some issues with how result grouping works. The implementation of groups doesn't support a few different things:

  1. ngroups must be set to true. If it's not the query will crash.

  2. 'group.format' cannot be set to simple. If it is, the query will crash.

  3. group.main doesn't work in conjunction with group.format = simple. If it's set, the query will crash.

Support for sending custom parameters to Solr

I'm not sure if there's an appetite for this, but I find it enormously useful to be able to send arbitrary parameters to Solr outside of what the API typically allows (i.e., low-level queries like what pysolr provides).

I have a function I hacked on top of sunburnt that allows me to do raw queries. For example, I can do this:

self.si.raw_query(**{'q': '*', 'caller': 'update_index'})

And that'll make a Solr request like:

http://localhost:8983/select/?q=*&caller=update_index

The main way I use this is to add a caller parameter to every request I make so that I can keep track of which ones are slow or later sort things out in the logs, but I also use it when sunburnt or lacks a parameter I need.

Any appetite for adding this into core? I can provide a PR if so.

Utf-8 search fails

Hi ,

Tried utf-8 search on text field...got following error.
Please let me know the fix.

import scorched

si = scorched.SolrInterface("http://192.168.0.115:8983/solr/unicodecore/")

for result in si.query(name="आदित्य"):
print result

Traceback (most recent call last):
File "test_sss.py", line 9, in
for result in si.query(name="आदित्य"):
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/connection.py", line 389, in query
return q.query(_args, *_kwargs)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 411, in query
newself.query_obj.add(args, kwargs)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 329, in add
self.add_exact(field_name, v, terms_or_phrases)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 346, in add_exact
this_term_or_phrase = term_or_phrase or self.term_or_phrase(inst)
File "/home/nagabhushan/nagav/lib/python2.7/site-packages/scorched-0.6-py2.7.egg/scorched/search.py", line 369, in term_or_phrase
return 'terms' if self.default_term_re.match(str(arg)) else 'phrases'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

Adding support for deep paging /cursormark queries

Hi:

I've added support for deep paging /cursormark ( see https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results ) of query results to scorched. At present it only supports iterating through the whole result set without having to explicitly fetch each page (which is done in the iterator), which I imagine would be the most common use case.

I'd like to contribute this - should I just send you the diffs ( to results.py and search.py) or open a pull request ?

best

-Simon

Edismax queries are parsed incorrectly

Example:

Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scorched import SolrInterface
>>> si = SolrInterface("http://localhost:8983/solr/mysearchengine/")
>>> si.query('foo +bar -baz').alt_parser('edismax').options()
{'defType': 'edismax', 'q': 'foo\\ \\+bar\\ \\-baz'}

Notice how valid edismax syntax is being escaped: spaces, +, and - no longer have special meaning, and so the edismax parser will not actually look at them.

The results i get from this query do not reflect my intent: I get documents that do not include bar and do include baz. The Solr log shows this query:

q=foo\+\%2Bbar\+\-baz

If I run the same query in Solr Admin (which shows correct results), then the Solr log shows this query:

q=foo+%2Bbar+-baz

I think that when using an "alternative parser", Scorched should not preprocess the query like this. It means the alternative parser doesn't even have a chance to parse anything.

Thoughts?

Exact search (double quotes) ignored

I have a weird problem. I'm querying a Solr 5.3 instance with Django through Scorched. It all works great as far as I don't ask an exact-match query. In other words,

q=something something else

returns exactly the same result as:

q="something something else"

The culprit, as far as I can see, is the actual query which Django throws at Solr. In fact, for the second case this is:

q="something+something+else"

So, in other words, the " character is escaped. Am I right? How do I tell Solr that when I query something between double quotes I want an exact match?

In the Solr admin webpage it all works well, i.e. if I search for "something something else" I get the correct result.

I'm not sure this is a Scorched problem or not. Does it have something to do with filters/tokenizers (e.g. solr.MappingCharFilterFactory)?

Solr4 join support

Any plans to support solr4 join queries?

I had an open pull request with join support for sunburnt (tow/sunburnt#88). If I can clean that up and/or reimplement to work with scorched, would a pull request for join support be welcomed?

__len__ method on SolrResponse only returns number of rows

The SolrReponse object has a __len__ method that has very simple code right now:

def __len__(self):
    if self.groups:
        return len(getattr(self.groups, self.group_field)['groups'])
    else:
        return len(self.result.docs)

The third line of that I wrote, copying the code from the last line, but both are wrong. The updated code should be:

def __len__(self):
    if self.groups:
        return getattr(self.groups, self.group_field)['ngroups']
    else:
        return self.result.numFound

This way things like paginators will be able to know the full size of the response.

Dot in the field name and query composition

My issue is fairly simple, my index contains fields that have dots (.) in it, for instance: object.id

I try to create a range query :

search.query(object.id__lt=before)

But that's not a legal way to declare field.

What I have found yet is to add a LuceneQuery manually:

search.query_obj.add_range('object.id', 'lt', before)

Is there a better way to bypass that problem ?

Default mode should be read only

This is a small thing that I just discovered while looking at the code, but if you don't supply a mode parameter to your SolrInterface, you get a read/write interface by default. The code in the __init__ method is:

    if mode == 'r':
        self.writeable = False
    elif mode == 'w':
        self.readable = False

This is a design question, but I think this would be better if the default was read only. This would require people to create writable interfaces explicitly, which seems important.

This is a breaking API change, so we'll want to consider it carefully, but it seems like a good direction to me.

Consider adding regex support

I would like to add regex support for Solr 4+. I was considering doing it as scorched.strings.RegexpString. It looks pretty straightforward. Would you consider accepting a PR (with tests) for this feature?

Include `pdate` as a date type?

Hello,

it seems that example files in recent Solr distributions define dates using a pdate, not date type in the schema.xml file:

    <fieldType name="pdate" class="solr.DatePointField" docValues="true"/>

This type is not recognised as a date type by Scorched, because date types are defined in method SolrInterface._extract_datefields in file connection.py:

    def _extract_datefields(self, schema):
        ret = [x['name'] for x in
               schema['fields'] if x['type'] == 'date']
        ret.extend([x['name'] for x in schema['dynamicFields']
                    if x['type'] == 'date'])
        return ret

Recognising pdateas a date could be achieved by replacing the two occurrences of the test:

if x['type'] == 'date'

by:

if x['type'] in ('date', 'pdate')

For now, the workaround is to rename pdate to date in the schema file.

group query support

Any plans to support result grouping?

https://cwiki.apache.org/confluence/display/solr/Result+Grouping

I might be able to take a look at implementing this, would a pull request be welcomed? Any thoughts on how you'd like to see it implemented?

I have a need for this, and was planning to start with the simple group result format, because it would require the least change in handling and displaying results.

Scorched 0.12.0 no longer compatible with httlib2 caching

Hello,
I've just come across a problem which arose after upgrading scorched from 0.11.0 to 0.12.0, in relation with the use of caching in the httplib2 module. The following code:

from scorched import SolrInterface
from httplib2 import Http
si = SolrInterface(url='http://localhost:8983/solr/XXX', http_connection=Http('/tmp'))

works with scorched <= 0.11.0, but produces the following error with 0.12.0:

File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 292, in __init__
self.schema = self.init_schema()
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 298, in init_schema
self.remote_schema_file))
File "/Users/daverio/pydev/lib/python3.5/site-packages/scorched/connection.py", line 73, in request
return self.http_connection.request(*args, **kwargs)
File "/Users/daverio/pydev/lib/python3.5/site-packages/httplib2/__init__.py", line 1176, in request
(scheme, authority, request_uri, defrag_uri) = urlnorm(uri)
File "/Users/daverio/pydev/lib/python3.5/site-packages/httplib2/__init__.py", line 148, in urlnorm
raise RelativeURIError("Only absolute URIs are allowed. uri = %s" % uri)
httplib2.RelativeURIError: Only absolute URIs are allowed. uri = GET

I haven't taken the time to investigate the problem yet. I'm not sure if I should continue using httplib2 caching.

Multi-valued date fields cannot be indexed

When you try to index an item with a multi-valued date field, you run into this error:

In [14]: sun.add(judy.as_search_dict())
---------------------------------------------------------------------------
SolrError                                 Traceback (most recent call last)
<ipython-input-14-c11bdcf59b84> in <module>()
----> 1 sun.add(judy.as_search_dict())

/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/connection.py in add(self, docs, chunk, **kwargs)
    343         ret = []
    344         for doc_chunk in grouper(docs, chunk):
--> 345             update_message = json.dumps(self._prepare_docs(doc_chunk))
    346             ret.append(scorched.response.SolrUpdateResponse.from_json(
    347                 self.conn.update(update_message, **kwargs)))

/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/connection.py in _prepare_docs(self, docs)
    319                     continue
    320                 if scorched.dates.is_datetime_field(name, self._datefields):
--> 321                     value = str(scorched.dates.solr_date(value))
    322                 new_doc[name] = value
    323             prepared_docs.append(new_doc)

/home/mlissner/.virtualenvs/courtlistener/local/lib/python2.7/site-packages/scorched/dates.py in __init__(self, v)
     93         else:
     94             raise scorched.exc.SolrError(
---> 95                 "Cannot initialize solr_date from %s object" % type(v))
     96 
     97     @staticmethod

SolrError: Cannot initialize solr_date from <type 'list'> object

This appears to be because of the code here, which assumes that date fields are never multi-value:

def _prepare_docs(self, docs):
    prepared_docs = []
    for doc in docs:
        new_doc = {}
        for name, value in list(doc.items()):
            # XXX remove all None fields this is needed for adding date
            # fields
            if value is None:
                continue
            if scorched.dates.is_datetime_field(name, self._datefields):
                # This is where the code needs a tweak, I'd say:
                value = str(scorched.dates.solr_date(value))
            new_doc[name] = value
        prepared_docs.append(new_doc)
return prepared_docs

I can think of two solutions here. We can either interrogate the schema to see if the item is multi-valued, and to assume a list in that case, or we can see if we got a list, and to assume that means it's a multi-value field.

I'd be happy to implement either solution, if desired.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.