Giter Site home page Giter Site logo

sunburnt's Introduction

Sunburnt

Sunburnt is a Python-based interface for working with the Apache Solr search engine.

It was written by Toby White <[email protected]> for use in the Timetric platform.

Please send queries/comments/suggestions to the mailing list.

Bugs can be filed on the issue tracker.

It's tested with Solr 1.4.1 and 3.1; previous versions were known to work with 1.3 and 1.4 as well.

Full documentation can be found at http://opensource.timetric.com/sunburnt/index.html.

Dependencies

  • Requirements:

  • Strongly recommended:

    • mx.DateTime

      Sunburnt will happily deal with dates stored either as Python datetime objects, or as mx.DateTime objects. The latter are preferable, having better semantics and a wider representation range. They will be used if present, otherwise sunburnt will fall back to Python datetime objects.

    • pytz

      If you're using native Python datetime objects with Solr (rather than mx.DateTime objects) you should also have pytz installed to guarantee correct timezone handling.

  • Optional (only to run the tests)

sunburnt's People

Contributors

arafalov avatar davidjb avatar evansd avatar foobacca avatar helix84 avatar lawlesst avatar mehaase avatar mlissner avatar ogrisel avatar pankajkgarg avatar rboulton avatar rlskoeser avatar simon-liu avatar tgs avatar tow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sunburnt's Issues

facet.query issues

I'm new to Python & Django, but pretty handy with Solr. I'm trying to facet on a query instead of just a field, so here's what I added:

diff --git a/sunburnt/search.py b/sunburnt/search.py
index 3eb0b6b..8aebaad 100644
--- a/sunburnt/search.py
+++ b/sunburnt/search.py
@@ -251,12 +251,13 @@ class LuceneQuery(object):


class SolrSearch(object):
-    option_modules = ('query_obj', 'filter_obj', 'paginator', 'more_like_this', 'highlighter', 'faceter', 'sorter')
+    option_modules = ('query_obj', 'filter_obj', 'facet_obj', 'paginator', 'more_like_this', 'highlighter', 'faceter', 'sorter')
    def __init__(self, interface):
        self.interface = interface
        self.schema = interface.schema
        self.query_obj = LuceneQuery(self.schema, 'q')
        self.filter_obj = LuceneQuery(self.schema, 'fq')
+        self.facet_obj = LuceneQuery(self.schema, 'facet.query')
        self.paginator = PaginateOptions(self.schema)
        self.more_like_this = MoreLikeThisOptions(self.schema)
        self.highlighter = HighlightOptions(self.schema)
@@ -283,6 +284,10 @@ class SolrSearch(object):
    def query(self, *args, **kwargs):
        self.query_obj.add(args, kwargs)
        return self
+    
+    def facet_query(self, *args, **kwargs):
+        self.facet_obj.add(args, kwargs)
+        return self

    def exclude(self, *args, **kwargs):
        self.query(~self.Q(*args, **kwargs))

Am I close? It works, but I still need to add facet_by("somefield") to get the actual facet=true added to the query. Not sure if I should try to merge that code with the above or leave them separate and just use that workaround?

execute().facet_counts.facet_queries

works, though, so it's aware of facet_queries at least in theory? Is there another way I should be using facet queries?

Thanks!

Unknown field_class 'solr.CurrencyField'

Hi,

I can't connect to solr, this error is raised.

My version of solr is the latest stable, the 3.6

si = sunburnt.SolrInterface("http://192.168.10.202:8080/solr/core0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/sunburnt/sunburnt.py", line 153, in __init__
    self.init_schema()
  File "/Library/Python/2.7/site-packages/sunburnt/sunburnt.py", line 164, in init_schema
    self.schema = SolrSchema(schemadoc)
  File "/Library/Python/2.7/site-packages/sunburnt/schema.py", line 394, in __init__
    = self.schema_parse(f)
  File "/Library/Python/2.7/site-packages/sunburnt/schema.py", line 414, in schema_parse
    name, field_type_class = self.field_type_factory(field_type_node)
  File "/Library/Python/2.7/site-packages/sunburnt/schema.py", line 442, in field_type_factory
    raise SolrError("Unknown field_class '%s'" % class_name)
sunburnt.schema.SolrError: Unknown field_class 'solr.CurrencyField'

dynamic fields

Sunburnt looks like it can't handle dynamic field definitions, or at least, I get a SolrError "No such field in current schema" when I try to add somethign to a dynamic field.

Is this something I'm doing wrong, or does Sunburnt not currently support dynamic fields? If not, will it soon?

Support date range facets

AFAICT, faceting by date ranges is not natively supported in sunburnt 0.6.

I added simple support for date range facets last year for a project I was working on, but I never got around to cleaning it up and submitting a merge request:

Example in action (see "Modification Date" facet): http://scapsync.com/search?q=flash&solrDocumentType=cve

Commited at: https://github.com/mehaase/sunburnt/commit/9d31af3b166eed5c4cb4e7bbda0dd297a47d9229

Is this something you're interested in? I was in a rush when I did it -- and it was a while back -- so it's ugly code (spread between sunburnt and my client, not cleanly factored apart) and I'm not even sure if it would merge cleanly anymore. But if you like the idea and you can provide some feedback, I'd be happy to clean it up and submit a merge request.

Add support for solr.LatLonType

The default schema.xml of the example folder of solr 3.1 is deemed invalid by sunburnt with the following exception:

Traceback (most recent call last):
 ...
  File "/usr/local/lib/python2.7/dist-packages/sunburnt/sunburnt.py", line 73, in __init__
    self.schema = SolrSchema(schemadoc)
  File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 208, in __init__
   = self.schema_parse(f)
  File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 233, in schema_parse
    raise SolrError("Invalid schema.xml: %s field_type undefined" % type)
sunburnt.schema.SolrError: Invalid schema.xml: location field_type undefined

The cause seems to be the lack of definition of a suitable datatype definition for solr.LatLonType.

sunburnt.schema SolrError

Hi,
I am using Apache solr with sunburnt. I am getting this error

sunburnt.schema.SolrError: unexpected field found in result (field name: version)

version is defined in the file as type:long. When I print field name in python, it is printed as None.

Any help would be highly appreciated.

Thanks,
Zee

As of Solr 4.8, sunburnt will fail to parse the schema

Cause: https://issues.apache.org/jira/browse/SOLR-5228 (fields and types sections have been flattened).
Tested with sunburnt 0.6 against Solr 4.8 example schema, but the Master seems to have the same XPath expressions:

Traceback (most recent call last):
  File "/Users/arafalov/Projects/solr-client-book/code/PythonClients/sunburntClient/test.py", line 5, in <module>
    solr_interface = sunburnt.SolrInterface("http://localhost:8983/solr/")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 153, in __init__
    self.init_schema()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 164, in init_schema
    self.schema = SolrSchema(schemadoc)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/schema.py", line 398, in __init__
    if self.unique_key else None
KeyError: 'id'

Support more comprehensive options on add/update

When adding to or deleting documents from Solr via the update URL, you have the ability to specify a commit inline (/solr/update?commit=true). You also have the ability to block until the new index has been flushed to disk (waitFlush) or until a new searcher is made available (waitSearcher). It would be nice to have these options in sunburnt on server.add and server.delete (perhaps as optional arguments).

See here for more info:

http://wiki.apache.org/solr/UpdateXmlMessages
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/AbstractUpdateRequest.html#setAction(org.apache.solr.client.solrj.request.AbstractUpdateRequest.ACTION, boolean, boolean)

Index-time boost

Index-time boost can be applied to document and to field - http://wiki.apache.org/solr/UpdateXmlMessages
I implemented it as _boost and _boost_fieldname for array, but this naming won't work for objects - therefore object tests are commented.
Any better ideas for objects?

diff --git a/sunburnt/schema.py b/sunburnt/schema.py
index 4d39cb9..d561273 100644
--- a/sunburnt/schema.py
+++ b/sunburnt/schema.py
@@ -518,12 +518,15 @@ class SolrUpdate(object):
         self.schema = schema
         self.xml = self.add(docs)

-    def fields(self, name, values):
+    def fields(self, name, values, boost=None):
         # values may be multivalued - so we treat that as the default case
         if not hasattr(values, "__iter__"):
             values = [values]
         field_values = [self.schema.field_from_user_data(name, value) for value in values]
-        return [self.FIELD({'name':name}, field_value.to_solr())
+        attrs = {'name':name}
+        if boost is not None:
+            attrs["boost"] = str(boost)
+        return [self.FIELD(attrs, field_value.to_solr())
             for field_value in field_values]

     def doc(self, doc):
@@ -534,9 +537,14 @@ class SolrUpdate(object):
         if not doc:
             return self.DOC()
         else:
-            return self.DOC(*reduce(operator.add,
-                                    [self.fields(name, values)
-                                     for name, values in doc.items()]))
+            body = reduce(operator.add,
+                          [self.fields(name, values, doc.get("_boost_"+name))
+                           for name, values in doc.items()
+                           if not name.startswith("_boost")])
+            if "_boost" in doc:
+                return self.DOC(boost=str(doc["_boost"]), *body)
+            else:
+                return self.DOC(*body)

     def add(self, docs):
         if hasattr(docs, "items") or not hasattr(docs, "__iter__"):
diff --git a/sunburnt/test_schema.py b/sunburnt/test_schema.py
index e5d3209..f59ea05 100644
--- a/sunburnt/test_schema.py
+++ b/sunburnt/test_schema.py
@@ -220,6 +220,14 @@ class D_with_callables(object):
     def my_arse(self):
         return self._my_arse

+#class D_with_boost(object):
+#    def __init__(self, int_field, text_field, boost=None, boost_int_field=None):
+#        self.int_field = int_field
+#        self.text_field = text_field
+#        if boost:
+#            self._boost = boost
+#        if boost_int_field:
+#            self._boost_int_field = boost_int_field

 update_docs = [
     # One single dictionary, not making use of multivalued field
@@ -263,6 +271,22 @@ update_docs = [
     # Check that strings aren't query-escaped
     (D(1, "a b", True),
      """<add><doc><field name="int_field">1</field><field name="text_field">a b</field></doc></add>"""),
+
+
+    # Seriaize _boost for whole document
+    ({"int_field":1, "text_field":"a b", "_boost": 2.5},
+     """<add><doc boost="2.5"><field name="int_field">1</field><field name="text_field">a b</field></doc></add>"""),
+
+#    (D_with_boost(1, "a b", 2.5, None),
+#     """<add><doc boost="2.5"><field name="int_field">1</field><field name="text_field">a b</field></doc></add>"""),
+
+    # Seriaize _boost for single field
+    ({"int_field":1, "text_field":"a b", "_boost_int_field": 2.5},
+     """<add><doc><field boost="2.5" name="int_field">1</field><field name="text_field">a b</field></doc></add>"""),
+
+#    (D_with_boost(1, "a b", None, 2.5),
+#     """<add><doc><field name="int_field" boost="2.5">1</field><field name="text_field">a b</field></doc></add>"""),
+
     ]

 def check_update_serialization(s, obj, xml_string):

Change the printing of schema objects

for facet,v in sorted(si.schema.fields.items()):
    print facet, v
    print type(v)
    print isinstance(v, sunburnt.schema.SolrFieldType_int)

Type will print FIELD_NAME <sunburnt.schema.SolrFieldType_int object at 0x1a4a8d0> but

    print isinstance(v, sunburnt.schema.SolrFieldType_int)
AttributeError: 'module' object has no attribute 'SolrFieldType_int'

Correct way is print isinstance(v, sunburnt.schema.SolrIntField). So just incase anyone thinks that SolrFieldType_int exists.

Make sure invalid XML characters are stripped

Invalid XML chars cause lxml to throw

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Pysolr fixes this by filtering away special characters (see django-haystack/pysolr#87).

As the library consumer may not know that these characters aren't allowed when putting stuff into solr, sunburnt should do the same.

Feel free to close this issue if you think this isn't supposed to be sunburnts job to do.

Optimize filter queries

The way sunburnt uses filter queries (fq parameter) is far from optimal from caching point of view.

Solr allows to use more than one fq parameter to provide many filter queries which are executed and cached separately and then intersected using very fast algorithms.

Sunburnt does not give possibility to build a query with more than one fq parameter.

For example:

`````` si.query().filter(field1=True).filter(field2=False)will produce something like this:?q=:&fq=field1:true AND field2:false```
but, from caching point of view, it would be much better to produce something like this:
```?q=:&fq=field1:true&fq=field2:false```

Both cases will give the same results, but in first case Solr would cache and reuse resultset for field1:true AND field2:false as one. In second case there would be 2 separate resultsets cached and reused for field1 and field2.

After changing the query like this:
si.query().filter(field1=True).filter(field2=True)
the cache for field1:true would be already populated and reused requiring only executing query and populating cache for field2:true.

This change would still allow to build a query with "AND" in fq like this:
si.query().filter(si.Q(field1=True) & si.Q(field2=False))

DisMax Support

Currently sunburnt seems has no support for Solr's DisMax query mode. However there is a very useful feature in DixMax mode which you can specify "qf" parameter to search for multiple fields. In order to achieve that with the current version, I have to do lots of AND and OR operations like "(title:foo OR body:foo) AND (title:bar OR body:bar) ...". I have more than 20 fields to be searched and this creates lots of overheads when the number of keywords grows. So is there any plan to support that?

Proposing a PR to fix a few small typos

Issue Type

[x] Bug (Typo)

Steps to Replicate and Expected Behaviour

  • Examine docs/queryingsolr.rst and observe undescores, however expect to see underscores.
  • Examine docs/queryingsolr.rst and observe stylye, however expect to see style.
  • Examine sunburnt/walktree.py and observe previoulsy, however expect to see previously.
  • Examine docs/connectionconfiguration.rst and observe depnding, however expect to see depending.
  • Examine docs/contributing.rst and observe contributro, however expect to see contributor.

Notes

Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.

https://github.com/timgates42/sunburnt/pull/new/bugfix_typos

Thanks.

SorlResponse not show query params

Hi,
I have a ask to do. in doc exist a paragraph where is explained the SolrResponse methods, one those methods is response.params.

I did a query to my solr instance with one parameter, but the params method not return any paramater, It's supposed this work with this behavior?

In [11]: res = si.query(id='500428550').execute()

In [12]: res.params

In [13]: res
Out[13]: <sunburnt.schema.SolrResponse at 0x10ef8a550>

In [14]: print res
1 results found, starting at #0

Thx

/Yago

Variable set but never used

The serialize_term_queries method sets but never uses a variable called field:

def serialize_term_queries(self, terms):

Given that no one has noticed this before, I presume it's just unnecessary code rather than a more severe bug.

Allow boost_relevancy() to use the same queries as query()

I have a query like this:

q = query.boost_relevancy(boost1, field1__lte=1000000).\
    boost_relevancy(boost2, field2__lte=10)

However, an exception is thrown saying that the fields don't exist in the schema. This is because the __lte part is not being taken into account.

There is already code in add() for checking field names without the __lte part, and by changing add_boost() to use this code, the exception goes and the query runs correctly:

def add_boost(self, kwargs, boost_score):
    for k, v in kwargs.items():
        try:
            field_name, rel = k.split("__")
        except ValueError:
            field_name, rel = k, 'eq'
        field = self.schema.match_field(field_name)
        if not field:
            if (k, v) != ("*", "*"):
                # the only case where wildcards in field names are allowed
                raise ValueError("%s is not a valid field name" % field_name)
        elif not field.indexed:
            raise SolrError("Can't query on non-indexed field '%s'" % field_name)
        value = field.instance_from_user_data(v)
    self.boosts.append((kwargs, boost_score))

Apologies for not providing a patch/pull request, but I'll try to follow up after next week -- for now I have a serious deadline to meet!

filtering based on dictionary key value pairs

When creating a query object I'm adding a filter() object to the query in the form of a dictoinary like so:

drilldown = {"site":"www.reddit.com","name":"foo"}

my expectation for this query would be for it to construct a ad add the parameters

fq=site:www.redit.com+AND+name:foo

to the query string but what I see happens is

fq=www.redit.com+AND+foo

In my opinion it shuld be doing a fielded search on to drill down further on the query results.

Thanks,
Alp

Overly brittle to bad data

I've regularly had sunburnt tripping over bad data and just flipping out. I added a little info to help debugging with this patch.

I'm wondering if it might be worth having a "non-strict" mode where it just makes the field value null rather than throwing an exception. This would suit my use case fine. It could be set as an argument to the constructor for SolrInterface, or maybe just on a per-query basis.

An example of the data I trip up on is:


I could have a go at writing this myself, but thought I'd just check if you'd be up for a "non-strict" mode.

Fallback to POST when GET query is too long.

With long & complex queries, the URL constructed by sunburnt will become 1000s of characters long, causing the query to fail.

We should fall back to POSTing queries in this case.

We still want to preserve GET queries where possible, they can be cached much better at the HTTP layer.

solr syntax for searching everything is incorrect

in search.py (currently lines 449-450):

if 'q' not in options:
        options['q'] = '*' # search everything

This generates an error response from Solr. I believe it should actually be:

if 'q' not in options:
        options['q'] = '*:*' # search everything

can't query on UUID field

Hi, I'm unable to search on a UUIDField. Is this by design? (or am I just stupid and doing it wrong?!) I couldn't see any examples in the docs or from google search). You help is appreciated. Thanks.

in schema.xml (field "entryid" is defined as a solr.UUIDField):
<field name="entryid" type="uuid" indexed="true" stored="true" multiValued="false" required="true"/>
(this field in my config is also the uniqueKey)

running a simple "entryid:018596d0-2077-4089-a676-b8cfc72b76b1" query from the solr web admin interface returns the correct doc.

in python:

r=si.query(entryid='018596d0-2077-4089-a676-b8cfc72b76b1').execute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/sunburnt.py", line 219, in query
    return q.query(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 387, in query
    newself.query_obj.add(args, kwargs)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 296, in add
    self.add_exact(field_name, v, terms_or_phrases)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 320, in add_exact
    this_term_or_phrase = term_or_phrase or self.term_or_phrase(inst.value)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 347, in term_or_phrase
    return 'terms' if self.default_term_re.match(arg) else 'phrases'
TypeError: expected string or buffer

also tried as a python UUID:

r=si.query(entryid=uuid.UUID('018596d0-2077-4089-a676-b8cfc72b76b1')).execute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/sunburnt.py", line 219, in query
    return q.query(*args, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 387, in query
    newself.query_obj.add(args, kwargs)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 296, in add
    self.add_exact(field_name, v, terms_or_phrases)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 320, in add_exact
    this_term_or_phrase = term_or_phrase or self.term_or_phrase(inst.value)
  File "/usr/local/lib/python2.6/dist-packages/sunburnt/search.py", line 347, in term_or_phrase
    return 'terms' if self.default_term_re.match(arg) else 'phrases'
TypeError: expected string or buffer

Adding maxScore to result

Hello -

I'm a new sunburnt user - so far it is working amazingly well for my project and I look forward to following the project.

For the work I'm doing, I've needed maxScore from SOLr, which was easy for me to add to the SolrResult object. (see line with self.maxScore below, that's the only change)

(sunburnt/schema.py):

class SolrResult(object):
    def __init__(self, schema, node):
        self.schema = schema
        self.name = node.attrib['name']
        self.numFound = int(node.attrib['numFound'])
        self.maxScore = node.attrib['maxScore'] # this is the only new line
        self.start = int(node.attrib['start'])
        self.docs = [schema.parse_result_doc(n) for n in node.xpath("doc")]

For now I am just ensuring that my virtualenv has that line in its sunburnt install, but would obviously love to see support added to the library at some point. I haven't had a chance to write any tests yet, but notice that perhaps a test of SolrResult initialization could be useful in general. I'm happy to work on that and other tests to support the project - just figured I'd send along the small issue comment to get started.

Best,
Megan.

Error when using commitWithin parameter

sunburnt throws an error, when commitWithin parameter is provided, while adding a new document.

For example

import sunburnt
si = sunburnt.SolrInterface("http://localhost:8983/solr/")
si.add({"id": 432, "text": "Hello World!"}, commitWithin=10000)

The error message thrown is

SolrError: ({'transfer-encoding': 'chunked', 'status': '400', 'content-type': 'application/xml; charset=UTF-8'}, '\n\n4000For input string: "10000.0"400\n\n')

This is happening because commitWithin parameter is converted to float before the request is sent.

DeprecationWarning in dates.py

sunburnt/dates.py:80: DeprecationWarning: integer argument expected, got float
  return datetime.datetime(**kwargs)

It looks like the seconds parameter is initialized to a float on line 43.

Escaping spaces = only phrase matches

The escape_for_lqs_term() method is escaping spaces to treat them as literals, so by default it's only matching exact phrases. Changing word order gives wildly different results. Is that really the expect behavior? Or do I possibly have an issue somewhere else in my schema?

Required fields are unspecified

I get this error when I try to add documents to Solr:

SolrError at /admin/page/page/1/
These required fields are unspecified:
['app_type', 'description']

What is odd about it is the documents still get added or updated without a problem.

I am using sunburnt==0.6

I working in django. here is the specific model:

class Page(models.Model):
    title = models.CharField(max_length=200)
    slug = models.SlugField(unique=True)
    description = models.TextField()

    def __unicode__(self):
        return self.title

    def app_type(self):
        return 'page'

Here is my schema file:

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="myapp" version="1.5">
        <types>
                <fieldType name="string" class="solr.StrField" multiValued="true" />
                <fieldType name="text" class="solr.TextField" />
        </types>
        <fields>
                <field name="pk" type="string" stored="true" required="true" />
                <field name="app_type" type="string" stored="true" required="true" />
                <field name="title" type="string" indexed="true" stored="true"/>
                <field name="slug" type="string" stored="true"/>
                <field name="description" type="text" indexed="true" stored="true" required="true" />
        </fields>
</schema>

Not suitable exception

When make request to the address where solr is not bind, then rise AttributeError exception - object has no attribute 'makefile' from /usr/lib/python2.7/httplib.py in init, line 346. Might be better to throw a ConnectionException?

The non required parameter defaultSearchField in solr 3.6 causes search error

Comments from the schema.xml of solr 3.6

<!-- field for the QueryParser to use when an explicit fieldname is absent
DEPRECATED: specify "df" in your request handler instead. 

<defaultSearchField>text</defaultSearchField> -->

I had to uncomment this parameter in order to avoid:

/usr/local/lib/python2.7/dist-packages/sunburnt/search.pyc in add_exact(self, field_name, values, term_or_phrase)
    315             else:
    316                 raise SolrError("If field_name is '*', then only '*' is permitted as the query")
--> 317         insts = [field.instance_from_user_data(value) for value in values]
    318         for inst in insts:
    319             if isinstance(field, SolrUnicodeField):

AttributeError: 'NoneType' object has no attribute 'instance_from_user_data'

Add solr.TrieDateField

'solr.TrieDateField' is missing from solr_data_types in sunburnt.schema.SolrSchema

I think it just needs to be added as SolrDateField type.

Thanks,
Dan

Needs more documentation.

Please please write some more (or any, really) documentation. I have been trying for hours to find how I can filter on "fieldname:value", and it appears to be impossible. Is there a way to do that currently?

LocalParams syntax

Hi !

I was wondering if it would be easy to add support for localparams syntax, especially when dealing with facetting (multi-select facets as described in solr docs: http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams )

Something like this should work to retrieve doctype facets without taking doctype:pdf filter into account:

s.query('query').filter('{!tag=dt}doctype:pdf').facet_by('{!ex=dt}doctype')

but currently it won't work because sunburnt will try to get the "{!ex=dt}doctype" field in the schema which obviously doesn't exist.

One workaround is to monkey patch the schema.match_field method to strip the localparam tag but i have no idea if this is a good or bad thing :)

How would you do it ?

Sunburnt not compatible with newer versions of Solr XML output

>>> import httplib2, sunburnt
>>> h = httplib2.Http()
>>> si = sunburnt.SolrInterface('http://localhost:8983/solr/items', http_connection=h)
>>> si.query(identifier='16dhnwtakze4po5x').execute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/aengus/.virtualenvs/mini_itemsearch/local/lib/python2.7/site-packages/sunburnt/search.py", line 599, in execute
    result = self.interface.search(**self.options())
  File "/home/aengus/.virtualenvs/mini_itemsearch/local/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 213, in search
    return self.schema.parse_response(self.conn.select(params))
  File "/home/aengus/.virtualenvs/mini_itemsearch/local/lib/python2.7/site-packages/sunburnt/schema.py", line 510, in parse_response
    return SolrResponse(self, msg)
  File "/home/aengus/.virtualenvs/mini_itemsearch/local/lib/python2.7/site-packages/sunburnt/schema.py", line 646, in __init__
    details['responseHeader'] = dict(details['responseHeader'])
KeyError: 'responseHeader'

This is with solr 2.4.0. It is caused by the fact that when Sunburnt makes this call, it calls: http://localhost:8983/solr/items/select/?q=identifier%3A16dhnwtakze4po5x . This returns data using <responseHeader> tags, instead of between <lst name="responseHeader"></lst>

Specifying a version of 2.2 functions as a work-around

Valid type list is missing SpatialRecursivePrefixTreeFieldType

Introduced in Solr 4.0 and is used in the example schema since.

Exception caused:

Traceback (most recent call last):
  File "/Users/arafalov/Projects/solr-client-book/code/PythonClients/sunburntClient/test.py", line 5, in <module>
    solr_interface = sunburnt.SolrInterface("http://localhost:8983/solr/")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 153, in __init__
    self.init_schema()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/sunburnt.py", line 164, in init_schema
    self.schema = SolrSchema(schemadoc)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/schema.py", line 394, in __init__
    = self.schema_parse(f)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/schema.py", line 414, in schema_parse
    name, field_type_class = self.field_type_factory(field_type_node)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sunburnt/schema.py", line 442, in field_type_factory
    raise SolrError("Unknown field_class '%s'" % class_name)
sunburnt.schema.SolrError: Unknown field_class 'solr.SpatialRecursivePrefixTreeFieldType'

Empty strings shouldn't be indexed

If you create an object with attributes that are empty strings, they'll get indexed.

So:

item.attribute = ''

Will result in an item in your index with attribute = ''.

This is very easy to fix since there's already a check to make sure that attributes aren't None. All that needs to happen is for the line to also say, and a != ''.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.