alastair / python-musicbrainzngs Goto Github PK

View Code? Open in Web Editor NEW

273.0 273.0 103.0 783 KB

Python bindings for Musicbrainz' NGS webservice

Home Page: http://python-musicbrainzngs.readthedocs.io/

License: Other

Python 100.00%

python-musicbrainzngs's People

Contributors

Stargazers

Watchers

Forkers

sampsyo marineam scottr thomasvs laarmen ianmcorvidae motivator mineo smurfix krbaker rinbo-consulting palli81 doskir jsternberg inytar yasound mhtruong fructuscode phw redapple cloudtunes victorakabutu shridharmishra4 lucaliz uservidya madclicker testvidya11 navap imclab rbi13 kaninfod jdetrey vidyar rageshkrishna chrisnolan1992 paulshannon ric03uecs jesseweinstein frewsxcv rembo10 siebenschlaefer ruippeixotog dufferzafar arlevi felixdasgupta agateblue digideskio stefanor projectrecommend mediakraken-dependancies wuvt jonathanholvey pranavg189 itaybb fat84 bernd-wechner mhendu rikrd ohikuy lolo211017 gorgobacka matthewmessmer freso flesner horrendus rudynutbeij pythonthings music-apps hashhar sundarnagarajan tanupoo viv95 aereaux kishorkunal-raj dosoe blwuer simonhova spellew ritiek orfium aidanlw17 loveisgrief heavytony2 jordanlevy99 nikhilsebastiank astrojuanlu rauldav tct123 jimpatterson louson eigenric iq-scm fortefrankie whatsnowplaying gerion0 kon-sv

python-musicbrainzngs's Issues

automatic rate limiting

There is a rate limit to the server. queries should be automatically limited if they happen too quickly. There should be an option to turn this off for people who run a local server

Catch socket errors that occur during read (not just open)

Currently, we use the _safe_open function to catch lots of errors when opening a URL and retry when necessary. For example, if we see a "connection reset by peer" error during URL open, it gets retried rather than propagated to the application.

However, errors like this can also occur during the data transfer, not just at opening. The call to message.read() inside mbxml.parse_message can raise socket errors that go unhandled.

The easy way to address this would be to move the read() call into _safe_open and make that function return a string instead of a file-like object.

Handle request errors

It would be useful to have the library make communication errors with the server "nicer" by avoiding the exposure of lower-level exceptions. For example, XML parse errors should maybe be translated into something like MalformedMBResponseError exceptions; HTML timeouts and such could be turned into ServerBusyErrors or the like. This would greatly reduce the headaches involved for clients trying to implement robust queries.

At the same time, perhaps the library should be responsible for retrying under certain conditions (e.g., after 502 errors)?

Make entity-list fields entitys instead

e.g., instead of having release["medium-list"] we should have release["mediums"]

user ratings, tags

show tags and ratings given by the user who is authenticated. Should be an error to ask for them if no login info has been given

ext:score support?

I looked around for a way to get the score back from various searches but I don't see one.

It looks like adding "{http://musicbrainz.org/ns/ext#-2.0}score" to the list of attributes in, for example, parse_recording will give me the attribute score.

I messed around with trying to fix the attributes with namespace but I'm too dim to figure it out (as is being done with ws:recording, etc).

Could someone more versed in this maybe add support for the ext:score attributes? They are very handy. Or maybe there's already a way and I'm just not seeing it?

Thanks!

Change user agent

clients should be able to set the user agent that gets sent with requests

Multiple IPI

Moving from 1 to a list: http://tickets.musicbrainz.org/browse/MBS-2532

Cover art archive

We should support the new cover art archive API. Either as part of pymb, or another library

oauth support

Give error on (now) incorrect usage of ratelimiting method

The change to the rate limiting code means that if you call it with 2 integers like you used to, it breaks quite badly. Fix this.

Documentation

Ensure all docs are present and consistent

404 while fetching by ID should not be a simple ResponseError

Currently when issuing get_releases_by_discid I get a ResponseError for any of the status codes 400, 404 and 411.

However, I do think a 404 is quite distinct from other ResponseErrors.
When checking for a discid I want to know if the disc ID is not found on the server or if there really was a response error.

That should either be a None as a return value or an Exception distinguishable from ResponseError (without having to check the cause or message). It can be a derived class, though.

make sure releases contain documentation and examples

It's been pointed out that the 0.3 tarball on pypi doesn't have docs or examples. These need to be added to setup.py, including the re-generation of docs before build.

Encode Unicode search query terms

If Unicode arguments are passed to the search function, the library eventually dies in urllib.urlencode(), which only support byte stings. This library should encode arguments (using UTF-8, like the old library) when building the request.

Add support for attributes for aliases

Background: An alias-list element, which includes multiple alias elements, are included on artist, label or work entities when they are requested with the aliases include. For example: http://musicbrainz.org/ws/2/artist/0e43fe9d-c472-4b62-be9e-55f971a023e1?inc=aliases

Currently alias-list elements are treated as list of strings via parse_element_list, however each alias element can have one of several attributes, from the schema:

    <define name="def_alias">
        <element name="alias">
            <optional>
                <attribute name="locale">
                    <ref name="def_iso-3166-2" />
                </attribute>
            </optional>
            <optional>
              <attribute name="sort-name">
                <text />
              </attribute>
            </optional>
            <optional>
              <attribute name="type">
                <text />
              </attribute>
            </optional>
            <optional>
              <attribute name="primary">
                <text />
              </attribute>
            </optional>
            <optional>
              <attribute name="begin-date">
                <ref name="def_incomplete-date"/>
              </attribute>
            </optional>
            <optional>
              <attribute name="end-date">
                <ref name="def_incomplete-date"/>
              </attribute>
            </optional>
            <text/>
        </element>
    </define>

In particular the locale and primary attributes are critical to being able to select the most appropriate alias for a given language.

I would like to change this by introducing a parse_alias_list function that would return a list of dictionaries and as such would more closely follow the XML schema, however such a change will break any software that uses the existing alias-list implementation.

Label comment

New issue fixed on ws/2: http://tickets.musicbrainz.org/browse/MBS-4467
need to make sure we show this field

object wrapper for query includes

Python-musicbrainz2 has classes for encapsulating what entities to include with a query, e.g. http://users.musicbrainz.org/~matt/python-musicbrainz2/html/musicbrainz2.webservice.TrackIncludes-class.html

We could consider having the same for queries

Lyrics language for works

New schema change: http://tickets.musicbrainz.org/browse/MBS-1798

suggest putting code into a package namespace; e.g. musicbrainzngs

Field access on returned data

It'd be neat to be able to access information in the returned data as fields as well as dictionary keys:

release.title

instead of (or in addition to)

release["title"]

live-span is parsed wrong

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
<artist type="Group" id="952a4205-023d-4235-897c-6fdb6f58dfaa">
<name>Dynamo Go</name><sort-name>Dynamo Go</sort-name>
<life-span><begin>2005-06</begin></life-span>
</artist></metadata>

becomes

{'artist': {'sort-name': 'Dynamo Go', 'type': 'Group', 'id': '952a4205-023d-4235-897c-6fdb6f58dfaa', 
'2005-06': '',
'name': 'Dynamo Go'}}

please note the second line

artist credits

inc=artist-credits, valid for releases and recordings. Should provide xml->dict as well as combining the credit back together

Avoid Bad Request when searching for "AND" or "OR"

The search_* functions' keyword arguments are escaped for inclusion in Lucene queries, but strings that look like boolean operators (e.g., "AND" and "OR" in upper case) still work as boolean operators. This can cause undesired behavior when someone intends to actually search for one of these words, or, more seriously, it leads to an HTTP 400 error ("Bad Request") when the query seems malformed. For example, if OR appears at the end of a query part (as in album:(PORTLAND, OR)), an error occurs.

I can think of three ways of addressing this:

Lower case all keyword arguments so they can't contain operators. (The query parameter can still be used if the user actually wants to use boolean operators.)
Detect incorrect usage of boolean operators and throw an error on the client side.
Leave it as is, but document the behavior.

I think the first seems the most reasonable. I'll implement it unless anyone has an objection.

Internal consistency

Ensure all methods have a similar way of calling them, and return similar looking objects

pack must not have prefix "python-"

The package as named in setup.py must not have the prefix "python-". This is the python universe, no need to say it again.

Automatically add required includes

Some includes require other includes to be present. E.g. puid requires recordings. We should consider adding these checks in ourselves (because otherwise we let people make an invalid request). An alternative could be to automatically add the includes required to make a request valid - e.g. if someone adds puid we automatically add recordings.

validate rate limiting values

if you call set_rate_limit(0,0) you get a Div by 0 error. if you call set_rate_limit(1,0) it hangs forever, trying to complete 0 requests in 1 second.

Vinyl track numbers

New field <number>: http://tickets.musicbrainz.org/browse/MBS-842

browse requests

"Browse requests are a direct lookup of all the entities directly linked to another entity"
for example, all releases given a label.

Also consider an object for valid links for a particular browse request (like #3)

Rework examples/demos

Move examples from a single file to a series of demos that use all the features that are available

Features that should be demonstrated:

User collections

add, remove, list

Caching/304 Not Modified support

http://codereview.musicbrainz.org/r/1759/ is hot off the presses, but it means (once merged) that applications can potentially save some bandwidth (for themselves and musicbrainz both) by utilizing HTTP caching (ETags specifically). We should support this, or even if that codereview doesn't get merged, eventually someone will solve http://tickets.musicbrainz.org/browse/MBS-358 and we'll want it for Last-Modified and If-Modified-Since.

Make common length field that uses track length or recording length

When requesting the track-list in some cases the length parameter will only be parsed correctly if it is inside a recording element
Example: calling
musicbrainzngs.get_release_by_id("7118801c-cb38-43a3-a76a-b25ee81769bd",["artists","release-groups","media","recordings"])

will not have a length on most of the tracks. The xml response has length parameters for all tracks but they are outside of the recording elements.

This should be fixable by parsing the length right inside the track element.

include libdiscid binding

I was using libdiscid through python-musicbrainz2. It would be nice to have something similar here.

The implementation in Pymb2 was in musicbrainz2.disc and one could use readDisc(devicename) to get the discID from a cd in a drive.

Always return Unicode strings

I noticed recently that the strings returned from our library are sometimes bytes and sometimes Unicode. Due to ElementTree's default behavior, only those strings that are non-ASCII are returned as Unicode objects. For example:

>>> rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]
>>> rec['title']
u'\u2766 (Piano)'
>>> rec['release-list'][0]['title']
'An Awesome Wave'

The recording title, which has a "special" character in it, is a unicode object. The release title, which is all ASCII, is a str object. For consistency's sake (and for an eventual Python 3 port), the library should always return unicode objects.

Anyone have any bright ideas about the best way to go about addressing this? (I have a nagging sensation that we might have discussed this in the past, but I can't remember if we came to a conclusion about what to do.)

object wrapper for search terms

ext:score not properly exported into result.

It seems the parsing of the ext:score attribute is not handled correctly and are never exported into the resulting dictionary object.

I haven't found the actual problem myself yet due to a lack of time. I will try to look into it later this week.

various artists

from the docs:

various-artists include only those releases where the artist appears on one of the tracks,
but not in the artist credit for the release itself (this is only valid on a
/ws/2/artist?inc=releases request).

unit tests

encode/check non-ascii input

Ascii input is no problem.

Unicode input works throughout the code currently (at least I haven't found problems), also because of using unicode literals in _do_mb_search and conversion from unicode to utf8 for the output in _mb_request (see #28)

However, we don't have any checking or conversion on the input. We just expect everything to be in unicode or ascii.
Just using sys.argv does not generate unicode strings and other input might also have problems.

So when we decide on handling unicode strings in the library itself we have to encode non-ascii strings to unicode on input.

Otherwise every function must be prepared to use non-ascii strings AND unicode.

Right now I only see _do_mb_search handling user input that is possibly non-ascii. So we probably should convert there.

Additionally we should check how things change when we try to support Python3 (additionally).

fill in "missing" track information

For space reasons the musicbrainz webservice skips filling in track details that can be inherited from the recording (if the recording and track don't differ)

We should fill in these values again, so that we don't need to check one element to see if it exists before falling back to another one.

For some examples, see:
http://test.musicbrainz.org/ws/2/release/5e3524ca-b4a1-4e51-9ba5-63ea2de8f49b?inc=recordings (track name)
https://beta.musicbrainz.org/ws/2/release/704b7bbd-ffdb-4e01-b211-713d0506ba85?inc=recordings+artists+artist-credits (artist credits)
https://beta.musicbrainz.org/ws/2/release/5dc6c088-0b65-4501-90e5-2b07d60618a2?inc=artists+recordings+artist-credits (compare to previous)

should allow setting a custom User-Agent string

In musicbrainz.py, I see a hard-coded _useragent value -- since MusicBrainz filters by User Agent (and is currently having lots of problems with the rather generic python-musicbrainz2/0.7.3 user agent!) it should be possible to set the User Agent when using this library.

Expose the musicbrainz server error on HTTP400

That is, the contents of <error><text>Invalid mbid.</text></error>

automatic paging

Browse requests and searches support paging. We should return from these requests an object that gives the results, with an easy method to call to get the next set of results.

An option might be to make these responses iterable so that you can just call next() on them and paging will happen magically in the background

plurar functions and parameters should have plural names

E.g:

browse_artist -> browse_artists (same for all browser_* funktions)
get_*_by_id(... release_status=[], release_type=[]) should be release_stati and release_types

filter by release type, status

"Any query which includes release-groups in the results can be filtered to only include release groups of a certain type"

AttributeError: 'etree' object has no attribute 'ParseError'

File "query.py", line 14, in main
print m.get_recordings_by_puid("070359fc-8219-e62b-7bfd-5a01e742b490")
[...]
File "python-musicbrainz-ngs/musicbrainz.py", line 576, in _mb_request
except etree.ParseError, exc:
AttributeError: 'module' object has no attribute 'ParseError'

using Python 2.6

Can't get releases in a collection without authing first

If you want to get someone else's collection then you shouldn't need to log in