alastair / python-musicbrainzngs Goto Github PK
View Code? Open in Web Editor NEWPython bindings for Musicbrainz' NGS webservice
Home Page: http://python-musicbrainzngs.readthedocs.io/
License: Other
Python bindings for Musicbrainz' NGS webservice
Home Page: http://python-musicbrainzngs.readthedocs.io/
License: Other
There is a rate limit to the server. queries should be automatically limited if they happen too quickly. There should be an option to turn this off for people who run a local server
Currently, we use the _safe_open
function to catch lots of errors when opening a URL and retry when necessary. For example, if we see a "connection reset by peer" error during URL open, it gets retried rather than propagated to the application.
However, errors like this can also occur during the data transfer, not just at opening. The call to message.read()
inside mbxml.parse_message
can raise socket errors that go unhandled.
The easy way to address this would be to move the read()
call into _safe_open
and make that function return a string instead of a file-like object.
It would be useful to have the library make communication errors with the server "nicer" by avoiding the exposure of lower-level exceptions. For example, XML parse errors should maybe be translated into something like MalformedMBResponseError
exceptions; HTML timeouts and such could be turned into ServerBusyError
s or the like. This would greatly reduce the headaches involved for clients trying to implement robust queries.
At the same time, perhaps the library should be responsible for retrying under certain conditions (e.g., after 502 errors)?
e.g., instead of having release["medium-list"] we should have release["mediums"]
show tags and ratings given by the user who is authenticated. Should be an error to ask for them if no login info has been given
I looked around for a way to get the score back from various searches but I don't see one.
It looks like adding "{http://musicbrainz.org/ns/ext#-2.0}score" to the list of attributes in, for example, parse_recording will give me the attribute score.
I messed around with trying to fix the attributes with namespace but I'm too dim to figure it out (as is being done with ws:recording, etc).
Could someone more versed in this maybe add support for the ext:score attributes? They are very handy. Or maybe there's already a way and I'm just not seeing it?
Thanks!
clients should be able to set the user agent that gets sent with requests
Moving from 1 to a list: http://tickets.musicbrainz.org/browse/MBS-2532
We should support the new cover art archive API. Either as part of pymb, or another library
The change to the rate limiting code means that if you call it with 2 integers like you used to, it breaks quite badly. Fix this.
Ensure all docs are present and consistent
Currently when issuing get_releases_by_discid
I get a ResponseError
for any of the status codes 400, 404 and 411.
However, I do think a 404 is quite distinct from other ResponseError
s.
When checking for a discid I want to know if the disc ID is not found on the server or if there really was a response error.
That should either be a None
as a return value or an Exception distinguishable from ResponseError
(without having to check the cause or message). It can be a derived class, though.
It's been pointed out that the 0.3 tarball on pypi doesn't have docs or examples. These need to be added to setup.py, including the re-generation of docs before build.
If Unicode arguments are passed to the search function, the library eventually dies in urllib.urlencode(), which only support byte stings. This library should encode arguments (using UTF-8, like the old library) when building the request.
Background: An alias-list
element, which includes multiple alias
elements, are included on artist
, label
or work
entities when they are requested with the aliases
include. For example: http://musicbrainz.org/ws/2/artist/0e43fe9d-c472-4b62-be9e-55f971a023e1?inc=aliases
Currently alias-list
elements are treated as list of strings via parse_element_list
, however each alias
element can have one of several attributes, from the schema:
<define name="def_alias">
<element name="alias">
<optional>
<attribute name="locale">
<ref name="def_iso-3166-2" />
</attribute>
</optional>
<optional>
<attribute name="sort-name">
<text />
</attribute>
</optional>
<optional>
<attribute name="type">
<text />
</attribute>
</optional>
<optional>
<attribute name="primary">
<text />
</attribute>
</optional>
<optional>
<attribute name="begin-date">
<ref name="def_incomplete-date"/>
</attribute>
</optional>
<optional>
<attribute name="end-date">
<ref name="def_incomplete-date"/>
</attribute>
</optional>
<text/>
</element>
</define>
In particular the locale and primary attributes are critical to being able to select the most appropriate alias for a given language.
I would like to change this by introducing a parse_alias_list
function that would return a list of dictionaries and as such would more closely follow the XML schema, however such a change will break any software that uses the existing alias-list implementation.
New issue fixed on ws/2: http://tickets.musicbrainz.org/browse/MBS-4467
need to make sure we show this field
Python-musicbrainz2 has classes for encapsulating what entities to include with a query, e.g. http://users.musicbrainz.org/~matt/python-musicbrainz2/html/musicbrainz2.webservice.TrackIncludes-class.html
We could consider having the same for queries
New schema change: http://tickets.musicbrainz.org/browse/MBS-1798
It'd be neat to be able to access information in the returned data as fields as well as dictionary keys:
release.title
instead of (or in addition to)
release["title"]
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
<artist type="Group" id="952a4205-023d-4235-897c-6fdb6f58dfaa">
<name>Dynamo Go</name><sort-name>Dynamo Go</sort-name>
<life-span><begin>2005-06</begin></life-span>
</artist></metadata>
becomes
{'artist': {'sort-name': 'Dynamo Go', 'type': 'Group', 'id': '952a4205-023d-4235-897c-6fdb6f58dfaa',
'2005-06': '',
'name': 'Dynamo Go'}}
please note the second line
inc=artist-credits, valid for releases and recordings. Should provide xml->dict as well as combining the credit back together
The search_*
functions' keyword arguments are escaped for inclusion in Lucene queries, but strings that look like boolean operators (e.g., "AND" and "OR" in upper case) still work as boolean operators. This can cause undesired behavior when someone intends to actually search for one of these words, or, more seriously, it leads to an HTTP 400 error ("Bad Request") when the query seems malformed. For example, if OR appears at the end of a query part (as in album:(PORTLAND, OR)
), an error occurs.
I can think of three ways of addressing this:
query
parameter can still be used if the user actually wants to use boolean operators.)I think the first seems the most reasonable. I'll implement it unless anyone has an objection.
Ensure all methods have a similar way of calling them, and return similar looking objects
The package as named in setup.py must not have the prefix "python-". This is the python universe, no need to say it again.
Some includes require other includes to be present. E.g. puid
requires recordings
. We should consider adding these checks in ourselves (because otherwise we let people make an invalid request). An alternative could be to automatically add the includes required to make a request valid - e.g. if someone adds puid
we automatically add recordings
.
if you call set_rate_limit(0,0)
you get a Div by 0 error. if you call set_rate_limit(1,0)
it hangs forever, trying to complete 0 requests in 1 second.
New field <number>
: http://tickets.musicbrainz.org/browse/MBS-842
"Browse requests are a direct lookup of all the entities directly linked to another entity"
for example, all releases given a label.
Also consider an object for valid links for a particular browse request (like #3)
Move examples from a single file to a series of demos that use all the features that are available
Features that should be demonstrated:
add, remove, list
http://codereview.musicbrainz.org/r/1759/ is hot off the presses, but it means (once merged) that applications can potentially save some bandwidth (for themselves and musicbrainz both) by utilizing HTTP caching (ETags specifically). We should support this, or even if that codereview doesn't get merged, eventually someone will solve http://tickets.musicbrainz.org/browse/MBS-358 and we'll want it for Last-Modified and If-Modified-Since.
When requesting the track-list in some cases the length parameter will only be parsed correctly if it is inside a recording element
Example: calling
musicbrainzngs.get_release_by_id("7118801c-cb38-43a3-a76a-b25ee81769bd",["artists","release-groups","media","recordings"])
will not have a length on most of the tracks. The xml response has length parameters for all tracks but they are outside of the recording elements.
This should be fixable by parsing the length right inside the track element.
I was using libdiscid through python-musicbrainz2. It would be nice to have something similar here.
The implementation in Pymb2 was in musicbrainz2.disc
and one could use readDisc(devicename)
to get the discID from a cd in a drive.
I noticed recently that the strings returned from our library are sometimes bytes and sometimes Unicode. Due to ElementTree's default behavior, only those strings that are non-ASCII are returned as Unicode objects. For example:
>>> rec = musicbrainzngs.search_recordings(artist='alt-j', recording='piano', limit=1)['recording-list'][0]
>>> rec['title']
u'\u2766 (Piano)'
>>> rec['release-list'][0]['title']
'An Awesome Wave'
The recording title, which has a "special" character in it, is a unicode
object. The release title, which is all ASCII, is a str
object. For consistency's sake (and for an eventual Python 3 port), the library should always return unicode
objects.
Anyone have any bright ideas about the best way to go about addressing this? (I have a nagging sensation that we might have discussed this in the past, but I can't remember if we came to a conclusion about what to do.)
similar to #3, but for search queries:
http://users.musicbrainz.org/~matt/python-musicbrainz2/html/musicbrainz2.webservice.TrackFilter-class.html
It seems the parsing of the ext:score
attribute is not handled correctly and are never exported into the resulting dictionary object.
I haven't found the actual problem myself yet due to a lack of time. I will try to look into it later this week.
from the docs:
Ascii input is no problem.
Unicode input works throughout the code currently (at least I haven't found problems), also because of using unicode literals in _do_mb_search and conversion from unicode to utf8 for the output in _mb_request
(see #28)
However, we don't have any checking or conversion on the input. We just expect everything to be in unicode
or ascii.
Just using sys.argv does not generate unicode strings and other input might also have problems.
So when we decide on handling unicode strings in the library itself we have to encode non-ascii strings to unicode on input.
Otherwise every function must be prepared to use non-ascii strings AND unicode.
Right now I only see _do_mb_search
handling user input that is possibly non-ascii. So we probably should convert there.
Additionally we should check how things change when we try to support Python3 (additionally).
For space reasons the musicbrainz webservice skips filling in track details that can be inherited from the recording (if the recording and track don't differ)
We should fill in these values again, so that we don't need to check one element to see if it exists before falling back to another one.
For some examples, see:
http://test.musicbrainz.org/ws/2/release/5e3524ca-b4a1-4e51-9ba5-63ea2de8f49b?inc=recordings (track name)
https://beta.musicbrainz.org/ws/2/release/704b7bbd-ffdb-4e01-b211-713d0506ba85?inc=recordings+artists+artist-credits (artist credits)
https://beta.musicbrainz.org/ws/2/release/5dc6c088-0b65-4501-90e5-2b07d60618a2?inc=artists+recordings+artist-credits (compare to previous)
In musicbrainz.py, I see a hard-coded _useragent value -- since MusicBrainz filters by User Agent (and is currently having lots of problems with the rather generic python-musicbrainz2/0.7.3 user agent!) it should be possible to set the User Agent when using this library.
See also: http://wiki.musicbrainz.org/XML_Web_Service/Rate_Limiting#Providing_meaningful_User-Agent_strings
That is, the contents of <error><text>Invalid mbid.</text></error>
Browse requests and searches support paging. We should return from these requests an object that gives the results, with an easy method to call to get the next set of results.
An option might be to make these responses iterable so that you can just call next() on them and paging will happen magically in the background
E.g:
"Any query which includes release-groups in the results can be filtered to only include release groups of a certain type"
File "query.py", line 14, in main
print m.get_recordings_by_puid("070359fc-8219-e62b-7bfd-5a01e742b490")
[...]
File "python-musicbrainz-ngs/musicbrainz.py", line 576, in _mb_request
except etree.ParseError, exc:
AttributeError: 'module' object has no attribute 'ParseError'
using Python 2.6
If you want to get someone else's collection then you shouldn't need to log in
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.