The mp3utensil from thefeshy

Merge MP3 files by frame

Take multiple MP3 files, and merge them - using only the frame data and not other junk.

Decide on and implement a consistent verbosity scheme

Test raised exceptions in mp3file

Dump non-mp3 data

Command-line action to dump all non-mp3 data to separate files, with appropriate labels if they can be identified (ID3 tags, unknown binary data.)

To figure out: should we separate out images from the ID3 v2 tags?

Verify CRC tags on frames

Verify that the crc tags, if present, are correct

Some parameters are supposed to be fixed within a given file - such as sample rate. Find out which ones are supposed to be fixed, and which ones players actually won't play properly. By default prevent users from joining MP3 files that would result in these parameters changing. However, allow users to override this (if nothing else it will help us make test files.)

Lastly we should allow users to split a file with invalid parameter switching like that into valid file parts again (break on parameter transitions.)

Restore "pure python" (e.g. no Numpy) support

Currently the framelist only works with Numpy; we need to add support for numpy not being available.

Identify APE tags

I read something about other types of MP3 tags, like APE. Look into this.

Move to array storage of headers instead of lists

Currently the largest amount of time is spent allocating objects, such as headers. The closest we can come to a pool allocator in python is an array, so we'll start using an array to store header values in. Hopefully this will help the continuing improvements in performance. Current targets to beat:

Numpy (best of three):
Big: 17.97
One: 0.528
Two 0.822

Python only:
Big: 21.14
One: 2.37
Two: 0.96

Support for no Numpy

Add fallback python-only processing for users who don't have numpy

Report results of mp3 scan

Scan for attributes that change and shouldn't, report status of all attributes such as copy flag, crc, bitrate, file length, etc.

Preserve ID3 tags on merge

Should be on by default, but able to be disabled via the command line.

To figure out: Should it preserve the first mp3 tag, or try to merge them somehow?

Identify poorly placed ID3v1 tags

Some ID3v1 taggers actually overwrote the last 128 bytes of the last frame with data. This means that we should check the last full frame to see if it contains an ID3v1 tag 128 bytes from its end.

Additionally, because the file could be a badly joined set of MP3 files, this 128 byte tag could be at the end of every frame, and they should all be checked.

Checking for ID3 tags at arbitrary loctations is unwise; both because it's slow an because the loose ID3v1 standard means at best we'd be relying on heuristics. But we might allow it as an option?

Add a license and copyright

Make file scanning "Chunky"

For large (1gb+) files, we probably don't want to read the whole thing at once if possible. Instead we should break scanning and
performing tasks on the file up into chunks. Ideally the program will do this automatically for large files, and support command-line override of this behavior. E.g. chunk-size=50mb (valid values are number of mb to chunk, or -1 for never chunk)

Add "string of null bytes" to list of tag types scanned for

Our real-world tests are sometimes showing files with extra padding after the ID3v2 tag, or just data that's been blanked. Either way it results in long strings of zeros. It would be good to add a tag-type scanner that looks for (and eliminates) these.

Identify ID3 v 1 and 1.1 tags

Identifying 1 and 1.1 tags should be easy, where they are supposed to be - at the end of a file. But when dealing with badly joined files, or badly tagged files, this isn't always the case. Here are some possible locations:

At the end of the file, in junk data (where it is supposed to be)
At the end of arbitrary junk data within the file (where it would be if tagged files were joined by "cat" or the like)
At the end of a file overwriting 128 bytes of frame data (where a careless tagger might have tried to add it without changing space figuring it would only corrupt a single end frame)
At the end of an arbitrary frame within the file overwriting data (which is what would happen if the case above is joined with "cat")

Add test for junk at end of file

Somehow this passed our test cases, but failed on the 128 byte ID3 tag on real data.

Get tasks to show up in mylyn

Test mylyn integration and get it operational

Identify ID3 v2 tags

When using ID3v2 tags, auto-build "chapter" frame when joining files

Add memory profiling

Get some sort of memory profiling working. This might be important later when we are handling 1gb+ audio books.

Ongoing performance monitoring

Continue to monitor and improve performance.

Pre-array storage of frames:

Numpy (best of three):
Big: 17.97
One: 0.528
Two 0.822

Python only:
Big: 21.14
One: 2.37
Two: 0.96

Fix up the test cases

Make tests more "requirements" based now that we have some basic structure here (this will help as we refactor code too.)

Generate short test files algorithmically, and make test cases for each branch of the mp3file code.

Add support for LAME tags

Get milestones to show up in eclipse

Trying to get IDE well integrated with git. currently milestones are not showing up in mylyn tasks

Handle merge of badly split files

MP3 splitters often don't split on frame boundaries. Sometimes this means a frame straddles two files. Worse, there is often an MP3 tag inserted at the beginning of the second file, so we have [partial frame] -> EOF -> New File -> [ID3 tag] -> [rest of frame.] We should handle this case if possible.

To decide: does the resulting frame get added to the first file, or the second?

thefeshy / mp3utensil Goto Github PK

mp3utensil's People

Contributors

Stargazers

Watchers

Forkers

mp3utensil's Issues

Recommend Projects

Recommend Topics

Recommend Org