Giter Site home page Giter Site logo

mp3utensil's People

Contributors

thefeshy avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

sahwar

mp3utensil's Issues

Dump non-mp3 data

Command-line action to dump all non-mp3 data to separate files, with appropriate labels if they can be identified (ID3 tags, unknown binary data.)

To figure out: should we separate out images from the ID3 v2 tags?

Handle mis-matched MP3 types

Some parameters are supposed to be fixed within a given file - such as sample rate. Find out which ones are supposed to be fixed, and which ones players actually won't play properly. By default prevent users from joining MP3 files that would result in these parameters changing. However, allow users to override this (if nothing else it will help us make test files.)

Lastly we should allow users to split a file with invalid parameter switching like that into valid file parts again (break on parameter transitions.)

Identify APE tags

I read something about other types of MP3 tags, like APE. Look into this.

Move to array storage of headers instead of lists

Currently the largest amount of time is spent allocating objects, such as headers. The closest we can come to a pool allocator in python is an array, so we'll start using an array to store header values in. Hopefully this will help the continuing improvements in performance. Current targets to beat:

Numpy (best of three):
Big: 17.97
One: 0.528
Two 0.822

Python only:
Big: 21.14
One: 2.37
Two: 0.96

Report results of mp3 scan

Scan for attributes that change and shouldn't, report status of all attributes such as copy flag, crc, bitrate, file length, etc.

Preserve ID3 tags on merge

Should be on by default, but able to be disabled via the command line.

To figure out: Should it preserve the first mp3 tag, or try to merge them somehow?

Identify poorly placed ID3v1 tags

Some ID3v1 taggers actually overwrote the last 128 bytes of the last frame with data. This means that we should check the last full frame to see if it contains an ID3v1 tag 128 bytes from its end.

Additionally, because the file could be a badly joined set of MP3 files, this 128 byte tag could be at the end of every frame, and they should all be checked.

Checking for ID3 tags at arbitrary loctations is unwise; both because it's slow an because the loose ID3v1 standard means at best we'd be relying on heuristics. But we might allow it as an option?

Make file scanning "Chunky"

For large (1gb+) files, we probably don't want to read the whole thing at once if possible. Instead we should break scanning and
performing tasks on the file up into chunks. Ideally the program will do this automatically for large files, and support command-line override of this behavior. E.g. chunk-size=50mb (valid values are number of mb to chunk, or -1 for never chunk)

Add "string of null bytes" to list of tag types scanned for

Our real-world tests are sometimes showing files with extra padding after the ID3v2 tag, or just data that's been blanked. Either way it results in long strings of zeros. It would be good to add a tag-type scanner that looks for (and eliminates) these.

Identify ID3 v 1 and 1.1 tags

Identifying 1 and 1.1 tags should be easy, where they are supposed to be - at the end of a file. But when dealing with badly joined files, or badly tagged files, this isn't always the case. Here are some possible locations:

At the end of the file, in junk data (where it is supposed to be)
At the end of arbitrary junk data within the file (where it would be if tagged files were joined by "cat" or the like)
At the end of a file overwriting 128 bytes of frame data (where a careless tagger might have tried to add it without changing space figuring it would only corrupt a single end frame)
At the end of an arbitrary frame within the file overwriting data (which is what would happen if the case above is joined with "cat")

Add memory profiling

Get some sort of memory profiling working. This might be important later when we are handling 1gb+ audio books.

Ongoing performance monitoring

Continue to monitor and improve performance.

Pre-array storage of frames:

Numpy (best of three):
Big: 17.97
One: 0.528
Two 0.822

Python only:
Big: 21.14
One: 2.37
Two: 0.96

Fix up the test cases

Make tests more "requirements" based now that we have some basic structure here (this will help as we refactor code too.)

Generate short test files algorithmically, and make test cases for each branch of the mp3file code.

Handle merge of badly split files

MP3 splitters often don't split on frame boundaries. Sometimes this means a frame straddles two files. Worse, there is often an MP3 tag inserted at the beginning of the second file, so we have [partial frame] -> EOF -> New File -> [ID3 tag] -> [rest of frame.] We should handle this case if possible.

To decide: does the resulting frame get added to the first file, or the second?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.