thefeshy / mp3utensil Goto Github PK
View Code? Open in Web Editor NEWA tool for validating, merging, and splitting mp3 files - focused on badly split and merged audio books.
A tool for validating, merging, and splitting mp3 files - focused on badly split and merged audio books.
Take multiple MP3 files, and merge them - using only the frame data and not other junk.
Command-line action to dump all non-mp3 data to separate files, with appropriate labels if they can be identified (ID3 tags, unknown binary data.)
To figure out: should we separate out images from the ID3 v2 tags?
Verify that the crc tags, if present, are correct
Some parameters are supposed to be fixed within a given file - such as sample rate. Find out which ones are supposed to be fixed, and which ones players actually won't play properly. By default prevent users from joining MP3 files that would result in these parameters changing. However, allow users to override this (if nothing else it will help us make test files.)
Lastly we should allow users to split a file with invalid parameter switching like that into valid file parts again (break on parameter transitions.)
Currently the framelist only works with Numpy; we need to add support for numpy not being available.
I read something about other types of MP3 tags, like APE. Look into this.
Currently the largest amount of time is spent allocating objects, such as headers. The closest we can come to a pool allocator in python is an array, so we'll start using an array to store header values in. Hopefully this will help the continuing improvements in performance. Current targets to beat:
Numpy (best of three):
Big: 17.97
One: 0.528
Two 0.822
Python only:
Big: 21.14
One: 2.37
Two: 0.96
Add fallback python-only processing for users who don't have numpy
Scan for attributes that change and shouldn't, report status of all attributes such as copy flag, crc, bitrate, file length, etc.
Should be on by default, but able to be disabled via the command line.
To figure out: Should it preserve the first mp3 tag, or try to merge them somehow?
Some ID3v1 taggers actually overwrote the last 128 bytes of the last frame with data. This means that we should check the last full frame to see if it contains an ID3v1 tag 128 bytes from its end.
Additionally, because the file could be a badly joined set of MP3 files, this 128 byte tag could be at the end of every frame, and they should all be checked.
Checking for ID3 tags at arbitrary loctations is unwise; both because it's slow an because the loose ID3v1 standard means at best we'd be relying on heuristics. But we might allow it as an option?
For large (1gb+) files, we probably don't want to read the whole thing at once if possible. Instead we should break scanning and
performing tasks on the file up into chunks. Ideally the program will do this automatically for large files, and support command-line override of this behavior. E.g. chunk-size=50mb (valid values are number of mb to chunk, or -1 for never chunk)
Our real-world tests are sometimes showing files with extra padding after the ID3v2 tag, or just data that's been blanked. Either way it results in long strings of zeros. It would be good to add a tag-type scanner that looks for (and eliminates) these.
Identifying 1 and 1.1 tags should be easy, where they are supposed to be - at the end of a file. But when dealing with badly joined files, or badly tagged files, this isn't always the case. Here are some possible locations:
At the end of the file, in junk data (where it is supposed to be)
At the end of arbitrary junk data within the file (where it would be if tagged files were joined by "cat" or the like)
At the end of a file overwriting 128 bytes of frame data (where a careless tagger might have tried to add it without changing space figuring it would only corrupt a single end frame)
At the end of an arbitrary frame within the file overwriting data (which is what would happen if the case above is joined with "cat")
Somehow this passed our test cases, but failed on the 128 byte ID3 tag on real data.
Test mylyn integration and get it operational
Get some sort of memory profiling working. This might be important later when we are handling 1gb+ audio books.
Continue to monitor and improve performance.
Pre-array storage of frames:
Numpy (best of three):
Big: 17.97
One: 0.528
Two 0.822
Python only:
Big: 21.14
One: 2.37
Two: 0.96
Make tests more "requirements" based now that we have some basic structure here (this will help as we refactor code too.)
Generate short test files algorithmically, and make test cases for each branch of the mp3file code.
Trying to get IDE well integrated with git. currently milestones are not showing up in mylyn tasks
MP3 splitters often don't split on frame boundaries. Sometimes this means a frame straddles two files. Worse, there is often an MP3 tag inserted at the beginning of the second file, so we have [partial frame] -> EOF -> New File -> [ID3 tag] -> [rest of frame.] We should handle this case if possible.
To decide: does the resulting frame get added to the first file, or the second?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.