sleepwalking / rocaloid-old Goto Github PK
View Code? Open in Web Editor NEWOBSOLETED! Moved to http://github.com/Rocaloid
License: GNU General Public License v3.0
OBSOLETED! Moved to http://github.com/Rocaloid
License: GNU General Public License v3.0
There are many people like @tuxzz who does not know how to install Rocaloid as well as QTau.
There are so many components so that installing is a tough work.
Would you mind providing an instruction on compilation & installation?
As I wrote in README.md before, the next version will be totally rewritten again, like the evolution from Rocaloid1 to Rocaloid1.6.
Currently the version of RSC, CVS, and CDT format has already reached 2.x, which means they are in different version with the synthesizer.
Also considering the significant change in synthesis algorithm(TDPSM -> FECSOLA), I've decided to name the next generation as "Rocaloid Engine 3" instead of 2, along with CVE 3.
Here I have to restate the definition and relations of "Rocaloid Engine":
RSC will not be included in Rocaloid Engine anymore, because RSC is strongly related to the note editor, and dealing with editor settings and musical notations is not the business of Rocaloid Engine.
RSC will be replaced by RVS(Rocaloid Vocal Script), which describes the general (but not in detail) information of notes and lyrics (but not phonemes). CVS Generator will be responsible for transforming RVS into CVS. The transformation from RSC (or .vsqx, .vsq, .ust, .nn, etc.) to RVS should be simple (does not require professional phonetics knowledge).
Altogether, the major components and formats in RE3(Rocaloid Engine 3) will be:
Additionally, CVS 3 and RVS 3 will be stored in binary instead of text. This is because formant data will be included in CVS and RVS, which will greatly increase the file size, and slow down the IO performance. (approximately a CVS 3 text file which contains a song will be 10MB)
CVDBStudio is the tool for making sound db for Rocaloid 3, the replacement of TDPSMStudio.
The significance of this tool, or why not writing plug-ins for wave editors such as audacity is that wave editors are not convenient for batch processing.
Generally what CVDBStudio does is to turn bunches of .wav into .cvdb, like:
a_C3.wav -> a_C3.cvdb
i_D#4.wav -> i_D#4.cvdb
...
More specifically, the three major jobs(steps) CVDBStudio does are:
CVDBStudio will be very similar to TDPSMStudio, which also has three similar functions.
Like TDPSMStudio, CVDBStudio also has three modes for the above steps:
To offer a direct view of what CVDBStudio should look like, here is a screenshot of TDPSMStudio, and CVDBStudio generally copies its UI.
Red contour, pink line, and green wave will only occur in CVDB Converter Mode.
This issue still doesn't cover all the details, but you can refer to the vb.net source code of TDPSMStudio:
PitchMixer of CVE3 shows serious problems, causing some of the synthesized vocals being vague.
http://bbs.ivocaloid.com/thread-124636-1-1.html
I'm designing an improved algorithm for CVE. Development will go on when researches are done.
Well. It's an algorism about formant modulation, which can be used to change the pronunciation of waves in the sound db and I think it extremely useful for CVE2.
I called it FECSOLA (Formant Envelope Coefficent Shift and OverLap Add). Briefly it works by modifying the spectral envelope with OLA.
For example, you have the wave of "a", and you know its formant frequencies. Just put it into FECSOLA and tell it the new formant frequencies, and the modified wave comes out (which might be transformed into "i" / "o" / "e").
Obviously this algorism can be used for correcting Miku's poor Chinese pronunciation.
I'm not going to use FECSOLA in building the new db, because it takes much more efforts (lots of work to do with the new db) and increases the size of db. Instead I'm going to embed it into CVE2 and do modification in real time (by some given parameters).
So the problem is we have to figure out:
Theoretical solutions such as observing & analyzing the spectrums would not work since we want the best output quality. So the only way is to put those symbols and formant parameters in abundant real tests and try...
The tester would be really simple. Nothing more than a few sliders (controls F0, F1, F2, F3) and pictureboxes (to show the spectrum before and after modification), and several buttons to load and play the .wav files.
I learned neithor Qt nor C++... So I would be glad if someone could help me make this application. The algorism has a C implementation, easy to port to C++.
For details and the codes of FECSOLA, I'll post them below if someone replies to this post.
After I have read some information about Cadencii, it uses MusicXML to exchange music data with other applications.
I wonder, can Rocaloid accept MusicXML as input file? It is convenient to extend its features.
As I know, many software supports MusicXML, such as MuseScore, Lilypond, etc. So that one can easily export MusicXML file from another app and import it here.
It seems that Rocaloid accepts an ini file, which is obviously poor-extendable. We can use a modified version of MusicXML (for example with additional information included such as phonetic symbols). Since it is XML, adding an attribute will not affect the existing file format and is still compatible with other apps.
P.S. I do not expect this feature to be implemented in a short time. I hope you keep in mind that this may be a feature in the future. So let us keep this issue open for a long time.
Hint: VOCALOID 3 also use XML (but not MusicXML) instead of using modified binary MIDI format as in VOCALOID 2 (MIDI format is formed so tightly that hardly another feature can be appended). So this shows the advantage of XML.
看了你的字典,发现使用的是你们自定义的发音记号,建议使用国际标准的 X-SAMPA 记号。
VOCALOID 的发音记号就是 X-SAMPA 记号。
然而也有例外,如洛天依音源因为某些 bug 导致音标和发音不完全对应。(如拼音 bo
发音应为 p uo
实为 p o
或者拼音 er
洛天依的发音是 Ar
而音标写成了 `@``。)
我们先不吐槽洛天依因为赶工时导致的各种 bug,建议 Rocaloid 使用国际标准的 X-SAMPA 音标格式。
如果你愿意,我可以提供拼音到标准 X-SAMPA 的转换表。 m13253/pinyin2xsampa。(你可能想使用修改一点的 X-SAMPA 便于实现)
我也愿意参与 Linux 移植计划。(当然是等到 C++ 重写之后)
I have had a glance at your dictionary and have found that you are using custom phonetic symbols. I suggest you use X-SAMPA phonetic symbols which is International standard.
In fact, the phonetic symbols that VOCALOID uses are X-SAMPA.
However there are exceptions. For example, some bugs of Luo Tianyi soundfont resulted in inconsistency between phonetic symbols and the actual pronunciation. (Such as bo
in Pinyin should be pronounced as p uo
instead of p o
, and er
in Pinyin should be Ar
instead of `@``.)
Despite those bugs found in Luo TY soundfont due to terrible work quality before the deadline, I recommend Rocaloid use X-SAMPA phonetic symbols.
If you are willing to, I can provide a table converting from Pinyin to standard X-SAMPA. m13253/pinyin2xsampa (You may want to use a slightly modified version of X-SAMPA for easier implementation)
I would like to participate in porting to Linux as well. (Of course not until it is rewritten with C++)
嗯...我知道这是很不规范的做法,Issue Tracker是用来Track Issue的不是用来当论坛聊天的...但是我在iVocaloid论坛上没有发帖权限...直接给开发者发邮件又怕被垃圾邮件过滤...于是我就到这里来发了...
嗯..首先,我觉得自己可以算半个程序员了..学编程大概学了两年左右吧..会C/C++/Python/Go/Javascript, 对Linux和各种开源软件体系都比较熟悉...嗯嗯这是自我介绍了...
然后...我觉得Sleepwalking桑你这个项目做的很棒啊!!! 其实我早就有用初音调教中文歌的想法,但是碍于完全不了解语音学而一直都做不了什么,而且对怎么做逆向工程也是完全不知道所以也搞不定Vocaloid...也是因为平时事情很多,没大块的时间...于是呢,现在我希望能参与这个项目合作...论坛上看到你说GUI和C++苦手...我恰好这方面强一点可以帮忙做做前端开发 ...当然我是觉得我完全做不了后端了(笑)
嗯现在肯定是有这么几个建议:
1.建议还是用C++做开发...C++封装性好,语言相对比较直观,方便做前端开发...不管是开发效率还是运行效率,都相对高一些...其实我是想能用Python做前端肯定最方便..但是出于跨平台考虑,Python要部署Windows 运行环境略坑...
2.建议不要用WxWidget做GUI,改用Qt吧...Qt比起WxWdiget要易学易懂的多...乃说乃学C++时被MFC的Hello World吓到了...其实WXWidget和MFC风格是一样的...而MFC的反人类的API复杂程度世人皆知...与WXWidget比起来Qt就容易学得多...而且Qt也跨平台... 我最熟悉的GUI编程也是用Qt编....
3.关于开源软件协议的事情...我觉得有必要提醒乃一下GPLv3是支持商业使用的..... GPL只是禁止商业公司把代码拿去做闭源软件...如果商业公司拿去修改了之后继续开源,甚至拿来卖钱,只要他提供源代码,那都是不违反GPL的...但是GPL允许散布软件,就是说商业公司拿去卖钱的GPL软件,用户买来后拷贝给别人,或者放在网上分享都还是完全合法的...就是说GPL并不是不允许商业使用...他只是让这个软件没有了被商业使用的意义...另一方面,禁止商业使用并不是开源软件精神推崇的...如果你真的想禁止商业使用的话请不要使用GPL...考虑CC协议吧 ...
4.还是开源软件的事情...关于开源软件的开源声明,一般来说的做法是在根目录下放一个LICENSE.txt 保存协议全文,在放一个 COPYING.txt写简短的版权声明...比如对于GPL短声明就是:
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
而且一般来说对于每个代码源文件内部也是要用注释附上版权声明的...
嗯 ...基本上就这样...期待能一起合作吧(虽然肯定只能等到期末考试后暑假时间才有时间码代码的说_(:3L)_)
We've decided to write the entire project in C++. The basic algorism won't change, still TDPSM. But currently CVE has a few bugs and we'd better fix that in the written version. Some functions are uncompleted, such as GEN factor and the factor conversion from vsqx to rsc.
There are two bugs in CVE 1.6:
When a transition takes place, for example, a_C3 -> o_C3, the transition is done by stretching and mixing hundreds of periods of a_C3 & o_C3 by different ratio. Let's suppose the instantaneous transition ratio is 0.5, so it should be half a_C3 and half o_C3 if it is perfect. Then there comes a problem: the transition ratio in the beginning of a period is not the same with the transition ratio in the end. Why? If don't do so, the end of the period cannot match with the start of the next period.
In a mathematical way to prove it, let's suppose the instantaneous transition ratio is given by:
TR(t) = t * 0.5 (When 0 <= t <= 2)
So this is a transition of 2 seconds. Suppose there is a period begins at 1 sec, which lengths 0.01 sec. So the TR at its beginning should be exactly 0.5 and at its end, TR should be (1 + 0.01) * 0.5 = 0.505.
Period Prediction is a method used in CVE 1.6 to solve the problem of different TRs at the beginning and end of periods. In fact I didn't realize this problem when I designed CVE 1.6. And Period Prediction was added after I finished the first version of CVE 1.6...
Here is the code (in PitchPreSynthesizer):
Dim TR1 As Double, TR2 As Double
TR1 = PCalc.TransitionRatio
SetStartMixRatio(TR1)
PCalc.PitchCalc(Time + 1 / PCalc.GetFreqAt(Time))
TR2 = PCalc.TransitionRatio
If TR2 > 1 Then TR2 = 1
If TR2 < TR1 Then TR2 = 1
As you can see, the EndRatio(TR2) comes from PCalc.PitchCalc(Time + 1 / PCalc.GetFreqAt(Time)), plainly add the current time with the length of the current period, but we don't know the exact length of the current period! So there would be an error of aroud 5 samples.
(I guess this is the slightest bug... 5 samples... May result in almost no change in the outputed wave...)
Here's an example that shows how Pitch Calculator works:
When CVE shifts the pitch from C3 to E3 and back to C3, the Pitch Calculator provides these transition instructions as time increases:
Sounds perfect, but what would happen if we shrink the total time of pitch change to 0.2s? There are 8 transitions in all, so each transition can only take 0.025s, which is the length of 3 periods under C3... Such short transitions would cause a sharp decrease in quality.
So I set up a limit in PCalc:
TimeResolution = 0.03
Then PCalc skips some of the transitions like this:
Then the bug comes...
What would happen if you suddenly change the pitch from D3 -> a bit lower than E3 -> A2?
Pay attention to the second and third transition above. There should be a moment, when the second transition is finished, the output is at a state between D#3 and E3, and in the next moment it becomes a state between E3 and D3... You know D#3 and D3 are from different files... So a boom may occur at the intersection of two neighboring periods...
My idea is to rewrite the PCalc. When a segment is loaded, send its FreqList to the PCalc. The PCalc should calculate all pitch transitions and store them in an array before being called by the synthesizers.
The pitch transitions should fit in two rules:
I would suggest to use CMake.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.