dmitryulyanov / neural-style-audio-tf Goto Github PK

View Code? Open in Web Editor NEW

445.0 26.0 114.0 6.58 MB

TensorFlow implementation for audio neural style.

Jupyter Notebook 100.00%

tensorflow style-transfer neural-style

neural-style-audio-tf's People

Stargazers

Watchers

Forkers

ml-lab vsooda jfsantos nimmen cash2one jlertle benjamesbabala jmiller656 laventura stevenlol jdc08161063 allensmile oduerr arbdigital vyraun profkittyface federicosan diggerdu bigsnarfdude jantb kharvd neucoder artistic-ai ianhalbwachs nifannn carloslema denkii xukai92 hzfeibao gumplus tcwalther paddymahoney leezqcst qqsantaclaus nami3373 zwhinmedia itimetraveler hugh-obrien monjovi xi-studio yashbonde jackielxu msm1089 shubhampachori12110095 zuewang chenxingzhang1997 yuguorui marcinja liyungithub c1a1o1 142857why xiangyuwei alamehor moomonkey zpeng1989 ahiroto eridgd aozhi naveen18 r1cebank upml xujiaba chienlinhuang1116 phonamnuaisuk vbirbal afcarl 0tao wushicanasl aaaaaaada jennychiou ch-yyk anigi98932 dhruvramani kurian-thomas ella77 audioai pelfsollution stavrev lijuan123 tys1128 jonsatt themidwestcanapps sheliakang ssgalitsky tamwaiban ptaati berryai whmnoe4j iewbgfnydwhrorrsktkdymduzgdwubygdktdjwd maciejsaw asksasasa83 meiqiaofei samdevo nirupam1sharma zoe-yjy cbiehl mbncr cwncdnc markusbuchholz listenwhy

neural-style-audio-tf's Issues

Getting Syntax error and deprecated TF function

Hi Dmitry,
thanks for putting this together, this is exactly what I was looking for an experiment!
I am definitely a beginner in this, but I was trying to run your example and I get a Syntax error on the Optimise kernel and in the Output in the print as now you have to add parenthesis.

File "<ipython-input-16-9eb962c6044b>", line 50
    print 'Final loss:', loss.eval()
                      ^
SyntaxError: invalid syntax

I also figured out that tf.initialize_all_variables() is now deprecated and so changed it to tf.global_variables_initializer()

Then it all works well!
Thanks!

AttributeError: module 'librosa' has no attribute 'output'

Hi Dmitry,
I wanted to try to do a audio style transfer, but get this error on the optimize and invert spectrum step.

Started optimization.
INFO:tensorflow:Optimization terminated with:
Message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
Objective function value: 1785.756958
Number of iterations: 300
Number of functions evaluations: 309
Final loss: 1785.7569580078125

AttributeError Traceback (most recent call last)
in ()
----> 1 get_ipython().run_cell_magic('time', '', 'from sys import stderr\n\n#@markdown ---\n#@markdown Advanced settings / Расширенные настройки\nALPHA= 0.1 #@param {type:"slider", min:0.01, max:0.2, step:0.01}\nlearning_rate= 0.01 #@param {type:"slider", min:0.001, max:0.02, step:0.001}\niterations = 300 #@param {type:"slider", min:100, max:500, step:10}\n#@markdown ---\nresult = None\nwith tf.Graph().as_default():\n\n # Build graph with variable input\n #x = tf.Variable(np.zeros([1,1,N_SAMPLES,N_CHANNELS], dtype=np.float32), name="x")\n x = tf.Variable(np.random.randn(1,1,N_SAMPLES,N_CHANNELS).astype(np.float32)*1e-3, name="x")\n\n kernel_tf = tf.constant(kernel, name="kernel", dtype='float32')\n conv = tf.nn.conv2d(\n x,\n kernel_tf,\n strides=[1, 1, 1, 1],\n padding="VALID",\n name="conv")\n \n \n net = tf.nn.relu(conv)\n\n content_loss = ALPHA * 2 * tf.nn.l2_loss(\n net - content_features)\n\n style_loss = 0\n\n _, height, width, number = map(lambda i: i.value, net.get_shape())\n\n size = height * width * number\n feats = tf.reshape(net, (-1, number))\n gram = tf.matmul(tf.transpose(feats), feats) / N_SAMPLES\n style_loss = 2 * tf.nn.l2_loss(gram - style_gram)\n\n # Overall loss\n loss = content_loss + style_loss\n\n opt = tf.contrib.opt.S...

2 frames
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2115 magic_arg_s = self.var_expand(line, stack_depth)
2116 with self.builtin_trap:
-> 2117 result = fn(magic_arg_s, cell)
2118 return result
2119

in time(self, line, cell, local_ns)

/usr/local/lib/python3.7/dist-packages/IPython/core/magic.py in (f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):

/usr/local/lib/python3.7/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1191 else:
1192 st = clock2()
-> 1193 exec(code, glob, local_ns)
1194 end = clock2()
1195 out = None

in ()

AttributeError: module 'librosa' has no attribute 'output'

learning style from more than one example?

I'm trying to understand if it would make sense to learn style from a group of examples (in this case, audio files) instead of just one. In the best case this would produce a sort of "mean style" representing the group of audio excerpts. In your experience, would such an approach work (as long as the examples do somehow share some style in common), or it would produce just garbage?

Why the hell is this not talked about more?

You added this 3 years ago, and I am just now finding it. I have been searching for an implementation of neural style that treats music as the images, in this case waveforms. This is amazing have you built more upon this? Thanks for this repo.

Why no 1D convolution ?

In your blog, you wrote 1D convolution works better than 2D ones. But this tensorflow version didnt use Conv1D but Conv2D. Why is that? any reason or am I missing out something else?

what does it take to produce longer outputs?

hello Dmitry,
a quick question. How do I produce longer output files with this approach? Should I necessary provide longer inputs or there is another way?
Thank you very much for sharing your results
Giancarlo

add the ability to load a pretrained net?

Though I fully trust Dmitry and believe in his claim that a random cnn is as good as a pretrained net in detecting and extracting texture features (the "style"), I would really appreciate the possibility of testing some pretrained net for extracting the "content" features.
While experiencing with this lovely software I found that its ability to discriminate the content structure in "content" sound files does not appear as accurate as in the examples provided elsewhere for the "image style transfer" case. In particular it seems that too much of the style still remains in the content, and this is perhaps the cause of high dominance of some audio files when combined with others.
I noted that the best combinations (i.e., where the "content" audio imposes only its structure and the "style" audio enforces its own texture) are produced, when the spectra of the two audios share most of their frequencies, but the "style" has less structure or, in other words, less evident "beats". This would correspond, in images, to the "style" image having mostly the same spectrum as the "content" one, but featuring weaker and shorter edges. The output audio, in this case, resembles a combination of an "envelope" taken from the "content" audio, modulating the amplitude of the "style" audio.
On the other hand, when the "style" audio lies on a mostly different region of the frequency spectrum (e.g. higher frequencies) with respect to the "content", then the two audios get mixed (their spectra appear to be merged) and both are almost equally present in the output, producing in most cases very confusing output.
I can provide some examples, but I guess anyone can figure out what I'm trying to explain, by testing on the available audio samples.
By looking at the results produced by applying style transfer to images, I would expect a different behavior, where the style (i.e. the texture) of the "style" image almost completely substitute the texture of the "content" image. I suspect that some more investigation might be needed in the selection of the most suitable net for content features selection, and therefore I would love to have some hints about how to load and use a pretrained network.
Sorry for the long message.

S = np.log1p(np.abs(S[:,:430]))

in read_audio_spectum @ 4th cell, S = np.log1p(np.abs(S[:,:430])) . What's the purpose of constant 430?
Much thx!
@DmitryUlyanov

Blank screen error

Hi Dimitry :)
I have encountered some kind of error, when trying to transfer style from one song to another.
After running few cells, screen goes black and I cannot use keyboard nor mouse, cant enter tty mode - looks like regular system crash.
I'm using ubuntu 16.04 with tensorflow gpu (geforce 760gti 2gb vram).
Is this problem caused by using gpu version?

Run on terminal in Ubuntu

hello @DmitryUlyanov ,
based on your github it said to be run on jupyter, can i run it in terminal in ubuntu and how?

dmitryulyanov / neural-style-audio-tf Goto Github PK

neural-style-audio-tf's People

Stargazers

Watchers

Forkers

neural-style-audio-tf's Issues

Getting Syntax error and deprecated TF function

AttributeError: module 'librosa' has no attribute 'output'

Started optimization.
INFO:tensorflow:Optimization terminated with:
Message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
Objective function value: 1785.756958
Number of iterations: 300
Number of functions evaluations: 309
Final loss: 1785.7569580078125

learning style from more than one example?

Why the hell is this not talked about more?

Why no 1D convolution ?

what does it take to produce longer outputs?

add the ability to load a pretrained net?

S = np.log1p(np.abs(S[:,:430]))

Blank screen error

Run on terminal in Ubuntu

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

dmitryulyanov / neural-style-audio-tf Goto Github PK

neural-style-audio-tf's People

Stargazers

Watchers

Forkers

neural-style-audio-tf's Issues

Started optimization. INFO:tensorflow:Optimization terminated with: Message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT' Objective function value: 1785.756958 Number of iterations: 300 Number of functions evaluations: 309 Final loss: 1785.7569580078125

Recommend Projects

Recommend Topics

Recommend Org

Started optimization.
INFO:tensorflow:Optimization terminated with:
Message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'
Objective function value: 1785.756958
Number of iterations: 300
Number of functions evaluations: 309
Final loss: 1785.7569580078125