vsitzmann / siren Goto Github PK

View Code? Open in Web Editor NEW

1.7K 1.7K 245.0 2.18 MB

Official implementation of "Implicit Neural Representations with Periodic Activation Functions"

License: MIT License

Python 92.00% Jupyter Notebook 8.00%

siren's People

Stargazers

Watchers

Forkers

miladroy templeblock mafm respondgaurav atomicbits xvdp ggsonic ioanszilagyi jingwang960108 moustafameshry landeraxe mrkoujan bruinxiong kp-forks gianmatharu zehuangfang bubblyyi chexueji cclauss shivamsrs killsking wh-forker wujinlonglovezhangmiao1314 shaochengjia alanderex 18813185122 dev4488 peterzs countofglamorgan kujta1 hwtwj princeyangzl ggxh kristofe thecobb ml-lab abollo orcuslc cooldiao mosamdabhi cortu01 ialzyoud kenryd dendisuhubdy xlyu0127 kvgandikota lotayou scape1989 juangon mldl stjordanis papers-implementation royorel xrosliang zeta1999 shinanzou zebrajack dbeker oakhope yuxwind hardikk13 dahliau franec94 coderpriya cock-puncher ccnguyen saharproter stevenhan1991 zhuyifan1993 xqpinitial saeed1262 mahirgulzar cankunqiu tikquuss jjcao mingzailao amitmy ashbt paologastaldi-polito schardong weiyudu brendancolvert mohammadalm alex-rakowski epochsimate jxzhangjhu sergeicu edmontdants akramjits ntnu-arl razdvapoka krenzaslv gabewb wonlee2019 cedrusx outlande xnorpx destructive-observer daydreamer2023 iwuqing

siren's Issues

Why are the image values multiplied by 1e1 when calculating the gradient

Dear authors of siren,

Could you please explain the following line of code? Why are the image values multiplied by 1e1 when calculating the gradient?

siren/dataio.py

Line 601 in 4df34ba

img *= 1e1

Finding the proper frequency multiplier

Hi,

how do you usually initialize the frequency of the network (leaving aside the magic number 30)?
As it is very much problem-dependent, I'm wondering if it is currently just a trial-and-error task or if there is a more
reasonable approach, even if just for finding general scale (I saw you mentioned an value of 3000 for audio data in another issue).

Thanks already in advance, and also for this nice code-base.

can SIREN methods be used into the RNN or LSTM architecture?

currently, SIREN is only used in MLP, but how's the performance if it can be used in RNN/LSTM/GRU?

Question about the loss on SDF

Thanks for your great work first.
In the SDF experiments, I am confused about why the the norm of spatial gradients should to be 1 almost everywhere. Can you give me a Intuitive explanation? Thanks so much!

allows to take derivative w.r.t. input

In the notebook "explore_siren.ipynb"
Can you explain why the line coords.clone().detach().requires_grad_(True) is necessary? Why wouldn't you be able to get a gradient otherwise?

    def forward(self, coords):
        coords = coords.clone().detach().requires_grad_(True) # allows to take derivative w.r.t. input
        output = self.net(coords)
        return output, coords

Missing script argument in example

When I run the code python experiment_scripts/train_img.py --model_type=sine given in README.md, I get the error

usage: train_img.py [-h] [-c CONFIG_FILEPATH] [--logging_root LOGGING_ROOT]
                    --experiment_name EXPERIMENT_NAME
                    [--batch_size BATCH_SIZE] [--lr LR]
                    [--num_epochs NUM_EPOCHS]
                    [--epochs_til_ckpt EPOCHS_TIL_CKPT]
                    [--steps_til_summary STEPS_TIL_SUMMARY]
                    [--model_type MODEL_TYPE]
                    [--checkpoint_path CHECKPOINT_PATH]
train_img.py: error: the following arguments are required: --experiment_name

How to run "Image Fitting" on GPU with low memory ~ 4GB?

Hi,
I am trying to execute the code for image fitting problem. I am setting batch size =1 (default value) as I have 4gb GPU. Still the training stops due to GPU out of memory. Can anyone let me know how could I solve this problem?
Thanks.

 python experiment_scripts/train_img.py --model_type=sine --experiment_name=output

SingleBVPNet(
  (image_downsampling): ImageDownsampling()
  (net): FCBlock(
    (net): MetaSequential(
      (0): MetaSequential(
        (0): BatchLinear(in_features=2, out_features=256, bias=True)
        (1): Sine()
      )
      (1): MetaSequential(
        (0): BatchLinear(in_features=256, out_features=256, bias=True)
        (1): Sine()
      )
      (2): MetaSequential(
        (0): BatchLinear(in_features=256, out_features=256, bias=True)
        (1): Sine()
      )
      (3): MetaSequential(
        (0): BatchLinear(in_features=256, out_features=256, bias=True)
        (1): Sine()
      )
      (4): MetaSequential(
        (0): BatchLinear(in_features=256, out_features=1, bias=True)
      )
    )
  )
)

  0%|                                                                          | 0/10000 [00:00<?, ?it/s]

Traceback (most recent call last):
  File "experiment_scripts/train_img.py", line 62, in <module>
    model_dir=root_path, loss_fn=loss_fn, summary_fn=summary_fn)
  File "/home/mz/code/siren/training.py", line 92, in train
    summary_fn(model, model_input, gt, model_output, writer, total_steps)
  File "/home/mz/code/siren/utils.py", line 334, in write_image_summary
    img_gradient = diff_operators.gradient(model_output['model_out'], model_output['model_in'])
  File "/home/mz/code/siren/diff_operators.py", line 42, in gradient
    grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
  File "/home/mz/anaconda3/envs/siren/lib/python3.6/site-packages/torch/autograd/__init__.py", line 158, in grad
    inputs, allow_unused)

RuntimeError: CUDA out opython experiment_scripts/train_img.py --model_type=sine --experiment_name=output1
SingleBVPNet(
  (image_downsampling): ImageDownsampling()
  (net): FCBlock(
    (net): MetaSequential(
      (0): MetaSequential(
        (0): BatchLinear(in_features=2, out_features=256, bias=True)
        (1): Sine()
      )
      (1): MetaSequential(
        (0): BatchLinear(in_features=256, out_features=256, bias=True)
        (1): Sine()
      )
      (2): MetaSequential(
        (0): BatchLinear(in_fef memory. Tried to allocate 256.00 MiB (GPU 0; 3.94 GiB total capacity; 2.76 GiB already allocated; 215.06 MiB free; 2.78 GiB reserved in total by PyTorch) (malloc at /opt/conda/conda-bld/pytorch_1587428091666/work/c10/cuda/CUDACachingAllocator.cpp:289)

Training videos. Missing test_video.py?

Trained up a a custom video but then realized there was no equivalent to test_audio.py. Is that just an oversight, as there is an example video on the project page?

Also note that there seems to be missing support for checkpoint_path in train_video.py, something like this can be added:

if opt.checkpoint_path is not None:
    model.load_state_dict(torch.load(opt.checkpoint_path))

In my training the loss started increasing after hitting a minimum around 20000. Is that what you observed?

How does the suggested initialization compare to the Xavier initialization method ?

I've just found out about the so-called Xavier initialization method, which to me seems to do what the paper claims to be its main contribution.

It's an option to Mathematica's NetInitialize function.

image = ExampleData[{"TestImage", "Lena"}]

image = ImageResize[image, 128];

{width, height } = ImageDimensions[image]

output = Flatten[#*2 - 1 &@ImageData[image], 1]

linspace[n_] := Array[# &, n, (n - 1)/2 // {-#, #} &]

input = Pi/2 Tuples [{linspace[height], linspace[width]}] // N

net = NetInitialize[
  width // NetChain[
     {
      #, Sin,
      #, Sin,
      #, Sin,
      #, Sin,
      3, Sin
      },
     "Input" -> 2
     ] &,
  Method -> "Xavier"
  ]

net = NetTrain[net, (input -> output)];
Image[Partition[(# + 1)/2 &[net /@ input], width], ColorSpace -> "RGB"]

image

Do[
 Print[Table[
   NetExtract[net, {i, x}] // Flatten // 
    Histogram, {i, {1, 3, 5, 7, 9}}]],
 {x, {"Weights", "Biases"}}
 ]

dealing with non square images

I noticed in all the examples and looking at the code that images and video are equal in column and row counts (e.g., 512x512). I am considering arbitrary sized images (without resizing), how do I do that?
thanks

test_sdf is crashing

I ran the sdf fit module until completion.
Then, when I try to run test_sdf.py on the checkpoint file the process is crashing from some reason, this is the output I get:

SingleBVPNet(
(image_downsampling): ImageDownsampling()
(net): FCBlock(
(net): MetaSequential(
(0): MetaSequential(
(0): BatchLinear(in_features=3, out_features=256, bias=True)
(1): Sine()
)
(1): MetaSequential(
(0): BatchLinear(in_features=256, out_features=256, bias=True)
(1): Sine()
)
(2): MetaSequential(
(0): BatchLinear(in_features=256, out_features=256, bias=True)
(1): Sine()
)
(3): MetaSequential(
(0): BatchLinear(in_features=256, out_features=256, bias=True)
(1): Sine()
)
(4): MetaSequential(
(0): BatchLinear(in_features=256, out_features=1, bias=True)
)
)
)
)
Killed

What could be the reason and how can I fix it?
Thank you

SIREN memory requirements

What are the minimal memory requirements for train_wave_equation.py? I am trying to run it on my gpu with 16GB memory, but consistently getting cuda OOM..

Visualize results

Hi, I've successfully train Siren using two pics in experiment "train_poisson_gradcomp_img.py"
But once finished, how can I visualize output image?

Thanks in advance

Numerical instability

Did you have same problem？？

Runtime error in "train_poisson_gradcomp_img.py"

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py"
I'm using two 256x256 B&W images, but when I run the script I keep getting:

Traceback (most recent call last):
File "/content/siren/experiment_scripts/train_poisson_gradcomp_img.py", line 44, in
sidelength=512)
File "/content/siren/dataio.py", line 854, in init
paddedImg[:, 512 - 340:512, :] = self.img2
RuntimeError: The expanded size of the tensor (84) must match the existing size (256) at non-singleton dimension 1. Target sizes: [1, 84, 256]. Tensor sizes: [256, 256]

Any clue on how to fix? Many thanks in advance

Question on Learning a Space of Implicit Functions

In section 4.4 of your paper you go into an interesting hypernetwork idea that can generate the siren parameters for a space of "functions" (images in that case).

In section 9 of the appendix, you go more into details, and I specifically care about the part where you predict the siren parameters from the input:

As far as I understand

the weights of a 5-layer SIREN with hidden features of size 256

Are:

W1 ∈ R(2, 256) # maps x,y
b1 ∈ R(256)

W2,W3,W4 ∈ R(256,256)
b2,b3,b4 ∈ R(256)

W5 ∈ R(256,3) # maps to rgb
b5 ∈ R(3)

Total params: 2*256 + 256 + 3*(256*256 + 256) + 256*3 + 3 =198,915

So, do I understand correctly that your hypernetwork takes the input from the convnet, input, and does the following:

h1 = relu(U1*input + c1)
h2 = U2*h1 + c2

Where

U1 ∈ R(|input|, 256)
c1 ∈  R(256)

U2 ∈ R(256, 198915) 
c2 ∈ R(198915)

This doesn't feel right to me.

about representing shpe with SDF (points and normals)

In Section 4 of supplementary material, it is mentioned that points with their normals are collected. Would you please explain which norm is calculated (Euclidean, etc)? Also, in which coordination system ( real-world or voxel coordination) the points are collected?

readme misspelling

line 58,68,77
experiment_scipts -> experiment_scripts

ate 2.2 Gb memory Still not enough to run....

model.cuda() and self.model.load_state_dict(torch.load(opt.checkpoint_path)) are very slow

When i try to train sdf, i found it will take a long time to call model.cuda() in file train_sdf.py. (just after printing the model)
Also when testing sdf, self.model.load_state_dict(torch.load(opt.checkpoint_path)) need a lot of time.

My pytorch and cudatookit are installed according to file environment.yml. ( pytorch=1.5.0=py3.6_cuda10.1.243_cudnn7.6.3_0 )

How can i solve the issue? Is it normal to have such a problem?

Can we use Siren for Image restoration tasks?

Is siren good for tasks like image reconstruction, dehazing, matting etc? How can we use it in current architectures if so?

a bug maybe with sine_init.

from modules.py at line62, the sine_init is

m.weight.uniform_(-np.sqrt(6 / num_input) / 30, np.sqrt(6 / num_input) / 30)

but I think is:

m.weight.uniform_(-np.sqrt(6 / num_input) * 30, np.sqrt(6 / num_input) * 30)

follow is my test code with 101 sin layers, the std of outputs is stability equals 0.7 when * 30，
and when I replace * factor with / factor, the std of output reduced to 0.

import torch
import numpy as np

dim = 1000
factor = 30
alpha = 1
x = torch.Tensor(np.zeros(dim, dtype=np.float32)).reshape([dim, 1])
w = torch.Tensor(np.zeros([dim,dim], dtype=np.float32))

inputs = torch.nn.init.normal_(x, 0, 1)
weights = torch.nn.init.uniform_(w, -alpha/dim, alpha/dim)

outputs = torch.sin(alpha*factor * weights@inputs)
print(torch.mean(outputs), torch.std(outputs))

weights = torch.nn.init.uniform_(w, -np.sqrt(6 / dim) * factor / alpha, np.sqrt(6 / dim) * factor / alpha)

for _ in range(100):
    outputs = torch.sin(alpha*weights@outputs)
    print(torch.mean(outputs), torch.std(outputs))

FC layers for VGG?

Reading vgg.py, I'm confused about the last layers of the network.

My understanding is that VGG normally ends with a few fully connected layers and a softmax, but here we end with:

layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
and then
self.classifier = nn.Linear(512, 10)

My questions are:

What's the deal with AvgPool2d with kernel size 1? Seems like it should be a no-op?
Why were the fully connected layers removed? Is this an adjustment for the easier task of CIFAR-10, as compared to ImageNet?

some about first_layer_sine_init

I am confused about first_layer_sine_init, where you set W~uniform(-1/n,1/n).

As we know, input is X-uniform(-1,1) so has var[x] = (2^2/12)=1/3. and after FC layer, var[sin(30Wx+b)] = 30^2n*(1/3)*(c^2/3) =1?
so how you initialize first-layer-weight by uniform(-1/n,1/n)?

Misunderstanding usage of Implicit2DWrapper

I have an issue related to the meaning and usefulness of Implicit2DWrapper function.

I understood that it is necessary to retrive a dictionary like python object containing data that also include first derivative, that is gradient, and the second order derivative that is laplacian.

But when I'm fetching data via Implicit2DWrapper and then I'm feeding a Siren based neural network, finally I'm going to measure PSNR and other metrices scores such as SSIM, someone made me notice that my metrices score recorded are too low.

So, I think that the issue is due to the fact that the self.transform instance attribute within class Implicit2DWrapper is containing a Normalization operation that it is not reported instead for explore_sire.ipynb where ImageFitting is defined in order to let a Siren-based model to be trained against Camera image.

Why do you decide to add such normalization process for Implicit2DWrapper in your dataio.py source file, but do not perfome such transformation when you have edited explore_sire.ipynb notebook, when showcasing the use of siren architecture for image fitting?

"--audio_path" changed to "--wav_path"

the option name has been changed

python experiment_scipts/train_audio.py --model_type=sine --wav_path=<path_to_audio_file>

Issues with training on audio (not a bug with this repo)

I reimplemented Siren in TensorFlow 2.5. The network easily learns images, but I can not reproduce result with audio. On the sample file from the paper loss gets stuck at relatively high value (~0.0242), and network's output turns very quiet (max(abs(x)) ~= 0.012). Just curious if anyone has faced the same issue when reimplementing Siren on their own.

What I've tried so far:

doublechecked omega - it is set to 3000.0 (input), 30.0, 30.0, 30.0 (inner) layers
Changing batch size to full length of the sample (I used to do randomized batches of 8*1024)
Using float64 to avoid potential issues with numerical overflows/underflows
Checked network weights: all are finite numbers
Using SGD as a more stable optimizer
Increasing network width/adding more layers

Essentially, all the above actions still led to the same result with loss ~0.0242

typo in proof of lemma 1.6

Thanks for the nice paper. In the last three equations in proof of lemma 1.6, it should be arcsiny rather than arcsinx, and the conclusion should be Y~Arcsin(-1,1) rather than X~Arcsin(-1,1) ?

Does SIREN only work well with over parameterised network?

Hello,

I spent one week on testing siren and try to introduce it to my system. In my knowledge, the Relu network at least is able to come out a result with the averaged or balanced patterns from the training data.

I did a small test like this: to jointly train the neural feature image and a CNN auto-encoder with Sin as the activation. With just 2 cat images as the reconstruction target, to my surprise, an over-parameterized network along with 2 corresponding neural images, very quickly (about 1000 iterations) outputs the results with beautiful high frequency patterns of hairs. Then I begin to reduce the size and dimension of the neural image. Consequently, the loss value still decrease fast at first, then at some iteration, the value suddenly jump high and come a very desperate result. Here I attach the processing result:

I just think very natively, is it because Sin activation is very sensitive to the gradient step, that any wrong step may lead the result to a bad local minimum? Have you tested the generalisation of Siren? Or does the Siren only work well with the over parameterised network?

How SIREN works?

I've read your paper and compare it with your code. Your code link shows that if the hypernetworks' parameters are given as a param and it is producted with SIREN.

Is it the right way to use "Hypernetwork"? Your paper shows that
, and the $\psi$'s output is not the params($\theta$) of SIREN in your code.

Please explain why this is the same thing with the above.

Question about the lemma 1.6 and code implementation

Hi. In your paper the lemma 1.6 is talking about the distribution of X and Y=sin(pi/2*X). We have known that the output of linear layer is in normal distribution if taking specific initialization method uniform(-c,c). However, in your code, the activation is just torch.sin(x) not torch.sin(pi/2 * x). Is there something I missed?

SIREN not converge

Hi, thanks for this great work. I recently adopted SIREN in my problem, but during training the loss becomes divergent. I've set the omega to 30 by default. Could you please give suggestion about this. Thanks!

The insight of "out-of-range behavior"

What exactly is the insight of studying the "out-of-range behaviour" in the notebook?
The signal is not repeated on this range of (-50, 50), so...?
In my opinion, I would study if the model can generalize to something larger than [-1, 1], for example if it can do some reasonable "out-painting" off the image borders, however it doesn't, as shown in the image.

Do you think it is possible with some tweak to the network? Otherwise I don't really understand why showing this block.

Do we need to fit a network for every image?

I found when fitting an image for SIREN, we only input a position encoding (doesn't contain the information of the image) and backdrop by comparing the output with the ground truth image. Does that mean we will input the same tensors (linear space from -1 to 1) when fitting different images? If so, the network parameters will be different for every image.

Question about first layer

Hi,
I have counted a problem in my reading.
As you stated in appendix 1, for input drawn uniformly at random in the interval [-1, 1], pushing this input through a sine nonlinearity yields an arcsine distribution as input for dot product as later layers do. So according to this correct SIREN should composed as sin->linear->sin->linear->...->linear. But actually, what you do in your code is linear->sin->linear->sin->...->linear, so my question is why are you have chosen this implementation since the first one follows the distribution assumption correctly and the second one does not.

Please tell me the right answer or point out my mistakes, thanks!

Best regards,
Xin Luo.

Reproducibility of SDF

Hey,

Thank you for your work.

According to the paper, the signed distance function constraint as well as the off-surface penalization should be multiplied by the same λ. However, the λ of 'inter_constraint(off-surface)' and 'sdf_constraint' is totally different in the code.

Shouldn't 'inter' be multiplied by 3e3? Which one should I follow?
Do you use a different hidden_features for thai and room?
Does 50000 iteration means 50000 steps or epochs?
The loss value is very unstable sometimes. (see below)

I tried using the original setting of your code to train the network on room and thai with inter_loss * 1e2 / inter_loss * 3e3.
A. room1e2, B. room3e3, C. thai1e2, D. thai3e3.

A, room1e2:

B, room3e3:

C thai1e2:

D thai3e3:

As you can see, none of these has a perfect convergency. I use hidden_feautre=256 for all 4 experiments. So if the 'units' in your paper means 'hidden_feautres' in code, then C thai1e2 should be exactly the same as you described. But as you can see, the inter_loss did not converge at all.

Best,
Zhengdi

Helmholtz Model not converging

After running train_helmholtz.py script, I got 'train_losses_final.txt', but the losses are still very large even after 50000 epochs.

Does this mean that the training has failed?
But I did't change anything in the train_helmholtz.py script.

Choosing omega_0

Hi,

Is there any interpretation for the choice of \omega (i.e., does a large \omega lead to better representation of high frequency detail etc). (Similar to the variance scale for random Fourier features).

This is for the task of image fitting.

Computing gradient without autograd

First of all thanks for the amazing paper and sharing the code.

I have a question regarding computation of gradients (and higher order derivatives) w.r.t. the input. In the code you seem to be using autograd to compute them (diff_operators.py). However, in section 2 of the supplementary material of the paper you provide an explicit formula (another SIREN) that represents the gradient.

Did you consider implementing a general mechanism that inputs a SIREN network and outputs a new SIREN network representing its gradient while sharing the trainable parameters of the input network? One could basically obtain derivatives of any order this way and use them easily in training. Maybe I am missing something though.

Thanks!

Runtime error in experiment "train_poisson_gradcomp_img.py"

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py"
I'm using two 256x256 B&W images, but when I run the script I keep getting:

Traceback (most recent call last):
File "/content/siren/experiment_scripts/train_poisson_gradcomp_img.py", line 44, in
sidelength=512)
File "/content/siren/dataio.py", line 854, in init
paddedImg[:, 512 - 340:512, :] = self.img2
RuntimeError: The expanded size of the tensor (84) must match the existing size (256) at non-singleton dimension 1. Target sizes: [1, 84, 256]. Tensor sizes: [256, 256]

Any clue on how to fix? Many thanks in advance

Is omega0 really needed ?

Is it not possible to scale the input linear space instead ?

Also, not at all an issue, but I'd like to share the following Mathematica code attempting to replicate this, or at least the part for fitting an image :

image = ImageResize[ExampleData[{"TestImage", "Lena"}], 128]
{width, height } = ImageDimensions[image];
output = Flatten[ImageData[image] // #*2 - 1 &, 1];
linspace[n_] := Array[# &, n, {-#, #}] &[(n - 1)/2]
input = Pi/2*Tuples[{linspace[height], linspace[width]}] // N;
layer[n_, in_] := LinearLayer[
    n,
    "Weights" -> RandomReal[{-#, #}, {n, in}],
    "Biases" -> RandomReal[{-#, #}, {n}],
    "Input" -> in
] &[Sqrt[6/in]]
net = NetChain[
  {
   128, Sin,
   layer[128, 128], Sin,
   layer[128, 128], Sin,
   layer[128, 128], Sin,
   layer[3, 128], Sin
  },
  "Input" -> 2
]
net = NetTrain[net, (input -> output)]
Image[Partition[(# + 1)/2 &[net /@ input], width], ColorSpace -> "RGB"]
Table[NetExtract[net, {i, "Weights"}] // Flatten // Histogram, {i, {1, 3, 5, 7, 9}}]
Table[NetExtract[net, {i, "Biases"}] // Flatten // Histogram, {i, {1, 3, 5, 7, 9}}]

It works surprisingly well, but I haven't used omega0, I scaled the input instead. Also performance is better when the weights in the first layer are not initialized as advised in the paper. Not sure if it's related with the input scaling or what.

tensorflow implementation

Hello,

This is really interesting work! I was wondering if you have or are planning on building a tensorflow implementation of SIREN layers, so I can quickly integrate it with my projects.

Thanks!

fwi throws out of memory error on a 32gb GPU; paper mentions 24gb GPU.

From the experiment_scripts/ folder, I try to run train_inverse_helmholtz.py experiment as follows.
python3 train_inverse_helmholtz.py --experiment_name fwi --batch_size 1

The supplementary section 5.3 of the paper states that a single 24GB GPU was used for running this experiment whereas I am using a 32GB V100 which should be sufficient. However, even with a batch size of 1 I get the following error:

RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)

Here is the full trace:

Traceback (most recent call last):
  File "train_inverse_helmholtz.py", line 78, in <module>
    training.train(model=model, train_dataloader=dataloader, epochs=opt.num_epochs, lr=opt.lr,
  File "../siren/training.py", line 73, in train
    losses = loss_fn(model_output, gt)
  File "../siren/loss_functions.py", line 188, in helmholtz_pml
    b, _ = diff_operators.jacobian(modules.compl_mul(B, dudx2), x)
  File "../siren/diff_operators.py", line 53, in jacobian
    jac[:, :, i, :] = grad(y_flat, x, torch.ones_like(y_flat), create_graph=True)[0]
  File ".../anaconda3/envs/tf-gpu2/lib/python3.8/site-packages/torch/autograd/__init__.py", line 202, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 31.75 GiB total capacity; 28.32 GiB already allocated; 11.75 MiB free; 30.49 GiB reserved in total by PyTorch)

Can you please help?

HyperNetwork (CNN+SIREN) on larger images?

Does it seem feasible to train the CNN+SIREN scheme on larger images? The given example (train_img_neural_process.py) uses 32x32 pixel images, and I haven't managed to generate visually pleasing results on a more useful size (eg 256x256px).

Either the image quality is very poor, or the memory blows up when I try increasing the number of parameters.

Another example (train_img.py) uses the same SIREN network to generate larger images (512x512) so the SIREN shouldn't be the problem, but its weight generation process is. It seems that the output of the CNN is reduced to only 256 features, first in ConvImgEncoder, which does something like

torch.nn.Linear(1024,1)(torch.rand([batch_size, 256, 32, 32]).view(4,256,-1)).squeeze(-1).shape                                                                                      
torch.Size([batch_size, 256])

Then the HyperParameter network (FCBlock) does the same according to the hidden layer's (hyper) hidden features.

This seems to be very little when we are generating weights for a network which has 256*256 weights per layer. I guess it would work on 32x32px but it makes sense that this would not scale up.

Do you have any insight as how to generate the SIREN weights from the output of a CNN without blowing up memory?

Undefined name: F in modules.py

https://github.com/vsitzmann/siren/blob/master/modules.py#L484

SDF training time

Are the parameter in SDF training file in experiments folder, reflect the parameters reported in the article? I am experimenting SDF train from scratch using thai statue point cloud, which was provided in google drive link using a V100 GPU but expected time to train so far estimated quite long. I previously tried to train it in a RTX 8000 but the result was the same. I think paper was mentioning 5-6 hours of training for 50k epochs .

Runtime error in experiment poisson_gradcomp

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py"
I'm using two 256x256 B&W images, but when I run the script I keep getting:

Traceback (most recent call last):
File "/content/siren/experiment_scripts/train_poisson_gradcomp_img.py", line 44, in
sidelength=512)
File "/content/siren/dataio.py", line 854, in init
paddedImg[:, 512 - 340:512, :] = self.img2
RuntimeError: The expanded size of the tensor (84) must match the existing size (256) at non-singleton dimension 1. Target sizes: [1, 84, 256]. Tensor sizes: [256, 256]

Any clue on how to fix? Many thanks in advance

an accessible colab with some extra features

I created an accessible colab notebook for fitting SIREN on image/video + option for temporal extrapolation + option to load learned initializations from https://www.matthewtancik.com/learnit (including using image inits for videos)
please find here: colab.research.google.com/gist/eyaler/20a9037a3619378b276b2303dadb558d

[Inconsistency with paper] I reproduced audio signal by RELU + MLP.

To compare SIREN layer with RELU +MLP, we implement two models.

audio signal (B, T, 1) --> Linear(B, T, 128) +RELU --> Linear(B, T, 128) +RELU --> Linear(B, T, 128) +RELU --> Linear(B, T, 1) --> reproduced signal (B, T, 1)
audio signal (B, T, 1) -->SIREN layer --> SIREN layer -->SIREN layer --> Linear(B, T, 1) --> reproduced signal (B, T, 1)

In your paper, RELU+MLP is not able to reproduce the audio signal, However, First model can reproduce audio signal even better than SIREN...

SIREN is also very instability so i used lower learning rate. But the loss fluctuated.

Could you explain why the siren is better than others in audio reproduction domain?

Possible mismatch between supplementary section 1.5 and the implementation?

The initialization of the first layer as in here:

siren/modules.py

Lines 629 to 634 in ecd150f

    
           def first_layer_sine_init(m): 
        
               with torch.no_grad(): 
        
                   if hasattr(m, 'weight'): 
        
                       num_input = m.weight.size(-1) 
        
                       # See paper sec. 3.2, final paragraph, and supplement Sec. 1.5 for discussion of factor 30 
        
                       m.weight.uniform_(-1 / num_input, 1 / num_input)

This does not apply a square root over the fan in of the layer. Am I missing something in the paper?

environment missing skvideo

Created an encironemnt directly from the yml file.

When trying to run one of the experiment_scripts I get an error that I'm missing skvideo.
Just letting you know so you add it there if you want. (pip install sk-video)

	def first_layer_sine_init(m):
	with torch.no_grad():
	if hasattr(m, 'weight'):
	num_input = m.weight.size(-1)
	# See paper sec. 3.2, final paragraph, and supplement Sec. 1.5 for discussion of factor 30
	m.weight.uniform_(-1 / num_input, 1 / num_input)

vsitzmann / siren Goto Github PK

siren's People

Stargazers

Watchers

Forkers

siren's Issues

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py" I'm using two 256x256 B&W images, but when I run the script I keep getting:

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py" I'm using two 256x256 B&W images, but when I run the script I keep getting:

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py" I'm using two 256x256 B&W images, but when I run the script I keep getting:

Recommend Projects

Recommend Topics

Recommend Org

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py"
I'm using two 256x256 B&W images, but when I run the script I keep getting:

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py"
I'm using two 256x256 B&W images, but when I run the script I keep getting:

Hi, I'm trying to run experiment "train_poisson_gradcomp_img.py"
I'm using two 256x256 B&W images, but when I run the script I keep getting: