Giter Site home page Giter Site logo

Comments (6)

jdppthk avatar jdppthk commented on July 23, 2024 1

Thanks @Melissa3248. I would recommend pre-populating the empty hdf5 files. Before running the parallel_copy_small_set.py file. I will push a script to do that. It should just do the following:

time_steps = 52
with h5py.File('filename.h5', 'w') as f:
f.create_dataset('fields', shape = (time_steps, 20, 720, 1440), dtype='f')

look through the h5py docs for more details https://docs.h5py.org/en/stable/high/dataset.html

from fourcastnet.

TeunZoer avatar TeunZoer commented on July 23, 2024

Hi everyone, I ran into the exact same error with the example code but have no idea how to solve it. Does someone know what is causing the error?

from fourcastnet.

Afshinshafei avatar Afshinshafei commented on July 23, 2024

I am searching for more than one week that what can cause this error and the most recommendation was data corruption which is not our case I think.
if someone knows how to solve it, it can save so much time for us thank you very much.

from fourcastnet.

Melissa3248 avatar Melissa3248 commented on July 23, 2024

I ran into the same problem running the parallel_copy_small_set.py file, and I was able to fix the issue by adding two lines in the writetofile() function:

def writetofile(src, dest, channel_idx, varslist, src_idx=0, frmt='nc'):
    if os.path.isfile(src):
        batch = 2**4
        rank = MPI.COMM_WORLD.rank
        Nproc = MPI.COMM_WORLD.size
        Nimgtot = 52#src_shape[0]

        Nimg = Nimgtot//Nproc
        base = rank*Nimg
        end = (rank+1)*Nimg if rank<Nproc - 1 else Nimgtot
        idx = base

        fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
        fdest['fields'] = np.empty((1,20,720,1440))

        for variable_name in varslist:

            if frmt == 'nc':
                fsrc = DS(src, 'r', format="NETCDF4").variables[variable_name]
            elif frmt == 'h5':
                fsrc = h5py.File(src, 'r')[varslist[0]]
            #print("fsrc shape", fsrc.shape)
            fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)

            start = time.time()
            while idx<end:
                if end - idx < batch:
                    if len(fsrc.shape) == 4:
                        ims = fsrc[idx:end,src_idx]
                    else:
                        ims = fsrc[idx:end]
                    print(ims.shape)
                    fdest['fields'][idx:end, channel_idx, :, :] = ims
                    break
                else:
                    if len(fsrc.shape) == 4:
                        ims = fsrc[idx:idx+batch,src_idx]
                    else:
                        ims = fsrc[idx:idx+batch]
                    #ims = fsrc[idx:idx+batch]
                    print("ims shape", ims.shape)
                    fdest['fields'][idx:idx+batch, channel_idx, :, :] = ims
                    idx+=batch
                    ttot = time.time() - start
                    eta = (end - base)/((idx - base)/ttot)
                    hrs = eta//3600
                    mins = (eta - 3600*hrs)//60
                    secs = (eta - 3600*hrs - 60*mins)

            ttot = time.time() - start
            hrs = ttot//3600
            mins = (ttot - 3600*hrs)//60
            secs = (ttot - 3600*hrs - 60*mins)
            channel_idx += 1

The two lines I added are:
fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
fdest['fields'] = np.empty((1,20,720,1440))

This initializes the destination file with a shape of (timepoint, # features, latitude, longitude). The code then appends data for each time point, so in the 13 day example, you obtain a dataset with size (52,20,720,1440). Hope this fix also works for you!

from fourcastnet.

TeunZoer avatar TeunZoer commented on July 23, 2024

Thanks @Melissa3248 and @jdppthk for your solutions. Melissa her solution (with a small adjustment in the empty numpy array) only worked for the first variable, from the second variable ('v10') I then get the following error:

File "/home/teun/Documents/TUD/FourCastNet/FourCastNet-0.0.0/data_process/parallel_copy_small_set_Melissa.py", line 119, in
writetofile(src, dest, 1, ['v10'])
File "/home/teun/Documents/TUD/FourCastNet/FourCastNet-0.0.0/data_process/parallel_copy_small_set_Melissa.py", line 71, in writetofile
fdest['fields'] = np.empty((16,20,721,1440))
File "/home/teun/miniconda3/lib/python3.9/site-packages/h5py/_hl/group.py", line 433, in setitem
h5o.link(ds.id, self.id, name, lcpl=lcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

For me the pre-populating proposed by jdppthk worked!

from fourcastnet.

jdppthk avatar jdppthk commented on July 23, 2024

Glad it works for you. Closing this issue.

from fourcastnet.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.