haozhongkai / gnot Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 15.0 1.87 MB

Python 99.67% Shell 0.33%

gnot's People

Contributors

Stargazers

Watchers

Forkers

thu-ml wenjundong bravedrxutf zhb0318 blueplexus isaacju-debug

gnot's Issues

How does GNOT work for time-dependent system?

Thanks a lot for your nice work!

I am wondering how this model applied for time-dependent system like NS2D since I dont see any detail in both github and the paper. Do you use autoregressive? Or directly change the input and output channel for time dimension.

I would appreciate your response.

Theta being concatenated with X?

In all your models, the model parameters u_p is concatenated with the grid x as an initial operation in any forward pass (before being passed through an MLP and then Attention Blocks). The model architecture diagram in the paper and readme indicates that theta should be processed as keys and values but wouldn't this conflate it as a query instead?

x = torch.cat([x, u_p.unsqueeze(1).repeat([1, x.shape[1], 1])], dim=-1)

Note: for the ns2d_1100_test.pkl file (rectangular grid with circular cavities), the theta array has a value of all zeros too, what information is this holding? No initial conditions or boundary conditions (inlet outlet) are specified in the model or data either. Were these inferred through training only?

Implementation with Geo-FNO NACA dataset

Hi dear author,

Thanks for your paper and implementation.
Possibly a silly question, could you please provide more specific details about running the model using the NACA dataset？ As for the NACA dataset used in FNO(-interp), directly using mask and Mach number as input and output may not fit into the model architecture.

About scaling experiments

Thank you for your excellent work! I want to ask the training setting of the scaling experiment in the paper. Should we keep same strategy for different training data size? I mean, if you use cycling strategy for each iter, the learning rate for each step would not be the same even if the cycling strategy is same for all experiments (Because the data size is different.).

Running HeatSink 3D

Hi,

First of all great work!

I would like to run this model on a similar example of yours: the heat3D sink. However I have a few questions:

the data that you indicate to (heat3d) is just an array of shape: (1000, 19517, 5). I assume you use 5 functions, would you be so kind to elaborate on what functions/features you use in this array?
The same link (heat3d) also contains a file "Heatsink_Output_XYZ.npy", but where can i find the ground truth?
How do you generate 1000 samples? are they just different simulation scenarios?

Looking forward to your answer!

Multi output functions

I really like the concept of GNOT as it allows me to combine multiple inputs functions and a input vector of scalars but is it able to learn to multiple output functions?

Does padding in GNOT contaminate the attention matrix?

In NLP, we have mask machanism to help prevent this. But in GNOT, the following code in https://github.com/HaoZhongkai/GNOT/blob/master/models/cgpt.py seems no mask procedure

class LinearAttention(nn.Module):
    """
    A vanilla multi-head masked self-attention layer with a projection at the end.
    It is possible to use torch.nn.MultiheadAttention here but I am including an
    explicit implementation here to show that there is nothing too scary here.
    """

    def __init__(self, config):
        super(LinearAttention, self).__init__()
        assert config.n_embd % config.n_head == 0
        # key, query, value projections for all heads
        self.key = nn.Linear(config.n_embd, config.n_embd)
        self.query = nn.Linear(config.n_embd, config.n_embd)
        self.value = nn.Linear(config.n_embd, config.n_embd)
        # regularization
        self.attn_drop = nn.Dropout(config.attn_pdrop)
        # output projection
        self.proj = nn.Linear(config.n_embd, config.n_embd)

        self.n_head = config.n_head

        self.attn_type = 'l1'

    '''
        Linear Attention and Linear Cross Attention (if y is provided)
    '''
    def forward(self, x, y=None, layer_past=None):
        y = x if y is None else y
        B, T1, C = x.size()
        _, T2, _ = y.size()
        # calculate query, key, values for all heads in batch and move head forward to be the batch dim
        q = self.query(x).view(B, T1, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
        k = self.key(y).view(B, T2, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
        v = self.value(y).view(B, T2, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)


        if self.attn_type == 'l1':
            q = q.softmax(dim=-1)
            k = k.softmax(dim=-1)   #
            k_cumsum = k.sum(dim=-2, keepdim=True)
            D_inv = 1. / (q * k_cumsum).sum(dim=-1, keepdim=True)       # normalized
        elif self.attn_type == "galerkin":
            q = q.softmax(dim=-1)
            k = k.softmax(dim=-1)  #
            D_inv = 1. / T2                                           # galerkin
        elif self.attn_type == "l2":                                   # still use l1 normalization
            q = q / q.norm(dim=-1,keepdim=True, p=1)
            k = k / k.norm(dim=-1,keepdim=True, p=1)
            k_cumsum = k.sum(dim=-2, keepdim=True)
            D_inv = 1. / (q * k_cumsum).abs().sum(dim=-1, keepdim=True)  # normalized
        else:
            raise NotImplementedError

        context = k.transpose(-2, -1) @ v
        y = self.attn_drop((q @ context) * D_inv + q)

        # output projection
        y = rearrange(y, 'b h n d -> b n (h d)')
        y = self.proj(y)
        return y



class LinearCrossAttention(nn.Module):
    """
    A vanilla multi-head masked self-attention layer with a projection at the end.
    It is possible to use torch.nn.MultiheadAttention here but I am including an
    explicit implementation here to show that there is nothing too scary here.
    """

    def __init__(self, config):
        super(LinearCrossAttention, self).__init__()
        assert config.n_embd % config.n_head == 0
        # key, query, value projections for all heads
        self.query = nn.Linear(config.n_embd, config.n_embd)
        self.keys = nn.ModuleList([nn.Linear(config.n_embd, config.n_embd) for _ in range(config.n_inputs)])
        self.values = nn.ModuleList([nn.Linear(config.n_embd, config.n_embd) for _ in range(config.n_inputs)])
        # regularization
        self.attn_drop = nn.Dropout(config.attn_pdrop)
        # output projection
        self.proj = nn.Linear(config.n_embd, config.n_embd)

        self.n_head = config.n_head
        self.n_inputs = config.n_inputs

        self.attn_type = 'l1'

    '''
        Linear Attention and Linear Cross Attention (if y is provided)
    '''
    def forward(self, x, y=None, layer_past=None):
        y = x if y is None else y
        B, T1, C = x.size()
        # calculate query, key, values for all heads in batch and move head forward to be the batch dim
        q = self.query(x).view(B, T1, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
        q = q.softmax(dim=-1)
        out = q
        for i in range(self.n_inputs):
            _, T2, _ = y[i].size()
            k = self.keys[i](y[i]).view(B, T2, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
            v = self.values[i](y[i]).view(B, T2, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
            k = k.softmax(dim=-1)  #
            k_cumsum = k.sum(dim=-2, keepdim=True)
            D_inv = 1. / (q * k_cumsum).sum(dim=-1, keepdim=True)  # normalized
            out = out +  1 * (q @ (k.transpose(-2, -1) @ v)) * D_inv


        # output projection
        out = rearrange(out, 'b h n d -> b n (h d)')
        out = self.proj(out)
        return out

. So it seems the element in the attention matrix is contaminated by the padded part. Is it true? Thanks.

License for contributions and loading from checkpoints?

Hi Hao Zhongkai,

Cool library, what license is this under? I'd like to add some loading from checkpoint capability and I can see some attempt was made by yourself, anything to be aware of beforehand?

3D Heat Sink Case

Great piece of work here. I am particularly interested in the 3D heatsink case, could you please supply the 3D heatsink training data and case handling functions so I can replicate your results?

Getting Error while creating Conda Environment with requirements.txt

PackagesNotFoundError: The following packages are not available from current channels:

einops==0.6.0
dgl==1.0.1+cu116
torch==1.10.0
scikit_learn==1.2.1

Missing data

Hi,

I noticed that there are some missing mph data files in the shared Google drive.

Im interested in the 3D HeatSink data files or if that is not possible, the mph files to understand the format of the data.

This example is complicated and we omit the technical details here and they could be found in the mph source files.

Thank you!

Multi-GPU training?

I have adapted this using simple Data-Parallel from Pytorch, but the model seems to output ``nans sometimes. Have you been able to train this across multiple GPUs on a single node?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.