Giter Site home page Giter Site logo

nlpodyssey / spago Goto Github PK

View Code? Open in Web Editor NEW
1.7K 39.0 86.0 19.95 MB

Self-contained Machine Learning and Natural Language Processing library in Go

License: BSD 2-Clause "Simplified" License

Go 83.06% Assembly 16.85% C 0.09%
deep-learning machine-learning natural-language-processing neural-network computation-graph automatic-differentiation artificial-intelligence deeplearning transformer-architecture recurrent-networks

spago's Introduction



Build Coverage Go Report Card Maintainability Documentation License PRs Welcome Awesome Go


If you like the project, please ★ star this repository to show your support! 🤩

15 Jan 2024 - As I reflect on the journey of Spago, I am filled with gratitude for the enriching experience it has provided me. Mastering Go and revisiting the fundamentals of Deep Learning through Spago has been immensely rewarding. The unique features of Spago, especially its asynchronous computation graph and focusing on clean coding, have made it an extraordinary project to work on. Our goal was to create a minimalist ML framework in Go, eliminating the dependency on Python in production by enabling the creation of standalone executables. This approach of Spago successfully powered several of my projects in challenging production environments.

However, the endeavor to elevate Spago to a level where it can compete effectively in the evolving 'AI space', which now extensively involves computation on GPUs, requires substantial commitment. At the same time, the vision that Spago aspired to achieve is now being impressively realized by the Candle project in Rust. With my limited capacity to dedicate the necessary attention to Spago, and in the absence of a supporting maintenance team, I have made the pragmatic decision to pause the project for now.

I am deeply grateful for the journey Spago has taken me on and for the community that has supported it. As we continue to explore the ever-evolving field of machine learning, I look forward to the exciting developments that lie ahead.

Warm regards,

Matteo Grella


Spago is a Machine Learning library written in pure Go designed to support relevant neural architectures in Natural Language Processing.

Spago is self-contained, in that it uses its own lightweight computational graph both for training and inference, easy to understand from start to finish.

It provides:

  • Automatic differentiation via dynamic define-by-run execution
  • Feed-forward layers (Linear, Highway, Convolution...)
  • Recurrent layers (LSTM, GRU, BiLSTM...)
  • Attention layers (Self-Attention, Multi-Head Attention...)
  • Gradient descent optimizers (Adam, RAdam, RMS-Prop, AdaGrad, SGD)
  • Gob compatible neural models for serialization

If you're interested in NLP-related functionalities, be sure to explore the Cybertron package!

Usage

Requirements:

Clone this repo or get the library:

go get -u github.com/nlpodyssey/spago

Getting Started

A good place to start is by looking at the implementation of built-in neural models, such as the LSTM.

Example 1

Here is an example of how to calculate the sum of two variables:

package main

import (
	"fmt"
	"log"

	"github.com/nlpodyssey/spago/ag"
	"github.com/nlpodyssey/spago/mat"
)

func main() {
	// define the type of the elements in the tensors
	type T = float32

	// create a new node of type variable with a scalar
	a := mat.Scalar(T(2.0), mat.WithGrad(true)) // create another node of type variable with a scalar
	b := mat.Scalar(T(5.0), mat.WithGrad(true)) // create an addition operator (the calculation is actually performed here)
	c := ag.Add(a, b)

	// print the result
	fmt.Printf("c = %v (float%d)\n", c.Value(), c.Value().Item().BitSize())

	c.AccGrad(mat.Scalar(T(0.5)))

	if err := ag.Backward(c); err != nil {
		log.Fatalf("error during Backward(): %v", err)
	}

	fmt.Printf("ga = %v\n", a.Grad())
	fmt.Printf("gb = %v\n", b.Grad())
}

Output:

c = [7] (float32)
ga = [0.5]
gb = [0.5]

Example 2

Here is a simple implementation of the perceptron formula:

package main

import (
	"fmt"
	
	. "github.com/nlpodyssey/spago/ag"
	"github.com/nlpodyssey/spago/mat"
)

func main() {
	x := mat.Scalar(-0.8)
	w := mat.Scalar(0.4)
	b := mat.Scalar(-0.2)

	y := Sigmoid(Add(Mul(w, x), b))

	fmt.Printf("y = %0.3f\n", y.Value().Item())
}

Contributing

If you think something is missing or could be improved, please open issues and pull requests.

To start contributing, check the Contributing Guidelines.

Contact

We highly encourage you to create an issue as it will contribute to the growth of the community. However, if you prefer to communicate with us privately, please feel free to email Matteo Grella with any questions or comments you may have.

spago's People

Contributors

dependabot[bot] avatar evanmcclure avatar jcr179 avatar jjviana avatar marco-nicola avatar matteo-grella avatar mattn avatar mbrukman avatar paralin avatar phanirithvij avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spago's Issues

QA Chinese model result does not match python version

Using this Chinese model
This model runs on python locally, the output is correct, but from spaGO is not.

Similar to #101, but I cannot find the bool parameter for QA,
how to turn off output is forced to be a distribution (sum must be 1), whereas with Python, the output is free?

server := bert.NewServer(model)
answers := s.model.Answer(body.Question, body.Passage)

Translated QA:
Context: My name is Clara, I live in Berkeley
Q: what is my name?
A: Clara

Output is supposed to be
克拉拉
but got

{
    "answers": [
        {
            "text": "我叫克拉拉,我住在伯克利。",
            "start": 0,
            "end": 13,
            "confidence": 0.2547743
        },
        {
            "text": "住在伯克利。",
            "start": 7,
            "end": 13,
            "confidence": 0.22960596
        },
        {
            "text": "我叫克拉拉,我住",
            "start": 0,
            "end": 8,
            "confidence": 0.1548344
        }
    ],
    "took": 1075
}

./bert-server server --repo=~/.spago --model=luhua/chinese_pretrain_mrc_roberta_wwm_ext_large --tls-disable

PASSAGE="我叫克拉拉,我住在伯克利。"                                                                                                                                                 
QUESTION1="我的名字是什么?" 
curl -k -d '{"question": "'"$QUESTION1"'", "passage": "'"$PASSAGE"'"}' -H "Content-Type: application/json" "http://127.0.0.1:1987/answer?pretty"

Use static linking when building Go binaries for Docker image

This will create larger binaries. However, the Docker base image can be swapped out for a distroless image such as scratch, which will make the resulting Docker image smaller.

One aspect to consider is whether the CGO versions of dependencies are used, as this can effect how the statically linked binaries are built.

Is it possible to pre-load passages from csv?

Is it currently possible to preload let's say the go faq and run semantic search on the passages by only providing a question to spago serving squad2.
I would like to create behavior like shown in this video semantic-search

If not yet could you give me some pointers to dig into it?

Thank you

Question server example should use http not https

The documentation for the question server indicates:

curl -k -d '{"question": "'"$QUESTION1"'", "passage": "'"$PASSAGE"'"}' -H "Content-Type: application/json" "https://127.0.0.1:1987/answer?pretty"

However, the URL should be http or you will receive:

curl: (35) error:1400410B:SSL routines:CONNECT_CR_SRVR_HELLO:wrong version number

Make gRPC client print YAML by default

Make the gRPC client output pretty and easier to read, by printing the API responses in YAML format. Add a flag to optionally print the API responses in JSON format.

Cannot import project as library

Hello,

I am trying to import this project as a library by running the command
go get -u github.com/nlpodyssey/spago
(as instructed in the readme)

I am getting the following error:
cannot find package "github.com/dgraph-io/badger/v2"

Anyone else got this error?

Loading converting PyTorch/ Lightning models

I could not work this out directly from the code (I have only just started working with this), and I realize that there is not exhaustive documentation and examples (if this project turns out to be good for what I wish to do (train in Python, run via go), then I am willing to contribute. Your README encourages creating issues, I hope that this is the correct place to ask a question.

I am training a model based on a standard Huggingface BERT model. I have trained it and have PyTorch LIghtning checkpoints and of course i can save the model in Pickle vi torch.save(model.state_dict(). "file.pt").

Now, how can i convert that in to the .bin file that this code wants to use. Sorry if this seems obvious, but the converter code I traced seems to want more files than just a state_dict.

Any pointers or example code that I missed.

Gorgonia tensors

Hi Matteo
in your gophercon deck you mentioned to have gpu friendly gorgonia tensors on spago's roadmap.
I am curious about how this might work. Could you give any pointers.
I suppose because the tensors are more or less just a slice of floats?
I read on their git that with regards to cuda the api is expected to change quite a bit before hitting v1.0.
Currently on 0.9.17 so I guess not before gorgonia's cuda interface got to a stable first release?

Other integration / Telegram Bot

Hi guys,

Hope you are all well !

I was looking for some implementation of AI chatbot golang and found your project spago.

And, I am developing a multibot for telegram based on go plugins: https://github.com/paper2code/telegram-multibot

So I was wondering how can I integrate spago as a QA bot plugin.

Do you have any insights or advices for such integration ?

Thanks in advance.

Cheers,
X

Support HTTPS by default

Support HTTPS by default, to maintain best practices in delivering cloud-native applications.

Ca not convert model to use it with spago

Hello, i'm trying to convert a model as it described in README.md in Import pretrained model section.

Step 1 (building):

Just build binary as usual: go build -o hugging_face_importer ./cmd/huggingfaceimporter/main.go

Step 2 (converting models):

Yes, directory ~/.spago is already created.

I found a model at hugginface - DeepPavlov/rubert-base-cased-conversational and then I try to run huggingface importer like this: ./hugging_face_importer --model=DeepPavlov/rubert-base-cased-conversational --repo=~/.spago

After many lines of logging that everything is OK I've got a crash:

Reading encoder.layer.10.attention.output.LayerNorm.bias.... ok
2020/06/14 14:51:06 Convert word/positional/type embeddings...
panic: runtime error: slice bounds out of range [:768] with capacity 0

goroutine 1 [running]:
github.com/nlpodyssey/spago/pkg/nlp/transformers/bert.assignToParamsList(0x0, 0x0, 0x0, 0xc000172000, 0x200, 0x200, 0x200, 0x300)
        /Users/alex/Downloads/spago-master/pkg/nlp/transformers/bert/converter.go:225 +0x112
github.com/nlpodyssey/spago/pkg/nlp/transformers/bert.(*huggingFacePreTrainedConverter).convertEmbeddings(0xc080285dd8, 0xc000114600)
        /Users/alex/Downloads/spago-master/pkg/nlp/transformers/bert/converter.go:203 +0xaf
github.com/nlpodyssey/spago/pkg/nlp/transformers/bert.(*huggingFacePreTrainedConverter).convert(0xc080285dd8, 0x2, 0x2)
        /Users/alex/Downloads/spago-master/pkg/nlp/transformers/bert/converter.go:101 +0x17e
github.com/nlpodyssey/spago/pkg/nlp/transformers/bert.ConvertHuggingFacePreTrained(0xc00002e580, 0x3e, 0x2, 0xc00002e580)
        /Users/alex/Downloads/spago-master/pkg/nlp/transformers/bert/converter.go:65 +0x5b9
main.main()
        /Users/alex/Downloads/spago-master/cmd/huggingfaceimporter/main.go:48 +0x385

Can I help with anything else?

Better BERT Server Capabilities

(apologies if the many issues are annoying, really liking the library so far)

Overview

Right now when using the BERT server, a set of defaults are used without any control from the user:

  • HTTP server lacks customizability
  • HTTP server listens on 0.0.0.0
  • No ability to enable TLS

Additionally it's not possible to build "external servers" as the functions used by the BERT http router (discriminateHandler, predictionHandler, qaHandler are private functions). If this was changed to instead export the handler functions (DiscriminateHandler, PredictionHandler, QaHandler) it would allow people to have more control over the BERT server, better middleware capabilities, TLS, etc...

By using public handler functions, users would be able to define their own routers, say using chi, and overall have more control of the BERT server.

I'd be more than happy to open a PR that implements these suggestions

Multi-label BERT classifier from PyTorch

So I can convert and then load my BERT model, but I am having troubling working out how to operate it from Spago.

It is a multi-label model and to use it in Python I do this:

    text_enc = bert_tokenizer.encode_plus(
            texttoclassify,
            None,
            add_special_tokens=True,
            max_length=MAX_LEN,
            padding='max_length',
            return_token_type_ids=False,
            return_attention_mask=True,
            truncation=True,
            return_tensors='pt'
    )

    # mymodel implements pl.LightningModule
    #
    outputs = mymodel(text_enc['input_ids'], text_enc['attention_mask'])
    pred_out = outputs[0].detach().numpy()

And then process the pred_out array. This model has 5 outputs and all works as you expect in Python.

So, how do I perform the equivalent in Spago? Borrowing code from the classifier server, I have got this far, but it just isn't obvious what I need to modify to cater for 5 output label layer.

func getTokenized(vocab *vocabulary.Vocabulary, text string) []string {
	cls := wordpiecetokenizer.DefaultClassToken
	sep := wordpiecetokenizer.DefaultSequenceSeparator
	tokenizer := wordpiecetokenizer.New(vocab)
	tokenized := append([]string{cls}, append(tokenizers.GetStrings(tokenizer.Tokenize(text)), sep)...)
	return tokenized
}

// ....
	model, err := bert.LoadModel(dir)
	if err != nil {
		log.Fatalf("error during model loading (%v)\n", err)
	}
	defer model.Close()
	
	// We need a classifier that matches the output layer of our model.
	//
	var bc = bert.ClassifierConfig{
		InputSize: 768,
		Labels:    []string{"A", "B", "C", "D", "E"},
	}
	model.Classifier = bert.NewTokenClassifier(bc)

	tokenized := getTokenized(model.Vocabulary, s)

	g := ag.NewGraph(ag.ConcurrentComputations(runtime.NumCPU()))
	proc := nn.ReifyForInference(model, g).(*bert.Model)
	encoded := proc.Encode(tokenized)

	logits := proc.SequenceClassification(encoded)
	probs := floatutils.SoftMax(logits.Value().Data())

However, this just gives me 0.2 for each, so I seem to be miles off. Is there an example, or can a short code sequence be provide? Is the wordpiecetokenizer even the correct thing to use?

bert train

I want to generate my own bert model,but I'm not sure how to train my corpus,Could you make out a tutorial,I am a rookie in the GO : (

Verify integrity of models fetched from the Interwebs

Currently, models that are fetched from remote sources are uncompressed without having their integrity verified. Models fetched from remote sources should be verified first, using checksums that are published with the models.

BERT Server Prediction Doesn't Seem To Work

I'm attempting to use the BERT server, and have successfully gotten the /answer API call to work. I can't seem to find much information on BERT prediction in general, but I'm guessing its used to predict what the next sentence will be?

Based on this I tried sending a JSON request to the /predict route like so, but it gives an empty response:

$> curl -d '{"text": "BERT is a technique for NLP developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google."}' -H Content-Type=application/json http://localhost:1987/predict
{"tokens":[]}

Contrast this with discriminate which works:

$> curl -d '{"text": "BERT is a technique for NLP developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google."}' -H Content-Type=application/json http://localhost:1987/discriminate
{"tokens":[{"text":"BERT","start":0,"end":4,"label":"FAKE"},{"text":"is","start":5,"end":7,"label":"FAKE"},{"text":"a","start":8,"end":9,"label":"FAKE"},{"text":"technique","start":10,"end":19,"label":"FAKE"},{"text":"for","start":20,"end":23,"label":"FAKE"},{"text":"NLP","start":24,"end":27,"label":"FAKE"},{"text":"developed","start":28,"end":37,"label":"FAKE"},{"text":"by","start":38,"end":40,"label":"FAKE"},{"text":"Google","start":41,"end":47,"label":"FAKE"},{"text":".","start":47,"end":48,"label":"FAKE"},{"text":"BERT","start":49,"end":53,"label":"FAKE"},{"text":"was","start":54,"end":57,"label":"FAKE"},{"text":"created","start":58,"end":65,"label":"FAKE"},{"text":"and","start":66,"end":69,"label":"FAKE"},{"text":"published","start":70,"end":79,"label":"FAKE"},{"text":"in","start":80,"end":82,"label":"FAKE"},{"text":"2018","start":83,"end":87,"label":"FAKE"},{"text":"by","start":88,"end":90,"label":"FAKE"},{"text":"Jacob","start":91,"end":96,"label":"FAKE"},{"text":"Devlin","start":97,"end":103,"label":"FAKE"},{"text":"and","start":104,"end":107,"label":"FAKE"},{"text":"his","start":108,"end":111,"label":"FAKE"},{"text":"colleagues","start":112,"end":122,"label":"FAKE"},{"text":"from","start":123,"end":127,"label":"FAKE"},{"text":"Google","start":128,"end":134,"label":"FAKE"},{"text":".","start":134,"end":135,"label":"FAKE"}]}

Also, what does discriminate do exactly? I found an article explaining all the details of BERT but cant seem to find out what discriminate does

Combine Multiple Models With BERT Server?

I'm experimenting with spago, and while attempting to start a BERT server, I noticed that it was impossible to load multiple models, and use multiple models with a single instance of a BERT server.

Is this not possible because the code to do so hasn't been written, or its not possible due to the way BERT server works?

Differences in the output of zero shot classification between python & spago for the same model

I appreciate everyone involved with the spago project for developing a proper Machine Learning framework for Go.

I'm in the process of exploring spago and found that the output for valhalla/distilbart-mnli-12-3 differs for zero shot classification when using python vs spago .

       //main.go

	model, err := zsc.LoadModel("spago/valhalla/distilbart-mnli-12-3")
	if err != nil {
		log.Fatal(err)
	}
	defer model.Close()

	//Sequence
	sequence := "PalmOS on Raspberry Pi"

	// arbitrary list of topics
	candidateLables := []string{"startup", "business", "legal", "tech"}

	result, err := model.Classify(sequence, "", candidateLables, true)

	if err != nil {
		log.Fatal(err)
	}
	for i, item := range result.Distribution {
		fmt.Printf("%d. %s [%.2f]\n", i, item.Class, item.Confidence)
	}

0. tech [0.89]
1. startup [0.02]
2. legal [0.01]
3. business [0.00]

   #main.py
    classifier = pipeline("zero-shot-classification", model="models/distilbart-mnli-12-3")

    sequence = "PalmOS on Raspberry Pi"
    candidate_labels = ["startup", "business", "legal", "tech"]

    res = classifier(sequence, candidate_labels, multi_label=True, truncation=False)

    for i, label in enumerate(candidate_labels):
        print("%d. %s [%.2f]\n" % (i, res['labels'][i], res['scores'][i]))
0. tech [0.99]
1. legal [0.77]
2. startup [0.05]
3. business [0.00]

Is this an expected behavior? If so why.

Nearly 4 times the memory usage when compared to python for the same model

I ran memory profiling for the code #103 and spago version uses 3.9 GB when compared to 1.2 GB of python. The model sizes are similar valhalla/distilbart-mnli-12-3 , it is 2.5 GB after transforming (hf-importer) to spago and where as upstream python version is 2.1 GB.

Memory profiling in spago

memory_prof

Memory profiling in Python

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     7    217.3 MiB    217.3 MiB           1   @profile
     8                                         def classify():
     9   1227.3 MiB   1010.0 MiB           1       classifier = pipeline("zero-shot-classification", model="models/distilbart-mnli-12-3")
    10                                         
    11   1227.3 MiB      0.0 MiB           1       sequence = "PalmOS on Raspberry Pi"
    12   1227.3 MiB      0.0 MiB           1       candidate_labels = ["startup", "business", "legal", "tech"]
    13                                         
    14                                         
    15   1235.1 MiB      7.8 MiB           1       res = classifier(sequence, candidate_labels, multi_label=True, truncation=False)
    16                                         
    17   1235.1 MiB      0.0 MiB           5       for i, label in enumerate(candidate_labels):
    18   1235.1 MiB      0.0 MiB           4           print("%d. %s [%.2f]\n" % (i, res['labels'][i], res['scores'][i]))

Is this expected?
Spago can be very useful in low memory environments like ARM SBC to conducted CPU bound inference, But the memory usage needs to optimized.

Python version seems to be faster in overall operation timing as well because loading of configuration, model weights takes variable timing in spago.

Chaining convolution+maxpooling layers

From what I found, one can create a 1-convolution cnn with the cnn.NewModel which will look like this:

  • convolution
  • max pooling
  • linear layer

There is also the convolution.New function which creates a new convolution. But how should I proceed if I want to create a bigger model by chaining multiple conv and max pooling before adding a linear layer? e.g. :

  • conv1
  • max pooling
  • conv2
  • maxpooling
  • linear

I was expecting something like:

model := stack.New(
    convolution.New(convConfig)
    pooling.NewMax(2, 2)
    convolution.New(convConfig)
    pooling.NewMax(2, 2)
    linear.New(in, out)
    activation.New(outputAct)
)

Benchmarks vs numpy, scipy, sklearn, Pytorch

Did you guys compare this library for equivalent implementations in numpy, scipy, sklear, Pytorch, Tensorflow?

If GPU is not supported, you can try reporting CPU versions. AFAIK both Pytorch and Tensorflow have cpu modes for their tensor operations.

This is the best thing since sliced bread

Hello I do not have any issue so feel free to close this but I just wanted to say that spago is fantastic.
Love what has been done here in native go!!!
Thank you everybody involved.
FORZA ITALIA

Accelerators

There is a ton of work being done with risc-v and machine learning accelerators

tinygo makes it’s possible to leverage spago on these accuracies I feel.

Just wanted to point this out as I saw in your Q&A that you felt it was not possible to accelerate spago

bart-large-mnli multi_class does not agree with Python version

If you convert facebook/bart-large-mnli and use it to evaluate the demo text at huggingface and compare against a local Python setup for verification, we find that:

  • the online demo card and the local Python agree on the label score
  • the label probabilities given back are vastly different
  • the Python version takes roughly 16 seconds on my local machine, but the Spago version takes 37 seconds - this is a MAC and there is no GPU available

Python code is

    text = "A new model offers an explanation for how the Galilean satellites formed around the solar system’s " \
           "largest world. Konstantin Batygin did not set out to solve one of the solar system’s most puzzling " \
           "mysteries when he went for a run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, " \
           "best known for his contributions to the search for the solar system’s missing “Planet Nine,” spotted a " \
           "beer bottle. At a steep, 20 degree grade, he wondered why it wasn’t rolling down the hill. He realized " \
           "there was a breeze at his back holding the bottle in place. Then he had a thought that would only pop " \
           "into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.” Europa is one of " \
           "Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal, " \
           "Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory " \
           "in France, present a theory explaining how some moons form around gas giants like Jupiter and Saturn, " \
           "suggesting that millimeter-sized grains of hail produced during the solar system’s formation became " \
           "trapped around these massive worlds, taking shape one at a time into the potentially habitable moons we " \
           "know today. "
    cc = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
    labels = ['space & cosmos', 'scientific discovery', 'microbiology', 'robots', 'archeology']
    r = cc(text, labels, multi_class=True)

Go code, with same text and classes, is:

bartModel, err = zsc.LoadModel(bartDir)
// ... check err
result, err := bartModel.Classify(c.InputText, "", classes, true)

Similarly using the model valhalla/distilbart-mnli-12-3 also gives wildly different results to the online huggingface demo, using the same text and label set as above.

So, is there something else I need to do, or is the zsc code not working? My go code is essentially just like the zsc demo code.

how to convert hf models to spago format?

Apologies if this is a silly question. But it seems there used to be an importor for huggingface models but it disappeared from the main branch / readme.
There are binary available elsewhere but I just want to make sure I am using the official and latest approach.

Demo endpoint /answer responds with HTTP code 404.

The demo endpoint /answer responds with the HTTP code 404.

$ curl -d '{"question": "'"$QUESTION1"'", "passage": "'"$PASSAGE"'"}' -H "Content-Type: application/json" "http://127.0.0.1:1987/answer?pretty"
404 page not found

Not very robust update of Mean and StdDev in BatchNorm

A second thought about the recent fix of the BatchNorm (#48) raises an interesting question.

It is the first time that it happens that during a forward, the parameters are directly altered (in this case Mean and StdDev):

p.updateBatchNormParameters(meanVector.Value(), devVector.Value())

In general, I believe that the Optimizer should be the only responsible for updating the parameters. However, here we are not talking about gradients, so the situation is slightly different.

The first thing that comes to mind is to update these values (Mean and StdDev) by saving them inside the Processor. A special method is then added to it which has the effect of updating the model parameters. It remains an ad hoc act, but it seems more robust and consistent with the rest.

@jjviana What do you think? Let's discuss here :)

Publish godocs to https://pkg.go.dev

Some of the content in the README can be republished as Godocs to https://pkg.go.dev. (The original site was https://godoc.org). A badge should be placed in the README that points to the godocs.

Adding a package should be as simple as searching for the docs, as described below (https://go.dev/about).

Adding a package

Data for the site is downloaded from proxy.golang.org. We monitor the Go Module Index regularly for new packages to add to pkg.go.dev. If you don’t see a package on pkg.go.dev, you can add it by doing one of the following:

Making a request to proxy.golang.org for the module version, to any endpoint specified by the Module proxy protocol. For example: 
https://proxy.golang.org/example.com/my/module/@v/v1.0.0.info

Downloading the package via the go command. For example: 
GOPROXY=https://proxy.golang.org GO111MODULE=on go get example.com/my/[email protected]

v1.0.0-alpha.0: operator gradients are sometimes null in non-obvious ways

  1. Download the imdb sentiments dataset from here: https://drive.google.com/drive/folders/1PZUyks3g1rvoSR9ZSaWr7hy61NQifcoq?usp=sharing
  2. Try to train a model with the following parameters:
    ./perceiver train -i train.shuf.csv --test-file validation.csv -j 32 -s 21 -k 2 -g 1 -e 1 -o imdb-minimal-2l.model -n 1

At some random point in training the following error will occur:
goroutine 4813637 [running]:
github.com/nlpodyssey/spago/mat.(*Dense[...]).Rows()
:1 +0x9
github.com/nlpodyssey/spago/mat.SameDims[...]({0x1584880?, 0x0}, {0x1584880, 0xc00b16d950})
/Users/julianoviana/Development/spago/mat/matrix.go:231 +0x38
github.com/nlpodyssey/spago/mat.(*Dense[...]).AddInPlace(0x0, {0x1584880, 0xc00b16d950})
/Users/julianoviana/Development/spago/mat/dense.go:514 +0x5f
github.com/nlpodyssey/spago/ag.(*Operator[...]).AccGrad(0xc0588a3b00, {0x1584880, 0xc00b16d950})
/Users/julianoviana/Development/spago/ag/operator.go:165 +0x122
github.com/nlpodyssey/spago/ag/fn.(*Mul[...]).Backward.func2()
/Users/julianoviana/Development/spago/ag/fn/mul.go:64 +0x18d
created by github.com/nlpodyssey/spago/ag/fn.(*Mul[...]).Backward
/Users/julianoviana/Development/spago/ag/fn/mul.go:58 +0x2a5

Debugging the problem I have found that in operator.go it is possible to observe o.grad!=nil and at the same time reflect.ValueOf(o.grad).IsNil()==true. That means somewhere a nil pointer is being cast to mat.Matrix[t] and stored in o.grad. Since mat.Matrix is an interface, the ==nil test will return false but any method call will panic as mat.Dense does not consider the possibility of a nil pointer value.
The following seems to fix it, albeit with the use of reflection:

diff --git a/ag/operator.go b/ag/operator.go
index d0167911..47c39738 100644
--- a/ag/operator.go
+++ b/ag/operator.go
@@ -157,11 +157,12 @@ func (o *Operator[T]) AccGrad(grad mat.Matrix[T]) {
        o.cond.L.Lock()
        defer o.cond.L.Unlock()
 
-       if o.grad == nil {
+       if o.grad == nil || reflect.ValueOf(o.grad).IsNil() {
                o.cond.L.Unlock()
                o.grad = o.Value().ZerosLike()
                o.cond.L.Lock()
        }
+
        o.grad.AddInPlace(grad)
 
        if o.inBackward && atomic.AddInt64(&o.pendingGrads, -1) == 0 {

Not sure this is an acceptable fix - but couldn't find any place where this weird grad is being set either...

Help with German zero shot

I would like to run Sahajtomar/German_Zeroshot (https://huggingface.co/Sahajtomar/German_Zeroshot) model in spago.

The import was successful:
./huggingface-importer --model=Sahajtomar/German_Zeroshot --repo=./models
-> BERT has been converted successfully!

Can I now run the model with Bart server(as I believe supports the zero shot, not the Bart server)?

I receive:

bassea@AP15557 spago % ./bart-server server --repo=./models --model=Sahajtomar/German_Zeroshot --tls-disable
Start loading pre-trained model from "models/Sahajtomar/German_Zeroshot"
[1/2] Loading configuration... ok
panic: bart: unsupported architecture BertForSequenceClassification

goroutine 1 [running]:
github.com/nlpodyssey/spago/pkg/nlp/transformers/bart/loader.Load(0xc000038660, 0x21, 0x2, 0xc000038660, 0x21, 0x492f960)
/Users/bassea/go/src/spago/pkg/nlp/transformers/bart/loader/loader.go:43 +0x819
github.com/nlpodyssey/spago/cmd/bart/app.newServerCommandActionFor.func1(0xc00022b740, 0x0, 0x0)
/Users/bassea/go/src/spago/cmd/bart/app/server.go:106 +0x105
github.com/urfave/cli/v2.(*Command).Run(0xc000222ea0, 0xc00022b440, 0x0, 0x0)
/Users/bassea/go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:163 +0x4e0
github.com/urfave/cli/v2.(*App).RunContext(0xc0000351e0, 0x4ae0aa0, 0xc000036068, 0xc0000320a0, 0x5, 0x5, 0x0, 0x0)
/Users/bassea/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:313 +0x814
github.com/urfave/cli/v2.(*App).Run(...)
/Users/bassea/go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main()
/Users/bassea/go/src/spago/cmd/bart/main.go:15 +0x72

Go.mod inside cmd sub-packages?

Pull-request #22 introduced something I wasn't aware of. What are the pros and cons to have a dedicated go.mod inside a cmd sub-package, like here?

In my understanding, this is a good way to avoid adding global dependencies that are actually only required in cmd. Cons?

@paralin

Error handling

Hey Guys,
I saw that there are 227 uses of panic in 87 files and I do understand your motivation behind it.
Go's implicit error returns stand in the way of function chaining youd like to achieve
eg gx := r.x1.Value().(*mat.Dense).MulT(gy)
Let's assume MulT can panic inside the backward scope but that panic should recover in eg the graph wrapper scope
This should explain bit better what I mean
What are your thoughts on this?

Suitable for categorising intents and slots?

Firstly, thins looks awesome! Thanks for your hard work on it.

We current have a very simple bot written in Go that our team can send messages to which we are looking to upgrade. It currently works out the "Intent" of the user based on some regex and bag of words but it's not very good!

We were looking at Rasa who let you define your Intents with examples which you can then use to train your model, but I'd rather not have a python service running in docker if I don't have to!

This is an example of the training data you could use in Rasa:

nlu:
- intent: ask_name
  examples: |
    - What is your name?
    - May I know your name?
    - What do people call you?
    - Do you have a name for yourself?

- intent: ask_weather
  examples: |
    - What's the weather like today?
    - Does it look sunny outside today?
    - Oh, do you mind checking the weather for me please?
    - I like sunny days in Berlin.

Could I implement something similar using spago with a text similarity transformer? I'm thinking I would take the users input such as tell me the weather and hopefully it will return the top 3 sentences matched from the examples I've given in the training data with a percentage match next to each one? I could the look up the intent that the sentence was a part of?

I have a couple of questions if that's OK:

  1. Is there a simper or better way that you can think of than using text similarity for this use-case?
  2. What would be really nice, is to return the intent, and also have some slots defined in the input. For the weather I could have a Location slot. Again, would the easiest thing be to run an Entity detection on the sentence once I have a Weather intent and see if I can find a location... if not I can then ask for it?

I'm happy to do some reading if you throw me some of the Buzz words I need to go read up about!

Thanks again!

Edit: Looks like Rasa has a dual purpose classifier that does the intent and Entities at the same time:
https://blog.rasa.com/introducing-dual-intent-and-entity-transformer-diet-state-of-the-art-performance-on-a-lightweight-architecture/
https://github.com/WeiNyn/DIETClassifier-pytorch

Recommendations for CLI sub-commands

Here are some commands that could be in an "auto-demo" cli:

  • importer: the existing downloader we have now
  • answer: question/answer demo, imports a model + runs the demo interactively or with cli flags for the question and passage.
  • more optionally interactive "run it one off" commands

Thoughts?

float32 data type

This is just a question out of curiosity but: Do you have any plans to support the float32 data type (or any other types, like integers actually)?

  • It is very common to train a neural network with a float32 precision, as it reduces the computation cost without having any significant impact on the accuracy, and I was wondering what would be the speed gain for spago?
  • I was thinking about something like an Enum type given to the matrix creation function and maybe the possibility to convert an existing matrix from a type to another
  • Supporting types like uint8 could allow to implement some quantization schemes more easily, which seems like a good fit with Golang since it shines in distributed applications (e.g. sending quantized weights over the network is definitely more bandwidth friendly)

For now I just find myself "fighting the matrix" by extracting the underlying data, convert them in the desired type, do some work with them and finally reloading them later in a matrix to run some computation.

Running hugging_face_importer from docker container causes strange behavior

I was following instructions to test the question answering demo and noticed that the container never completed by outputting the spago model and left a zombie python process running on my machine (a mid-2013 MacBook Pro 2.3 ghz quad core i7, 16 gb ddr3). Here are steps to reproduce:

# after cloning the repo in its latest form
git rev-parse --short HEAD
f91d1b8
docker build -t spago:main . -f Dockerfile
# that completes successfully
mkdir ~/.spago
# then i run the hugging face import step via the container
docker run --rm -it -v ~/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago

Running command: './hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago'
Downloading dataset...
Start downloading 🤗 `deepset/bert-base-cased-squad2`
2020/06/27 18:55:30 Fetch the model configuration from `https://s3.amazonaws.com/models.huggingface.co/bert/deepset/bert-base-cased-squad2/config.json`
Downloading... 508 B complete
2020/06/27 18:55:30 Fetch the model vocabulary from `https://s3.amazonaws.com/models.huggingface.co/bert/deepset/bert-base-cased-squad2/vocab.txt`
Downloading... 214 kB complete
2020/06/27 18:55:30 Fetch the model weights from `https://s3.amazonaws.com/models.huggingface.co/bert/deepset/bert-base-cased-squad2/pytorch_model.bin` (it might take a while...)

# this process runs for a _really_ long time - ive actually never seen it finish successfully (have let it run for over 60 minutes)
...

In another shell session I was inspecting what Python processes are operating because I noted some cpu hogging after the pytorch model was fully downloaded ...

$ ls -hal ~/.spago/deepset/bert-base-cased-squad2/
total 417M
drwxr-xr-x 6 anthcor staff  192 Jun 27 14:56 ./
drwxr-xr-x 3 anthcor staff   96 Jun 27 14:55 ../
-rw-r--r-- 1 anthcor staff  508 Jun 27 14:55 config.json
drwx------ 6 anthcor staff  192 Jun 27 14:57 embeddings_storage/
-rw-r--r-- 1 anthcor staff 414M Jun 27 14:56 pytorch_model.bin
-rw-r--r-- 1 anthcor staff 209K Jun 27 14:55 vocab.txt
$ ps aux | grep spago
anthcor          29694   6.1  0.1  4444408  22784 s003  S+    2:55PM   0:03.44 docker run --rm -it -v /Users/anthcor/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago
anthcor          29705   0.0  0.0  4268300    700 s004  S+    2:56PM   0:00.00 grep spago
anthcor          29693   0.0  0.0  4280612   6932 s003  S+    2:55PM   0:00.08 /usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python /usr/local/bin/grc -es --colour=auto docker run --rm -it -v /Users/anthcor/.spago:/tmp/spago spago:main ./hugging_face_importer --model=deepset/bert-base-cased-squad2 --repo=/tmp/spago

When running this workflow not with docker everything works as expected.

In order to stop everything I just kill the docker container and the zombie local python process.

ps aux | grep -i spago | awk '{print $2}' | xargs kill -9 $1

Not really sure why this happens – looks like the container is using a local system binary and that causes hangups in the flow of things but I could totally be wrong as I haven't really spent too much time diving in. Hope this gives enough insight into my issue – lmk if you would like any more details. Cheers 🍻

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.