Giter Site home page Giter Site logo

Comments (12)

olegtarasov avatar olegtarasov commented on June 5, 2024 1

@Sojaner Great, thank you!

I created C++ exports by hand. I made a fork of Facebook's implementation here: https://github.com/olegtarasov/fastText (note that default branch for this fork is c_api).

All of the exports are declared and implemented in https://github.com/olegtarasov/fastText/blob/c_api/src/fasttext_api.h and https://github.com/olegtarasov/fastText/blob/c_api/src/fasttext_api.cc. There is also https://github.com/olegtarasov/fastText/blob/c_api/src/fasttext_api_test.cc with some manual tests (never got around to creating automated tests unfortunately). There is a CMake target fasttext-api-test which will create a binary for these tests.

After you make a pull request to that repo and I merge it, new native Nuget packages (FastText.Native.*) will be built automatically.

After that you will need to update this project with new versions of Nugeet packages and mirror new exports in https://github.com/olegtarasov/FastText.NetWrapper/blob/master/FastText.NetWrapper/FastTextWrapper.Api.cs and then extend https://github.com/olegtarasov/FastText.NetWrapper/blob/master/FastText.NetWrapper/FastTextWrapper.cs with actual new methods.

If you have further questions, please don't hesitate to ask!

from fasttext.netwrapper.

olegtarasov avatar olegtarasov commented on June 5, 2024 1

Merged and published a new version. Thank you!

from fasttext.netwrapper.

olegtarasov avatar olegtarasov commented on June 5, 2024 1

Huh, you are right, pretrained vectors should be in text format. Sorry for misleading you :) I didn't notice your training data link the first time, and now I see that it lacks labels.

You are trying to train a classifier that predicts which class a given example belongs to. You need to define those classes at train time, and the way to do this in FastText is to add labels like __label__0 at the beginning of each training example like this:

__label__1 Błąd w dokumentacji
__label__1 Błąd w formularzu delegacji 
__label__1 Błąd w kalkulacji
__label__2 Dokument FS, pobieranie z symfonii
__label__2 Dokument historyczny - niepoprawna wartość 
__label__3 Żeby usunać pozycje dokumentu trzeba wyjść z kontrolki pozycji i wrócić.

Here __label__ is a default prefix that you can change with SupervisedArgs.LabelPrefix. You should make sure that label prefix in your training file matches the prefix you pass to FastText. You can use any string as label, and you can even put multiple lables for a signle example for multi-label classification:

__label__foo __label__bar __label__baz Błąd w dokumentacji

I was able to successfully train a model on your data and pretrained vectors in text form, so let me know if you can reproduce the result :)
I also suggest that you read an original tutorial from fastText: https://github.com/facebookresearch/fastText/blob/main/docs/supervised-tutorial.md. You can also read other docs in that directory, they are mostly useful.

from fasttext.netwrapper.

olegtarasov avatar olegtarasov commented on June 5, 2024

Hi @Sojaner! Yes, this library doesn't yet cover the full interface of the original FastText, and unsupervised models in particular. But it wouldn't be hard to add the implementation — I would appreciate a pull request :)

from fasttext.netwrapper.

Sojaner avatar Sojaner commented on June 5, 2024

Hi @olegtarasov! I would love to help with that. I need to understand one thing before being able though. Do the exports already exist on the C++ side or are they missing too? I don't seem to be able to figure out how/where/using which tool (if any), you have done that part.
I will hopefully figure the C# bindings out from there. :)

from fasttext.netwrapper.

Sojaner avatar Sojaner commented on June 5, 2024

Pull request #13 is waiting to solve this. :)

from fasttext.netwrapper.

Sojaner avatar Sojaner commented on June 5, 2024

I have also added some extra code which makes it possible to load a model from memory as a byte array.

In my scenario where this would be useful, I embed the model as a resource and load it over the memory, hence no need for file extraction and temp file management.

from fasttext.netwrapper.

rzechu avatar rzechu commented on June 5, 2024

I see its closed, and project is propably abandoned - but same problem.

Downloaded vectors vec + bin from
https://fasttext.cc/docs/en/crawl-vectors.html
And can't use it.
Also trained model with

            model = new FastText.NetWrapper.FastTextWrapper();
            var args = new SupervisedArgs
            {
                PretrainedVectors = "H:\\AI\\FastText\\cc.pl.300.vec",
                dim = 300
            };
            model.Supervised("H:\\AI\\FastText\\myData.txt", @"H:\\AI\\FastText\\myData.txt", args);

But neverthless what I use (plain downloaded files or code above
model.LoadModel("H:\AI\FastText\cc.pl.300.bin"); //downloaded from url

model = new FastTextWrapper();
model.LoadModel("H:\\AI\\FastText\\zgloszenieDone.txt.bin");
//model.Quantize(); //optionally
var predictions = model.PredictMultiple(text, 3); 

I always receive.
'Loaded model doesn't contain supervised labels. Maybe you loaded an unsupervised model?'

from fasttext.netwrapper.

olegtarasov avatar olegtarasov commented on June 5, 2024

Hi @rzechu! The project is definitely not abandoned, it's just complete :) There are no major updates to FastText being released and most bugs are eliminated in this wrapper.

Can you post a complete reproducible project along with data that you use to train the model?

from fasttext.netwrapper.

rzechu avatar rzechu commented on June 5, 2024

Propably I am doint it wrong but I guess there's should be option to use unsupervised data

using System.Windows.Forms;
using FastText.NetWrapper;
using FastText;

namespace FastText6
{
    public partial class Form1 : Form
    {
        private FastTextWrapper model;

        public Form1()
        {
            InitializeComponent();

            model = new FastTextWrapper();

            if (false)
            {
                //option 1

                //model url
                //https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.pl.300.bin.gz
                model.LoadModel("H:\\AI\\FastText\\cc.pl.300.bin");
            }
            else
            {
                //option 2 

                //train
                //vec file https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.pl.300.vec.gz

                var args = new SupervisedArgs
                {
                    PretrainedVectors = "H:\\AI\\FastText\\cc.pl.300.vec",
                    dim = 300,

                };
                model.Supervised("H:\\AI\\FastText\\zgloszenieShort.txt", @"H:\\AI\\FastText\\zgloszenieShortDone.txt", args);

                model.LoadModel("H:\\AI\\FastText\\zgloszenieShortDone.txt.bin");
                model.Quantize();
            }
        }

        private void textBox1_TextChanged(object sender, EventArgs e)
        {
            string text = textBox1.Text;
            if (!text.Contains(" "))
                return; 
            //var z = model.GetModelDimension(); //300
            var predictions = model.PredictMultiple(text, 3); //error here 
            listBox1.Items.Clear();
            foreach (var prediction in predictions)
            {
                listBox1.Items.Add(prediction.Label);
            }
        }
    }
}

Short zgloszenieShort.txt
zgloszenieShort.txt

For GetNearestNeighbours(text, 3) works fine

from fasttext.netwrapper.

olegtarasov avatar olegtarasov commented on June 5, 2024

Ok, I think I can see the problem.

  1. When using option 1 you are just loading unsupervised vectors in binary form. If you try to predict something on this model, you will get an error, since this is an usupervised model. It means it contains no classes, it just maps words to their vector representations. You can probably use GetSentenceVector and GetWordVector methods with this model, but that's about it.
  2. When using option 2 you are training a supervised model on your data — this is the correct one. You also try to use pretrained vectors. This is fine, but it's an optional feature — you can train your model just from your data. Pretrained vectors might be useful if you don't have a lot of data and want to bootstrap your model with some previously trained vector representation. Anyway, here you use pretrained vectors in text form, and this is incorrect. Only binary models should be used with this library. So it should be:
var args = new SupervisedArgs
{
    PretrainedVectors = "H:\\AI\\FastText\\cc.pl.300.bin",
    dim = 300,
};

Anyway, for the first test I would recommend training the simplest possible classifier to see whether it works. And you also don't need to load the model after it was just trained — it's still loaded and memory and can be used right away.

from fasttext.netwrapper.

rzechu avatar rzechu commented on June 5, 2024

Thank you for response
I am new to ML just wanted to find working example and customize it to fullfill my needs.

I've tried 2nd option but also got error with bin. I understand message but have no idea ho to check or fix it. Pretrained model is described as dimension=300 (same on website as filename)
image

I guess my text is enough but I see i will propably need 300 pretrainer vectors dimension in my custom text file?
image

   at FastText.NetWrapper.FastTextWrapper.ThrowNativeException()
   at FastText.NetWrapper.FastTextWrapper.Supervised(String inputPath, String outputPath, SupervisedArgs args, AutotuneArgs autotuneArgs, Boolean debug)
   at FastText.NetWrapper.FastTextWrapper.Supervised(String inputPath, String outputPath, SupervisedArgs args, TrainProgressCallback progressCallback)
   at FastText6.Form1..ctor() in C:\Users\xyz\source\repos\FastText\FastText6\Form1.cs:line 38
   at FastText6.Program.Main() in C:\Users\xyz\source\repos\FastText\FastText6\Program.cs:line 14

Found 2 simmiliar issued and looks like people fixing it with using vec instead of bin
facebookresearch/fastText#108
facebookresearch/fastText#190

from fasttext.netwrapper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.