Giter Site home page Giter Site logo

Is it possible to use the schema of our own dataset, generated by SQLite in json format, with SmBop code in google colab? about smbop HOT 18 CLOSED

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024
Is it possible to use the schema of our own dataset, generated by SQLite in json format, with SmBop code in google colab?

from smbop.

Comments (18)

OhadRubin avatar OhadRubin commented on September 27, 2024

Sure, you can look at how Spider formats the schema and just adapt your own DB to that structure.
Then you can just upload your DB to Colab and you're done.

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024
  1. Are you aware of any scripts that can help in doing that?
  2. And once the schema is ready, will SmBop perform well on it, without training the network again on my own dataset?

from smbop.

OhadRubin avatar OhadRubin commented on September 27, 2024
  1. Not really.
  2. Yeah! Actually, any model that does well on Spider will be able to generalise to a new schema.

from smbop.

OhadRubin avatar OhadRubin commented on September 27, 2024

How to fine-tune the model on our own dataset, without needing to retrain it again? Can the pretrained model you provided be used for this?

Yes. You can use the pre-trained model.
If the new data is in the same format as Spider, you can use the original dataset reader. And modify the defaults.jsonnet such that the model is initialized from the pretrained model as described here.

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024

How to fine-tune the model on our own dataset, without needing to retrain it again? Can the pretrained model you provided be used for this?

Yes. You can use the pre-trained model.
If the new data is in the same format as Spider, you can use the original dataset reader. And modify the defaults.jsonnet such that the model is initialized from the pretrained model as described here.

Screenshot (8)
I am new to jsonnet format. Where should I put

[".*",
{
"type": "pretrained",
"weights_file_path": "best.th",
"parameter_name_overrides": {}
}
]

in the jsonnet file?
I have already converted the dataset in spider format

from smbop.

OhadRubin avatar OhadRubin commented on September 27, 2024

I think instead of initializerapplicator it should be initializer

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024

It's still giving the same error on changing it to "initializer"
Have you found any alternatives?

from smbop.

Illumaria avatar Illumaria commented on September 27, 2024

It's still giving the same error on changing it to "initializer"
Have you found any alternatives?

Hi! Seems like I found the solution: try to replace the whole "model": {...} block in the jsonnet file with just

"model": {
  "type": "from_archive",
  "archive_file": "<your_path>/model.tar.gz",
},

UPD: note that you'll need to modify the config.json after training finishes, because the reference to pretrained model archive won't go anywhere and it will prevent the proper loading of the finetuned model.

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024

It's still giving the same error on changing it to "initializer"
Have you found any alternatives?

Hi! Seems like I found the solution: try to replace the whole "model": {...} block in the jsonnet file with just

"model": {
  "type": "from_archive",
  "archive_file": "<your_path>/model.tar.gz",
},

UPD: note that you'll need to modify the config.json after training finishes, because the reference to pretrained model archive won't go anywhere and it will prevent the proper loading of the finetuned model.

Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?

from smbop.

Illumaria avatar Illumaria commented on September 27, 2024

Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?

I'd say that if you take a pretrained model and then train it on a completely different dataset, it is a retraining but with a very very good weights initialization. ;D Still, the weights will obviously shift towards the new data, just as expected. I would assume you have to add your data to the Spider data instead of replacing it if you want to keep the performance, but no guarantee that the model will generalize well to that union.

In my case, I started finetuning retraining with 74,6% (according to the evaluation script) overall exact matching accuracy on Spider and 0% on my data, and ended with 67,1% and 63,6% accuracy, respectively. Now, I don't really care about model performance on Spider since I want to use it on my data, not on Spider. ;)

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024

Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?

I'd say that if you take a pretrained model and then train it on a completely different dataset, it is a retraining but with a very very good weights initialization. ;D Still, the weights will obviously shift towards the new data, just as expected. I would assume you have to add your data to the Spider data instead of replacing it if you want to keep the performance, but no guarantee that the model will generalize well to that union.

In my case, I started finetuning retraining with 74,6% (according to the evaluation script) overall exact matching accuracy on Spider and 0% on my data, and ended with 67,1% and 63,6% accuracy, respectively. Now, I don't really care about model performance on Spider since I want to use it on my data, not on Spider. ;)

Hi Thanks for that answer! Really helpful.

But seems like I ran into another issue. The Inference pipeline of SMBop hosted on google colab, runs out of RAM(25 gigs) as it keeps on connecting and loading trees for my own finetuned model. Whereas using the pretrained model provided by the authors, only loads and connects the trees once, and works fine on google colab.

Any insights on this?

from smbop.

OhadRubin avatar OhadRubin commented on September 27, 2024

Hey, can you provide a MWE?
Specifically, What is the error message?

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024

Hey, can you provide a MWE?
Specifically, What is the error message?

image

I don't get an error message. It just keeps on running "before load_trees, before connecting" until it runs out of ram(25GB GPU RAM) for my pretrained model. Whereas for your pretrained model, it only loads a couple of times and runs fine.

from smbop.

OhadRubin avatar OhadRubin commented on September 27, 2024

Can you share the code?

from smbop.

varunpandya2004 avatar varunpandya2004 commented on September 27, 2024

Can you share the code?

I think there was a problem with google colab's GPU. It's working fine now. Thanks for all of your help!

from smbop.

alan-ai-learner avatar alan-ai-learner commented on September 27, 2024

@varunpandya2004 Hi, can you please tell me how you prepared your custom dataset, and share the script or colab notebook

from smbop.

alan-ai-learner avatar alan-ai-learner commented on September 27, 2024

I have my own dataset. I can generate its schema in json format using SQLite DB's export option. I want to convert natural questions to SQL format using SmBop for querying on my own database. Is it possible to do this using the pretrained SmBop model on google colab?

@varunpandya2004 can you tell me how you created the schema, like when i'm trying to export with export option in sqlite studio , there is no options for the schema to export, there is only options for tables, database, query. Please help
thanks

from smbop.

alan-ai-learner avatar alan-ai-learner commented on September 27, 2024

Can you share the code?

I think there was a problem with google colab's GPU. It's working fine now. Thanks for all of your help!

facing same issue, how did you solved?

from smbop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.