Is it possible to use the schema of our own dataset, generated by SQLite in json format, with SmBop code in google colab? about smbop HOT 18 CLOSED

varunpandya2004 commented on September 27, 2024

Is it possible to use the schema of our own dataset, generated by SQLite in json format, with SmBop code in google colab?

from smbop.

Comments (18)

OhadRubin commented on September 27, 2024

Sure, you can look at how Spider formats the schema and just adapt your own DB to that structure.
Then you can just upload your DB to Colab and you're done.

from smbop.

varunpandya2004 commented on September 27, 2024

Are you aware of any scripts that can help in doing that?
And once the schema is ready, will SmBop perform well on it, without training the network again on my own dataset?

from smbop.

OhadRubin commented on September 27, 2024

Not really.
Yeah! Actually, any model that does well on Spider will be able to generalise to a new schema.

from smbop.

OhadRubin commented on September 27, 2024

How to fine-tune the model on our own dataset, without needing to retrain it again? Can the pretrained model you provided be used for this?

Yes. You can use the pre-trained model.
If the new data is in the same format as Spider, you can use the original dataset reader. And modify the defaults.jsonnet such that the model is initialized from the pretrained model as described here.

from smbop.

varunpandya2004 commented on September 27, 2024

How to fine-tune the model on our own dataset, without needing to retrain it again? Can the pretrained model you provided be used for this?

Yes. You can use the pre-trained model.
If the new data is in the same format as Spider, you can use the original dataset reader. And modify the defaults.jsonnet such that the model is initialized from the pretrained model as described here.

I am new to jsonnet format. Where should I put

[".*",
{
"type": "pretrained",
"weights_file_path": "best.th",
"parameter_name_overrides": {}
}
]

in the jsonnet file?
I have already converted the dataset in spider format

from smbop.

OhadRubin commented on September 27, 2024

I think instead of initializerapplicator it should be initializer

from smbop.

varunpandya2004 commented on September 27, 2024

It's still giving the same error on changing it to "initializer"
Have you found any alternatives?

from smbop.

Illumaria commented on September 27, 2024

It's still giving the same error on changing it to "initializer"
Have you found any alternatives?

Hi! Seems like I found the solution: try to replace the whole "model": {...} block in the jsonnet file with just

"model": {
  "type": "from_archive",
  "archive_file": "<your_path>/model.tar.gz",
},

UPD: note that you'll need to modify the config.json after training finishes, because the reference to pretrained model archive won't go anywhere and it will prevent the proper loading of the finetuned model.

from smbop.

varunpandya2004 commented on September 27, 2024

It's still giving the same error on changing it to "initializer"
Have you found any alternatives?

Hi! Seems like I found the solution: try to replace the whole "model": {...} block in the jsonnet file with just
"model": {
  "type": "from_archive",
  "archive_file": "<your_path>/model.tar.gz",
},
UPD: note that you'll need to modify the config.json after training finishes, because the reference to pretrained model archive won't go anywhere and it will prevent the proper loading of the finetuned model.

Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?

from smbop.

Illumaria commented on September 27, 2024

Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?

I'd say that if you take a pretrained model and then train it on a completely different dataset, it is a retraining but with a very very good weights initialization. ;D Still, the weights will obviously shift towards the new data, just as expected. I would assume you have to add your data to the Spider data instead of replacing it if you want to keep the performance, but no guarantee that the model will generalize well to that union.

In my case, I started ~~finetuning~~ retraining with 74,6% (according to the evaluation script) overall exact matching accuracy on Spider and 0% on my data, and ended with 67,1% and 63,6% accuracy, respectively. Now, I don't really care about model performance on Spider since I want to use it on my data, not on Spider. ;)

from smbop.

varunpandya2004 commented on September 27, 2024

Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?

I'd say that if you take a pretrained model and then train it on a completely different dataset, it is a retraining but with a very very good weights initialization. ;D Still, the weights will obviously shift towards the new data, just as expected. I would assume you have to add your data to the Spider data instead of replacing it if you want to keep the performance, but no guarantee that the model will generalize well to that union.

In my case, I started ~~finetuning~~ retraining with 74,6% (according to the evaluation script) overall exact matching accuracy on Spider and 0% on my data, and ended with 67,1% and 63,6% accuracy, respectively. Now, I don't really care about model performance on Spider since I want to use it on my data, not on Spider. ;)

Hi Thanks for that answer! Really helpful.

But seems like I ran into another issue. The Inference pipeline of SMBop hosted on google colab, runs out of RAM(25 gigs) as it keeps on connecting and loading trees for my own finetuned model. Whereas using the pretrained model provided by the authors, only loads and connects the trees once, and works fine on google colab.

Any insights on this?

from smbop.

OhadRubin commented on September 27, 2024

Hey, can you provide a MWE?
Specifically, What is the error message?

from smbop.

varunpandya2004 commented on September 27, 2024

Hey, can you provide a MWE?
Specifically, What is the error message?

I don't get an error message. It just keeps on running "before load_trees, before connecting" until it runs out of ram(25GB GPU RAM) for my pretrained model. Whereas for your pretrained model, it only loads a couple of times and runs fine.

from smbop.

OhadRubin commented on September 27, 2024

Can you share the code?

from smbop.

varunpandya2004 commented on September 27, 2024

Can you share the code?

I think there was a problem with google colab's GPU. It's working fine now. Thanks for all of your help!

from smbop.

alan-ai-learner commented on September 27, 2024

@varunpandya2004 Hi, can you please tell me how you prepared your custom dataset, and share the script or colab notebook

from smbop.

alan-ai-learner commented on September 27, 2024

I have my own dataset. I can generate its schema in json format using SQLite DB's export option. I want to convert natural questions to SQL format using SmBop for querying on my own database. Is it possible to do this using the pretrained SmBop model on google colab?

@varunpandya2004 can you tell me how you created the schema, like when i'm trying to export with export option in sqlite studio , there is no options for the schema to export, there is only options for tables, database, query. Please help
thanks

from smbop.

alan-ai-learner commented on September 27, 2024

Can you share the code?

I think there was a problem with google colab's GPU. It's working fine now. Thanks for all of your help!

facing same issue, how did you solved?

from smbop.

Is it possible to use the schema of our own dataset, generated by SQLite in json format, with SmBop code in google colab? about smbop HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent