Comments (18)
Sure, you can look at how Spider formats the schema and just adapt your own DB to that structure.
Then you can just upload your DB to Colab and you're done.
from smbop.
- Are you aware of any scripts that can help in doing that?
- And once the schema is ready, will SmBop perform well on it, without training the network again on my own dataset?
from smbop.
- Not really.
- Yeah! Actually, any model that does well on Spider will be able to generalise to a new schema.
from smbop.
How to fine-tune the model on our own dataset, without needing to retrain it again? Can the pretrained model you provided be used for this?
Yes. You can use the pre-trained model.
If the new data is in the same format as Spider, you can use the original dataset reader. And modify the defaults.jsonnet such that the model is initialized from the pretrained model as described here.
from smbop.
How to fine-tune the model on our own dataset, without needing to retrain it again? Can the pretrained model you provided be used for this?
Yes. You can use the pre-trained model.
If the new data is in the same format as Spider, you can use the original dataset reader. And modify the defaults.jsonnet such that the model is initialized from the pretrained model as described here.
I am new to jsonnet format. Where should I put
[".*",
{
"type": "pretrained",
"weights_file_path": "best.th",
"parameter_name_overrides": {}
}
]
in the jsonnet file?
I have already converted the dataset in spider format
from smbop.
I think instead of initializerapplicator
it should be initializer
from smbop.
It's still giving the same error on changing it to "initializer"
Have you found any alternatives?
from smbop.
It's still giving the same error on changing it to "initializer"
Have you found any alternatives?
Hi! Seems like I found the solution: try to replace the whole "model": {...}
block in the jsonnet
file with just
"model": {
"type": "from_archive",
"archive_file": "<your_path>/model.tar.gz",
},
UPD: note that you'll need to modify the config.json
after training finishes, because the reference to pretrained model archive won't go anywhere and it will prevent the proper loading of the finetuned model.
from smbop.
It's still giving the same error on changing it to "initializer"
Have you found any alternatives?Hi! Seems like I found the solution: try to replace the whole
"model": {...}
block in thejsonnet
file with just"model": { "type": "from_archive", "archive_file": "<your_path>/model.tar.gz", },
UPD: note that you'll need to modify the
config.json
after training finishes, because the reference to pretrained model archive won't go anywhere and it will prevent the proper loading of the finetuned model.
Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?
from smbop.
Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?
I'd say that if you take a pretrained model and then train it on a completely different dataset, it is a retraining but with a very very good weights initialization. ;D Still, the weights will obviously shift towards the new data, just as expected. I would assume you have to add your data to the Spider data instead of replacing it if you want to keep the performance, but no guarantee that the model will generalize well to that union.
In my case, I started finetuning retraining with 74,6% (according to the evaluation script) overall exact matching accuracy on Spider and 0% on my data, and ended with 67,1% and 63,6% accuracy, respectively. Now, I don't really care about model performance on Spider since I want to use it on my data, not on Spider. ;)
from smbop.
Hey thanks for the solution. It works now.
Although I am wondering,, with the above modification, if it's retraining the model or fine-tuning it? If it's retraining then the information learnt from the Spider dataset might be lost.
Any insights on this?I'd say that if you take a pretrained model and then train it on a completely different dataset, it is a retraining but with a very very good weights initialization. ;D Still, the weights will obviously shift towards the new data, just as expected. I would assume you have to add your data to the Spider data instead of replacing it if you want to keep the performance, but no guarantee that the model will generalize well to that union.
In my case, I started
finetuningretraining with 74,6% (according to the evaluation script) overall exact matching accuracy on Spider and 0% on my data, and ended with 67,1% and 63,6% accuracy, respectively. Now, I don't really care about model performance on Spider since I want to use it on my data, not on Spider. ;)
Hi Thanks for that answer! Really helpful.
But seems like I ran into another issue. The Inference pipeline of SMBop hosted on google colab, runs out of RAM(25 gigs) as it keeps on connecting and loading trees for my own finetuned model. Whereas using the pretrained model provided by the authors, only loads and connects the trees once, and works fine on google colab.
Any insights on this?
from smbop.
Hey, can you provide a MWE?
Specifically, What is the error message?
from smbop.
Hey, can you provide a MWE?
Specifically, What is the error message?
I don't get an error message. It just keeps on running "before load_trees, before connecting" until it runs out of ram(25GB GPU RAM) for my pretrained model. Whereas for your pretrained model, it only loads a couple of times and runs fine.
from smbop.
Can you share the code?
from smbop.
Can you share the code?
I think there was a problem with google colab's GPU. It's working fine now. Thanks for all of your help!
from smbop.
@varunpandya2004 Hi, can you please tell me how you prepared your custom dataset, and share the script or colab notebook
from smbop.
I have my own dataset. I can generate its schema in json format using SQLite DB's export option. I want to convert natural questions to SQL format using SmBop for querying on my own database. Is it possible to do this using the pretrained SmBop model on google colab?
@varunpandya2004 can you tell me how you created the schema, like when i'm trying to export with export option in sqlite studio , there is no options for the schema to export, there is only options for tables, database, query. Please help
thanks
from smbop.
Can you share the code?
I think there was a problem with google colab's GPU. It's working fine now. Thanks for all of your help!
facing same issue, how did you solved?
from smbop.
Related Issues (20)
- Returning tree_obj_values in place of tree_obj in spider.py HOT 4
- How can I change the parameters if the cuda memery is not enough? HOT 6
- Is SmBop can adapt the database functions and it not detecting the date columns in data base? HOT 3
- Usage of vocab built during training HOT 1
- Model Performance HOT 7
- [possible bug?] is_level_order_list should always have atleast one element as 1?
- [Possible Bug?] Should is_level_order_list always have atleast one element as 1? HOT 11
- How can we new sample to spider dataset to fine tune the SmBop? HOT 3
- setting bug
- System requirements for faster training!
- Question regarding final_beam_acc and BEM HOT 2
- RecursionError: maximum recursion depth exceeded HOT 2
- Executing exec.py HOT 2
- Problems using SmBop without values HOT 1
- Meaning of tensor beam_scores_el
- Hi! InvalidVersion: Invalid version: '0.10.1,<0.11'
- Issue in running Colab
- How to get RA Tree node embeddings?
- Can't download spider and model_weights HOT 1
- lmdb.Error: cache/exp1000train: No such file or directory HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smbop.