hamelsmu / code_search Goto Github PK

Code For Medium Article: "How To Create Natural Language Semantic Search for Arbitrary Objects With Deep Learning"

Home Page: https://medium.com/@hamelhusain/semantic-code-search-3cd6d244a39c

License: MIT License

Jupyter Notebook 95.45% Python 4.48% Shell 0.01% HTML 0.07%

code-search data-science deep-learning fastai keras machine-learning machine-learning-on-source-code ml-on-code natural-language-processing nlp python pytorch search search-algorithm searching-algorithms semantic-search semantic-search-engine tensorflow tutorial

code_search's People

Contributors

Stargazers

Watchers

Forkers

chubbymaggie mdp0999 yanfei10887 mattzheng niranjanagaram zhongkailv coloratto mizzlr yanghaha11514 aliendeep snehil monad-one benmbark shobhit-agarwal qitong nyrank jbdatascience ankit-cliqz gregsimpson dsp6414 reloadbrain tomasz-pankowski afcarl zhuzhibin1988 allensmile meelement birkanatici maozhitao liybu36 mahmud83 happyyolanda sedflix batermj khoa-ho zhouyonglong wotchin hatgit laksh47 lthodavdopl luffysg hytsang junchenjin ttong-ai kevintrannz coder-chenzhi 79212 hankniu01 danacity qingemeng stjordanis cwerner spencerai amirunpri2018 xiaojunzhao ngo010 zhaoyicc colinsongf benpeloquin7 marsch tristin7 ahzz1207 viyang tomarraj008 chnsh mpacadi yuhonghong7035 seujay teslaimpertior poovarasanvasudevan 0x6b7966 dantodor andrewnc sukeshtedla upalchowdhury gritsenko-konstantin pombredanne ian-cannon yumengsunny amrutadate ksrinivs64 balaprasanna p-r-t felixlephuoc alirezabayatmk foeinlove manikant92 chenzhik anyuanay anyuanxx nerdishhomosapein tapish13031997 chandrakant-sonawane neerajai eliasjacob hzitoun karndeb amirstudy parvbhullar junkgear karthikbd

code_search's Issues

OverflowError in 1 Preprocess Data

I'm trying to run the notebooks in my own python 3.6 conda environment.

I'm running into a problem when running this code:
pairs = flattenlist(apply_parallel(get_function_docstring_pairs_list, df.content.tolist(), cpu_cores=4))

I see the following traceback:

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-16-3f34f247210c>", line 40, in get_function_docstring_pairs_list
    return [get_function_docstring_pairs(b) for b in blob_list]
  File "<ipython-input-16-3f34f247210c>", line 40, in <listcomp>
    return [get_function_docstring_pairs(b) for b in blob_list]
  File "<ipython-input-16-3f34f247210c>", line 23, in get_function_docstring_pairs
    source = astor.to_source(f)
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 52, in to_source
    generator.result.append('\n')
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/node_util.py", line 143, in visit
    return visitor(node)
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 320, in visit_FunctionDef
    if not self.indentation:
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 218, in body
    self.indentation -= 1
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 168, in write
    elif callable(item):
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/node_util.py", line 143, in visit
    return visitor(node)
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 472, in visit_Return
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 206, in conditional_write
    # Inform the caller that we wrote
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 168, in write
    elif callable(item):
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/node_util.py", line 143, in visit
    return visitor(node)
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 659, in visit_Tuple
    with self.delimit(node, op) as delimiters:
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 268, in comma_list
    self.write(',' if trailing else '')
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 168, in write
    elif callable(item):
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/node_util.py", line 143, in visit
    return visitor(node)
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 627, in visit_Num
    delimiters.discard = delimiters.pp != pow_lhs
  File "/home/brian/.conda/envs/tmp/lib/python3.6/site-packages/astor/code_gen.py", line 619, in part
    self.write(s)
OverflowError: int too large to convert to float
"""

The above exception was the direct cause of the following exception:

OverflowError                             Traceback (most recent call last)
/media/HDD/brian/code_search/notebooks/general_utils.py in apply_parallel(func, data, cpu_cores)
     75         pool = Pool(cpu_cores)
---> 76         transformed_data = pool.map(func, chunked(data, chunk_size), chunksize=1)
     77     finally:

~/.conda/envs/tmp/lib/python3.6/site-packages/multiprocess/pool.py in map(self, func, iterable, chunksize)
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 

~/.conda/envs/tmp/lib/python3.6/site-packages/multiprocess/pool.py in get(self, timeout)
    643         else:
--> 644             raise self._value
    645 

OverflowError: int too large to convert to float

I'm not sure whats going on here. Any help appreciated.

Need clarification on training 'code2emb' model in Part 4

Is there any specific reason behind using large batch size(20k), less epochs(15) while training code2emb_model by freezing encoder and small batch size(2k), more epochs(20) while training complete model?

Please Help

Hello,
I am on step 5 of the tutorial using jupyter notebook, and have had an interesting time with the dependencies. I am trying to run this cell on step 5:

lang_model = torch.load('./data/lang_model/lang_model_cpu_v2.torch',
map_location=lambda storage, loc: storage)

vocab = load_lm_vocab('./data/lang_model/vocab_v2.cls')
q2emb = Query2Emb(lang_model = lang_model.cpu(),
vocab = vocab)

search_index = nmslib.init(method='hnsw', space='cosinesimil')
search_index.loadIndex('./data/search/search_index.nmslib')

however, an error is produced...

Note: be v. careful before removing this, as 3rd party device types

136         # likely rely on this behavior to properly .to() modules like LSTM.

--> 137 self._flat_weights = [getattr(self, weight) for weight in self._flat_weights_names]
138
139 # Flattens params (on CUDA)
'LSTM' object has no attribute '_flat_weights_names'
which originates in torch\nn\modules\module.py

This torch library is an old version of pytorch, right? There are also other libraries that I am having trouble getting a hold of. I am pretty sure that this is a compatibility error; is there anyway you could provide the torch library you used for this tutorial as it is not available to download. My OS is Win x64.

Consider re-factoring f-strings and consider python version

From @hohsiangwu , he makes some good points worth considering:

"""
I am not a big fan of f-string yet. I know it is easier and more clear but the downside is that it only supports Python3.6 and onwards.

I ran into several problems while dealing with the dependencies of python, pytorch with your libraries where I might be only in python3.5 and all of a sudden, all your libraries don't work. I would highly recommend that let's use more common patterns in the public repository.

I could be overthinking, so if you decide on keeping using f strings, I would need to modify the part to follow the convention. I think nothing is more confused than a repository with different patterns.
"""

The docker container for this tutorial is running python 3.6.3 but maybe that is not the best for our readers? I want to post this issue so I don't forget about it and come back to it later!

issue in fit method in fastai

when i run a fit function in language model it shows a below error :

Epoch:   0%|          | 0/7 [00:00<?, ?it/s]
  0%|          | 0/3280 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "E:/Mindtree_IDP/neural_language_model.py", line 41, in <module>
    wd=1e-6)
  File "E:\Mindtree_IDP\lang_model_utils.py", line 243, in train_lang_model
    best_save_name = 'language_model'
  File "C:\Users\GK\AppData\Roaming\Python\Python36\site-packages\fastai-0.7.0-py3.6.egg\fastai\learner.py", line 287, in fit
    return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
  File "C:\Users\GK\AppData\Roaming\Python\Python36\site-packages\fastai-0.7.0-py3.6.egg\fastai\learner.py", line 234, in fit_gen
    swa_eval_freq=swa_eval_freq, **kwargs)
  File "C:\Users\GK\AppData\Roaming\Python\Python36\site-packages\fastai-0.7.0-py3.6.egg\fastai\model.py", line 132, in fit
    loss = model_stepper.step(V(x),V(y), epoch)
  File "C:\Users\GK\AppData\Roaming\Python\Python36\site-packages\fastai-0.7.0-py3.6.egg\fastai\model.py", line 50, in step
    output = self.m(*xs)
  File "c:\users\gk\appdata\local\continuum\anaconda3\envs\mindtree\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "c:\users\gk\appdata\local\continuum\anaconda3\envs\mindtree\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "c:\users\gk\appdata\local\continuum\anaconda3\envs\mindtree\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\GK\AppData\Roaming\Python\Python36\site-packages\fastai-0.7.0-py3.6.egg\fastai\lm_rnn.py", line 97, in forward
    raw_output, new_h = rnn(raw_output, self.hidden[l])
  File "c:\users\gk\appdata\local\continuum\anaconda3\envs\mindtree\lib\site-packages\torch\nn\modules\module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\GK\AppData\Roaming\Python\Python36\site-packages\fastai-0.7.0-py3.6.egg\fastai\rnn_reg.py", line 122, in forward
    return self.module.forward(*args)
  File "c:\users\gk\appdata\local\continuum\anaconda3\envs\mindtree\lib\site-packages\torch\nn\modules\rnn.py", line 179, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: shape '[1000000, 1]' is invalid for input of size 2000

Facing error while executing -> from fastai.text import *

I am facing below error:
errofile.txt

from fastai import text
Traceback (most recent call last):

File "", line 1, in
from fastai import text

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\text_init_.py", line 1, in
from .. import basics

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\basics.py", line 1, in
from .basic_train import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\basic_train.py", line 2, in
from .torch_core import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\torch_core.py", line 2, in
from .imports.torch import *

ModuleNotFoundError: No module named 'fastai.imports.torch'; 'fastai.imports' is not a package

from fastai.text import *
Traceback (most recent call last):

File "", line 1, in
from fastai.text import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\text_init_.py", line 1, in
from .. import basics

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\basics.py", line 1, in
from .basic_train import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\basic_train.py", line 2, in
from .torch_core import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\torch_core.py", line 2, in
from .imports.torch import *

ModuleNotFoundError: No module named 'fastai.imports.torch'; 'fastai.imports' is not a package

from fastai.text import *
Traceback (most recent call last):

File "", line 1, in
from fastai.text import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\text_init_.py", line 1, in
from .. import basics

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\basics.py", line 1, in
from .basic_train import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\basic_train.py", line 2, in
from .torch_core import *

File "C:\Users\NJ077229\AppData\Local\Continuum\anaconda3\lib\site-packages\fastai\torch_core.py", line 2, in
from .imports.torch import *

ModuleNotFoundError: No module named 'fastai.imports.torch'; 'fastai.imports' is not a package

fastai backward compatibility issue

The image hamelsmu/ml-gpu is installing the latest version of fastai making some parts of the code to crash. Solved by uninstalling and: pip install fastai==0.7.0; pip install torchtext==0.2.3 but you could use the requirements.txt specification you also provided on the repo.

Q4.ipynb NotFound this file code_summary_seq2seq_model.h5

seq2seq_Model = load_model(str(seq2seq_path)+'/code_summary_seq2seq_model.h5')

'no such file or directory'

In my data/seq2seq folder, I only have those 4 files: py_code_proc_v2.dpkl py_comment_proc_v2.dpkl py_t_code_vecs_v2.npy py_t_comment_vecs_v2.npy

Doesn't have file code_summary_seq2seq_model.h5

Dataset Not found

Hi, @hamelsmu I really like your work on this code search, But I couldn't found the dataset for the development of this project, which you have mentioned in notebook1. Is it possible to share the 10 CSV file that you have mentioned getting?
Thank you

Part 3 - Training the language model

Hi @hamelsmu

Training the language model(train_lang_model) seems to take 13 hours and gpu utilization is at 0%. Why this step does not utilize gpu? Is this intentional? or Is there a configuration that needs to be set to enable gpu utilization?

I verified that the environment variable is set to a gpu device.

In Part 4, vectorize all of the code without docstrings.

I am wondering the reasons why we need to vectorize all of the code without docstrings, can we use the code that we have seen by the model?(if we do so, what are the problems) Thanks a lot.

ModuleNotFoundError: No module named 'tensorflow'

Hello,
how are you?

I'm following notebook 2 using the hamelsmu/ml-cpu docker container. But I encounter the following error when trying to load the preprocessed data (in the "Read Text From File" part):

File "/opt/project/utils/seq2seq.py", line 2, in <module> import tensorflow as tf ModuleNotFoundError: No module named 'tensorflow'

Any suggestions for solution?

Thank you very much and congratulations for the excellent tutorial.

Can't load index?

Hello,

I am trying to load the index using the code provided in notebook 5:

search_index = nmslib.init(method='hnsw', space='cosinesimil')
search_index.loadIndex('./data/search/search_index.nmslib')

But, the following error happens:

Check failed: data_level0_memory_
Traceback (most recent call last):
search_index.loadIndex("./data/search/search_index.nmslib")
RuntimeError: Check failed: it's either a bug or inconsistent data!

My computer has only 8GB as primary memory. So, did this happen because the index is over 8GB and could not be loaded into memory?

Thank you for any help.

Could not vectorize no docstring codes

Hello,

I am running notebook 4, the part where codes whithout docstrings are turned into embeddings. Specifically I am having trouble executing the following code snippet:

encinp = enc_pp.transform_parallel(no_docstring_funcs)
np.save(code2emb_path/'nodoc_encinp.npy', encinp)

which returns the following error:

WARNING:root:...tokenizing data
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py", line 88, in process_text
    return [tokenizer(cleaner(doc)) for doc in text]
  File "/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py", line 88, in <listcomp>
    return [tokenizer(cleaner(doc)) for doc in text]
  File "/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py", line 55, in textacy_cleaner
    no_accents=True)
  File "/opt/conda/lib/python3.6/site-packages/textacy/preprocess.py", line 256, in preprocess_text
    text = replace_urls(text)
  File "/opt/conda/lib/python3.6/site-packages/textacy/preprocess.py", line 101, in replace_urls
    return constants.RE_URL.sub(
AttributeError: module 'textacy.constants' has no attribute 'RE_URL'
"""

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py in apply_parallel(func, data, cpu_cores)
     71         pool = Pool(cpu_cores)
---> 72         transformed_data = pool.map(func, chunked(data, chunk_size), chunksize=1)
     73     finally:

/opt/conda/lib/python3.6/site-packages/multiprocess/pool.py in map(self, func, iterable, chunksize)
    265         '''
--> 266         return self._map_async(func, iterable, mapstar, chunksize).get()
    267 

/opt/conda/lib/python3.6/site-packages/multiprocess/pool.py in get(self, timeout)
    643         else:
--> 644             raise self._value
    645 

AttributeError: module 'textacy.constants' has no attribute 'RE_URL'

During handling of the above exception, another exception occurred:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-19-8e34745b4d23> in <module>
----> 1 encinp = enc_pp.transform_parallel(test_codes[:5])
      2 np.save(code2emb_path/'test_codes_encinp.npy', encinp)

/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py in transform_parallel(self, data)
    375         """
    376         logging.warning(f'...tokenizing data')
--> 377         tokenized_data = self.parallel_process_text(data)
    378         logging.warning(f'...indexing data')
    379         indexed_data = self.indexer.tokenized_texts_to_sequences(tokenized_data)

/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py in parallel_process_text(self, data)
    231                                                 end_tok=self.end_tok)
    232         n_cores = self.num_cores
--> 233         return flattenlist(apply_parallel(process_text, data, n_cores))
    234 
    235     def generate_doc_length_stats(self):

/opt/conda/lib/python3.6/site-packages/ktext/preprocess.py in apply_parallel(func, data, cpu_cores)
     74         pool.close()
     75         pool.join()
---> 76         return transformed_data
     77 
     78 

UnboundLocalError: local variable 'transformed_data' referenced before assignment

Thanks in Advance.

Pre-Trained Model for Code Search

Is there a pre-trained model for the code search task? I couldn't find it here or in CodeSearchNet Repository. Pre-Trained model for any of Part 3 or Part 4 can also help.

Why did you fitted several times the model?

I have been reading the tutorial, however in notebook 3 I noted you fitted the model several times:

In [18]:

if not use_cache:
    fastai_learner.fit(1e-3, 3, wds=1e-6, cycle_len=2)

HBox(children=(IntProgress(value=0, description='Epoch', max=6), HTML(value='')))

epoch      trn_loss   val_loss                                
    0      3.954703   3.989164  
    1      3.907728   3.975681                                
    2      3.936994   3.976287                                
    3      3.871557   3.96412                                 
    4      3.927649   3.969976                                
    5      3.873011   3.956639

Then you use here different parameters:

In [19]:

if not use_cache:
    fastai_learner.fit(1e-3, 2, wds=1e-6, cycle_len=3, cycle_mult=2)

HBox(children=(IntProgress(value=0, description='Epoch', max=9), HTML(value='')))

epoch      trn_loss   val_loss                                
    0      3.925804   3.971093  
    1      3.857519   3.951696                                
    2      3.840948   3.946251                                
    3      3.907309   3.970567                                
    4      3.879899   3.956719                                
    5      3.840587   3.947983                                
    6      3.823401   3.935096                                
    7      3.838912   3.929217                                
    8      3.778818   3.930717

And here you use cycle_mult=10:

In [20]:

if not use_cache:
    fastai_learner.fit(1e-3, 2, wds=1e-6, cycle_len=3, cycle_mult=10)

HBox(children=(IntProgress(value=0, description='Epoch', max=33), HTML(value='')))

epoch      trn_loss   val_loss                                
    0      3.86375    3.953147  
    1      3.851326   3.930299                                
    2      3.773453   3.927069                                
    3      3.879102   3.957266                                
    4      3.858202   3.954743                                
    5      3.852824   3.951508                                
    6      3.837561   3.9509                                  
    7      3.818845   3.947756                                
    8      3.809637   3.944036                                
    9      3.835555   3.942263                                
    10     3.824583   3.935868                                
    11     3.827287   3.932043                                
    12     3.817058   3.927741                                
    13     3.778389   3.927357                                
    14     3.779933   3.925774                                
    15     3.780848   3.918761                                
    16     3.746735   3.920191                                
    17     3.743517   3.915674                                
    18     3.752455   3.911835                                
    19     3.758213   3.908067                                
    20     3.768209   3.904584                                
    21     3.711149   3.904635                                
    22     3.770484   3.898746                                
    23     3.767993   3.897296                                
    24     3.707685   3.898568                                
    25     3.694116   3.898346                                
    26     3.749094   3.89368                                 
    27     3.727432   3.894122                                
    28     3.682065   3.89575                                 
    29     3.712119   3.894845                                
    30     3.721573   3.894399                                
    31     3.668023   3.89601                                 
    32     3.710865   3.896029

Why do you fit the model like this? Just for teaching purposes or is that the number of times the model must be fitted?

Getting error with apply_parallel in 1-Preprocessing

UnboundLocalError: local variable 'transformed_data' referenced before assignment while using below line of code. Please help out

%%time
pairs = flattenlist(apply_parallel(get_function_docstring_pairs_list, df.content.tolist(), cpu_cores=32))

404 when wget py_comment_proc_v2.dpk

code_search/notebooks/2 - Train Function Summarizer With Keras + TF.ipynb

Line 247 in 37ba71b

    
           " - https://storage.googleapis.com/kubeflow-examples/code_search/data/seq2seq/py_comment_proc_v2.dpk"

getting a 404 for this link
https://storage.googleapis.com/kubeflow-examples/code_search/data/seq2seq/py_comment_proc_v2.dpk

Questions on the existence of a file in Q3.ipynb

load_lm_vocab('./data/lang_model/vocab.cls')

wonder where is this file?

Questions on how to use GAN in your article

Hi, Hamel. I’m an undergraduate student in a Chinese university, and I’m currently doing a project(actually my graduation project) on code generation. I have read your article Semantic Search on towardsdatascience.com which inspires me lot. I’m wondering the paragraph in your article “It should be noted that training a seq2seq model to summarize code is not the only technique you can use to build a feature extractor for code. For example, you could also train a GAN and use the discriminator as a feature extractor. ”

I feel confused about how to do this work on GAN because I feel there exist many difficulties so can you give me some specific advice or reference papers on how to do code search by GAN?

I have also read some other papers such as DeepCodeSearch(published on ICSE’ 18) by XiaoDong Gu, a HKUST professor. Their work is mainly on joint embedding and got good results on java code generation. Their work seems a little different with yours but also have good experimental results.

What’s more, I want to reproduce your work on pytorch. And I really hope I can get ideal results.

Sincerely

regarding issue in parameter

Why in keras and pytorch shows a diffrent parameter count always.I have made a model in torch and i campare with keras there in less learnable parameter than keras.why??

NameError: name 'LanguageModelData' is not defined

@hamelsmu

Hi, I am trying to do run this set of notebooks that are provided. I was able to preprocess the data successfully, but when I try to run the third notebook : "3 - Train Language Model Using FastAI", I get the following error: -

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-a18631c71446> in <module>
     12                                                   cycle_mult=2,
     13                                                   bs = 32,
---> 14                                                   wd = 1e-6)
     15 
     16 elif use_cache:

D:\code_search-github\lang_model_utils.py in train_lang_model(model_path, trn_indexed, val_indexed, vocab_size, lr, n_cycle, cycle_len, cycle_mult, em_sz, nh, nl, bptt, wd, bs)
    191 
    192     # create lang model data
--> 193     md = LanguageModelData(mpath, 1, vocab_size, trn_dl, val_dl, bs=bs, bptt=bptt)
    194 
    195     # build learner

NameError: name 'LanguageModelData' is not defined

In one of the issues at the FastAI repository I read that 'LanguageModelData' was replaced by 'TextLMDataBunch'. I have tried using that but didn't get any success.

How should I proceed forward so that I can run this notebook properly?

Python-Version - 3.6.8
Cuda-Version - 10.0
FastAI-Version - 1.0

Preprocess Data: ModuleNotFoundError: No module named 'general_utils'

I'm trying to run the Preprocess Process notebook on Google Colab. I was wondering where general_utils comes from. Currently, I get the following error:
ModuleNotFoundError: No module named 'general_utils'

hamelsmu / code_search Goto Github PK

code_search's People

Contributors

Stargazers

Watchers

Forkers

code_search's Issues

Note: be v. careful before removing this, as 3rd party device types

Recommend Projects

Recommend Topics

Recommend Org