Giter Site home page Giter Site logo

kaggle-quora-dup's People

Contributors

aerdem4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

kaggle-quora-dup's Issues

Cannot properly generate kcore_dict in non-NLP features

Hi, I downloaded your code and tried to play with it. When I run the script non_nlp_feature_extraction.py, I encountered the following error. I am not familliar with the Graph lib you used here. Would you please have a look on it and tell me what's wrong with the code?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-df736cd8be48> in <module>()
     10 print("Calculating kcore features...")
     11 all_df = pd.concat([train_df, test_df])
---> 12 kcore_dict = get_kcore_dict(all_df)
     13 train_df = get_kcore_features(train_df, kcore_dict)
     14 test_df = get_kcore_features(test_df, kcore_dict)

<ipython-input-5-4e9e86a38b2a> in get_kcore_dict(df)
     25     print(type(g.nodes()))
     26     print(g.nodes())
---> 27     df_output = pd.DataFrame(data=g.nodes(), columns=["qid"])
     28     df_output["kcore"] = 0
     29     for k in range(2, NB_CORES + 1):

D:\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    352                                          copy=False)
    353             else:
--> 354                 raise ValueError('DataFrame constructor not properly called!')
    355 
    356         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

I think the problem is with this function.

def get_kcore_dict(df):
    g = nx.Graph()
    g.add_nodes_from(df.qid1)
    edges = list(df[["qid1", "qid2"]].to_records(index=False))
    g.add_edges_from(edges)
    g.remove_edges_from(g.selfloop_edges())
    print(type(g.nodes()))
    print(g.nodes())
    df_output = pd.DataFrame(data=g.nodes(), columns=["qid"])   <==== THIS LINE
    df_output["kcore"] = 0
    for k in range(2, NB_CORES + 1):
        ck = nx.k_core(g, k=k).nodes()
        print("kcore", k)
        df_output.ix[df_output.qid.isin(ck), "kcore"] = k

    return df_output.to_dict()["kcore"]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 962: character maps to <undefined>

I am trying to run model.py but i am getting following error:

D:\imad_web\kaggle-quora-dup_24_position>python model.py
C:\ProgramData\Anaconda3\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Creating the vocabulary of words occurred more than 100
Traceback (most recent call last):
File "model.py", line 122, in
embeddings_index = get_embedding()
File "model.py", line 55, in get_embedding
for line in f:
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 962: character maps to
`
def get_embedding():
embeddings_index = {}
f = open(EMBEDDING_FILE)
for line in f: ##########################line 55
values = line.split()
word = values[0]
if len(values) == EMBEDDING_DIM + 1 and word in top_words:
coefs = np.asarray(values[1:], dtype="float32")
embeddings_index[word] = coefs
f.close()
return embeddings_index

`

`
vectorizer = CountVectorizer(lowercase=False, token_pattern="\S+", min_df=MIN_WORD_OCCURRENCE)
vectorizer.fit(all_questions)
top_words = set(vectorizer.vocabulary_.keys())
top_words.add(REPLACE_WORD)

embeddings_index = get_embedding() ##############line 122
print("Words are not found in the embedding:", top_words - embeddings_index.keys())
top_words = embeddings_index.keys()

`

what is your offline score

Hello!
First of all, thank you very much for your open source.
I am very interested in your implementation, so I downloaded your code and implemented it on my machine. My GPU is TiTanic Xp.
After I finished running the file "model.py" with setting epochs=15, and I got the result of the validation set loss of about 0.203 (training set loss is about 0.17). The results of the model training don't seem so good!
I find you have a ranking of 23 on kaggle and an online score of 0.12988, so I would like to ask, what is your offline score? How can I use this open source code to achieve the same validation loss as you?

Looking forward to your reply.Thanks again!
@aerdem4

Possible to include some trained weights?

Hi I was interested in playing about with this model, but I was wondering instead of hiring an AWS server to run the model, is it possible to include some trained weights so we can use the model without training it?

Thanks

Model reference

May I asked model in your solution,refer to any other reference ,such as paper?Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.