Giter Site home page Giter Site logo

conceptnet5's Introduction

ConceptNet

Overview

ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave unstated.

ConceptNet is a semantic network that represents things that computers should know about the world, especially for the purpose of understanding text written by people. Its "concepts" are represented using words and phrases of many different natural language -- unlike similar projects, it's not limited to a single language such as English. It expresses over 13 million links between these concepts, and makes the whole data set available under a Creative Commons license.

Much of the current development of ConceptNet involves using it as an input for machine learning about the semantics of text. Its multilingual representation makes it particularly expressive, because the semantic overlaps and differences between languages are a useful signal that a learning system can learn from.

ConceptNet grew out of Open Mind Common Sense, an early project for crowd-sourced knowledge, and expanded to cover many different languages through a collaboration with groups around the world. ConceptNet is cited in many research papers, and its public API gets over 50,000 hits per day.

This Python package contains a toolset for building the ConceptNet 5 knowledge graph, possibly with your own custom data, and it serves the HTML interface and JSON Web API for it.

You don't need this package to simply access ConceptNet 5; see http://conceptnet.io for more information and a browsable Web interface with an API.

Further documentation is available on the ConceptNet wiki.

Licensing and attribution appear in LICENSE.txt and DATA-CREDITS.md.

Discussion groups

If you're interested in using ConceptNet, please join the conceptnet-users Google group, for questions and occasional announcements: http://groups.google.com/group/conceptnet-users?hl=en

For real-time discussion, ConceptNet also has a chat channel on Gitter: https://gitter.im/commonsense/conceptnet5

Installing and building ConceptNet

To be able to run all steps of the ConceptNet build process, you'll need a Unix command line (Ubuntu 16.04 works great), Python 3.5 or later, 30 GB of RAM, and some other dependencies. See the build process on our wiki for instructions.

You may not need to build ConceptNet yourself! Try the Web API first.

Testing

Run pytest to test the ConceptNet libraries and a small version of the build process.

Run pytest --quick to re-run the tests more quickly, with the assumption that the small test database has already been built.

Run pytest --fulldb to run additional tests on the fully built ConceptNet database.

conceptnet5's People

Contributors

amirouche avatar dant86 avatar fcrimins avatar gsittyz avatar jlowryduda avatar joshua-chin avatar juliusvonkohout avatar jvarley avatar k-oizumi avatar luminoso-beaudoin avatar pdworzynski avatar rspeer avatar sheyvaert avatar vihari avatar waruts avatar ylkuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conceptnet5's Issues

Database export scripts

Exporting the database from MongoDB is not that hard; the "mongoexport" command does most of the work, and then we just need to zip the heck out of it (using bzip2 or xzip) and put it in our downloads directory.

We've promised a CC-By version, which means we need to:

  • Determine the set of nodes that are CC-By-SA licensed (without using up tons of memory)
  • Remove those nodes and all edges pointing to them
  • Put the results in a new zip archive

Not able to mount data to conceptnet_data in docker container

I followed all the steps for the docker container, but all I get is INTERNAL SERVER ERROR,. And even when I tried the bash shell as shown here #57 , I was unable to see the mapped container folder 'conceptnet_data'.

Docker Info:

Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:37:01 2016
 OS/Arch:      linux/amd64
Server:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:37:01 2016
 OS/Arch:      linux/amd64

Command I ran to start the docker:
sudo docker run --privileged -p 127.0.0.1:10054:10054 --sig-proxy=false rspeer/conceptnet-web:5.4 -v /large-disk/conceptnet5.4/data:/conceptnet_data
Please help me out !

Differences between Docker images

Hello,
It may be obvious to others, but it seems a little unclear regarding the differences in the two Docker images - one seems to do everything and the other does the web.

Does that mean if I get the everything version I can easily switch it to also show the web-front end?

A little more colour would help out here (since I'm hesitant to "just have a go" given the resources needed 😄 )

Would be great if you could spell out a little more exactly what the differences are.

Many thanks,
Neil

use C# and F#?

Good afternoon. Whether the use is possible (or integration) C # or F #, how should this occur (sovmesno with you or is it merely our personal business)? PS except for a specified language you like a python.
I inform in advance we are a small company, and we have, alas neodin programmer once worked with python in horror as pischat do not even want him prekosotsya (study). our main languages ​​that use C # as well as begin to implement the F #.

Thank you in advance for your help

Refactor the main conceptnet5 package

The modules directly under "conceptnet5/", such as conceptnet5.nodes and conceptnet5.edges, have become a mess. The functions were written in various styles by several different people. Documentation refers to incorrect URIs, or details of ConceptNet 5.0 that don't exist anymore.

The tests have not been updated, and the data flow for building node and edge URIs is hard to follow and hard to use correctly in conceptnet5.builders (which has been updated more recently, for version 5.2).

I'm posting this issue here to indicate that I'm working on a refactor.

setup.py entry_points console_scripts missing

@rspeer When I look through the setup.py, I found an entry_points script defined here can not be found in source code. 'cn5-build-index = conceptnet5.hashtable.cli:run_build', where is it ? there should be a hashtable sub folder in conceptnet5 folder, isn't it ? Here is the pazzle part of setup.py.
entry_points = {
'console_scripts': [
'cn5-vectors = conceptnet5.vectors.cli:cli',
'cn5-build-index = conceptnet5.hashtable.cli:run_build',
'cn5-read = conceptnet5.readers.cli:cli',
'cn5-convert = conceptnet5.formats.convert:cli',
'cn5-db = conceptnet5.db.cli:cli'
]
}

Iberian Portuguese

I would like to warn that, Portuguese is Brazilian Portuguese.
Please add support for Iberian Portuguese.

Run the example: sqlite3.OperationalError: unable to open database file

Hi,

I have followed all the instructions here: https://github.com/commonsense/conceptnet5/wiki/Running-your-own-copy and i try to run the example:

>>> from conceptnet5.query import lookup
>>> for assertion in lookup('/c/en/example'):
...     print(assertion)

However, i get the following error:

  File "<stdin>", line 1, in <module>
  File "/home/yassine/conceptnet5/conceptnet5/query.py", line 74, in lookup
    self.load_index()
  File "/home/yassine/conceptnet5/conceptnet5/query.py", line 57, in load_index
    self._db_filename, self._edge_dir, self.nshards
  File "/home/yassine/conceptnet5/conceptnet5/formats/sql.py", line 210, in __init__
    self._connect()
  File "/home/yassine/conceptnet5/conceptnet5/formats/sql.py", line 215, in _connect
    self.dbs[i] = sqlite3.connect(filename)
sqlite3.OperationalError: unable to open database file

Does anyone know how to fix it?
Thanks :)

Language recognized incorrectly in some etymologies

This was reported over e-mail. ConceptNet 5.3 contains a few EtymologicallyDerivedFrom edges that claim to be in English when they're actually in another language, such as:

/r/EtymologicallyDerivedFrom    /c/en/masyu     /c/en/真珠

"masyu" is in fact being defined in English, but we're not recognizing the phrasing "Japanese, from a misreading of 真珠". (We would have expected something like "from Japanese 真珠".)

URI queries depend on case of first letter, ground beef does not work

I noticed last night that queries seemed to distinguish unnecessarily between capitalized and non-capitalized versions of terms. I noticed this morning that it had change, with some terms that previously needed to be capitalized now needing to be lower case, and vice versa.

The example on the wiki: http://conceptnet5.media.mit.edu/data/5.4/uri?language=en&text=ground_beef
returns nothing as of this writing, and it is required to query
http://conceptnet5.media.mit.edu/data/5.4/uri?language=en&text=ground_Beef

to get a result.

5.5 feature request: add functionality to query all concepts of the same lemma

I'm currently running CN 5.4, and I really appreciate the fact that the (one-word) concepts are lemmas, and I can get all the knowledge edges about them even if their surface texts have different word forms. For example, if I want to know if birds can sing, I can easily find the lemma of "birds", then search for the edge CapableOf(bird, sing).

However CN 5.5 seems to distinguish between concepts on the word form level, and knowledge about the lemma concept doesn't seem to hold for concepts for word forms. E.g. according to CN 5.5, a bird can sing, but birds cannot sing. Therefore if I'm interested in a noun, I'll have to check its plural form to ensure coverage.

Basically, transforming word forms to lemmas is easy, but the other way around is hard. In the new version, users can skip the URI standardization, but seemingly at the cost of having to exhaustively look up all the word forms just to be sure (or else they'll get less interesting relations). I'm not migrating to 5.5 because of this issue.

Another example: in 5.5. there's an entry CapableOf(salesman, sell_products), but nothing for CapableOf(salesman, sell_product). This can be frustrating. All the user wants is to search for CapableOf(salesman, sell), and get all the related knowledge. On a related note, a regex match or fuzzy match functionality will be tremendously useful as well, something like CapableOf(sales.*, sell.*) or CapableOf(sales.*, .*deal.*)

Description of software I am downloading is missing, is this an actual AI?

I came here after reading about advanced AI, I was hoping to put it to the test.

I can't see anything about what the software actually does, no real time information, it all looks like programming gibberish to me.

If I am the wrong place then I apologise, but is there anyway to talk to this AI by just downloading a package and installing the software or is this the intended software used to programme the AI from scratch?

Thanks in advance
Confused Yeti

Index CSV file data separation

Hi - I'm trying to create a local index from your CSV files, and I've noticed that, when there are multiple sources, an extraneous tab appears to have found its way into the middle of the source data, which thus splits it?

How can I establish a concept path ?

Hi guys,
I would like to know how can I do to establish each path of a concept until the most generic concept that graph has. Example:
to "/c/en/jaguar" : IsA automobile, IsA organization, IsA automobile_machine, IsA product... I need something like this - first path - [automobile_machine-> autombile ->product...]. I am currently build this path through analysing all "IsA" relations by means of backtracking algorithm which is too expensive. Is this correct? Please, can you indicate me another more simple way ?

Thank you so much guys :)

Failed to import preprocess_text in node.py

I synced to the head but cannot import preprocess_text in node.py.
Here is the error I got:
from conceptnet5.nodes import make_concept_uri, normalize_uri File "........../lib/python2.7/site-packages/ConceptNet5-5.1.4-py2.7.egg/conceptnet5/nodes.py", line 6, in <module> from metanl.general import preprocess_text ImportError: No module named general''

The general module has been renamed to token_utils but there is no preprocess_text there.

Some lines of Portuguese CSV data incorrectly formatted

It appears that some lines of Portuguese data in the CSV files are incorrectly formatted (they have early and extra right brackets). This was only observed for Portuguese data when parsing through every file.

Example ones in part_01.csv are:

/a/[/r/AtLocation/,/c/pt/bicho_de_goiaba]/,/c/pt/pé_de_goiaba/]    /r/AtLocation   /c/pt/bicho_de_goiaba]  /c/pt/pé_de_goiaba /ctx/all    2.584962500721156   /or/[/and/[/s/activity/omcs/csamoa4_self-rating/,/s/contributor/omcs/dimiguel/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/filipeaoki/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/ivonegodoi/]/,/and/[/s/activity/omcs/vote/,/s/contributor/omcs/kis/]/] /e/e3bf73c9389d4cb7d6e97a2a2a866f6dff832f0a /d/conceptnet/4/pt  *Uma coisa que você pode encontrar em um(a) [[pé de goiaba]] é um(a) [[bicho de goiaba]]].
/a/[/r/AtLocation/,/c/pt/velho]/,/c/pt/praça_pública/]    /r/AtLocation   /c/pt/velho]    /c/pt/praça_pública   /ctx/all    1.5849625007211563  /and/[/s/activity/omcs/csamoa4_self-rating/,/s/contributor/omcs/chocolatra_cg/] /e/4b5c4fff87fe7254de329e0895a8a211a78ad68d /d/conceptnet/4/pt  *Uma coisa que você pode encontrar em um(a) [[praça pública]] é um(a) [[velho]]].
/a/[/r/AtLocation/,/c/pt/mofo]/,/c/pt/teto_com_goteiras/]   /r/AtLocation   /c/pt/mofo] /c/pt/teto_com_goteiras /ctx/all    1.5849625007211563  /and/[/s/activity/omcs/csamoa4_self-rating/,/s/contributor/omcs/_kamikaze_/]    /e/f40f1d8d86868cb2b63d1256c81f80194ca51774/d/conceptnet/4/pt   *Uma coisa que você pode encontrar em um(a) [[teto com goteiras]] é um(a) [[mofo]]].

regarding extraction of data

1.for my project I need to map the concept net relations to another. so I need all the relations that are possible in concept net. Is there any possible way to get them.

  1. I need to iterate through data using there relations to some depth. for now I am doing it using script(my own) on web api. It would be a great help if any one can tell me exactly how I can use json format data to make data extraction possible recursively to some depth.

ps: I am new to use this all data extraction thing.so please explain the concept in simple terms(if possible)

thank you for considering

[5.3] score attribute missing

Hi,

I just loaded version 5.3 on my server and what seems to be missing is the score attribute for every concept.
Because I highly need this attribute, I'd like to ask were it went as I can't seem to find it getting removed in one of the commits.

Is this a bug or did you remove it for some reason?
Also if it's not a bug, what can I use instead?
I need weighted links (scores)

Error while generating wiktionary data

Hi,
I am trying to build the data from start, replacing the original wiktionary data with the latest article.tgz file.
I have run
make apsw

After running

make download

I have made the english wiktionary data switch.

Then I ran make and I get the following error.

cd /home/code/wordhunt/conceptnetmybuild/conceptnet5/conceptnet5/wiktparse && make en_parser.py
make[1]: Entering directory `/home/code/wordhunt/conceptnetmybuild/conceptnet5/conceptnet5/wiktparse'
grako -m en_wiktionary -w '' -o en_parser.py en_parser.ebnf
en_parser.ebnf(13:1) Expecting :

^
word
name
rule
grammar
start
make[1]: *** [en_parser.py] Error 1
make[1]: Leaving directory `/home/code/wordhunt/conceptnetmybuild/conceptnet5/conceptnet5/wiktparse'
make: *** [/home/code/wordhunt/conceptnetmybuild/conceptnet5/conceptnet5/wiktparse/en_parser.py] Error 2

The en_parser.ebnf file is empty except for the top 'Do not edit this file' comment paragraph. I am using python3 and have installed nltk and pyyaml. Is there anything else that I need to do?

adding languages.

Good afternoon. a correct algorithm to add a new language? (if possible, will be grateful for a complete description) that would be in the future, you can add it to their database sushestvuet languages. if I am not mistaken they have right now 27.

api.conceptnet.io returns HTML to non-browsers

api.conceptnet.io is intended to provide JSON wrapped in HTML when you visit it in a browser, and plain JSON when you visit it from another source, such as 'curl'.

However, it seems to always be returning HTML.

where is pymongo / flask /.. modules

when I run conceptnet5.graph it says "ImportError: No module named pymongo" and "ImportError: No module named Flask" when run conceptnet5.web_interface.web_interface...

Docker image: sqlite3.OperationalError: unable to open database file

Hi, I have followed the Docker instructions here:
https://github.com/commonsense/conceptnet5/wiki/Docker#building-conceptnet-with-docker
and this proceeded error-free.

I can run the container and see the following log:

$ docker run -it -p 10054 rspeer/conceptnet-web:5.4 -v /my-big-drive/data:/conceptnet_data
WARNING: Your kernel does not support memory swappiness capabilities, memory swappiness discarded.
[2015-10-13 21:13:57 +0000] [6] [INFO] Starting gunicorn 19.3.0
[2015-10-13 21:13:57 +0000] [6] [INFO] Listening at: http://0.0.0.0:10054 (6)
[2015-10-13 21:13:57 +0000] [6] [INFO] Using worker: sync
[2015-10-13 21:13:57 +0000] [9] [INFO] Booting worker with pid: 9
[2015-10-13 21:13:57 +0000] [11] [INFO] Booting worker with pid: 11
[2015-10-13 21:13:57 +0000] [13] [INFO] Booting worker with pid: 13
[2015-10-13 21:13:57 +0000] [15] [INFO] Booting worker with pid: 15

However if I then try an example query:
http://127.0.0.1:32768/data/5.4/c/en/toast

I get a 500 internal server error:

ERROR:conceptnet5:Exception on /data/5.4/c/en/toast [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.4/site-packages/Flask_Cors-2.1.0-py3.4.egg/flask_cors/extension.py", line 110, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.4/site-packages/Flask-0.10.1-py3.4.egg/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/src/conceptnet/conceptnet5/api.py", line 99, in query_node
    results = list(FINDER.lookup(path, offset=offset, limit=limit))
  File "/src/conceptnet/conceptnet5/query.py", line 75, in lookup
    self.load_index()
  File "/src/conceptnet/conceptnet5/query.py", line 58, in load_index
    self._db_filename, self._edge_dir, self.nshards
  File "/src/conceptnet/conceptnet5/formats/sql.py", line 211, in __init__
    self._connect()
  File "/src/conceptnet/conceptnet5/formats/sql.py", line 216, in _connect
    self.dbs[i] = sqlite3.connect(filename)
sqlite3.OperationalError: unable to open database file
xx.xx.xx.xx - - [13/Oct/2015:21:40:20 +0000] "GET /data/5.4/c/en/toast HTTP/1.1" 500 291 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.132 Safari/537.36"

-which looks like it may be the same underlying error as this:
#33

I have tried to figure out how to get a shell on the container so I can try the 'ln -s' fix, but have had no success.

Would appreciate any help or advice on
-how to access the container and try the above fix myself, or
-otherwise how to resolve the issue.

Thanks in advance for any advice!

JSON API crashes with TypeError

Here's what I get at http://anemone.media.mit.edu:5000/concept/en/coffee/:

Traceback (most recent call last):
  File "/srv/conceptnet5/env/lib/python2.6/site-packages/flask/app.py", line 1518, in __call__
    return self.wsgi_app(environ, start_response)
  File "/srv/conceptnet5/env/lib/python2.6/site-packages/flask/app.py", line 1506, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/srv/conceptnet5/env/lib/python2.6/site-packages/flask/app.py", line 1504, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/conceptnet5/env/lib/python2.6/site-packages/flask/app.py", line 1264, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/conceptnet5/env/lib/python2.6/site-packages/flask/app.py", line 1262, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/conceptnet5/env/lib/python2.6/site-packages/flask/app.py", line 1248, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/srv/conceptnet5/conceptnet5/api.py", line 39, in get_data
    json[label] = f(uri)
  File "/srv/conceptnet5/conceptnet5/api.py", line 189, in get_normalized_of
    json[-1]['url'] = root_url + json[-1]['uri']
TypeError: 'NoneType' object is unsubscriptable

ERROR on the build process conceptnet5

Not created Ninja file that describes the process of building ConceptNet.

~/conceptnet5$ python ninja.py
File "ninja.py", line 458
print(ninja, file=open('build.ninja', mode='w'))
^
SyntaxError: invalid syntax

node structure miss used?

Hi, When I look at the lookup response json data, I found the 'label' in 'start' or 'end' is confusing. Why it is not the same as 'term'? I found that it is the same as opposite node 'term'. what's the label meaning? I can not find the defination throught the source codes.

"end": { "@id": "/c/zh/馬廄", "label": "馬", "language": "zh", "term": "/c/zh/馬廄" },

doubt about /r

Hi guys,

How can I do to obtain a list of all relations that is possible find after /r/?

Thank you so much!

"next" links are erroneous and throw Internal Server Errors

If I look up http://conceptnet5.media.mit.edu/data/concept/en/example/, then at the end I get:

  "next": "http://conceptnet.media.mit.edu/data/incoming_assertions/concept/en/example/926.492473806"

As a URI, this should refer to the word "example" with the part of speech "926.492473806", which should be a 404. In practice it throws an Internal Server Error.

A better URI would be something like: http://conceptnet.media.mit.edu/data/incoming_assertions/concept/en/example?below=926.492473806

This would put {'below': '926.492473806'} into the GET arguments while still requesting the same URI.

Missing support_data

The entire support_data directory appears to be missing from the package on PyPI, which breaks a few methods. Is this supposed to be downloaded separately? If so, I can't find any documentation for doing so.

virtualenv .env
. .env/bin/activate
pip install ConceptNet
python
>>> import conceptnet5.util.language_codes
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/myproject/.env/local/lib/python2.7/site-packages/conceptnet5/util/language_codes.py", line 86, in <module>
    _setup()
  File "/usr/local/myproject/.env/local/lib/python2.7/site-packages/conceptnet5/util/language_codes.py", line 58, in _setup
    for line in codecs.open(ISO_DATA_FILENAME, encoding='utf-8'):
  File "/usr/local/myproject/.env/lib/python2.7/codecs.py", line 881, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: u'/usr/local/myproject/.env/local/lib/python2.7/site-packages/conceptnet5/support_data/iso639.txt'

[Idea] Use antonyms to enable negation processing with conceptnet

Hi again,

As I mentioned in #35 already, I'm currently working on a project including conceptnet5.
Essentially I'm using it to extract emotions from text.

Right now, I'm facing a rather big issue concerning negation in sentences.
Take this sentence for example:

The movie was not bad.

Now, my application would analyze every word given in this sentence and perform a graph search on conceptnet in order to determine if the word has any connections to the concept of emotion.
Since, I do not handle negation in sentences yet, it would probably use the emotion anger to rate the sentences emotional features based on the word bad

Usually - at least that's what most NLP papers describe - negation handling is done by prefixing words for handling them separately in their lexicon later.
Unfortunately, this approach does not work using conceptnet5.

While thinking about the problem, I found Thesaurus.com's antonym feature. Given an arbitrary word, Thesaurus.com's website returns a rated list of antonym words. Given the exemplary word bad, Thesaurus.com returns good.

Therefore I'd like to propose parsing Thesaurus.com's database in order to add antonym words to conceptnet5's graph structure.
For every concept there could be a isAntonym link.
Since I neither know if this is technically and legally possible nor if this feature is desirable for you guys, I first wanted to describe the idea here.

I'd love to hear your feedback.

Make it easier to add concepts to ConceptNet

Conceptnet5 users are in dire need to be able extend the ConceptNet5 with their own data. It's should be easy and straitgh forward to do.

My proposition is to create a database library to do that work that mimic the REST API.

WDYT?

Incoming and outgoing assertions appear to be identical

I got this feedback from Catherine White at BT. We should check whether these API calls are doing the right thing.

I noticed that ?get=incoming_assertions gives exactly the same result as ?get=outcoming_assertions (and as ?get=assertions). Is this by design? I would expect incoming_assertions to be left-side such as "play violin causes..." and outgoing assertions would be right-side such "IsA sound." Have I misunderstood?

web_interface currently depends on missing "utils" module

The current version of the web_interface doesn't run, because it depends on a module called "utils" that's not present in the Git repository.

Also, I doubt it would be able to import it under the name "utils", except in the case where you're running the devel server from that directory. It should be called "conceptnet5.web_interface.utils".

types of queries.

Good afternoon. please tell me any other means of queries to your server is? (except url)
for example in order to be able to analyze the whole sentence or text. (if that is possible)

I apologize in advance for any incorrect translation.

Question about RAM needed

I got an MemoryError when run make build_assoc . Could you tell me how much REM is needed at least to build assoc?
Another question: making Solr Index is also also use a lot REM, is there a simple way to make a small index (not all dateset need). That is to say, how to make a mini-core solr index?

172M all.csv
KiB Mem: 12099444 total (my PC)

Thank your great job! I will pull request in future.

Detail print info:
➜ data sudo make build_assoc
python -m assoc_space.build_conceptnet assoc/all.csv assoc/assoc-space-5.2
loading
filtering entries
making assoc space
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/assoc_space-0.1-py2.7.egg/assoc_space/build_conceptnet.py", line 85, in
build_assoc_space(args.input_file, args.output_dir)
File "/usr/local/lib/python2.7/dist-packages/assoc_space-0.1-py2.7.egg/assoc_space/build_conceptnet.py", line 73, in build_assoc_space
space = AssocSpace.from_sparse_storage(sparse, 300, offset_weight=1e-5)
File "/usr/local/lib/python2.7/dist-packages/assoc_space-0.1-py2.7.egg/assoc_space/init.py", line 157, in from_sparse_storage
return cls.from_matrix(matrix, k, labels, strip_a0=strip_a0)
File "/usr/local/lib/python2.7/dist-packages/assoc_space-0.1-py2.7.egg/assoc_space/init.py", line 114, in from_matrix
u, s = eigensystem(mat, k=k, strip_a0=strip_a0)
File "/usr/local/lib/python2.7/dist-packages/assoc_space-0.1-py2.7.egg/assoc_space/init.py", line 390, in eigensystem
S, U = scipy.sparse.linalg.eigen.eigsh(mat.tocsr(), k=real_k, which='LA')
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1565, in eigsh
ncv, v0, maxiter, which, tol)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 506, in init
ncv, v0, maxiter, which, tol)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 335, in init
self.v = np.zeros((n, ncv), tp) # holds Ritz vectors
MemoryError
make: *** [assoc/assoc-space-5.2/u.npy] 错误 1

memory error

I was trying to build the data and got the following error message.

Traceback (most recent call last):
File "/usr/local/bin/cn5-vectors", line 9, in
load_entry_point('ConceptNet==5.5.1', 'console_scripts', 'cn5-vectors')()
File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 716, in __ call __
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/mnt/e/conceptnet/conceptnet5/conceptnet5/vectors/cli.py", line 83, in run_intersect
intersected, projection = merge_intersect(frames)
File "/mnt/e/conceptnet/conceptnet5/conceptnet5/vectors/merge.py", line 32, in merge_intersect
joined = pd.concat(frames, join='inner', axis=1, ignore_index=True).astype('f')
File "/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py", line 3054, in astype
raise_on_error=raise_on_error, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3189, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3012, in apply
self._consolidate_inplace()
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 3529, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 4521, in _consolidate
_can_consolidate=_can_consolidate)
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", line 4544, in _merge_blocks
new_values = new_values[argsort]
MemoryError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.