Build fails due to import error from nirjas

Atarashi

Open source software is licensed using open source licenses. There are many of open source licenses around and adding to that, open source software packages involve sometimes multiple licenses for different files.

Atarashi provides different methods for scanning for license statements in open source software. Unlike existing rule-based approaches - such as the Nomos license scanner from the FOSSology project - atarashi implements multiple text statistics and information retrieval algorithms.

Anticipated advantages is an improved precision while offering an as easy as possible approach to add new license texts or new license references.

Atarashi is designed to work stand-alone and with FOSSology. More info at https://fossology.github.io/atarashi

Requirements

Python >= v3.5
pip >= 18.1

Steps for Installation

Install

Install from PyPi

pip install atarashi

Source install

pip install .
It will download all dependencies required and trigger build as well.
Build will generate 3 new files in your current directory
1. data/Ngram_keywords.json
2. licenses/<SPDX-version>.csv
3. licenses/processedList.csv
These files will be placed to their appropriate places by the install script.

Installing just dependencies

pip install -r requirements.txt

Build (optional)

$ python3 setup.py build

How to run

Get the help by running atarashi -h or atarashi --help

Example

Running DLD agent

atarashi -a DLD /path/to/file.c
Running wordFrequencySimilarity agent

atarashi -a wordFrequencySimilarity /path/to/file.c
Running tfidf agent
- With Cosine similarity
  
  atarashi -a tfidf /path/to/file.c
  
  atarashi -a tfidf -s CosineSim /path/to/file.c
- With Score similarity
  
  atarashi -a tfidf -s ScoreSim /path/to/file.c
Running Ngram agent
- With Cosine similarity
  
  atarashi -a Ngram /path/to/file.c
  
  atarashi -a Ngram -s CosineSim /path/to/file.c
- With Dice similarity
  
  atarashi -a Ngram -s DiceSim /path/to/file.c
- With Bigram Cosine similarity
  
  atarashi -a Ngram -s BigramCosineSim /path/to/file.c
Running in verbose mode

atarashi -a DLD -v /path/to/file.c
Running with custom CSVs and JSONs
- Please reffer to the build instructions to get the CSV and JSON understandable by atarashi.
- atarashi -a DLD -l /path/to/processedList.csv /path/to/file.c
- atarashi -a Ngram -l /path/to/processedList.csv -j /path/to/ngram.json /path/to/file.c

Running Docker image

Pull Docker image

docker pull fossology/atarashi:latest
Run the image

docker run --rm -v <path/to/scan>:/project fossology/atarashi:latest <options> /project/<path/to/file>

Since docker can not access host fs directly, we mount a volume from the directory containing the files to scan to /project in the container. Simply pass the options and path to the file relative to the mounted path.

Test

Run imtihaan (meaning Exam in Hindi) with the name of the Agent.
eg. python atarashi/imtihaan.py /path/to/processedList.csv <DLD|tfidf|Ngram> <testfile>
See python atarashi/imtihaan.py --help for more

Creating Debian packages

Install dependencies

# apt-get install python3-setuptools python3-all debhelper
# pip install stdeb

Create Debian packages

$ python3 setup.py --command-packages=stdeb.command bdist_deb

Locate the files under deb_dist

License

SPDX-License-Identifier: GPL-2.0

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

How to generate the documentation using sphinx

Go to project directory 'atarashi'.
Install Sphinx and m2r pip install sphinx m2r (Since this project is based on python so pip is already installed).
Initialise docs/ directory with sphinx-quickstart
```
mkdir docs
cd docs/
sphinx-quickstart
```
- Root path for the documentation [.]: .
- Separate source and build directories (y/n) [n]: n
- autodoc: automatically insert docstrings from modules (y/n) [n]: y
- intersphinx: link between Sphinx documentation of different projects (y/n) [n]: y
- Else use the default option

Setup the conf.py and include README.md

Enable the following lines and change the insert path:

import os
import sys
sys.path.insert(0, os.path.abspath('../'))

Enable m2r to insert .md files in Sphinx documentation:

[...]
extensions = [
  ...
  'm2r',
]
[...]
source_suffix = ['.rst', '.md']

Include README.md by editing index.rst

.. toctree::
    [...]
    readme

.. mdinclude:: ../README.md

Auto-generate the .rst files in docs/source which will be used to generate documentation
```
cd docs/
sphinx-apidoc -o source/ ../atarashi
```
cd docs
make html

This will generate file in docs/_build/html. Go to: index.html

You can change the theme of the documentation by changing html_theme in config.py file in docs/ folder. You can choose from {'alabaster', 'classic', 'sphinxdoc', 'scrolls', 'agogo', 'traditional', 'nature', 'haiku', 'pyramid', 'bizstyle'} Reference

Character	UTF-8	ASCII	Name
`–`	\u2013	`-`	EM DASH
`—`	\u2014	`-`	EM DASH
`―`	\u2015	`-`	Horizontal Bar
`‘`	\u2018	`'`	Left single quotation mark
`’`	\u2019	`'`	Right single quotation mark
`‚`	\u201a	`,`	Single low-9 quotation mark
`‛`	\u201b	`'`	Single high-reversed-9 quotation mark
`“`	\u201c	`"`	Left double quotation mark
`”`	\u201d	`"`	Right double quotation mark
`„`	\u201e	`"`	Double low-9 quotation mark
`…`	\u2026	`...`	Horizontal ellipsis
`′`	\u2032	`'`	Prime
`″`	\u2033	`"`	Double prime
`©`	\u00a9	`(c)`	Copyright sign

	with open(outputFile, 'w') as outFile:
	# if the file extension is supported
	if fileType in supportedFileExtensions:
	data_file = commentExtract(inputFile)
	data = json.loads(data_file)
	data1 = licenseComment(data)
	outFile.write(data1)

	for counter, value in enumerate(all_documents_matrix, start=0):
	sim_score = self.__cosine_similarity(value, search_martix)
	if sim_score >= 0.3:
	matches.append({
	'shortname': self.licenseList.iloc[counter]['shortname'],
	'sim_type': "TF-IDF Cosine Sim",
	'sim_score': sim_score,
	'desc': ''
	})
	matches.sort(key=lambda x: x['sim_score'], reverse=True)
	if self.verbose > 0:
	print("time taken is " + str(time.time() - startTime) + " sec")
	return matches

	Algorithm	Time elapsed	Accuracy
1	*tfidf (CosineSim) (thr=0.30)*	*30.19*	*59.0%*
2	tfidf (CosineSim) (thr=0.17)	35.29	61.0%
3	tfidf (CosineSim) (thr=0.16, max_df=0.10)	27.34	62.0%
4	tfidf (CosineSim) (thr=0.16)	36.42	62.0%
5	tfidf (CosineSim) (thr=0.15)	38.45	62.0%
6	tfidf (CosineSim) (thr=0.10)	39.91	62.0%
7	tfidf (CosineSim) (thr=0.00)	61.49	62.0%
8	Ngram (CosineSim)	-	57.0%
9	Ngram (BigramCosineSim)	-	56.0%
10	Ngram (DiceSim)	-	55.0%
11	wordFrequencySimilarity	-	23.0%
12	DLD	-	17.0%
13	tfidf (ScoreSim)	-	13.0%

fossology / atarashi Goto Github PK

atarashi's Introduction

Atarashi

Requirements

Steps for Installation

Install

Install from PyPi

Source install

Installing just dependencies

Build (optional)

How to run

Example

Running Docker image

Test

Creating Debian packages

License

How to generate the documentation using sphinx

atarashi's People

Contributors

Stargazers

Watchers

Forkers

atarashi's Issues

sys.exit(main())

Description

How to Solve

Files to be changed

Description

How to reproduce

Screenshots

When Running atarashi -a {Any agent } -s {Similarities} following error produce:

Steps To Reproduce

WHAT

Proposal

Description

How to solve

Description

How to fix

Recommend Projects

Recommend Topics

Recommend Org