Giter Site home page Giter Site logo

cereja-project / cereja Goto Github PK

View Code? Open in Web Editor NEW
25.0 5.0 9.0 692 KB

Cereja is a bundle of useful functions we don't want to rewrite and .. just pure fun!

License: MIT License

Python 100.00%
python3 python-library python colab array-manipulations utilities progress-bar progress-view file-converter data-tools

cereja's Introduction

Cereja 🍒

Python package PyPI version Downloads MIT LICENSE Issues Get start on Colab

CEREJA

Cereja was written only with the Standard Python Library, and it was a great way to improve knowledge in the Language also to avoid the rewriting of code.

Getting Started DEV

Don't be shy \0/ ... Clone the repository and submit a function or module you made or use some function you liked.

See CONTRIBUTING 💻

Setup

Install

pip install --user cereja

or for all users

pip install cereja

Cereja Example usage

See some of the Cereja tools

To access the Cereja's tools you need to import it import cereja as cj.

📝 FileIO

Create new files

import cereja as cj

file_json = cj.FileIO.create('./json_new_file.json', data={'k': 'v', 'k2': 'v2'})

file_txt = cj.FileIO.create('./txt_new_file.txt', ['line1', 'line2', 'line3'])

file_json.save()
file_txt.save()

print(file_json.exists)
# True
print(file_txt.exists)
# True


# see what you can do .txt file
print(cj.can_do(file_txt))

# see what you can do .json file
print(cj.can_do(file_json))

Load and edit files

import cereja as cj

file_json = cj.FileIO.load('./json_new_file.json')

print(file_json.data)
# {'k': 'v', 'k2': 'v2'}

file_json.add(key='new_key', value='value')
print(file_json.data)
# {'k': 'v', 'k2': 'v2', 'new_key': 'value'}

file_txt = cj.FileIO.load('./txt_new_file.txt')

print(file_txt.data)
# ['line1', 'line2', 'line3']

file_txt.add('line4')
print(file_txt.data)
# ['line1', 'line2', 'line3', 'line4']

file_txt.save(exist_ok=True)  # Override
file_json.save(exist_ok=True)  # Override

📍 Path

import cereja as cj

file_path = cj.Path('/my/path/file.ext')
print(cj.can_do(file_path))
# ['change_current_dir', 'cp', 'created_at', 'exists', 'get_current_dir', 'is_dir', 'is_file', 'is_hidden', 'is_link', 'join', 'last_access', 'list_dir', 'list_files', 'mv', 'name', 'parent', 'parent_name', 'parts', 'path', 'rm', 'root', 'rsplit', 'sep', 'split', 'stem', 'suffix', 'updated_at', 'uri']

🆗 HTTP Requests

import cereja as cj

# Change url, headers and data values.
url = 'localhost:8000/example'
headers = {'Authorization': 'TOKEN'} # optional
data = {'q': 'test'} # optional

response = cj.request.post(url, data=data, headers=headers)

if response.code == 200:
    data = response.data
    # have a fun!
import cereja as cj
import time

my_iterable = ['Cereja', 'is', 'very', 'easy']

for i in cj.Progress.prog(my_iterable):
    print(f"current: {i}")
    time.sleep(2)

# Output on terminal ...

# 🍒 Sys[out] » current: Cereja 
# 🍒 Sys[out] » current: is 
# 🍒 Cereja Progress » [▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱▱▱▱▱▱▱] - 50.00% - 🕢 00:00:02 estimated

📊 Freq

import cereja as cj

freq = cj.Freq([1, 2, 3, 3, 10, 10, 4, 4, 4, 4])
# Output -> Freq({1: 1, 2: 1, 3: 2, 10: 2, 4: 4})

freq.most_common(2)
# Output -> {4: 4, 3: 2}

freq.least_freq(2)
# Output -> {2: 1, 1: 1}

freq.probability
# Output -> OrderedDict([(4, 0.4), (3, 0.2), (10, 0.2), (1, 0.1), (2, 0.1)])

freq.sample(min_freq=1, max_freq=2)
# Output -> {3: 2, 10: 2, 1: 1, 2: 1}

# Save json file.
freq.to_json('./freq.json')

🧹 Text Preprocess

import cereja as cj

text = "Oi tudo bem?? meu nome é joab!"

text = cj.preprocess.remove_extra_chars(text)
print(text)
# Output -> 'Oi tudo bem? meu nome é joab!'

text = cj.preprocess.separate(text, sep=['?', '!'])
# Output -> 'Oi tudo bem ? meu nome é joab !'

text = cj.preprocess.accent_remove(text)
# Output -> 'Oi tudo bem ? meu nome e joab !'

# and more ..

# You can use class Preprocessor ...
preprocessor = cj.Preprocessor(stop_words=(),
                               punctuation='!?,.', to_lower=True, is_remove_punctuation=False,
                               is_remove_stop_words=False,
                               is_remove_accent=True)

print(preprocessor.preprocess(text))
# Output -> 'oi tudo bem ? meu nome e joab !'

print(preprocessor.preprocess(text, is_destructive=True))
# Output -> 'oi tudo bem meu nome e joab'

🔣 Tokenizer

import cereja as cj

text = ['oi tudo bem meu nome é joab']

tokenizer = cj.Tokenizer(text, use_unk=True)

# tokens 0 to 9 is UNK
# hash_ used to replace UNK
token_sequence, hash_ = tokenizer.encode('meu nome é Neymar Júnior')
# Output -> [([10, 12, 11, 0, 1], 'eeb755960ce70c')]

decoded_sequence = tokenizer.decode(token_sequence, hash_=hash_)
# Output -> 'meu nome é Neymar Júnior'

Corpus

Great training and test separator.

import cereja as cj

X = ['how are you?', 'my name is Joab', 'I like coffee', 'how are you joab?', 'how', 'we are the world']
Y = ['como você está?', 'meu nome é Joab', 'Eu gosto de café', 'Como você está joab?', 'como', 'Nós somos o mundo']

corpus = cj.Corpus(source_data=X, target_data=Y, source_name='en', target_name='pt')
print(corpus)  # Corpus(examples: 6 - source_vocab_size: 13 - target_vocab_size:15)
print(corpus.source)  # LanguageData(examples: 6 - vocab_size: 13)
print(corpus.target)  # LanguageData(examples: 6 - vocab_size: 15)

corpus.source.phrases_freq
# Counter({'how are you': 1, 'my name is joab': 1, 'i like coffee': 1, 'how are you joab': 1, 'how': 1, 'we are the world': 1})

corpus.source.word_freq
# Counter({'how': 3, 'are': 3, 'you': 2, 'joab': 2, 'my': 1, 'name': 1, 'is': 1, 'i': 1, 'like': 1, 'coffee': 1, 'we': 1, 'the': 1, 'world': 1})

corpus.target.phrases_freq
# Counter({'como você está': 1, 'meu nome é joab': 1, 'eu gosto de café': 1, 'como você está joab': 1, 'como': 1, 'nós somos o mundo': 1})

corpus.target.words_freq
# Counter({'como': 3, 'você': 2, 'está': 2, 'joab': 2, 'meu': 1, 'nome': 1, 'é': 1, 'eu': 1, 'gosto': 1, 'de': 1, 'café': 1, 'nós': 1, 'somos': 1, 'o': 1, 'mundo': 1})

# split_data function guarantees test data without data identical to training
# and only with vocabulary that exists in training
train, test = corpus.split_data()  # default percent of training is 80%

🔢 Array

import cereja as cj

cj.array.is_empty(data)  # False
cj.array.get_shape(data)  # (2, 3)

data = cj.array.flatten(data)  # [1, 2, 3, 3, 3, 3]
cj.array.prod(data)  # 162
cj.array.sub(data)  # -13
cj.array.div(data)  # 0.006172839506172839

cj.array.rand_n(0.0, 2.0, n=3)  # [0.3001196087729699, 0.639679494102923, 1.060200897124107]
cj.array.rand_n(1, 10)  # 5.086403830031244
cj.array.array_randn((3, 3,
                      3))  # [[[0.015077210355770374, 0.014298110484612511, 0.030410666810216064], [0.029319083335697604, 0.0072365209507707666, 0.010677361074992], [0.010576754075922935, 0.04146379877648334, 0.02188348813336284]], [[0.0451851551098092, 0.037074906805326824, 0.0032484586475421007], [0.025633380630695347, 0.010312669541918484, 0.0373624007621097], [0.047923908102496145, 0.0027939333359724224, 0.05976224377251878]], [[0.046869510719106486, 0.008325638358172866, 0.0038702998343255893], [0.06475268683502387, 0.0035638592537234623, 0.06551037943638163], [0.043317416824708604, 0.06579372884523939, 0.2477564291871006]]]
cj.chunk(data=[1, 2, 3, 4], batch_size=3, fill_with=0)  # [[1, 2, 3], [4, 0, 0]]
cj.array.remove_duplicate_items(['hi', 'hi', 'ih'])  # ['hi', 'ih'] 
cj.array.get_cols([['line1_col1', 'line1_col2'],
                   ['line2_col1', 'line2_col2']])  # [['line1_col1', 'line2_col1'], ['line1_col2', 'line2_col2']]
cj.array.dotproduct([1, 2], [1, 2])  # 5

a = cj.array.array_gen((3, 3), 1)  # [[1, 1, 1], [1, 1, 1], [1, 1, 1]]
b = cj.array.array_gen((3, 3), 1)  # [[1, 1, 1], [1, 1, 1], [1, 1, 1]]
cj.array.dot(a, b)  # [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
cj.mathtools.theta_angle((2, 2), (0, -2))  # 135.0

🧰 Utils

import cereja as cj

data = {"key1": 'value1', "key2": 'value2', "key3": 'value3', "key4": 'value4'}

cj.utils.chunk(list(range(10)), batch_size=3)
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
cj.utils.chunk(list(range(10)), batch_size=3, fill_with=0, is_random=True)
# [[9, 7, 8], [0, 3, 2], [4, 1, 5], [6, 0, 0]]

# Invert Dict
cj.utils.invert_dict(data)
# Output -> {'value1': 'key1', 'value2': 'key2', 'value3': 'key3', 'value4': 'key4'}

# Get sample of large data
cj.utils.sample(data, k=2, is_random=True)
# Output -> {'key1': 'value1', 'key4': 'value4'}

cj.utils.fill([1, 2, 3, 4], max_size=20, with_=0)
# Output -> [1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

cj.utils.rescale_values([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], granularity=4)
# Output -> [1, 3, 5, 7]

cj.utils.import_string('cereja.file._io.FileIO')
# Output -> <class 'cereja.file._io.FileIO'>

cj.utils.list_methods(cj.Path)
# Output -> ['change_current_dir', 'cp', 'get_current_dir', 'join', 'list_dir', 'list_files', 'mv', 'rm', 'rsplit', 'split']


cj.utils.string_to_literal('[1,2,3,4]')
# Output -> [1, 2, 3, 4]

cj.utils.time_format(3600)
# Output -> '01:00:00'

cj.utils.truncate("Cereja is fun.", k=3)
# Output -> 'Cer...'

data = [[1, 2, 3], [3, 3, 3]]
cj.utils.is_iterable(data)  # True
cj.utils.is_sequence(data)  # True
cj.utils.is_numeric_sequence(data)  # True

See Usage - Jupyter Notebook

License

This project is licensed under the MIT License - see the LICENSE file for details

cereja's People

Contributors

ailton-felix avatar chillrx avatar dennymarcels avatar gabrielbrasileiro avatar jlsneto avatar joaovictorcosta avatar rickards avatar rodrigobastos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cereja's Issues

Create dict relationship

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

limit n_lines sysout

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

FileIO bugs

  • Version: 1.5.4
  • Platform: all
  • Subsystem: file
import cereja as cj

# Bug 1
file = cj.FileIO.create('./test.txt')
file[0] = 'line1'
file[1] = 'line2' # Raise exception IndexError: list assignment index out of range

# Bug 2
print(file.path) # /test.txt
file.set_path('./test2.txt')
file.undo() # isn't reset path altered
print(file.path) # /test2.txt

Interpolação rescale_values

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

Read .csv without column

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

Add file data format converters

Is your feature request related to a problem? Please describe.

I would like to be able to transform data using the File class or python objects.

Describe the solution you'd like

It would be interesting to be able to read a .txt file for example and be easy to transform the data into various types of Python objects and other possible file formats.

Describe alternatives you've considered

It could simply be a function of the FileBase class or the child classes, or a new Transformation class for example, which would only have the job of formatting this data.

add base64 convert to Filetools

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

phrase frequency by words

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

Improvement in Filetools

Is your feature request related to a problem? Please describe.
Files do not have intuitive attribute and function names

Describe the solution you'd like
be easy to understand

Describe alternatives you've considered

are just ideas ...
E.g:

1 - File.content_file to File.content_lines or File.data_lines

2 - File.content_str to File.content_string or File.data_string

3 - File.line_sep to File.end_line

4 - Write new file with File.new and edit with File.edit (need state control)

these are the main improvements I can see now.

Change console colors

The way to change console colors today works as follows: The user writes the name of the color.
The suggestion would be to change to hexadecimal, so that the color palette gets bigger.

Prevent data loss

Is your feature request related to a problem? Please describe.

risk of data loss

Describe the solution you'd like

1 - Do not overwrite an existing file without prior notice.

2 - Store instance change history

Describe alternatives you've considered

1 - Throw error if the file already exists for the indicated path, but allow kwargs auto-overrode, when to save a file, except when the instance comes from File.read (just alert the user, that file will be overwritten).

2 - Save copy for each change in a list (must be a private attribute), analyze the copy limit.
e.g. sample:

_changes = [instance_copy]

Create .yml manipulator

  • Whitespace indentation is used for denoting structure; however, tab characters are not allowed as part of indentation.

  • Comments begin with the number sign (#), can start anywhere on a line and continue until the end of the line. Comments must be separated from other tokens by whitespace characters.[15] If # characters appear inside of a string, then they are number sign (#) literals.

  • List members are denoted by a leading hyphen (-) with one member per line.

    • A list can also be specified by enclosing text in square brackets ([...]) with each entry separated by commas.
  • An associative array entry is represented using colon space in the form key: value with one entry per line. YAML requires the colon be followed by a space so that scalar values such as http://www.wikipedia.org can generally be represented without needing to be enclosed in quotes.

  • An associative array entry is represented using colon space in the form key: value with one entry per line. YAML requires the colon be followed by a space so that scalar values such as http://www.wikipedia.org can generally be represented without needing to be enclosed in quotes.

    • A question mark can be used in front of a key, in the form "?key: value" to allow the key to contain leading dashes, square brackets, etc., without quotes.
    • An associative array can also be specified by text enclosed in curly braces ({...}), with keys separated from values by colon and the entries separated by commas (no spaces are necessary, for JSON compatibility).
  • Strings (scalars) are ordinarily unquoted, but may be enclosed in double-quotes ("), or single-quotes (').

    • Within double-quotes, special characters may be represented with C-style escape sequences starting with a backslash (). According to the documentation the only octal escape supported is \0.
  • Block scalars are delimited with indentation with optional modifiers to preserve (|) or fold (>) newlines.

  • Multiple documents within a single stream are separated by three hyphens (---).

    • Three periods (...) optionally end a document within a stream.
  • Repeated nodes are initially denoted by an ampersand (&) and thereafter referenced with an asterisk (*).

  • Nodes may be labeled with a type or tag using the exclamation point (!!) followed by a string, which can be expanded into a URI.

  • YAML documents in a stream may be preceded by 'directives' composed of a percent sign (%) followed by a name and space-delimited parameters. Two directives are defined in YAML 1.1:

    • The %YAML directive is used for identifying the version of YAML in a given document.
    • The %TAG directive is used as a shortcut for URI prefixes. These shortcuts may then be used in node type tags.
  • Two additional sigil characters are reserved in YAML for possible future specification: the at sign (@) and backtick (`).

see more about .yml

Fix thread never die (only Colab)

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
when an exception occurs the thread that controls the console and progress does not die.

Freq refactoring

Consider refactoring the Freq class by adding checks and conversion to dict in the constructor.

Progress is not completing for small values

  • Version: 1.1.0.final.0
  • Platform: Linux 00c891b4b673 4.14.137+ #1 SMP Thu Aug 8 02:47:02 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux

Resume

Passing small values (i have tested with values below of 10) the Progress is not completing.
Sample:

import cereja as cj

with cj.Progress('Progress', style='bar', max_value=10) as bar:
  for i in range(10):
    bar.update(i)

expected on finish operation:

🍒 Progress: [=============================>] Done! ✅ - Time: 0.5s

current behaviour:

🍒 Progress: [===========================>  ] Done! ✅ - Time: 0.5s

Add tokenizer datatools

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

Format Progress with only two digits

  • Version: 1.1.0

Currently, the ProgressBar is returning percentage value with large numbers, see the image below:

image

It's cool maintaining the progress bar with two digits:

current: 3.9012......%
sugestion: 03.90%

Add Csv dataprep

Is your feature request related to a problem? Please describe.
I need group items by columns and more more more.

Reversed items on File.insert function

  • Version: 1.1.0
  • Platform: all

The File.insert added data reversed.
e.g:

file_instance.insert(5,[“I like cereja!”, “And you?”]

file.content_file

[...,“And you?”, “I like cereja!”, ...]

add list_dir with filters to class Path

Is your feature request related to a problem? Please describe.
Please describe the problem you are trying to solve.

Describe the solution you'd like
Please describe the desired behavior.

Describe alternatives you've considered
Please describe alternative solutions or features you have considered.

Need to improve the documentation

Is your feature request related to a problem? Please describe.

Yes, documentation is essential!

Describe the solution you'd like

I think it would be interesting to add some examples of using Cereja in the readme. Some modules and functions are not well documented. The Jupyter Notebook needs to be updated, that is, it is necessary to add the use of some tools.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.