Giter Site home page Giter Site logo

cucco's Introduction

cucco Build Status codecov beerpay

Is that... is that a cucco? Sure it is!

Cucco is here to help you to normalize those nasty texts. Removing extra whitespaces is not that hard, right? What about stop words? They make no good... oh, and don't even mention emojis!

This little friend will do the hard work for you. Just set it up and let him peck all over your text.

Oh please, shut up and show me where can I grab a cucco!

The easiest way to get a cucco is by using pip:

$ pip install cucco

But sometimes... sometimes you want to go wild and get the biggest... No, the best!... No, THE MIGHTY cucco!

To do so, you may use Git. Clone the repository from Github and do it all the hard way:

$ git clone https://github.com/davidmogar/cucco.git
$ cd cucco
$ python setup.py install

Got it. How do I use it?

Now that you have a cucco, I'll let him give you all the details.

Cucuco, cuco cuco cucucuco, CUCCO!

—Cucco

So true... so true...[tears falling down my face]. Just allow me to add some details.

The next example code shows how to normalize a short text:

from cucco import Cucco

cucco = Cucco(language='en')
print(cucco.normalize('Who let the cucco out?'))

This would apply all normalizations to the text Who let the cucco out?. The output for this normaliations would be the next one:

let cucco

It's also possible to send a list of normalizations to apply, which will be executed in order.

from cucco import Cucco

cucco = Cucco(language='en')

normalizations = [
    'remove_extra_whitespaces',
    ('replace_punctuation', {'replacement': ' '})
]

print(cucco.normalize('Who    let   the cucco out?', normalizations))

This is the output:

Who let the cucco out

Finally, if you only need to apply one normalization, use one of these methods:

  • remove_accent_marks
  • remove_extra_whitespaces
  • remove_stop_words
  • replace_charachters
  • replace_emails
  • replace_emojis
  • replace_hyphens
  • replace_punctuation
  • replace_symbols
  • replace_urls

Supported languages

You never know when a cucco will learn a new trick but, at the moment, they can remove stop words in these thirteen languages:

Language Accronym Language Accronym Language Accronym
Danish da German de Portuguese pt
Dutch nl Hungarian hu Russian ru
English en Italian it Spanish es
Finnish fi Norwegian no Swedish sv
French fr        

Can I contribute?

Are you a breeder? No? Don't worry, you can still help.

Maybe you have a good new feature to add. Maybe is not even good. It doesn't matter! It is always good to share ideas, isn't it? Just go for it! Pull request are warmly welcome.

Not in the mood of implement it yourself? You can still create an issue and tell about it there. Feedback is always great!

cucco's People

Contributors

armaggedon avatar davidmogar avatar luzfcb avatar xecgr avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.