vitorluizc / normalize-text Goto Github PK
View Code? Open in Web Editor NEW📝 Provides a simple functions to normalize texts, whitespaces, paragraphs & diacritics.
License: MIT License
📝 Provides a simple functions to normalize texts, whitespaces, paragraphs & diacritics.
License: MIT License
I was thinking about this lib and there's a growing need to handle emojis effectively in text normalization. This feature would convert emojis into their corresponding textual descriptions, making the text more comprehensible and analyzable, especially when processing social media content or informal communications.
Use Case:
Often, emojis are used in texts to convey emotions or actions that are not captured by plain text. Normalizing these into words can aid in sentiment analysis, text-to-speech applications, and in contexts where emojis are not supported or are less meaningful.
Implementation Idea:
We could create a mapping of commonly used emojis to their respective descriptive phrases. The normalization function should then detect these emojis in the text and replace them with the mapped phrases.
It's possible to use Gitmoji project as reference, because their project has the list with all emoji and codes that is possible to use in commit messages, and this feature can adapt with it's own context (e.g they have :bug:
as emoji for commits that solves bugs, maybe :insect:
or something like that can be used in the place), and Github has it's own text-to-emoji cheatsheet too
Potential Challenges:
Benefits:
I believe this feature would be a valuable addition to the 'normalize-text' project, helping people that want to support apps that receives emoji codes and handles the emoji as needed.
Hi,
I am currently working on a project developed with the NestJS framework and TypeScript, and when I try to use the normalizeText() function I get the error TypeError: Cannot read property 'normalize' of undefined
.
The full error message is as follows:
TypeError: Cannot read property 'normalize' of undefined
at normalizeDiacritics (C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\node_modules\normalize-text\src\normalizeDiacritics.js:12:16)
at C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\node_modules\@bitty\pipe\src\pipe.js:23:53
at Array.reduce (<anonymous>)
at Object.<anonymous> (C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\node_modules\@bitty\pipe\src\pipe.js:23:20)
at new User (C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\src\users\users.entity.ts:82:44)
at EntityMetadata.Object.<anonymous>.EntityMetadata.create (C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\src\metadata\EntityMetadata.ts:527:23)
at EntityMetadataValidator.Object.<anonymous>.EntityMetadataValidator.validate (C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\src\metadata-builder\EntityMetadataValidator.ts:118:47)
at C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\src\metadata-builder\EntityMetadataValidator.ts:46:56
at Array.forEach (<anonymous>)
at EntityMetadataValidator.Object.<anonymous>.EntityMetadataValidator.validateMany (C:\Users\jmcandia\Repositorios\proyecto-morpheus\morpheuscrm-api\src\metadata-builder\EntityMetadataValidator.ts:46:25)
The version of normalize-text
I am using is 2.3.2
Add a function to capitalize words like a name.
import { normalizeName } from 'normalize-text';
normalizeName('fernanda montenegro') // 'Fernanda Montenegro'
normalizeName('leornado matos nascimento ') // 'Leonardo Matos Nascimento'
normalizeName('ALOÍSIO NUNES') // 'Aloísio Nunes'
Would be nice to provide tagged string template functions.
Something like:
normalizeWhiteSpaces`
Hi, ${student.name}!
How are you doing?
`;
//=> "Hi, Vitor! How are you doing?"
Seems that wouldn't be hard to do with current normalize functions. But maybe they should be scoped in a different way.
Some implementation details in the gist below:
https://gist.github.com/VitorLuizC/f80a9b51c3a686e5b5e35259bbf263fa
uncouple
dependency.compose
function.Node: v14.15.4 | OS: Linux (Arch) | normalize-text: v2.3.1
normalizeDiacritics
unexpectedly removes the letter ß:
Input | "Amélie plays Fußball" |
---|---|
Expected | "Amelie plays Fußball" or "Amelie plays Fussball" |
Actual | "Amelie plays Fuball" |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.