Giter Site home page Giter Site logo

photogabble / php-confusable-homoglyphs Goto Github PK

View Code? Open in Web Editor NEW
32.0 5.0 3.0 689 KB

A PHP port of vhf/confusable_homoglyphs

Home Page: https://photogabble.co.uk/projects/confusable-homoglyphs/

License: MIT License

PHP 100.00%
php7 unicode unicode-characters confusable homoglyphs php library string unicode-public-data

php-confusable-homoglyphs's Introduction

Confusable Homoglyphs

A PHP port of vhf/confusable_homoglyphs

Build Status Latest Stable Version License

About this package

Unicode homoglyphs can be a nuisance on the web. Your most popular client, AlaskaJazz, might be upset to be impersonated by a trickster who deliberately chose the username ΑlaskaJazz. (The A is the greek letter capital alpha)

This is a complete port of the Python library vhf/confusable_homoglyphs to PHP. I found myself needing its functionality after reading this article by James Bennett on validating usernames and how django-registration does so.

A huge thank you goes to the Python package creator Victor Felder and its contributors Ryan Kilby and muusik; without their work this port would not exist.

This library is compatible with PHP versions 7.3 and above.

Install

Install this library with composer: composer require photogabble/php-confusable-homoglyphs.

Usage

Please see the tests for detailed example of usage.

Known Usage

Is the data up to date?

This project currently ships with unicode consortium public data version 10.0.0.

The unicode blocks aliases and names for each character are extracted from this file provided by the unicode consortium.

The matrix of which character can be confused with which other characters is built using this file provided by the unicode consortium.

The version this project currently ships with was generated on the 17th Feb 2022.

php-confusable-homoglyphs's People

Contributors

carbontwelve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

php-confusable-homoglyphs's Issues

Issue with the json file? Confusables for characters 'm' and 'w'

Something seems off with just those characters in the json file.

Repro:

<?php declare(strict_types=1);

$all = file_get_contents('confusables.json');
$all = json_decode($all, true);

print_r($all['a']); // this is fine, prints out array of 23 confusables

print_r($all['m']); // PROBLEM: Only prints one confusable, {"c":"rn","n":"LATIN SMALL LETTER R, LATIN SMALL LETTER N"}

print_r($all['w']); // PROBLEM: Only prints one confusable, {"c":"vv","n":"LATIN SMALL LETTER V, LATIN SMALL LETTER V"}

What both of these have in common is that the only confusable happens to be a double char: m has rn and w has vv, so maybe there is a bug in the generation of this file that doesn't know about multi-character confusables?

Here's a link showing actual confusables for M and W, which I would expect to be in this JSON file:

https://util.unicode.org/UnicodeJsps/confusables.jsp?a=manwe&r=None

Breaks when string contains a zero-width character

When using the isConfusable method with the input string www.microsоft.com and a preferredAliases of ['latin']. The following error is returned: Undefined Index: Confusable.php:146.

This is because the input string contains the zero-width character \u65279 - incidentally this zero width character gets correctly parsed as the common alias however on line 146 of Confusable.php it's essentially a empty string and the array key lookup fails on the index not existing.

If the index does exist in the json source, then maybe it needs converting so the index is the escaped unicode in ASCII string form - if so this can be folded into #1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.