Giter Site home page Giter Site logo

utf8_maltese_ci's Introduction

Maltese MySQL Collation (utf8_maltese_ci)

By John J. Camilleri

About

This is a custom collation for MySQL which correctly sorts strings according to the Maltese alphabet:

a b ċ d e f ġ g għ h ħ i ie j k l m n o p q r s t u v w x ż z

MySQL's utf8_unicode_ci collation treats g and ġ etc. as interchangeable, and utf8_bin places ċ, ġ, ħ, ż after the letter z — neither of which is correct. This collation was written to provide a solution to this.

Notes

  • Prior to version MySQL 5.6, collations cannot handle double character sequences. So, is treated as two separate letters and is sorted after the character sequence gh, which is strictly incorrect. This also means that ie would be sorted after io, which again is wrong.

  • I have only tested this with data in Maltese and English. Compatibility with characters from other languages may not be what you expect. Let me know if you find issues when your data contains other characters which don't sort well with respect to Maltese under this collation.

Installation

Requirements

  • MySQL 5.1+
  • Root/administritive access on the machine where MySQL is running
  • Basic command line, text-editor, and MySQL skills

Installing the collation is relatively simple. It doesn't require recompiling anything, however you will require administrative access on your machine. Adding a new collation does not affect any existing tables; you need to explicitly specify the collation in your create statements and/or queries in order to benefit from it.

1. Preparation

  1. Identify your exact MySQL version with the command SHOW VARIABLES LIKE 'version'
  2. Find an available collation ID on your MySQL server by following the steps here: 5.1, 5.5, 5.6.
    The IDs I chose for the Maltese collation are 225 and 1356 for MySQL 5.1 and 5.5+ respectively. However you should make sure the chosen ID is not in use on your system by running the following:
    SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID=1356
    and making sure it turns up no results.
  3. Find the location of MySQL's Index.xml file on your system with the command
    SHOW VARIABLES LIKE 'character_sets_dir'
    On an Ubuntu system this returns /usr/share/mysql/charsets/

2. Installation

Script-based installation

An install script is provided which will automatically patch your Index.xml file for you (you will first need the details above). Here is an example of it in use:

$ make install
sudo ./install.py
[sudo] password for user: 
Script for installing utf8_maltese_ci in MySQL
MySQL version (5.5): ↵
Location of charsets file (/usr/share/mysql/charsets/Index.xml): ↵
Backup existing file [Y/n]? ↵
Backed up to Index.xml.bak
Done

Notes about the install script:

  • You will require Python ≥ 2.5
  • If you do not have Python's lxml library installed, you will lose any XML comments in your Index.xml file.

There is also a corresonding uninstall script, which can be invoked using make uninstall.

Manual installation

If the script above doesn't work for you (or you just want to do things manually) follow these steps:

  1. Open the Index.xml file in a text editor (you will need to be root).
  2. Copy the Maltese <collation ...> section from the correct utf8_maltese_ci-mysql_5.x.xml file for your version of MySQL.
  3. Paste the copied XML into the <charset name="utf8">...</charset> section of your Index.xml file.
  4. Save the file and exit.

3. Restart MySQL

On a Unix system you can usually restart MySQL in one of the following ways:

  • service mysql restart
  • service mysqld restart
  • /etc/init.d/mysql restart
  • /etc/init.d/mysqld restart

4. Testing

Quick test

To make sure the collation has been registered with MySQL, you can execute the supplied test.py script:

$ ./test.py
Script for checking that utf8_maltese_ci is registered in MySQL
MySQL hostname (localhost): ↵
Username (root): ↵
Password (): mypassword↵
OK

Alternatively, you can manually perform the quick test by executing the following query:
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME='utf8_maltese_ci'

Full test

To test that the collation actually behaves as correctly, one should try it out on some test data:

  1. Create a simple test table by running test-table.sql
  2. Run the following query and check how the results are sorted as you'd expect:
    SELECT s FROM maltese_collation_test ORDER BY s ASC COLLATE 'utf8_maltese_ci'

utf8_maltese_ci's People

Contributors

johnjcamilleri avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.