Giter Site home page Giter Site logo

UTF8 support about gdevelop HOT 47 CLOSED

4ian avatar 4ian commented on April 28, 2024
UTF8 support

from gdevelop.

Comments (47)

4ian avatar 4ian commented on April 28, 2024

I don't know what to do for now to handle properly Unicode.. If a custom string class is really needed, I prefer it to be as lightweight as possible... Even typedef it to be an alias for sf::String or (wxString).

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Unicode as UTF-8 can be handled in an std::string. But, std::string::size() and all operations on char becomes invalid as an UTF8 char can be stored 1, 2, 3 or 4 chars depending of its codepoint.

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

We can use directly sf::String, but remember it stores the string as utf32 , which can be a bit huge in memory (4 times bigger than std::string, where UTF-8 is between these two depending of the char used in the string). Then, utf32 has advantages : it is easier to manipulate and has a faster access to single characters. Finally comes the question of the storage : we can convert it to an UTF-8 std::string before saving (I've already made some tests with tinyxml which successfully load and save utf8 char without any loss) and the opposite operation to load it.

This change also implies a lot of modifications in extension : replace std::string by sf::String for the text expressions.

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I'm currently doing some tests on GDevelop for utf8 (on my utf8 branch based on c++11).

It's working fine : JS code works perfectly without changes (just need to add the charset to index.html) and features like text length are working correctly. I'm able to display almost all unicode characters.

On GDCore/GDCpp, I'm able to store UTF8 inside a std::string and I've created some tools for conversions between "UTF8 std::string", sf::String (UTF32) and wxString. So, you need to use these functions everywhere a conversion between wxString and std::string happens because you'll get an ASCII std::string without. Then, functions like string length are invalidated (because of multi-bytes characters) but UTF8-CPP provides functions to get the real size of the string.
Notice that I am not able to show all unicode char in GDevelop and GDCpp game because of the font (webbrowsers seems able to fallback to another font if the char is not defined).

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

You're doing an awesome job as usual! 😄 I've checked your branch, the file organisation and the naming scheme sounds good and consistent with the rest of the codebase (gd::utf8::xxx is perfect as it clearly enable to find in the code where utf8 is used).
So, so far so good! 👍

I think we don't need to implement utf8 support "everywhere": the most important seems to be the Text object :)
When you'll be done with it, I'll merge it (I have to merge the c++11 branch also before that by the way) and release a test version to make sure that the move to C++11 and utf8 didn't break anything for end users. :)

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

In fact, it would break strings using characters outside the traditional ASCII (I mean from 128 to 255).
I don't agree when you say that only the text object should have UTF8 support : expressions should be UTF8 strings. That way, you can set the text object content with a new UTF8 string.
It's already working on the version available on the utf8 branch (but it needs heavy testing). However, they are some weird things due to the way actions/conditions are declared : I need to show instructions sentences as non UTF8 strings (as they are in the user's locale) but the parameters inside as UTF8 strings (as they may contain special characters). That's why the method EventsRenderingHelper::DrawTextInArea has a new bool parameter to determine whether the string is encoded in UTF8 or in the user's locale.

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

To keep compatibility with extensions which rely on "locale" std::string (for example, when loading a file using its filepath), I've added a bool inside gd::ParameterMetadata to say whether the parameter accept an UTF8 string or wants to locale string. If the parameter is not able de receive UTF8 input (default behavior), the result of the string expression is converted from UTF8 to the current locale. (not used in JS)

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Well, you surely know better than me what should be done, but I'm a bit afraid that lots of conversion from UTF8 to current locale happens at runtime, which could be bad for performances. What do you think about it?
(Nothing related, but I saw gd::utf8::SubStr and I think it would be nice to add one or two tests cases for it :) It is a perfect candidate for tests)

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Maybe there will be less conversion if all instructions use Utf8 by default and we can deactivate it for specials ones ?
or we can adapt the instructions that doesn't support UTF8 to use utf8::To/FromLocaleString.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

In which case using UTF8 will break the instructions?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I've only noticed problems with actions using a filepath. For Common dialogs, I've added a conversion function between UTF8 and std::wstring (Windows can use UTF16 wide char).
Other actions seems to be working properly : even variables with things like myVar["Unicode string here"] works 😄 .

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Sounds good! So I think the right approach for instructions is to let everything in UTF8 unless there is a problem (like with file path) in which case we have to add a manual conversion.
This way we avoid lots of potentially costly conversions, and the only changes to be made in the code are in functions dealing with filepath.

myVar["Unicode string here"] will be awesome! 😄
Can I try your UTF8 branch or is there still some work in progress? :)

EDIT: Another question, is TinyXML working great with games files having utf8? I remember testing with json game files and it was working well (because the JSON serializer just write the strings as they are in memory without changes or conversion, so if the string contains utf8 it will be written without problem).

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Good, I've done this changes (Just call CantUseUtf8() after declaring a parameter or DontReturnUtf8() after an string expression).

You can try, but I can't garantuee that an old ASCII project can be opened without problems with this version (I'll see that problem later, maybe converting the whole project into utf8 before opening according to the project file version ?)

I'm currently working on the CommonDialogs extensions : it works nice (with Windows unicode functions and wchar_t* )

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

So, good news, everything is working thing for CommonDialogs (only tested on Windows). The WinAPI is so cool (irony 😄).
By the way, we can even use a unicode string as a pure variable name.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Nice! I'm really busy these days so sorry for not being able to be totally at work on it!
Are you making other changes?

I've tested the branch yesterday and I didn't notice any problem with already existing games :) We'll have to make some tests to ensure that games saved on Windows are correctly opened on Ubuntu (for now, games made on Windows having strings with not ASCII characters (like french diacritics: é...) are interpreted as empty strings).

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I still need to adapt some "String manipulation" expressions and make some tests. I'm also creating a group in the doxygen documentation to explain how to use UTF8 strings and where they are used.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Excellent, I'm highly interested in these documentations pages to make clear for everyone how things are working!

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I've fixed crash when opening an ANSI project containing locale depend characters (crash inside utf8-cpp because of invalid codepoints when trying to convert to sf::String or wxString).
So, in the last commit, when GD opens a project file with "ISO-8859-1" in its declaration, it converts the whole file in utf8 (using gd::utf8::FromLocaleString) into myProject.gdg.utf8 then opens it (as it was myProject.gdg).
EDIT : Still having an issue as UTF8 project are detected as ANSI projects...

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I've fixed the issue. Also, GDevelop force the encoding when loading a project to avoid TinyXML to auto-detect the wrong encoding (some extensions still save ANSI strings and TinyXML might think that the file is an ANSI file).

EDIT : Still more complicated than this, on Linux, projects were already saved in UTF8 (as the locale is UTF8)...

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

So, it's a bit weird. Old projects files were already saved in UTF8 in Linux so I've disable the conversion on Linux. But, then, you can't open correctly a Windows ANSI project on GDevelop Linux (as we can't use the "locale string -> UTF8" conversion as the locale is already UTF8) : I've added a message saying that the user should convert the project file to UTF8 using an external text editor before opening it with GDevelop. :(

EDIT : The good news is that GDevelop project are now totally compatible between Windows and Linux.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

I guess that it is ok as people opening a project saved in Windows on Ubuntu are quite rare, and as long as they save it again with the new version on Windows it will be ok! So the warning message is good enough :)
Almost all users shouldn't notice the change, but I'll increase the minor version number and warn that projects saved with the next version could be not retro-compatible.

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Some news : I'm currently adding a new macro : GD_T which is returns a UTF8 encoded string from a translation. It's defined in GDCore/Tools/Localization.h. Now, _() will not be redefined as it was before : it now returns a wxString and should only be used in GUI.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

I'm a bit concerned by this extra complexity: why is it necessary?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Because the trick you were using to get std::string from _() was converting them to the current locale.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Ok... To be clear, the rationale behind the current design was to have a single macro for all translations: this macro was set to nothing when compiling for GDC++ Runtime, meaning that all runtime sentences are in english (that's fine because at runtime almost all translations are for logs and not for the end user). When compiling for the IDE, the macro was set to the standard wx macro to support translation, except that it will be also converted to a std::string to allow its use in non GUI (i.e not wxwidgets related) classes.
Couldn't we convert to std string as before but using wx appropriate functions for handling UTF8?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

The problem is that those strings (these are not only the log translation but also all the actions/conditions/expressions strings) are encoded in the current locale. It's different from the other strings (which are in UTF8) and can create errors.

Then, if we redefine _() to get UTF8 string, the conversion to wxString will not work as the string will be considered in the current locale.

(By the way, can't use other extensions in game since on the macos-bundle branch as it causes linkage errors when compiling a scene)

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

So at the end when should GD_T() and _() be used?

Ok, surely something broke in CodeCompiler when adapting it for Mac OS (for now on Mac OS it's broken as linkage expect .dylib files and GD extensions are .xgd files).

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

GD_T() -> UTF8 std::string (instructions/expressions description, title, object's properties, debugger...)
_() -> wxString (mostly for generated GUI and all wx calls when a wxString is needed)

(you can look on utf8-tr-macro branch to see the changes, not fully working as some std::string are still converted without utf8::ToWxString)

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Ok I think you're most advanced on this point than me, I still struggle with these encoding details, so do it as you think it's best :)
I just want to keep things as simple as possible (that's why I don't like this GD_T ;) ) Note that I the source code, everything is expected to be written using ANSI characters (not utf8 source code) in plain english: don't know if it can be of any help or if it is even related but might be worth keeping it in mind.

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

It would be fantastic if we could tell wxWidgets to consider strings as UTF8 by default (that way, we could define _() with gd::utf8::FromWxString(wxGetText...(s)) and wxWidgets will do the opposite conversion automatically when needed) but this is not possible... That's why GD_T is, I think, the most appropriate solution. I think it could be more dangerous to keep the localized string in the current locale and the others in UTF8 (and what happen if we want to launch GD in russian on an english Windows ? I've tested it, we get a buggy window without texts). Also, there were some complicated stuff when mixing UTF8 and not UTF8 string : it was the case when rendering an instruction's text : parameters were UTF8 string while other part of the instruction sentence was in the current locale, the temporary solution was to add a isUtf8 boolean to know.
But it is up to you, it's your software. ;)

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Well, if you think it's the most appropriate solution, let's go for it. ;) (But as soon as we find a solution to get rid of it, I will be happy to see it disappear from the code ^^)
If I understand correctly, the pitfall with having _() returning a std::string is that wxWidgets assumes that all std::string are in the current locale, that's it?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I may have another solution : _() returning a class which can convert to std::string or wxString automatically.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Or (may sound dumb but who knows) a simple function/macro that:
-does nothing (as now) for GDC++ Runtime.
-returns a std::string that can be converted to a wxString, that is to say, if I understood you correctly, a string that is in current locale. To do so, I imagine that we can construct a wxString using wxGetTranslation then convert it to a std::string with the right encoding/locale using a wxString function.

EDIT: in fact that's almost what is done currently. What I mean is, the "trick" I used is maybe no more good with UTF8 as it can mess with encoding, but if there is a method to properly convert wxString to std::string, why not just use this method?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

You mean use gd::utf8::FromWxString to get an UTF8 encoded std::string ?
Yes, but when _() will be used in GUI, wxWidgets will automatically convert the std::string to a wxString considering that the string is in the current locale (it won't be the case). That' the problem :
wxGetTranslation() (wxString) --- to UTF8 --> std::string --- from locale ---> wxString

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

I mean do somehow:
wxGetTranslation() (wxString) --- to locale --> std::string --- from locale ---> wxString

Is this even feasible?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

That's what is currently done, but the problem is that the std::string is in the current locale. That's not what I want (I need actions/conditions/expressions titles, description and many other things in UTF8) and it can lose a lot of information (Russian translation on an english computer will not work for example).

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Ok, so in this case continue with what you've done, it's technically correct anyway so let's build something stable and will see later if something can be simplified or not :)

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Ok, the macro is working now, I will update some GUI to take care of UTF8 strings (for instructions and expressions description and title, properties' name and description...)

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Ok! By the way, is the boolean trick you used for rendering actions/conditions still needed?

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

No, because all string are now in UTF8.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Excellent! :)

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Did you had any time to make some progress on it? Stuck on another problem maybe?
Let me know :)

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I dont' have any problems. I just didn't have a lot of time because I had a lot of exams last week and this week too.
Anyway, this feature is a kind of long term feature : there are a lot of things to do and even more to test.

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Ok, no problem this is perfectly ok :) I was just afraid that you got discouraged by the task. Good luck with your exam! 😄

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

Thanks (but I finished them yesterday ^^). So, now, I'll have more time to develop this feature (and play GTA V ^^).

from gdevelop.

4ian avatar 4ian commented on April 28, 2024

Cool cool! :D (Ah GTA V, I still have to get it on my PC, the game looks awesome!)

from gdevelop.

victorlevasseur avatar victorlevasseur commented on April 28, 2024

I think we can close this issue now as UTF8 support is now stable on v4 branch.

from gdevelop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.