Advertisement

Support for unicode identifiers

Started by October 26, 2015 05:33 PM
6 comments, last by WitchLord 8 years, 11 months ago

Hi, Andreas.

Is it possible - add in lexer support for identifiers with unicode class "letter" symbols?

I planned write little orm for database, and some fields has cyrillic symbols in his names.

On JavaScript, C#, or modern C++ we have not problem wrote "someObject.????? = 10", but on AngelScript it is impossible :(

Anything is possible :)

Of course, it wouldn't be a trivial matter filtering out what Unicode characters are actually letters or not since there are more than a 1 million and more are added every year, but perhaps it doesn't really matter. I could simply make the tokenizer accept any byte with value above 128 as a valid byte for identifiers. Of course, this support would only be turned on through an engine property.

I'll look into it.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Advertisement

O, thanks!

Accept any characters above 128 as valid - is good enought for me.

At now I have to make each time the same changes to the sources in my projects.

I've implemented support for this in revision 2248.

You turn on the support for unicode in identifiers with engine->SetEngineProperty(asEP_ALLOW_UNICODE_IDENTIFIERS, true);

Regards,

Andreas

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Big thanks!

Andreas, for my purposes, it works very well.

But there was a small problem.

Writing using non-English identifiers, together with English keywords frequently makes changing the keyboard layout.

Is it possible to make the registration of a callback that will be called when parsing identifiers to identify the key words?

Then I and my users to be able to use non-English translation for keywords.

Advertisement
Alex, I don't think it is feasible. As an alternative, you could use an AutoHotKey script. A script that would replace the typed character sequence ":?" with "class", or ":?" with "function" and etc.

EDIT:
As for library methods, I can only think an additional layer of indirection would work-- a dictionary mapping translations to english.

Having the tokenizer call a callback for every identifier would probably impact the compiler performance quite a bit for everyone, even those who would not use the callback.

Instead I suggest you modify the CScriptBuilder add-on to translate special identifiers to keywords before passing the script to the compiler. The CScriptBuilder already has logic for doing a pre-compile pass on the code script and change some things, so it should be quite easy for you to implement that on your own.

Alternatively you can customize the asCTokenizer to translate the special identifiers for you.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

This topic is closed to new replies.

Advertisement