Half-Life и Adrenaline Gamer форум • Просмотр темы

Сообщения без ответов | Активные темы

Список форумов » Multilingual » English

Часовой пояс: UTC + 5 часов [ Летнее время ]

UTF-8 in Chat check letters

Страница 1 из 3

[ Сообщений: 21 ]

На страницу 1, 2, 3 След.

Версия для печати

Пред. тема | След. тема

Автор

Сообщение

abdobiskra

Заголовок сообщения: UTF-8 in Chat check letters

Добавлено: 11 май 2023, 16:12

Зарегистрирован:
22 окт 2014, 19:26
Последнее посещение:
20 июл 2024, 02:14
Сообщения: 1027

Hi, first of all Welcome back to the forum again

strlen()Not supports multi-byte characters (UTF-8 ). (is there another function similar supports multi-byte?)
I want an example of how strfind() can determine the position of a letter in a word?
I want to replace the word except for the last letter, for example, how do I do that?

_________________
https://vk.com/kgbaghl

Вернуться к началу

Lev

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 15 май 2023, 12:27

Site Admin

Зарегистрирован:
01 июн 2010, 01:27
Последнее посещение:
26 июл 2024, 12:13
Сообщения: 6871

Hi!
https://www.amxmodx.org/api/string/__functions
No words about no support of UTF-8 for strlen.
In real, strlen works for UTF-8 strings. It returns length in bytes (not chars).
To find, just use strfind with the same multi-byte string you are trying to find. It will search for byte to byte match.

Вернуться к началу

abdobiskra

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 16 май 2023, 02:04

Зарегистрирован:
22 окт 2014, 19:26
Последнее посещение:
20 июл 2024, 02:14
Сообщения: 1027

-How can I replace all letters in each word in the chat except for the last letter of each word Using the available functions?
-I specified the last letter through this function but it only works with latin (english) letters
But it didn't give the result I'm looking for in the end

▼

Код:

stock getLastChars(const in[], out[], oLen)
{
   new pos = 0, lastCharPos = 0, wordStart = 0, i = 0;
   new len = strlen(in);
   while (pos < len) {
      if ((in[pos] & 0xC0) != 0x80) { //Verify that it is the beginning of the first letter in the letter
         if (in[pos] == ' ' || in[pos] == '\n' || in[pos] == '\r' || in[pos] == '\t') { // The beginning of a new word
            // Copy the last letter of the previous word into the resulting string
            if (i < oLen) {
               out[i] = in[lastCharPos];
               i++;
            }
            // Determine the beginning of the new word
            wordStart = pos + 1;
         }
      }
      // Determine the beginning of the last letter of the word
      lastCharPos = pos;
      pos++;
   }
   // Copy the last letter of the last word in the text into the resulting string
   if (i < oLen) {
      out[i] = in[lastCharPos];
      i++;
   }
   // Add the end of the string
   if (i < oLen) {
      out[i] = '\0';
      } else {
      out[oLen - 1] = '\0';
   }
}

this my code and whati try too:

Код:

new Separate_letters_Symbol[][] = {
   "ط§", "ط¨", "طھ", "ط«", "ط¬", "ط", "ط®", "ط³", "ط´", "طµ", "ط¶", "ط¹",
   "ط؛", "ظپ", "ظ‚", "ظƒ", "ظ…", "ظ†", "ظٹ", "ط©", "ظ‰", "ظ‡", "ظ„",
   "ط¦"
}

new Connected_letters_Symbol[][] = {
   
   "ï؛ژ", "ï؛‘", "ï؛—", "ï؛›", "ï؛ں", "ï؛£", "ï؛§", "ï؛³", "ï؛·", "ï؛»", "ï؛؟â€ژ", "ï»‹",
   "ï»ڈ", "ï»“", "ï»—", "ï»›", "ï»£", "ï»§", "ï»³", "ï؛”", "ï»°", "ï»ھ", "ï»ں",
   "ï؛‹"
}

public plugin_init() {
   register_plugin( PLUGIN, VERSION, AUTHOR )
   
   register_clcmd( "say", "CheckMessage" )
   register_clcmd( "say_team", "CheckMessage" )
}


public CheckMessage(id) {
   static said[192], said_to_utf16[192], said_to_utf8[192], name[33];
   
   read_args( said, charsmax(said) )
   remove_quotes( said )
   trim( said )
   
   MultiByteToWideChar(said, said_to_utf16)

   if(isArabic(said_to_utf16)) {
   
      ReverseString(said_to_utf16)
   }
   
   WideCharToMultiByte(said_to_utf16, said_to_utf8)
   
   for( new i; i < sizeof Separate_letters_Symbol; i++ ) {
      
      new len = strlen(said_to_utf8)
      
      for (new i = 0; i < len; i++) {
         
         if (i != len - 1) {

            replace_all(said_to_utf8, charsmax(said_to_utf8), Separate_letters_Symbol[i], Connected_letters_Symbol[i]);
         }
      }
   }
   get_user_name(id, name, charsmax(name));
   
   client_print(0, print_chat, " (AR) %s : %s",name, said_to_utf8)

   return PLUGIN_HANDLED
}

_________________
https://vk.com/kgbaghl

Вернуться к началу

Lev

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 24 май 2023, 19:08

Site Admin

Зарегистрирован:
01 июн 2010, 01:27
Последнее посещение:
26 июл 2024, 12:13
Сообщения: 6871

Why do you use i variable name for outer and inner for cycles? It is a mistake.
I dunno what you are trying to do. Nor I understand why you use MultiByteToWideChar, WideCharToMultiByte, ReverseString.
Probably this is about right-to-left writing, but I dunno how people chat in these languages.
Without correct description I can't help.

Вернуться к началу

abdobiskra

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 26 май 2023, 20:33

Зарегистрирован:
22 окт 2014, 19:26
Последнее посещение:
20 июл 2024, 02:14
Сообщения: 1027

The Arabic language is one of the languages written from right to left (words and letters), but inside the game it is not written correctly
For example: "(سلام عليكم) = (slam alaykum)" is written like this in the game "(م ك ي ل ع _ م ا ل س) = ( m l a s _ m u k y a l a )" ( i.e. in the form of separate letters and from left to right in an incomprehensible way.
The letters in the Arabic language are separated when written alone or in when they are at the end of the word, such as the letters of the alphabet: "(أ ب ت ث ...etc)= (a b c d ... etc)" and at the end of the word, for example: "(سلام) = (slam)" The letter "(م) = (M)" is separated and its form is at the beginning or middle of the word It is connected like this: (مــ)
What I'm trying to do in the plugin is I'm going to replace the separate characters with the cursive characters and I'm flipping those characters from right to left in order for them to be intelligible and to do that I have to convert the string to UTF-16 and then reverse it and then convert it back to UTF-8 that's what they advised me, have had satisfactory results so far
The problem remained in checking other matters and fixing them, including the last letter that should take the form of a separate letter, and what I thought about was replacing the letters in the word and ignoring the replacement of the last letter for it. I tried with it, but I do not find the appropriate way to do so.

Posted after 15 minutes 47 seconds:

Lev писал(а):

Why do you use i variable name for outer and inner for cycles? It is a mistake.

Yes, sorry, I had sent the wrong code. I used "j" inside the loop, but now I dispensed with all that and extracted the word without the last letter, as well as extracting the last letter of the word alone and then collecting them, but it is a silly method where the result is that there is a space between them, and I could not generalize it to all words in a sentence (i.e. every word in a sentence must have its last letter separated) it's complex :pardon:

and that's the whole code I'm using :

▼

Код:

#include <amxmodx>


#pragma ctrlchar '\'

#define PLUGIN "AR chat Fix HL"
#define VERSION "0.0"
#define AUTHOR "abdo"

#define getLastChar_UTF8(%1,%2,%3) copy(%1, min(%2, get_char_bytes(%3)), %3)

new Separate_letters_Symbol[][] = {
   "ط§", "ط¨", "طھ", "ط«", "ط¬", "ط", "ط®", "ط³", "ط´", "طµ", "ط¶", "ط¹",
   "ط؛", "ظپ", "ظ‚", "ظƒ", "ظ…", "ظ†", "ظٹ", "ط©", "ظ‰", "ظ‡", "ظ„",
   "ط¦"
}

new Connected_letters_Symbol[][] = {
   
   "ï؛ژ", "ï؛‘", "ï؛—", "ï؛›", "ï؛ں", "ï؛£", "ï؛§", "ï؛³", "ï؛·", "ï؛»", "ï؛؟â€ژ", "ï»‹",
   "ï»ڈ", "ï»“", "ï»—", "ï»›", "ï»£", "ï»§", "ï»³", "ï؛”", "ï»°", "ï»ھ", "ï»ں",
   "ï؛‹"
}

public plugin_init() {
   register_plugin( PLUGIN, VERSION, AUTHOR )
   
   register_clcmd( "say", "CheckMessage" )
   register_clcmd( "say_team", "CheckMessage" )
}


public CheckMessage(id) {
   static said[192], said_to_utf16[192], said_to_utf8[192], name[33];
   
   read_args( said, charsmax(said) )
   remove_quotes( said )
   trim( said )
   
   get_user_name(id, name, charsmax(name));
   
   MultiByteToWideChar(said, said_to_utf16)

   if(isArabic(said_to_utf16)) {
   
      ReverseString(said_to_utf16) 
   
      WideCharToMultiByte(said_to_utf16, said_to_utf8)
      
      for( new i; i < sizeof Separate_letters_Symbol ; i++ ) {
         
         replace_all(said_to_utf8, charsmax(said_to_utf8), Separate_letters_Symbol[i], Connected_letters_Symbol[i]);
      }
      new last_leter[192]
      
      getLastChar_UTF8(last_leter, charsmax(last_leter), said_to_utf8)
      
      for( new i; i < sizeof Connected_letters_Symbol ; i++ ) {
         
         replace_all(last_leter, charsmax(last_leter), Connected_letters_Symbol[i], Separate_letters_Symbol[i]);
      }
      
      
      new said_utf8[192];
      
      getWord_UTF8(said_utf8, charsmax(said_utf8), said_to_utf8)
      
      new result[192];
      formatex(result, charsmax(result), "%s%s", last_leter, said_utf8);

      client_print(0, print_chat, "(AR) %s :defult : %s | lastLetters : %s | FirstLetters : %s | Results: %s", name, said_to_utf8, last_leter, said_utf8, result)
   }else {
      client_print(0, print_chat, "%s : %s", name, said)
   }
   
   
   return PLUGIN_HANDLED
}


// to get word without last letter
stock getWord_UTF8(out[], oLen, const in[], iLen = 0)
{
   if (!iLen) iLen = strlen(in);
   new cnt = 1;
   for (new pos = iLen - 1; ((in[pos] & 0xC0) == 0x80) && ((in[pos] & 0x40) == 0); cnt++)
   {
      pos--;
   }
   return copy(out, oLen, in[cnt - 1]);
}



stock MultiByteToWideChar(const mbszInput[], wcszOutput[])
{
   new nOutputChars = 0; 
   for (new n = 0; mbszInput[n] != EOS; n++) { 
      if (mbszInput[n] < 0x80) { // 0... 1-byte ASCII 
         wcszOutput[nOutputChars] = mbszInput[n]; 
         } else if ((mbszInput[n] & 0xE0) == 0xC0) { // 110... 2-byte UTF-8 
         wcszOutput[nOutputChars] = (mbszInput[n] & 0x1F) << 6; // Upper 5 bits 
         
         if ((mbszInput[n + 1] & 0xC0) == 0x80) { // Is 10... ? 
            wcszOutput[nOutputChars] |= mbszInput[++n] & 0x3F; // Lower 6 bits 
            } else { // Decode error 
            wcszOutput[nOutputChars] = '?'; 
         } 
         } else if ((mbszInput[n] & 0xF0) == 0xE0) { // 1110... 3-byte UTF-8 
         wcszOutput[nOutputChars] = (mbszInput[n] & 0xF) << 12; // Upper 4 bits 
         
         if ((mbszInput[n + 1] & 0xC0) == 0x80) { // Is 10... ? 
            wcszOutput[nOutputChars] |= (mbszInput[++n] & 0x3F) << 6; // Middle 6 bits 
            
            if ((mbszInput[n + 1] & 0xC0) == 0x80) { // Is 10... ? 
               wcszOutput[nOutputChars] |= mbszInput[++n] & 0x3F; // Lower 6 bits 
               } else { // Decode error 
               wcszOutput[nOutputChars] = '?'; 
            } 
            } else { // Decode error 
            wcszOutput[nOutputChars] = '?'; 
         } 
         } else { // Decode error 
         wcszOutput[nOutputChars] = '?'; 
      } 
      
      nOutputChars++; 
   } 
   wcszOutput[nOutputChars] = EOS; 
}
stock WideCharToMultiByte(const wcszInput[], mbszOutput[])
{ 
   new nOutputChars = 0; 
   for (new n = 0; wcszInput[n] != EOS; n++) { 
      if (wcszInput[n] < 0x80) { 
         mbszOutput[nOutputChars++] = wcszInput[n]; 
         } else if (wcszInput[n] < 0x800) { 
         mbszOutput[nOutputChars++] = (wcszInput[n] >> 6) | 0xC0; 
         mbszOutput[nOutputChars++] = (wcszInput[n] & 0x3F) | 0x80; 
         } else { 
         mbszOutput[nOutputChars++] = (wcszInput[n] >> 12) | 0xE0; 
         mbszOutput[nOutputChars++] = ((wcszInput[n] >> 6) & 0x3F) | 0x80; 
         mbszOutput[nOutputChars++] = (wcszInput[n] & 0x3F) | 0x80; 
      } 
   } 
   mbszOutput[nOutputChars] = EOS; 
}

stock ReverseString(toggle[]) 
{ 
   for(new i = strlen(toggle) - 1, j = 0, temp ; i > j ; i--, j++) 
   { 
      temp = toggle[i]; 
      toggle[i] = toggle[j]; 
      toggle[j] = temp; 
   } 
}


stock isEnglish(const szString[])
{
   new i = 0;
   new ch;
   while((ch = szString[i]) != EOS)
   {
      if(0x21 <= ch <= 0x7F) 
         return true;
      
      i++;
   }   
   return false;
}

stock isArabic(const szString[])
{
   new i = 0;
   new ch;
   while((ch = szString[i]) != EOS)
   {
      if(0x621 <= ch && ch <= 0x64A) // تشمل الحروف العربية الأساسية
         return true;
      
      if(0x660 <= ch && ch <= 0x669) // تشمل الأرقام العربية
         return true;
      
      i++;
   }   
   return false;
}

Results:

Вложения:

Capture.PNG [ 109.32 КБ | Просмотров: 1577 ]

_________________
https://vk.com/kgbaghl

Вернуться к началу

Lev

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 28 май 2023, 15:26

Site Admin

Зарегистрирован:
01 июн 2010, 01:27
Последнее посещение:
26 июл 2024, 12:13
Сообщения: 6871

The first thing you should try to achive is to output text to the client screen so it looks as required. For that I advise you to move to the byte level. If you see that at least part of the sentence looks correctly - capture it and analyze the byte order. Check the end letter case, probably you will be able to add some bytes (text, spaces, dots) so the last letter will appear not separated.
After you will get the output byte sequence that will looks good, you can start to deal with the input byte sequence to convert it in the correct form.

Вернуться к началу

abdobiskra

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 08 июн 2023, 00:20

Зарегистрирован:
22 окт 2014, 19:26
Последнее посещение:
20 июл 2024, 02:14
Сообщения: 1027

Lev писал(а):

The first thing you should try to achive is to output text to the client screen so it looks as required.

Код:

      MultiByteToWideChar(said, said_to_utf16)

   if(isArabic(said_to_utf16)) {
   
      ReverseString(said_to_utf16) 
   
      WideCharToMultiByte(said_to_utf16, said_to_utf8)
      
      for( new i; i < sizeof Separate_letters_Symbol ; i++ ) {
         
         replace_all(said_to_utf8, charsmax(said_to_utf8), Separate_letters_Symbol[i], Connected_letters_Symbol[i]);
      }

In this part of the above code I actually captured the chat line as I should see it (said_to_utf8)
(the position of the letters in the word and the position of the words were converted from right to left, then the non-connected letters were replaced and converted to connected letters in the word)

Lev писал(а):

For that I advise you to move to the byte level. If you see that at least part of the sentence looks correctly - capture it and analyze the byte order. Check the end letter case, probably you will be able to add some bytes (text, spaces, dots) so the last letter will appear not separated.

Код:

      new last_leter[192]
      
      getLastChar_UTF8(last_leter, charsmax(last_leter), said_to_utf8)
      
      for( new i; i < sizeof Connected_letters_Symbol ; i++ ) {
         
         replace_all(last_leter, charsmax(last_leter), Connected_letters_Symbol[i], Separate_letters_Symbol[i]);
      }

Yes, in this part of the code, I think I did that by capturing the last letter of the word in the chat through its last output in the previous code, then I returned the last letter of a connected letter to a separate letter (contrary to what I did in the previous code)

Код:

      new said_utf8[192];//First Letters
      
      getWord_UTF8(said_utf8, charsmax(said_utf8), said_to_utf8)
      
      new result[192];
      formatex(result, charsmax(result), "%s%s", last_leter, said_utf8);

Here I captured the word except for the last letter and then output them in one interface in the results (
i.e. the last letter that was previously replaced with the word without the last letter)
The results are somewhat unsatisfactory because there is a space between the last letter and the word.
Result what i get : م(space)سلا "As attached in the picture"
Result what i need : سلام "like as defult output (UP)result but form of last letter should be (م) not (مـ) "
I am wondering if there is another way to analyze the letters inside each word and ignore the replacement of the last letter, i.e. leave it as it is (i.e. replace the letters of words without replacing the last letter of it)

Lev писал(а):

After you will get the output byte sequence that will looks good, you can start to deal with the input byte sequence to convert it in the correct form.

Could you give me a simple example to make it clearer?

_________________
https://vk.com/kgbaghl

Вернуться к началу

Lev

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 08 июн 2023, 15:27

Site Admin

Зарегистрирован:
01 июн 2010, 01:27
Последнее посещение:
26 июл 2024, 12:13
Сообщения: 6871

abdobiskra писал(а):

Could you give me a simple example to make it clearer?

You can start with just single-line plugin:

Код:

client_print( "Your text in right-to-left-form" )

And check how it looks on the client. If you will manage to output text correctly, bring that string from the plugin and chat captured string that you wish to convert to that output. And then I probably could help you to mangle it.

Вернуться к началу

abdobiskra

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 09 июн 2023, 03:00

Зарегистрирован:
22 окт 2014, 19:26
Последнее посещение:
20 июл 2024, 02:14
Сообщения: 1027

I think I did it above?

▼

Код:

public CheckMessage(id) {
   static said[192], said_to_utf16[192], said_to_utf8[192], name[33];
   
   read_args( said, charsmax(said) )
   remove_quotes( said )
   trim( said )
   
   get_user_name(id, name, charsmax(name));
   
   MultiByteToWideChar(said, said_to_utf16)
   
   if(isArabic(said_to_utf16)) {
   
      ReverseString(said_to_utf16) 
   
      WideCharToMultiByte(said_to_utf16, said_to_utf8)
      
      
      for( new i; i < sizeof Separate_letters_Symbol ; i++ ) {
   
         replace_string(said_to_utf8, charsmax(said_to_utf8), Separate_letters_Symbol[i], Connected_letters_Symbol[i]);
      }
      
      client_print(0, print_chat, "(Valve) %s : %s", name, said)
      client_print(0, print_chat, "(Plugin) %s : %s", name, said_to_utf8)
      
   }else {
      client_print(0, print_chat, "(ENG) %s : %s", name, said)
   }
   return PLUGIN_HANDLED
}

(Valve) : It's how letters and words appear from left to right in a normal chat
(Plugin) : After the modification to the chat through the functions available in the plugin
The results of the plugin are missing the last letter, either ignoring its replacement or returning it to its shape .. or any method that works

Цитата:

(Valve) Abdo : م ك ل ا ح () ف ي ك () م ك ي ل ع () م ال س
(Plugin) Abdo : سلامـ عليكمـ كيفـ حالكمـ
I want it like that : سلام عليكم كيف حالكم
You can notice the last letter of the words as i want them

Вложения:

Capture.PNG [ 1.36 КБ | Просмотров: 1499 ]

_________________
https://vk.com/kgbaghl

Вернуться к началу

Lev

Заголовок сообщения: Re: UTF-8 in Chat check letters

Добавлено: 13 июн 2023, 22:36

Site Admin

Зарегистрирован:
01 июн 2010, 01:27
Последнее посещение:
26 июл 2024, 12:13
Сообщения: 6871

Byte-to-byte comparision of two texts reveals extra bytes.

Вложение:

RTL.jpg [ 98.36 КБ | Просмотров: 1465 ]

Top text is "سلامـ عليكمـ كيفـ حالكمـ", bottom: "سلام عليكم كيف حالكم"
Try to remove these bytes from string.

Вернуться к началу

Страница 1 из 3

[ Сообщений: 21 ]

На страницу 1, 2, 3 След.

Список форумов » Multilingual » English

Часовой пояс: UTC + 5 часов [ Летнее время ]

Кто сейчас на конференции

Сейчас этот форум просматривают: Google [Bot] и гости: 2

Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете добавлять вложения

Перейти: