Thursday, April 4, 2013

Wiktionary - Recent changes [en]: Wiktionary:Grease pit/2013/April

Wiktionary - Recent changes [en]
Track the most recent changes to the wiki in this feed. // via fulltextrssfeed.com
Wiktionary:Grease pit/2013/April
Apr 5th 2013, 01:11

(One intermediate revision by one user not shown)
Line 48: Line 48:
   
 

:{{w|Module:String}} exists on Wikipedia. Because it doesn't exist here yet, I copied the entire code and added extra bits to it when I wrote [[Module:bo-translit]]. [[User:Wyang|Wyang]] ([[User talk:Wyang|talk]]) 01:01, 5 April 2013 (UTC)

 

:{{w|Module:String}} exists on Wikipedia. Because it doesn't exist here yet, I copied the entire code and added extra bits to it when I wrote [[Module:bo-translit]]. [[User:Wyang|Wyang]] ([[User talk:Wyang|talk]]) 01:01, 5 April 2013 (UTC)

  +
  +

::Great stuff, thank you! I also tried Thai "{{Thai|เค็ม}}" = {{Thai|เ ค ็ ม}} and Arabic "{{Arab|اَلْلُغَةُ ٱلْعَرَبِيَّةُ}}"{{LR}} = {{Arab|ا َ ل ْ ل ُ غ َ ة ُ ٱ ل ْ ع َ ر َ ب ِ ي َ ّ ة ُ}}{{LR}}. --[[User:Atitarev|Anatoli]] <sup>([[User talk:Atitarev|обсудить]]</sup>/<sup>[[Special:Contributions/Atitarev|вклад]])</sup> 01:11, 5 April 2013 (UTC)


Latest revision as of 01:11, 5 April 2013

Current code can do {{l}}'s job, and {{l}} will use the module instead of its current code soon. The aim of the module is generally handling wikilinks, though -- not just in {{l}}, but in {{term}}, head templates, and other similar templates that create wikilinks.

Some new features have been proposed at Template_talk:l#Lua-ising. The code for the features has been written and tested, we just need to gain official community consensus to implement it.

Any thoughts or suggestions would be welcomed. --Z 04:28, 1 April 2013 (UTC)

Have you tested it to make sure it works in all cases that {{l}} works, and that it doesn't do anything it shouldn't? Also, what is the purpose of Module:useful stuff? "detect_script" in particular doesn't seem like it does anything useful. And the list of languages that have automated transliteration should be in Module:languages. I also warned you not to start adding all kinds of extra code to this until we're sure that it works the way it should. —CodeCat 13:41, 1 April 2013 (UTC)
Assuming "detect_script" does what it seems to based on its name, that would be extremely useful. Several templates for multiscriptal languages like Tatar, Ladino, and Japanese have parameters that require the user to input what script an entry is in. If we can scrap that, that's be great. —Μετάknowledgediscuss/deeds 14:20, 1 April 2013 (UTC)
But what do you do when the word contains characters in multiple scripts? —CodeCat 16:40, 1 April 2013 (UTC)
That doesn't happen in Tatar or Ladino. It does happen in Japanese, and I'm not sure how it works. For example, アメリカ合衆国 is in both katakana and kanji, but is marked as katakana (in the template, that's kk). We'll have to ask a Japanese editor. —Μετάknowledgediscuss/deeds 00:45, 2 April 2013 (UTC)
Why use kk, the language code for Kazakh? In case it matters, the Japanese ISO script codes are:[1]
  • Hira: Hiragana
  • Kana: Katakana
  • Hrkt: Japanese syllabaries (alias for Hiragana + Katakana)
  • Jpan: Japanese (alias for Han + Hiragana + Katakana)
 Michael Z. 2013-04-03 21:38 z
This is totally off-topic, but I guess it's a valid complaint about the template. The answer is that it's faster to type, just like pl= (code for Polish, means plural in templates) or tr= (code for Turkish, means transliteration in templates). In cases like these, editors' ease definitely outweighs using ISO script codes, because it really doesn't matter which we use. —Μετάknowledgediscuss/deeds 23:45, 3 April 2013 (UTC)
Where is the code for the version with the proposed features?
Automatically detecting script sounds like a very good idea, imo. --Yair rand (talk) 01:22, 4 April 2013 (UTC)
Some are removed by CodeCat and you can find them in older revisions, some were moved to this module, others are in commented part of the code, e.g. recognizing reconstructed terms from "*" and linking to appendix is in prepare_title(). --Z 01:42, 4 April 2013 (UTC)
Can somebody please help me work out how to implement script recognition in {{tt-pos}}? —Μετάknowledgediscuss/deeds 02:31, 4 April 2013 (UTC)
I'm not opposed to these innovations in principle, but I do think that we should first get {{l}} to work with this module first, and keep it that way for at least a week or two so that we can be sure there are no unexpected problems. —CodeCat 03:20, 4 April 2013 (UTC)
Why would we want to get {{l}} working with it first? That might could be a while... —Μετάknowledgediscuss/deeds 04:22, 4 April 2013 (UTC)
Currently detect_script() can't be invoked from templates, it's a better idea to rewrite that template in Lua. --Z 03:29, 4 April 2013 (UTC)
Is it? I was hoping to have a model off which I might be able to design more templates with these features. —Μετάknowledgediscuss/deeds 04:22, 4 April 2013 (UTC)
It would be possible after Lua-ization of {{l}} and {{head}} and adding the ability of detecting scripts to them. --Z 05:08, 4 April 2013 (UTC)
Regarding Japanese, I have no idea about how its writing system works, but it's possible to find katakana characters of a word and tag it with Kana class, and other non-katakana characters of the word (if there is any) would be kanji, I assume? If so, it's easy to fix. Does similar thing happen in any other language? --Z 03:29, 4 April 2013 (UTC)
Not that I can think of, but we should assume so just to be safe. —Μετάknowledgediscuss/deeds 04:22, 4 April 2013 (UTC)
The module is tested, the only problem is gender/number part -- output of Module:gender and number and those of gender/number templates are not identical. That's not much of a problem though, we can use the gender templates in the module for now. --Z 05:32, 4 April 2013 (UTC)

[edit] Substring module

Thanks, Z! New question: do we have a basic string manipulation module, just to store stuff like taking a substring of a certain length from the end of a word, etc? If not, should I create Module:string or something? —Μετάknowledgediscuss/deeds 00:23, 5 April 2013 (UTC)

NP, no that's not needed, for this certain task you can simply use string.sub(). --Z 00:39, 5 April 2013 (UTC)
I see a need to decompose words in any script into components, including any diacritics and ligatures. Use: say you want to check how to pronounce a word in a complex script with diacritics - Burmese, Hindi, Bengali, Thai, Arabic, Hebrew, etc. A Devanagari syllable रा (rā) can't be looked up in Wiktionary:Hindi transliteration because it's + and you can't take out the diacritic from रा to look it up. Some word processors allow to break strings into parts. So, yes, please. Not just the substring but a break up.
Module:ko-hangul has the function syllable2Jamo, which in debug mode shows individual jamo for each hangeul ( (han) = ㅎㅏㄴ (h a n). Need to make it break up hangeul in the run mode. Tried to do it with "syllable2JamoSep" but didn't work. --Anatoli (обсудить/вклад) 00:41, 5 April 2013 (UTC)
If I understand you correctly, you need mw.ustring.gsub(text, "(.)", "%1 ") (try print(mw.ustring.gsub("रा", "(.)", "%1 ")) in console). --Z 00:58, 5 April 2013 (UTC)
Module:String exists on Wikipedia. Because it doesn't exist here yet, I copied the entire code and added extra bits to it when I wrote Module:bo-translit. Wyang (talk) 01:01, 5 April 2013 (UTC)
Great stuff, thank you! I also tried Thai "เค็ม" = เ ค ็ ม and Arabic "اَلْلُغَةُ ٱلْعَرَبِيَّةُ"‎ = ا َ ل ْ ل ُ غ َ ة ُ ٱ ل ْ ع َ ر َ ب ِ ي َ ّ ة ُ‎. --Anatoli (обсудить/вклад) 01:11, 5 April 2013 (UTC)

You are receiving this email because you subscribed to this feed at blogtrottr.com.

If you no longer wish to receive these emails, you can unsubscribe from this feed, or manage all your subscriptions