| Wiktionary:Grease pit/2013/March Mar 23rd 2013, 01:10 | | | | Line 692: | Line 692: | | | ::::::: Maybe this is an opportunity to try out a single template for headword and sense line(s), incorporating HTML <code>dfn</code> and <code>dl</code>. ''—[[User:Mzajac |Michael]] [[User talk:Mzajac |Z.]] <small>2013-03-22 23:04 z</small>'' | | ::::::: Maybe this is an opportunity to try out a single template for headword and sense line(s), incorporating HTML <code>dfn</code> and <code>dl</code>. ''—[[User:Mzajac |Michael]] [[User talk:Mzajac |Z.]] <small>2013-03-22 23:04 z</small>'' | | | ::::::::Yes. It's probably long past time to kill off this idea of wiki-style participation here. I say let there be an apprenticeship period, no edits from non-whitelisted users without approval, etc, qualifying exams for would be template writers, HTML and CSS qualifying exams for adminship. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 23:34, 22 March 2013 (UTC) | | ::::::::Yes. It's probably long past time to kill off this idea of wiki-style participation here. I say let there be an apprenticeship period, no edits from non-whitelisted users without approval, etc, qualifying exams for would be template writers, HTML and CSS qualifying exams for adminship. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 23:34, 22 March 2013 (UTC) | | | + | ::::::::: you don't think one well-designed template could be made more accessible for editors than two vaguely unrelated templates? ''—[[User:Mzajac |Michael]] [[User talk:Mzajac |Z.]] <small>2013-03-23 01:10 z</small>'' | | | | | | | | == Reminder of Lua help session in a few hours == | | == Reminder of Lua help session in a few hours == |
Latest revision as of 01:10, 23 March 2013 I started a new page WT:CSS, because it was pointed out that our main style sheet is not documented. —Michael Z. 2013-03-01 19:34 z [edit] translation tables slow I think starting with the introduction of Web fonts (but I'm not sure), it now takes a long time for translation tables to drop-down-able. (Newest Firefox for Mac; newest Firefox for Windows.) Is there anything to be done about this?—msh210℠ (talk) 06:42, 4 March 2013 (UTC) - I suspect that it's not the translation tables, per se, but just general slowness. I could be wrong, but I believe that pretty much nothing clickable becomes active until the page is finished being drawn. I've noticed slowness everywhere- not just on pages with web fonts. Chuck Entz (talk) 06:59, 4 March 2013 (UTC)
- Over the years this Wiki has become slower and slower. Every time somebody adds a bit more cleverness, or adds complications to a template, or replaces simple text with a template allowing users to change its appearance, it gets a little bit slower. I think we should stamp down on added cleverness, and maybe roll back some we already have. KISS SemperBlotto (talk) 08:11, 4 March 2013 (UTC)
- I wonder too how much past cleverness might have simpler and more elegant solutions, now that folks have a clearer idea what we want for the site. I know that some processes I've encountered at various job sites have fossilized from years and years of past half-formed ideas about what was required, and sitting down and looking at specific inputs and required outputs can often lead to a much more streamlined way of doing things. -- Eiríkr Útlendi │ Tala við mig 16:19, 4 March 2013 (UTC)
- Isn't {{t}} and its relatives a reasonable suspect for performance problems? Can't someone figure out a way for Lua to improve performance. The number of large and very large translations tables seems to be growing faster than average page size. DCDuring TALK 18:23, 4 March 2013 (UTC)
- If Scribunto really can make templates like {{t}}, {{context}} and other big templates load faster, it could be a very useful tool for improving the usability of our larger pages. Mglovesfun (talk) 18:55, 4 March 2013 (UTC)
- Have you tried turning off WebFonts in the preferences? -- Liliana • 18:27, 4 March 2013 (UTC)
- Good point. Okay, two more bits of information: (1) I think it only happens the first time I load an entry (any entry, not the specific one I'm looking at) in a cacheless browser. (2) I just tried it with WebFonts off and it didn't happen. (Or if it did then it was small enough a delay that I didn't notice it, which was not my experience previous times.)—msh210℠ (talk) 19:50, 4 March 2013 (UTC)
- Just what I suspected: downloading all these fonts takes a long time and the browser is effectively frozen while it happens. This makes WebFonts a real nuisance. -- Liliana • 20:26, 4 March 2013 (UTC)
-
-
- how do we make we bfonts opt-in, on a per-language basis? My browser downloads 1.6 MB of web fonts on this site, but needs zero of them to display all of the languages on the Main Page. This is irresponsible for any website, much less one that should be accessible to people with poor network bandwidth. —Michael Z. 2013-03-04 20:42 z
-
-
-
- Wow, I thought the whole webfonts feature only kicked in if the browser was missing a font required to display a given page. I had no idea it was causing downloads even when not needed. That's not good. -- Eiríkr Útlendi │ Tala við mig 20:49, 4 March 2013 (UTC)
- It only loads a font when there's some text on the page that is set to use a specific font and the user doesn't already have that particular font on their computer, even if they already have some other font that could display the text, I think. --Yair rand (talk) 21:08, 4 March 2013 (UTC)
- I've been having this issue since December. No idea what's causing it. --Yair rand (talk) 21:04, 4 March 2013 (UTC)
So who chose the web font files for Wiktionary? Are they the same fonts that we are specifying in MediaWiki:Common.css? —Michael Z. 2013-03-05 00:15 z - The WebFonts default settings, which we're currently using and can get changed through bugzilla, apply certain fonts to certain languages. Since this doesn't cover all of our uses, there are also some set from Common.css. --Yair rand (talk) 01:28, 5 March 2013 (UTC)
-
- What are the default settings (the docs[1] only list all supported languages)? Which fonts set from Common.css? Where is our documentation for this? Which specific browser or OS inadequacies are we serving fonts for? —Michael Z. 2013-03-05 01:58 z
- "Supported languages" seems to mean languages which fonts are served for by default. The only fonts set from Common.css are for {{Bugi}}, {{Ethi}}, and {{Mymr}}, I think. --Yair rand (talk) 02:05, 5 March 2013 (UTC)
-
-
-
- How do I find out which ones? How were the requirements determined? I hope we don't add another 300 kB to the page load just because one editor requests their favourite font. I can't afford to have my mobile plan notch up a tier just because I visit Wiktionary five times during a month. —Michael Z. 2013-03-05 02:24 z
[edit] Modules can now be documented There has been an update to Scribunto, which automatically transcludes the documentation subpage onto the top of the page. They can be used to provide nicely-formatted documentation of the module, and also allow you to categorise it (put the category in <includeonly> tags). Documentation subpages are treated as "special" by the software. Unlike subpages with other names, they are not interpreted as modules. The documentation subpage's name can be changed by editing MediaWiki:scribunto-doc-subpage-name. Its default value is "doc", but I've changed it to "documentation" per WT:RFM#Documentation subpages to /documentation. —CodeCat 22:19, 6 March 2013 (UTC) - A recent update to Scribunto [2] has changed the way documentation pages are handled, it's now at MediaWiki:Scribunto-doc-page-name instead. I've updated it accordingly. —CodeCat 16:33, 15 March 2013 (UTC)
This script generates sense-lines of the form {{form of|[[lemma]]|lang=foo}}. Unless I'm mistaken, it no longer needs to explicitly wikilink the terms, because the templates create links automatically and our page counter no longer relies on the presence of square brackets. Also: we could discuss whether to update it to use {{head|foo|partofspeech?}} rather than '''pagename'''. - -sche (discuss) 18:01, 9 March 2013 (UTC) - I would definitely agree with using {{head}}, although I'm not sure if a PoS is needed, since many form-of templates themselves already add categories. I don't know if that is desirable, but that's a separate question. Also, I think it would be a good idea to replace all existing cases where such raw links are still in use. Could someone make a list of all templates that still allow such usage? I can then add a cleanup category to them, and run a bot script to update all the usages so that we can finally abandon this "legacy". —CodeCat 18:08, 9 March 2013 (UTC)
- What is the advantage, apart from uniformity, to having {{head}} instead of using PAGENAME for, let's say, English? Why would we want to have such a vast number of transclusions of a single template? DCDuring TALK 18:47, 9 March 2013 (UTC)
- For English, the advantage is that it is consistent with our intended coding of headwords elsewhere. There is somewhat of a consensus to move towards more CSS-based formatting combined with making better use of semantic HTML and classes rather than hard-coded formatting. One of those things is to write headwords as <strong class="headword" lang="foo">word</strong>, which we've already started doing for several templates and modules and which I would definitely consider a good thing. However, if we use plain bold text for English, then that would make English inconsistent with all other languages. —CodeCat 19:04, 9 March 2013 (UTC)
- Is it worth calling {{head}} on more pages for this reason? Doesn't the extra template call to a relatively large template slow down the loading of pages? Mglovesfun (talk) 20:04, 9 March 2013 (UTC)
- It's not really a very large template, and when it's converted to Lua it will be quite a bit faster because Lua can easily support any number of optional parameters the way {{head}} uses them, without any significant slowdown. And anyway, {{head}} isn't really called that often per page... {{l}} is called, on average, more often within any single language section than {{head}} is called on any given page (to put that differently: most entries have more links than pages have entries). —CodeCat 20:25, 9 March 2013 (UTC)
- Yes, please get right of square brackets. User:Mglovesfun/vector.js has a line (more than one in fact) to get rid of square brackets from templates that do literally nothing. Mglovesfun (talk) 20:30, 9 March 2013 (UTC)
- Ok, then I would like to have a list of all the templates that currently contain code to allow raw-linking in their parameter. I have already noted {{form of}}, which is used by many other templates as well; it now adds entries to Category:Entries using form-of templates with a raw link. You can recognise the templates because they use {{isValidPageName}}. Come to think of it... are there any other uses for that template at all? —CodeCat 14:05, 11 March 2013 (UTC)
[edit] Improving how module documentation currently displays Currently, when a module needs documentation, it shows a link, like on Module:User:CodeCat. But most of the time, we only want/need the documentation page to put the module in a category, so once we create it, it ends up transcluding an empty page and looks like this: Module:eo-conj. I wonder if that could be improved, because it seems like a problem in a few ways. Firstly, there is no indication that anything at all has been transcluded, unlike what {{documentation}} displays. Secondly, there is no link to the documentation page itself; this would be fixed by fixing the previous problem, but a tab like we have on Template: pages would also be a good idea. And finally, it seems rather pointless for Scribunto to think that it has transcluded documentation. But all it has really transcluded is a category, so it ends up showing a horizontal rule with nothing above it, which leaves you to guess about what it means. —CodeCat 18:04, 9 March 2013 (UTC) [edit] List request. Hi. I'm trying to insure that all English plurals belonging in the categories, Category:English plurals ending in "-ies", Category:English plurals ending in "-es", and Category:English irregular plurals ending in "-ves", are properly categorized. However, as we currently have 115,950 English plurals, weeding through that list is proving to be excessive. Can someone with the technical knowhow generate individual lists of all English plurals ending in "ies", "ses", "xes", "ches", "shes", and "ves", preferably limited to terms which are not already in the aforementioned categories? I will then plow through the lists and fix the ones which need to be categorized. (I suppose this could be automated entirely if someone could make a bot that understood that plurals like "waves" are normal formations while plurals like "pelves" are an "-es" formation and plurals like "wives" and "wolves" are a "-ves" formation). Cheers! bd2412 T 03:22, 11 March 2013 (UTC) - I noticed that you've been adding that category to many pages. However, when {{en-noun}} is converted to Lua, that will all become redundant, because Lua can easily perform the categorization itself, automatically. —CodeCat 13:44, 11 March 2013 (UTC)
- I'm afraid I don't know what Lua is, or how it would perform such categorization. Although some of these pluralizations are predictable, it would need to know for example that "leaf" becomes "leaves" while "waif" becomes "waifs". bd2412 T 01:54, 12 March 2013 (UTC)
- Re: what Lua is: See Wiktionary:Scribunto. Re: knowing that "leaf" becomes "leaves" while "waif" becomes "waifs": Well, technically, that information is already embedded in the templates; [[leaves]] contains {{plural of|leaf}}, for example. But I'm not sure how useful that fact is, since {{plural of}} is not English-specific, so we wouldn't really want to "contaminate" it with this sort of categorization information. (Though to be honest, I'm not sure these categories should exist, anyway; wouldn't it be better for [[leaf]] to be in Category:English nouns with irregular plurals in "-ves"? The latter, in addition to being preferable in general IMHO, is also doable by Luicizing {{en-noun}}.) —RuakhTALK 02:25, 12 March 2013 (UTC)
- I don't see a conflict between having leaf in a category for nouns having a certain kind of irregular plural, and having leaves in a category for nouns being that kind of irregular plural. I think the categorization would be particularly useful, given that leafs exists (as a form of the verb, to leaf), and that similar instances occur of words existing that readers might mistakenly assume to be the regular plural form of words with irregular plurals. If someone would be so kind as to generate the aforementioned lists, I will gladly effect this categorization in a matter of hours. bd2412 T 02:52, 12 March 2013 (UTC)
I think the question of how we should really handle language-codes (etc.) is incredibly complex, because languages are incredibly complex, and there are a lot of just-slightly-independent dimensions (e.g. WMF language prefix vs. ISO language code vs. HTML language tag); but I don't think we wait until we've hammered that stuff out (or even started hammering it out) before we start taking advantage of Scribunto. So, how to take advantage of Scribunto, without hammering out the issues surrounding language codes? One option is to require that language-manipulation be handled in template-space, before invoking Lua; so, for example, Template:context would call {{languagex}} to get the language-name for a given code, and would pass that in to the Scribunto module it uses. The problem with this option — or at least, one problem with this option — is that {{languagex}} is exactly the sort of expensive template that Scribunto is supposed to help us move away from. Another option is just to create Module:lang now, with the intent of improving it later. The problem with this option is that any real improvements will probably require fundamental changes that will break everything that uses the module. So instead, I'd like to suggest that we create Module:lang/legacy ("legacy" being a software-engineering term describing an old system that's still in use but does things in ways that are now considered less than ideal), with a more-or-less direct translation of what we've got now. It would then be pretty straightforward to Luacize existing templates without making any breaking changes to them; and then, at some glorious future date when Module:lang is ready, we can slowly modify these templates to take advantage of its luminous beauty. Are people O.K. with that general approach? If so, I'll set about creating Module:lang/legacy, and will post back here for further feedback before we actually start using it. —RuakhTALK 05:09, 11 March 2013 (UTC) - Isn't that more or less what Module:languages already does? It is pretty much a direct import of the language code templates, and I haven't made any other changes. —CodeCat 13:41, 11 March 2013 (UTC)
-
- Yeah, I noticed that module later. (And I noticed that you hadn't started using it yet, presumably because you wanted to gather input first? If so, I appreciate your caution.) So basically what I'm proposing is (1) that Module:languages be moved to Module:lang/legacy (or Module:languages/legacy if you prefer); (2) that it be changed to match our current structure more precisely (e.g., proto: and so on); and (3) that it be a table of functions (corresponding to existing templates like {{langnamex}}) rather than of raw data. (The raw data could still be exported as p.data or something, but the current approach has the module only include raw data, which is unfortunate.) —RuakhTALK 15:12, 11 March 2013 (UTC)
-
-
- I did post about it on the BP or GP (I don't remember which). And I haven't started using it because of the speed issues it has, which are discussed on the talk pages. However, the good news is that they've added a new function specifically for this case. It imports data as read-only, but allows it to be shared by all invocations on a page. So while a single use of that module is still somewhat expensive, it would never be imported more than once per page so it is not a problem. I'm not sure what the use would be of your proposal though. I realise that it would be for compatibility reasons, but even then I don't see the purpose of converting it into a table of functions. Also, one of the caveats with the read-only import is that the imported table can't contain functions, only raw data. —CodeCat 16:50, 11 March 2013 (UTC)
-
-
-
- Re: "I did post about it on the BP or GP (I don't remember which)": I'm almost positive that you didn't. You did post about User:CodeCat/Module:lang, though, which may be what you're thinking of. Re: read-only import: Well then, the data can go in Module:lang/legacy/data. :-) —RuakhTALK 02:34, 12 March 2013 (UTC)
-
-
- Ok, after thinking about it a bit more I think I understand. You are asking for a kind of "glue" module between old code and the language data. But in the case of {{languagex}} I don't see much of a point. After all, a Lua call like languages_legacy.languagex("fr") would just translate to languages["fr"][1]. There is an alternative though, if you like the idea of wrapper functions around raw data. Lua supports so-called metatables, which are tables that really have accessor functions behind them. Metatables, being functions, can't be included in a read-only module though. —CodeCat 16:56, 11 March 2013 (UTC)
-
-
-
- But languages_legacy.languagex("gem-pro") would translate to languages["proto:gem-pro"][1], because of the {{langprefix}} ugliness. (I'm quite seriously proposing that we reproduce exactly what we have now, including the stuff that no one likes, because there is still no agreement on how to improve that stuff. What I'm proposing is that we create a clearly-demarcated "legacy" area that allows us to migrate existing templates to Lua without breaking them.) —RuakhTALK 02:34, 12 March 2013 (UTC)
- Any comments? —RuakhTALK 04:31, 22 March 2013 (UTC)
[edit] form of template bug {{feminine of|calmo#Adjective|calmo}} displays - feminine form of calmo#Adjective
For historical purposes when that gets fixed, it is: - feminine form of calmo#Adjective
This syntax used to work, and I'm not sure why it doesn't. I guess... calmo#Adjective isn't a valid page name. Is my guess right? Mglovesfun (talk) 11:43, 11 March 2013 (UTC) - Oddly, I think it's actually the code that allows putting raw links into form-of templates that is the cause of this. You're right, it's not a valid page name, and that's what that code goes by to determine whether something is a raw link or just a page name. So it treats its parameter as if it were a raw link, except it's not a link. However, once that code is removed, it should work. On the other hand, the template is missing a language parameter, so that still needs to be fixed. Another thing to consider is that there are probably several #Adjective sections on any given page, so the current approach doesn't actually do what it's intended to do. What you really want is to link to the adjective section of whatever language it is, but I don't think that is currently possible. I think if we have to choose between linking to #Adjective and linking to #language, the latter is preferable. —CodeCat 13:49, 11 March 2013 (UTC)
- Special:WhatLinksHere/calmo#Adjective seems to be valid, mind you. Mglovesfun (talk) 19:47, 11 March 2013 (UTC)
- Yes, but in a very sneaky way. Notice that when you actually visit the page, it shows links to "calmo" alone. When your browser sees that URL, it actually strips off the # part, so the webserver never sees #Adjective. If you ever actually sent a request for "calmo#Adjective" to the server, it would probably shout at you for providing an invalid request. :) —CodeCat 20:43, 11 March 2013 (UTC)
[edit] Javascript to tackle 404-errors I previously posted it in the beer corner, but I figured the grease pit might be more appropriate: I rewrote my example userscript which, upon hitting a 404 error page scans other wiktionaries to see if the word exists there, and if so, displays them as interwiki. Enable the userscript at User:Stratoprutser/404_native.js and test it out with klompvoet, danim, or real non existing words. -- Stratoprutser (talk) 13:34, 11 March 2013 (UTC) [edit] No bot owner template? I miss this template from the English Wikipedia, and it seems hard to introduce here. --Njardarlogar (talk) 17:54, 11 March 2013 (UTC) - {{bot owner}} should be fine. Mglovesfun (talk) 19:47, 11 March 2013 (UTC)
- Or "I operate NjardarBot (talk • contribs)." No need for a userbox. :-) (If you really want a userbox, by the way, then that's a policy matter, not a technical question, and belongs at BP, not here.) —RuakhTALK 02:36, 12 March 2013 (UTC)
- We don't need babel boxes, personal fluency levels could be included the text with more detailed specifications. We don't need user pages either, we could include all that information on Wiktionary:Stasi files.
- We should have {{bot owner}} because it standardises and makes more accessible highly relevant user information. --Njardarlogar (talk) 08:43, 12 March 2013 (UTC)
[edit] Etymology trees A lot of proto- language entries duplicate some of the descendants content. For example, if a Proto-Germanic word is descended from a PIE word, the PG descendants are duplicated in the PIE entry. These often get out of sync, and require many edits to synchronize. Some entries (most?) even just don't duplicate them at all, and require the reader to click the link to find out the further descendants. Couldn't this be fixed by putting the entire tree into a standalone wiki page (maybe in the appendix or template space) and having lua scripts run through the things to pull out the relevant parts? Is this feasible? --Yair rand (talk) 21:31, 12 March 2013 (UTC) - That could work, but it could turn rather nasty in itself if we have to deal with sub-descendants and sub-sub... For example, part of the Germanic tree would be duplicated on an Old Dutch entry, and part of its tree would in turn go in a Middle Dutch entry. So while it's a good idea, we should be very clear about when it should be applied and when not. Also, another point to consider is that a single "line" in the PIE descendants might have several words in it, each of which might have a separate entry and a list of descendants of its own; see *bʰerǵʰ- for an example. If we go with your approach, those would have to be split into several lines. —CodeCat 21:40, 12 March 2013 (UTC)
- If we're using Lua I assume we would be able to give it instructions as to which parts of the tree to display (for example, in a Dutch entry you may want to not go back all the way to the PIE root). So I don't see duplication as a problem with this approach.
- I don't understand your point about multiple "lines" for one entry- can you restate it? DTLHS (talk) 23:41, 12 March 2013 (UTC)
- At Appendix:Proto-Indo-European/bʰerǵʰ-, the Germanic line lists two separate Proto-Germanic forms. Both of these forms are derived from the same PIE etymon, but they're separate forms, and have separate descendants. So the descendants of the PIE etymon form a tree in multiple dimensions: not necessarily just one branch per daughter language. —RuakhTALK 06:58, 13 March 2013 (UTC)
[edit] 404 errors? I keep getting this error randomly when I visit pages: Not Found The requested URL /w/index.php was not found on this server. Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request. Is anyone else getting that too? It's very annoying... —CodeCat 00:23, 13 March 2013 (UTC) - Me too. DCDuring TALK 00:51, 13 March 2013 (UTC)
- On 'pedia too. Which is a good thing, because that means somebody will actually care and if it's a fixable problem, it'll be fixed soon. —Μετάknowledgediscuss/deeds 01:02, 13 March 2013 (UTC)
[edit] Genitive of proper nouns The template {{genitive of}} puts words into the appropriate "... noun forms" category. Is that appropriate for proper nouns? See Kleinasiens as an example. SemperBlotto (talk) 11:35, 13 March 2013 (UTC) - pos=proper noun. Mglovesfun (talk) 11:42, 13 March 2013 (UTC)
- OK - that puts it into both cats (presumably intentionally). SemperBlotto (talk) 11:46, 13 March 2013 (UTC)
- I would prefer the forms of proper nouns to be in the normal "noun forms" category. There is already some disagreement on whether proper nouns are as distinct from nouns as we consider them to be, and making that same distinction in forms is a bit overboard. I can't really think of a good reason why someone would want to look up a list of proper noun forms specifically. —CodeCat 14:56, 13 March 2013 (UTC)
- Yes, I tend to agree with you. Actually, I have often wondered if our users ever use any of our massive range of categories at all - does anyone have any evidence that they do? I think that their main use is for editors to see what words we have, and especially what similar words may be missing. SemperBlotto (talk) 15:58, 13 March 2013 (UTC)
- I usually treat the form-of categories as kind of a "because it has to be in at least one category" thing. So I don't usually make any further subdivisions. —CodeCat 16:01, 13 March 2013 (UTC)
[edit] Meetup & videostream tomorrow - focus on Lua Tomorrow's meetup at Wikimedia Foundation headquarters in San Francisco focuses on how Lua as a templating/scripting language improves our sites, and includes a brief introduction to Lua. It'll also be streamed live on the web, and the video will be posted afterwards. Please feel free to visit or watch! Sumana Harihareswara, Wikimedia Foundation Engineering Community Manager (talk) 15:46, 13 March 2013 (UTC) [edit] conjugation template for German reflexive verbs Have we got a conjugation table template for German verbs that are reflexive? What about for verbs that are both reflexive and separable, e.g. fremdschämen (which can also be inseparable, and so really needs two tables), which conjugates like "ich schäme mich fremd" (and "ich fremdschäme mich")? - -sche (discuss) 21:56, 13 March 2013 (UTC) - I have always preferred not to have separate entries for reflexive verbs if they are formed using separate words or clitics in a language. That especially applies to languages like Dutch or German where the word order may be vastly different. So different, in fact, that any entries we create for inflected forms will be almost useless. Just consider in how many different ways the reflexive pronoun may be arranged in a few typical German sentences. Add a separable verb into the mix and it becomes even worse. For that reason, I prefer to redirect reflexive verbs to their non-reflexive entries, and add {{reflexive}} to the specific senses. I have already done this for Dutch. —CodeCat 01:03, 14 March 2013 (UTC)
- Alright, but that doesn't answer my question. de.Wikt doesn't have entries for e.g. de:sich fremdschämen, de:sich benehmen, etc, but the tables in de:fremdschämen, de:benehmen, etc include "sich". Do we have tables that do likewise yet? If not, I can set about creating some (though I might need help). - -sche (discuss) 02:03, 14 March 2013 (UTC)
- What I am saying is that we probably shouldn't have such tables. Consider a verb like irren, which has some reflexive and some non-reflexive senses. Should that entry have two conjugation tables, both containing the exact same conjugated verb forms, but one with the reflexive pronoun and one without? I don't think it should. —CodeCat 02:15, 14 March 2013 (UTC)
[edit] A standard location for Lua transliterations One of the obvious advantages of Lua is the ability to automatically transliterate words into Latin script. It is definitely something we'd want to add to templates like {{l}}, {{term}}, {{head}} and {{t}}. However, for that to work, there has to be a single common scheme for the functions that do the transliteration. The problem is that every language could have its own transliteration scheme, so just putting them all into one module will eventually run into speed issues because that module would eventually become too large. Therefore I propose that we form a single common scheme, an "interface" so to say, that transliteration functions have to adhere to so that they are interoperable with one another. Compare it to the way all of our script templates work the same way and are therefore interchangeable with one another. Is there a way we can do this for transliterations too? —CodeCat 00:58, 14 March 2013 (UTC) - I think invoking them from Module:foo-translit is the most logical location, if that's what you mean by "scheme", but I don't really mind if people would rather have it at Module:foo-common, invoked as tr. I'd like to go on record that transliteration modules should be language-based, not script-based, to reduce the complexity (and sheer size) of individual modules. —Μετάknowledgediscuss/deeds 01:03, 14 March 2013 (UTC)
- I know, and that is kind of what I had in mind. However, if we do it for every language, how do we handle cases where there is no transliteration module for a language yet? Is Scribunto capable of handling a failed module import gracefully? —CodeCat 01:05, 14 March 2013 (UTC)
- No idea. But first of all, which location do you prefer? I ask because I'm planning on creating a bunch of these soon. —Μετάknowledgediscuss/deeds 01:52, 14 March 2013 (UTC)
- I would prefer keeping it separate, in Module:foo-transliteration. But I just thought of something else we could try. As far as I know, transliteration isn't context-dependent: the same letter always becomes the same Latin letter(s) regardless of how it appears in the word. That means we may not even need whole functions to do this; we could just store a list of letter-pairs. And since that would consist of only data, it would be possible to add it to Module:languages (which may not contain any functions). —CodeCat 01:58, 14 March 2013 (UTC)
- That's a really bad idea IMO. For one thing, the premise is wrong (example: Korean) and for another Module:languages is already too big for me to even load it in a reasonable amount of time, last I checked, let alone edit it. —Μετάknowledgediscuss/deeds 02:03, 14 March 2013 (UTC)
- The time for you to load it is a lot longer than the time Lua takes to load it. A recent Scribunto update actually added a function specifically for loading such large modules containing data. So the size is really not a problem, at least not if we are to believe the developers. As for the premise... when does it not apply to Korean? I had the impression that Korean was actually rather regular. Can you give an example of a single Korean letter or syllabic that can be transliterated in several different ways? Also, just to make it clear, this idea isn't meant to be able to transliterate every language, it would be hopeless to attempt it for the likes of Han characters. —CodeCat 02:10, 14 March 2013 (UTC)
- I know that... but there's still the problem of me wanting to edit it! If it gets too big, it's a real problem for editors. Anyway, my point with Korean is that if you just take the letters ㅇ, ㅗ, and ㄱ, if you combine them in one order you get 공 (gong) but in the opposite order you get 옥 (ok). Can Lua handle that? —Μετάknowledgediscuss/deeds 02:14, 14 March 2013 (UTC)
- If our browsers can tell the difference, why can't Lua? From what I can tell in w:Korean language and computers, Hangul is encoded by combining all three individual letters into a single character. I presume that means that from a transliteration perspective, Hangul behaves as a syllabary and "gong" and "ok" are two different characters, each with a single Unicode codepoint, like Chinese characters or Kana are. —CodeCat 02:21, 14 March 2013 (UTC)
- Oh and just to clarify, exceptions like Japanese "ha" being pronounced as "wa" can simply be explicitly overridden with a tr= parameter like we have now. Automatic transliteration is simply meant to provide a useful default transliteration, but it should be possible to override it when it's wrong, just like we could override an irregular plural form. —CodeCat 02:25, 14 March 2013 (UTC)
-
- Sometimes Unicode is really weird... does that mean that Module:ko-translit will be gigantic? (Yes, I'd rather foo-translit over foo-transliteration, so I will be using that as the standard now unless you have a good reason not to do so.) I agree on the exceptions, although for languages like Kyrgyz where there don't appear to be any exceptions, overrides aren't necessary. —Μετάknowledgediscuss/deeds 02:29, 14 March 2013 (UTC)
- For Korean you need a formula to decompose hangeul chracters into individual jamo. I have written a transliteration tool a while ago in C#. I've got it somewhere at home, happy to share if someone want to write transliteration tool. I wonder what Google translate uses to transliterate Mandarin and Japanese (often wrong, especially Japanese!). --Anatoli (обсудить/вклад) 02:39, 14 March 2013 (UTC)
- It could become pretty large, yes. Which is kind of unfortunate considering that Hangul itself is so well-structured. Hangul could be easily transliterated if we could piece apart individual code points like Anatoli said, but that would require more than we can put into a simple data table like Module:languages. On the other hand, the module we will presumably create to handle automatic transliteration, Module:transliteration, could simply be hand-coded with an exception specific to Hangul. The function could work like this: if the script is Hangul, then do some fancy code-point processing in Unicode, else use the pair-wise table. —CodeCat 02:53, 14 March 2013 (UTC)
- The logic for decomposing is simple, a JavaScript can handle this. I will get my code when I have a chance and post a logic somewhere. The complete program had some flaws as it didn't take into account some consonant changes, which should be reflected as per Revised romanisation. --Anatoli (обсудить/вклад) 03:07, 14 March 2013 (UTC)
-
-
-
- I don't understand exactly what's going but it sounds interesting.
- For languages without manageable automatic transliteration this module should be skipped but it would be useful if people could add missing sounds or correct them, e.g. if Arabic "ظهر" were automatically transliterated as "ẓhr", an editor would edit to make it "ẓuhr" (insert the unwritten vowel). --Anatoli (обсудить/вклад) 02:05, 14 March 2013 (UTC)
- Well, for languages like Arabic, automatic transliteration wouldn't be terribly helpful if we use the basic page name for it. But I think that transliterating the fully vowel-marked version of the word could work? We already add vowels to the head= parameter, so the template/module could be written to use this instead of the page name. —CodeCat 02:10, 14 March 2013 (UTC)
-
- I meant if Lua could be used in editing or adding translations, not in ready entries (e.g. in preview). Fully vowelled Arabic (not sure about Hebrew) could be transliterated but not sure if this could be made perfect (without errors), perhaps it can, if strict spelling rules are followed (eg. hamza is written when it's appropriate and ى and ه are not used instead of ي and ة. --Anatoli (обсудить/вклад) 02:27, 14 March 2013 (UTC)
- Hebrew is impossible because WT:HE TR requires marking vowel stress, which Hebrew doesn't do. Yes, Arabic would require strict spelling rules to be followed which translations currently do not (but I think entries usually do). —Μετάknowledgediscuss/deeds 02:42, 14 March 2013 (UTC)
-
-
-
- I suggest to use it only if transliteration is missing, specific transliteration should override Lua. We have SO many translations and entries with no translit. Lua transliteration may have some warning advising people that it can be incorrect (for selected languages?) Also note my reply re: Korean above. I can spend some with whoever works on the Korean transliteration. --Anatoli (обсудить/вклад) 02:49, 14 March 2013 (UTC)
- Well, at least for Greek and Cyrillic, and generally any fully alphabetic script, the transliteration could be made flawless (but it would not include stress marks). I don't see lack of stress marks as a reason to avoid automatic transliteration altogether. A transliteration without them may not be complete, but it won't be wrong either, so it may be usable for Hebrew too. Devanagari and the other Indic scripts are encoded as alphabets in Unicode (the consonants and vowels are separate), but they need special treatment because the transliteration of the consonants depends on whether a vowel character follows ("devanāgarī" is actually encoded as "d-e-v-n-ā-g-r-ī"), so a simple pair-table would need to be supplemented by a function that suppresses the inherent vowel of a consonant when necessary. Such a function could, however, probably work for all indic scripts as long as we tell it which letters in a given script are consonants and which are vowels. —CodeCat 02:53, 14 March 2013 (UTC)
-
- Actually, Cyrillic and Greek can be flawless even with stress marks :) See Module:ru-translit. —Μετάknowledgediscuss/deeds 02:57, 14 March 2013 (UTC)
- That is a lot of code. Somehow I think that it could be a lot simpler, but I don't really know what a lot of it does (specifically, what purpose does it serve, why is it there?). How much of it is actually specific to Cyrillic? —CodeCat 03:00, 14 March 2013 (UTC)
-
-
-
-
-
-
-
-
- The scheme used for Russian is a mix of transliteration with arbitrary exceptions where parts of words are phonemically transcribed. Not a good exemplar. —Michael Z. 2013-03-14 04:41 z
-
-
-
-
-
-
-
-
-
- The developer, also a Russian, did what he felt was right for the Russian language and what is our policy. It has described exceptions. E.g. adjective endings -ого/-его are transliterated as -ovo/-(j)evo, not -ogo/-(j)ego, it's standard. The code needs to cater for these where possible. --Anatoli (обсудить/вклад) 04:49, 14 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
- Yes, but it is quite different from any other transliteration scheme, and is far from a typical example or prototype for any transliteration code. —Michael Z. 2013-03-14 05:34 z
-
-
-
-
-
-
-
-
-
-
-
- Not so sure about "any other". The Japanese particles は and へ, for example, are transliterated phonemically as "wa" and "e", not as their usual hiragana readings "ha" and "he". Catering for these exceptions may be big hurdles in some cases. As you yourself mentioned below (transliterating letters depending on their position), CodeCat mentioned about Indic languages, will make automatic transliteration harder and will require more sophisticated code. Russian may turn out an easy example. --Anatoli (обсудить/вклад) 05:44, 14 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
-
-
- Yes, romanizing logographic and syllabic scripts is more complicated than for the Cyrillic alphabet. Usually. —Michael Z. 2013-03-14 05:58 z
- I don't think automated transliteration should cater to such exceptions. If the default is wrong, it should be overridden just like we do with any other template that generates a default form (such as {{en-noun}}'s plural). See my comments further down. —CodeCat 14:09, 14 March 2013 (UTC)
- (In response to Metaknowledge 02:42, 14 March 2013 (UTC).) In Hebrew, not only stress is the problem. חָכְמָה, for example, can be chochmá or chach'má (two different words). Basically, even if we wouldn't mark stress, any word with U+05B0 HEBREW POINT SHEVA and many with U+05B8 HEBREW POINT QAMATS would be ambiguous — and that's a large proportion of all Hebrew words. (That's just re general problems of automating Hebrew transliteration. I haven't been following this discussion at all, and don't understand, e.g., its first post.)—msh210℠ (talk) 06:37, 14 March 2013 (UTC)
- For Russian there is currently no single entry missing transliteration, translations miss it sometimes, which I have been fixing. Still see some use for Russian in the future. Agreed about Indic. Uyghur is fully vowelised, even if it's Arabic abjad based. Armenian, Georgian seem easy. Thai, Khmer, Lao, Burmese would be complex but possible. Khmer might be the easiest, check with Stephen G. Brown.--Anatoli (обсудить/вклад) 03:07, 14 March 2013 (UTC)
If romanization is automated, then there could be multiple schemes per language. Perhaps the displayed scheme for a language could be a user pref, or an entry could show several commonly-used romanizations. It would be reasonable to add a BGN/PCGN transliteration for a geographic name, for example, as that's what would be seen on many maps. In the future we could decide to include other transliterations, e.g., Cyrillizations of Latin or Chinese script. The framework should leave room for this. We might also use LOC transliteration for foreign-language titles in citations, as this is used in library catalogues and bibliographies. Romanizations aren't necessarily straight table lookups. Some important ones include exceptions for occurrences at the beginning or end of a word, or after a vowel or consonant. But we could start by implementing ones that are straight lookups. ISO language tags have a standard representation for transformed text, although the tags can get lengthy. This might be useful for cataloguing our schemes. —Michael Z. 2013-03-14 04:43 z - While that would be nice, I think it kind of misses the point. The point of automated transliteration, to me, is that it can provide a sensible default where it has not yet been provided. So, for example, if someone types {{head|ru|noun}} on an entry without a transliteration, automated transliteration could make one itself. But it would still be necessary and desirable to check it and to override it if necessary, so it's not meant as a substitute for manual transliterations. —CodeCat 14:04, 14 March 2013 (UTC)
- I disagree. Your position makes sense for some languages, like Russian. But we've had a real problem with some smaller languages, like Telugu, where the contributors use various transliteration schemes, ranging from ASCII to ad hoc, and often neglect transliteration altogether. The only person cleaning that mess up has been Stephen G. Brown. An automated transliteration system for Telugu will be more reliable than what users give as a transliteration value, and thus it definitely would be a substitute for manual transliteration. —Μετάknowledgediscuss/deeds 22:51, 14 March 2013 (UTC)
-
-
- I agree with CodeCate on this one. Humans know or should know better. We should have the ability, perhaps, to give us both manual and automated, then, someone with knowledge of standards could fix non-standard transliteration. So, to give a Russian example, no point in having automated transliteration of "что" as "čto" (incorrect), I'd prefer manual "shto" (non-standard but correctly showing non-standard reading), then I know that I have to put "što" to make it standard. I stumbled across similar problems with Bengali and Thai. There are many exceptions in readings in various languages. I already mentioned Japanese particles particles は and へ. --Anatoli (обсудить/вклад) 23:24, 14 March 2013 (UTC)
-
-
-
- But that only makes sense if exceptions exist; if we don't know, then we should ask our local experts like Stephen or just bring the issue up at language forums. In many languages, like Greek, our system specifically says that it is trying to reproduce the orthographical conventions rather than the sounds, so there will never be exceptions.—Μετάknowledgediscuss/deeds 23:31, 14 March 2013 (UTC)
-
- We can have exception-free languages, like e.g. most Cyrillic-based, except for Russian. Don't know Telugu but in Hindi (like Arabic) there is strict and relaxed spelling, ग़रीब (ġarīb) can be spelled (casually but too common) "गरीब" (without a "nuqta", a dot under ग (ġa) -> ग़ (ga)). It's still "ġarīb", not "garīb". Manual transliteration should provide the correct pronunciation, as nuqta is often ignored in Hindi (8 Devanagari letters can use it).
- I personally disagree with WT:EL TR, they totally ignore the way foreign words with "b" and "d" are transliterated "μπ" /b/, "ντ" /d/, etc. --Anatoli (обсудить/вклад) 00:19, 15 March 2013 (UTC)
-
-
- no point in having automated transliteration of "что" as "čto" (incorrect)
-
-
- Anatoli, there's exactly a point of having a transliteration of the word spelled č-t-o be čto. Otherwise, it is not a transliteration. Why are you putting pronunciation into the place for transliteration? —Michael Z. 2013-03-15 01:05 z
- I agree with Michael on that point. Consider what would happen if we started "transliterating" English the same way. We'd end up with something resembling enPR wouldn't we? —CodeCat 01:15, 15 March 2013 (UTC)
-
-
-
- That's what Lua will produce - "čto". I want to override it with "što" because "čto" is misleading and doesn't help anybody, foreigners and even some uneducated Russians still read out "что" as "čto" when it should be "što" it is a practice accepted and used over years by editors working with Russian. This exception is not predictable like akanye and the knowledge of Russian phonology and sound changes doesn't help to arrive at the correct pronuncation of the word, so it has to be specifically explained. IPA is not sufficient, many people dislike or don't understand it and IPA is not used in translations. --Anatoli (обсудить/вклад) 01:21, 15 March 2013 (UTC)
- I think you need to consider what transliterations are for. The purpose is to allow someone to read the word when they don't know the script. It's not meant to tell them how to pronounce the word, that's what the pronunciation section is for. Moreover, if someone is able to read Cyrillic, they shouldn't need the transliteration, should they? So if you think about it, someone who doesn't need the transliteration will end up reading the Cyrillic letters что (čto) while someone who does need it will find što instead. That is just inconsistent, and it's rather strange that the transliteration (which is meant as a reading aid) gives different information. I think the transliteration should only have information in it that can also be deduced from the original script (in combination with the regular phonology/orthography of the language). If you really want to show that что is to be read as što, then that should be written in addition to the regular transliteration čto, not replacing it. —CodeCat 01:32, 15 March 2013 (UTC)
- I think the present situation is fine, but if you want to have this conversation, move it to the BP. —Μετάknowledgediscuss/deeds 01:38, 15 March 2013 (UTC)
-
-
-
-
-
-
- MK, what is relevant here is that transliteration schemes that meet the criteria for transliteration schemes also tend to be suitable for mechanical transliteration (whether it be by machine or by a non-native reader). Substituting a complex, proprietary, ambiguously-defined, phonemic transcription system is a loss for readers, editors, and openness, as well as for automated transliteration. We should define some baseline standards for transliteration. —Michael Z. 2013-03-15 14:42 z
-
-
-
-
- Anatoli, one objective of having both transliteration and pronunciation is exactly to show when the two differ. By not transliterating the word (which requires a respect for its letters), you are obscuring that very information, potentially contributing to the problem you describe. If accessibility of the pronunciation is lacking, then improve the pronunciation, as you have done in some entries, instead of destroying the transliteration. —Michael Z. 2013-03-15 14:42 z
- Think about what would happen if someone who just started learning Cyrillic comes across что. They have just learned that ч is č or ch or some variety. Yet here they suddenly see what they think is a "wrong" transliteration, so they will correct it. That's what I'd probably do too if I found this. —CodeCat 15:05, 15 March 2013 (UTC)
-
-
-
-
-
-
- Relevant: #Criteria for romanization systems —Michael Z. 2013-03-15 20:41 z
-
- People who start learning Japanese and learn hiragana, see the phrase これはなんですか。. I they only know hiragana they will read "kore ha nan desu ka?". An automatic transliterator would also romanise it so. You need a person knowing Japanese to correct and say it's "kore wa nan desu ka?". This is how Japanese is transliterated. There's no difference with the Russian "что это?", which is "što éto?", not "čto éto?". It's not just understanding the writing system. Transliterating letter by letter ("čto eto") is just unhelpful in this case. Foreign users can ask if they think it's "wrong", native users understand exactly why it's transliterated that way. People who know Cyrillic but don't know exception will misread the word. I'm OK with Lua to transliterate the default way (taking into account some basic rules of changing, like поезд (pójezd) but небо (nébo) "е" = je/e, берёза (berjóza) but жёлтый (žóltyj) "ё" = jo/o) but it's up to editors with the knowledge to override the default and correct.
- If anyone wants to check the complicated rules about Korean standard transliteration, read w:Revised Romanization of Korean, there are too many consonant changes, like ㅂ + ㄴ (b + n = mn), ㄹ + ㄴ (l + n = ll, nn). @Michael, please just stop talking about "destruction" of the transliteration. --Anatoli (обсудить/вклад) 10:42, 16 March 2013 (UTC)
- An automated transliteration of Burmese would generate the ALA-LC system very easily, and could probably be made to generate the MLCTS as well, but four years ago Stephen and I reached the compromise that Burmese entries would show four romanization systems (two that are orthography-faithful transliterations and two that are pronunciation-faithful transcriptions), while Burmese words mentioned on other pages (e.g. in Etymology and Translations sections) would just use the pronunciation-faithful BGN/PCGN transcription. —Angr 11:02, 16 March 2013 (UTC)
- For multiple transliteration methods, we could use multiple modules named in the form:
-
{{ my-translit | မြန်မာဘာသာ }} – default romanization, e.g. BGN {{ my-alaloc-translit | မြန်မာဘာသာ }} – other romanization, e.g. ALA-LC
- Or a single module with a transliteration, method, or scheme argument for the non-default methods:
-
{{ my-translit | မြန်မာဘာသာ }} {{ my-translit | မြန်မာဘာသာ | method=alaloc }}
- Standard tags for ISO t extension are
-
alaloc – American Library Association-Library of Congress bin – US Board on Geographic Names buckwalt – Buckwalter Arabic transliteration system din – Deutsches Institut für Normung host – Euro-Asian Council for Standardization, Metrology and Certification iso – International Organization for Standardization mcst – Korean Ministry of Culture, Sports and Tourism stats – Standard Arabic Technical Transliteration System ungegn – United Nations Group of Experts on Geographical Names
- Specific versions are typically tagged like
ungegn-2012. Non-standard methods would be tagged with an "x" private-use code, e.g., x-wikt. —Michael Z. 2013-03-17 20:54 z - But the problem is that only two of the systems in use here are predictable from the spelling—the other two (including the one used outside the Burmese pages themselves) are not (always) predictable from the spelling. Though I suppose the BGN/PCGN transcription is predictable often enough that it will be OK as long as it's possible to manually override the automatic transcription, e.g. via
{{my-translit|မြန်မာစကား |tr=myanmazăga:}} to prevent the template from automatically generating myanmasăka:. —Angr 10:20, 19 March 2013 (UTC)
-
-
- Does that correspond to note 1 on page 3 of this standard, or lines 1 and 4 of the first table in this one? It looks like it might be predictable, but requiring some more-complicated programming. A manual override like you describe sounds like a good compromise, until and if that programming can be added. —Michael Z. 2013-03-22 19:04 z
Can someone please generate a list of pages, each of which has a ==Hebrew== section containing <!?—msh210℠ (talk) 06:52, 14 March 2013 (UTC) - את . שוקולד . אגרוף . אגרוף תאילנדי . אינדונזי . אנגלית . מים . טוב . אלוהים . בן . דרום . היה . גדל . מצא . פן . עם . ילד . עור . אהרן . ויקרא . אחת . לבן . תוכי . כי . ז־כ־ר . כדור . אח . מת . אי . שם . מספר . נזהר . ישן . כוס . תת . ציבור . זה . יותר . ירדן . טבע . איזה . צילם . מתמטיקה . הפעיל . האיר . אָ . הבא . פקד . בער . ־ון . בא . קשת . קורס . נשא . שלח . עאכ״ו . פחות . באשר . שימוש . כפה . אלהים . צפון . ־ים . הבין . סדין . נפלא . מאפיה . התקלח . פסח . י־ל־ד . מעות חטים . ־ה . זיין . קעקע . גת . יום טוב . הזיע . י״ט . הארץ . הטהר . הצטנן . השתדל . השתכנע . הסתעף . השתמר . השתכר . התלכלך . התקמט . התחבא . התלבש . התקשר . השתתף . התנכר . תיכנת . ארצה . רעש . הביא . מלח לימון . חומצת לימון . חשמן . חרש . כרית . מחמד . ארוחת עשר . בנים . נ־כ־ר . ילדים . לבד . גילה . מרדכי . גלעד . פרו . ישרצו . כהה . מלכי־צדק . זכור . תמלא . יאמר . עמו . הבה . נתחכמה . ירבה . בנו . ישימו . משוגע . תיראן . מצה . תחיין . יראו . תחיון . לכי . תכה . שמך . רעך . להרגני . נודע . אסרה . אראה . כדי . להעלתו . העלה . בני ישראל . ואמרו . אלי . מכרה . ושמעו . נלכה . שלשת . ושלחתי . הכה . עיני . תשליך . ידו . והיה . ולקחת . פיך . והוריתיך . אוצר . שוק שחור . אשובה . מערכת הפעלה . מעבר לים . אינדונזיה . מג״ב . כוח . הקב״ה . האט . חומוס . קטון . ניגש . ויאמן . וישמעו . ענים —RuakhTALK 03:37, 15 March 2013 (UTC)
- Many thanks.—msh210℠ (talk) 15:37, 15 March 2013 (UTC)
[edit] Edittools? Anyone else having trouble with Edittools in Chrome? Using Chrome on Win 7. Edittools were working fine this morning, but I get back from lunch and they completely fail to load, not even the default ones... -- Eiríkr Útlendi │ Tala við mig 20:23, 14 March 2013 (UTC) [edit] Downloadtools Is there any kind of API, or database, where I can download some ogg files from wiktionary??? Best regards --77.47.30.210 21:26, 14 March 2013 (UTC) [edit] A function to convert Korean hangeul to Roman letters (basic) in C# This is the code I promised to share for converting Korean hangeul to Roman letters. The code breaks up hangeul blocks into jamo components, e.g. 한 (han) = ㅎ (h), ㅏ (a), and ㄴ (n). I can give the full code in C# as well for the graphical program (includes Cyrillisation of Korean). Just need a C# compiler (csc.exe) The code also handles ㄹ (l/r) but doesn't cover all cases. private string romanize(string stringToConvert) { string result = ""; string [] rLeads = {"g", "gg", "n", "d", "dd", "r", "m", "b", "bb", "s", "ss", "", "j", "jj", "ch", "k", "t", "p", "h"}; string [] rVowels = {"a", "ae", "ya", "yae", "eo", "e", "yeo", "ye", "o", "oa", "oae", "oi", "yo", "u", "ueo", "ue", "ui", "yu", "eu", "eui", "i"}; string [] rTails = {"g", "gg", "gs", "n", "nj", "nh", "d", "l", "lg", "lm", "lb", "ls", "lt", "lp", "lh", "m", "b", "bs", "s", "ss", "ng", "j", "c", "k", "t", "p", "h"}; char currentChar; int index = 0; string l = ""; string v = ""; string t = ""; int charInt = 0; string syllable = ""; bool wasVowel = false; for (int i = 0; i < stringToConvert.Length; i++) { currentChar = stringToConvert[index]; if (((int)currentChar >= 44032) && ((int)currentChar <= 55203)) { charInt = (int)currentChar; try { l = rLeads[((charInt - 44032) / 588)]; //convert R to L if after a consonant if ((l == "r") && (!wasVowel)) l = "l"; } catch (IndexOutOfRangeException ex) { l = ""; } try { t = rTails[((charInt - 44032) % 28) - 1]; } catch (IndexOutOfRangeException ex) { t = ""; } try { v = rVowels[((charInt - 44032 - (charInt - 44032) % 28) % 588) / 28]; } catch (IndexOutOfRangeException ex) { v = ""; } syllable = l + v + t; if ((syllable.Substring(syllable.Length -1, 1) == "a") || (syllable.Substring(syllable.Length - 1, 1) == "e") || (syllable.Substring(syllable.Length - 1, 1) == "i") || (syllable.Substring(syllable.Length - 1, 1) == "o") || (syllable.Substring(syllable.Length - 1, 1) == "u")) { wasVowel = true; } else { wasVowel = false; } if (useSyllableDelimiter) result = result + syllable + "-"; else result = result + syllable; } else { //trim dashes if the next character wasn't Korean if ((result.Length > 1) && (result.Substring(result.Length - 1, 1) == "-")) result = result.Substring(0, result.Length - 1) + currentChar; else result = result + currentChar; } index++; } if (keepOriginal) return stringToConvert + "\n" + result; else return result; } Hopefully someone gets interested in making a transliteration tool for Korean. The above code is basic, it converts the Google Translate way - well, almost, the finals are "k", "p" and "t", not "g", "b" and "d", which is more standard. It doesn't take into account the changes required by Revised romanisation (current standard in South Korea) but if you're able to start, then I'll help to get the rules, which are not too complex. --Anatoli (обсудить/вклад) 04:35, 15 March 2013 (UTC) - Example conversion of a text from Korean Wikipedia:
- Source:
- 한국어(韓國語)는 주로 한반도에서 쓰이는 언어로, 대한민국에서는 한국어, 한국말이라고 부른다. 조선민주주의인민공화국에서는 조선어(朝鮮語), 중국(조선족 위주)에서도 조선어(朝鮮語)로 불린다. 카자흐스탄 등 구 소련의 고려인들 사이에서는 고려말(高麗말)로 불린다.
- 19세기 이후 한반도와 주변 국가의 정치 사회상 변화에 따라 중국(특히 옌볜 조선족 자치주), 일본, 러시아(특히 연해주와 사할린), 우즈베키스탄, 카자흐스탄, 미국, 캐나다 등에 한민족(韓民族)이 이주하면서 이들 지역에서도 한국어가 쓰이고 있다. 한국어 사용 인구는 전 세계를 통틀어 약 8천200만 명으로 추산된다.[1] 일제 강점기에는 일본 제국의 문화 말살 정책으로 상당한 핍박을 받았다.
- Converted text (needs tweaking, I know):
hangugeo(韓國語)neun juro hanbandoeseo sseuineun eoneoro, daehanmingugeseoneun hangugeo, hangugmalirago bureunda. joseonminjujueuiinmingonghoagugeseoneun joseoneo(朝鮮語), junggug(joseonjog uiju)eseodo joseoneo(朝鮮語)ro bullinda. kajaheuseutan deung gu soryeoneui goryeoindeul saieseoneun goryeomal(高麗mal)lo bullinda. 19segi ihu hanbandooa jubyeon guggaeui jeongchi sahoisang byeonhoae ddara junggug(teughi yenbyen joseonjog jachiju), ilbon, leosia(teughi yeonhaejuoa sahallin), ujeubekiseutan, kajaheuseutan, migug, kaenada deunge hanminjog(韓民族)i ijuhamyeonseo ideul jiyeogeseodo hangugeoga sseuigo issda. hangugeo sayong inguneun jeon segyereul tongteuleo yag 8cheon200man myeongeuro chusandoinda.[1] ilje gangjeomgieneun ilbon jegugeui munhoa malsal jeongchaegeuro sangdanghan pibbageul badassda. --Anatoli (обсудить/вклад) 04:46, 15 March 2013 (UTC) - I have Luacized that function, cleaned it up slightly (IMHO; YMMV), and put it at Module:ko-utilities. —RuakhTALK 03:11, 17 March 2013 (UTC)
- @Anatoli: Which cases doesn't it cover?
- @Ruakh: I'm going to put that at Module:ko-translit with the function being named rv (to match Korean template parameters). Just thought I'd let you know; if there's a problem with me doing that you can move it back. —Μετάknowledgediscuss/deeds 03:29, 17 March 2013 (UTC)
- Can you give it a longer name? "rv" doesn't really mean much. —CodeCat 03:31, 17 March 2013 (UTC)
- Decided not to change the function's name for now. The reason for rv is that there are multiple transliteration systems for Korean. Wiktionary primarily uses Revised Romanization, but entries often use {{ko-pron}} to show three more methods, one of which cannot be reliably deduced from the hangeul alone (nor can the IPA, for that matter). We should Luacize all possible methods used on Wiktionary. —Μετάknowledgediscuss/deeds 03:36, 17 March 2013 (UTC)
- I was actually hoping for something like "revised_romanization" or maybe shorter "revised_rom" if you want. —CodeCat 03:40, 17 March 2013 (UTC)
- Ruakh, thanks for the efforts but do you have a working version so far? (the current module was renamed to Module:ko-translit, which requires Module:ko-hangul I tried to call but it didn't work. Not sure if you're in the middle of development.
- @Metaknowledge, before we can starting tweaking for details, need to get the basic functionality to work. --Anatoli (обсудить/вклад) 11:13, 17 March 2013 (UTC)
-
- It works just fine, you just don't understand how to use Scribunto modules. Please read Wiktionary:Scribunto. —RuakhTALK 16:13, 17 March 2013 (UTC)
[edit] Lua loops? Does anyone know what happens if you put a never-ending loop into a Lua module? Does it stop the entire wiki? SemperBlotto (talk) 18:23, 16 March 2013 (UTC) - It wouldn't stop everything as far as I know, there is a time limit. Why not try it? —CodeCat 18:30, 16 March 2013 (UTC)
- I somehow doubt that the servers would give exclusive access to one process from one instance of one page, let alone have no time limit on it. If they did, the system programmers should be fired as grossly incompetent. The worst that might happen would that the page would freeze up for the person viewing the page. Chuck Entz (talk) 19:15, 16 March 2013 (UTC)
[edit] Interlanguage links I have a question unrelated to Wiktionary and hope someone can point me in the right direction. For a small wiki I sometimes contribute to, I want to introduce other-language versions. The wiki is small, though, so we don't want the overhead of multiple wikis. I'm trying to come up with a solution for the wikimaster, but I don't understand the configuration aspects very much. My idea is to have the language links at the left link to a subdirectory. For example, if you are on "thisPage.html" and click "Spanish" in the language list at left, it would go to my.wiki.org/es/questaPagina.html. I've found articles like mw:Manual:$wgInterwikiMagic, but nothing that addresses something exactly like this. Any suggestions welcome. --BB12 (talk) 20:44, 16 March 2013 (UTC) - I don't get it, anyone? Mglovesfun (talk) 21:12, 16 March 2013 (UTC)
-
- I'm happy to explain it differently. What don't you get? --BB12 (talk) 21:13, 16 March 2013 (UTC)
-
- How about this: What's the easiest way to have a multilingual wiki in a case where the URL is wiki.myweb.org (so I can't have es.myweb.org, etc.)? --BB12 (talk) 22:16, 16 March 2013 (UTC)
-
- (e/c) On this wiki, a page with the absolute URL http://en.wiktionary.org/wiki/this might contain the interwiki link [[fr:this]], which is a link to http://fr.wiktionary.org/wiki/this. If I understand correctly, BB wants it to be a link like http://en.wiktionary.org/wiki/fr/this instead (but on his wiki, not on Wiktionary). - -sche (discuss) 22:18, 16 March 2013 (UTC)
-
-
- Yes, that seems, to me, to be the easiest way to make a wiki multilingual. I would think this is a really simple tweak in the settings, but I haven't gotten anywhere with the wikimaster, so I was wondering if someone here could point me where to go or suggest what should be done. --BB12 (talk) 00:49, 17 March 2013 (UTC)
- I think that such a thing could be done by creative use of the interwiki-map (e.g., mapping es to //my.wiki.org/es/$1.html), but it seems messy and potentially fragile. For example, I could easily imagine getting everything working so that [[thisPage]] links just fine to its Spanish counterpart, but then having no way for that Spanish counterpart to link back to the English.
- Instead, I'd suggest that you do something similar to how en.wikt produces sidebar links to Wikipedia when you use e.g. {{projectlink|pedia}}. The way that works is, the template produces wikitext like <span class="interProject">[[w:...|Wikipedia]]</span>, which results in HTML like <span class="interProject"><a href="http://redirect.viglink.com?key=11fe087258b6fc0532a5ccfc924805c0&u=%2F%2Fen.wikipedia.org%2Fwiki%2F..." class="extiw" title="w:...">Wikipedia</a></span>. We then use CSS to prevent that link from being displayed normally, and we use JS to move it into the sidebar. In your case, you'd presumably add interwiki-links via a template like {{interwikis|es=questaPagina|fr=cettePage}} or whatnot.
- You'd probably also want to use mod_rewrite to implicitly add uselang=es to Spanish pages, so that the whole interface is in Spanish, rather than just the content.
- —RuakhTALK 02:13, 17 March 2013 (UTC)
-
- Thank you for the suggestion. I have passed that on to the wikimaster! --BB12 (talk) 17:44, 17 March 2013 (UTC)
[edit] Some Latin templates now only ever require one parameter -- How about making it all of them? {{l/la}} and {{la-decl-1st}} can now be passed a single parameter with macrons and the templates will automatically generate the macronless version of the word. e.g. While you can still generate an inflection table with: - {{la-decl-1st|stell|stēll}}
now you can instead simply use: The magic happens in Module:Latin, written in Lua. I'd recommend also using the same logic in {{l|la|...}} and making the requirement for two versions of Latin words a thing of the past. Hopefully I haven't broken anything. Pengo (talk) 14:59, 17 March 2013 (UTC) - I have changed the name to Module:la-utilities, and I changed {{l/la}} to reflect that. I think the next obvious step with Latin templates is to merge {{la-decl-2nd}} and {{la-decl-2nd-N}}, and {{la-decl-2nd-ER}} (they should all eventually be a redirect to the first one), because we could just add a function to the Module:la-utilities that outputs the last two characters of a string; for example, if it's um it takes the neuter declension and if it's us or er it takes the masculine declension. —Μετάknowledgediscuss/deeds 16:05, 17 March 2013 (UTC)
- Just to make it clear, {{l/la}} and its relatives were created before Lua came around, and were intended to be faster than {{l}}. However, now that Lua is here, they may well be redundant because {{l}} would presumably be quite a bit faster when Lua-cised. So it's better not to change or use those specialised link templates at all until we know for sure whether they are still needed. —CodeCat 17:52, 17 March 2013 (UTC)
- Well, in the mean time I think Pengo killed two birds with one stone by improving {{l/la}} and providing a way for us to edit one template and change which module is invoked in all the other templates that need macron-stripping (eventually, all of them). If you ever want to finish figuring out the best/fastest way to {{l}}-ify, with subpages or not, then we can make a copy of {{l/la}} and replace all uses of it in the template namespace with the copy. But it doesn't look like it'll be worked out anytime soon, so IMO there's no point preserving it as is. —Μετάknowledgediscuss/deeds 18:05, 17 March 2013 (UTC)
- What I'm worried about is backwards compatibility. If we extend {{l/la}} with this extra functionality, it will no longer be possible to replace it with {{l|la}} as easily, if and when the time comes. I strongly recommend that for the time being, the specialised templates should not have extra abilities that the general {{l}} does not also have. —CodeCat 18:16, 17 March 2013 (UTC)
- But when the time comes, {{l}} should have lang-specific functions like this. Where else would we put this kind of template? —Μετάknowledgediscuss/deeds 18:20, 17 March 2013 (UTC)
- I mostly agree with CodeCat. Language-specific functionality belongs in language-specific templates; in this case, I suppose that would be {{la-l}} or {{la-onym}}. {{l/la}} is intended to be a hackish variant of {{l|la}}, part of a family of templates with identical behavior, and it should conform to the requirements of that family. —RuakhTALK 18:48, 17 March 2013 (UTC)
- I agree with Metaknowledge on that point though. If {{l}} can be made to automatically strip diacritics, why not? It can probably be made to work the same as automatic transliteration (in effect, it's the same thing). —CodeCat 19:15, 17 March 2013 (UTC)
- Because all editors use {{l}}. Editors who don't usually work on Latin understand that they need to look at the documentation for (say) {{la-noun}} before using it, and that they can't just assume that it works the same way as {{en-noun}} or {{fr-noun}}; but they should be able to expect that {{l}} works the same way they're used to.
Also, there are a whole bunch of problems with that Lua module. Each of those problems could, in principle, be fixed, but I think it's reasonable to expect that clever language-specific code will always have little problems and inconsistencies, for two reasons: (1) none of us is perfect (our cleverness is in finite supply); and (2) such code almost always does, and should, optimize for the 99% case, such that it's sometimes inapplicable to rare edge cases (e.g., Latin entries that really should have macrons for whatever reason). Do we really want all of those problems to be in {{l}}? Currently, when the language-specific code is in a language-specific template, we can always fall back on using a generic template that imposes fewer requirements (e.g. using {{head}} for pluralia tantum because of a language-specific noun-headword template that "knows" that the noun lemma is a singular form); but if it's the generic template itself that has the problematic language-specific code, we're SOL. —RuakhTALK 20:00, 17 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
- But there are no rare cases. AFAICT, it's 100%, not 99%. In the end, I don't really mind what you do, as long as you don't break stuff. For example, don't edit {{l/la}} without editing {{la-decl-1st}}. I would replace it, but it looks like Module talk:la-utilities/tests is currently failing, so I'm going to revert the changes to {{la-decl-1st}} for now. —Μετάknowledgediscuss/deeds 20:30, 17 March 2013 (UTC)
-
-
-
-
-
-
-
-
-
-
- Re: {{la-decl-1st}}: Thanks. Re: there being no rare cases: I think there are always rare edge cases, or at least, that we always want to leave the door open to rare edge cases. Maybe people who send SMSes in Latin treat ō_ō and o_o as two distinct emoticons? Maybe we get a Perseus dump of 10,000 entries with macrons in their titles, and want (temporarily) to be able to link to those entries (instead of having them be enforcedly orphaned until they're all properly fixed and merged)? Maybe we'll want {{l|la||bār}} to work? I have no idea. It just seems rather extreme to impose macronlessness as a technical restriction in 100.000% of cases. —RuakhTALK 21:05, 17 March 2013 (UTC)
- Things can be used in ways we couldn't have foreseen, and interact in ways we would never expect, so that we may need an out for reasons unconnected to the unreal and relatively tidy universe of Latin morphological rules. I firmly believe that having an override should always be the default, and that it should be removed only where experience shows it's unnecessary, and where there are compelling reasons such as performance or usability. It just seems a good idea on principle not to design things around our alleged omniscience and infallibility. Chuck Entz (talk) 22:16, 17 March 2013 (UTC)
-
- If you insist. What really matters to me right now is that, judging by Module talk:la-utilities/tests, the module isn't working correctly yet. (PS: When I text in Latin, I never use macra. If I really need to distinguish, I use an underscore following the letter. But that's just a bit of trivia I thought I'd share.) —Μετάknowledgediscuss/deeds 23:25, 17 March 2013 (UTC)
- It was working when I saved it. There's been many improvements made in this short time, but also someone broke it while trying to fix something I did that probably breaks conventions. As it says at the top of Module:la-utilities, to test while editing, "Preview page with this template" with: Module_talk:la-utilities/tests . I've fixed it for now, but probably needs some work to be correct. Pengo (talk) 00:19, 18 March 2013 (UTC)
- The edge cases aren't really an issue as it is: if you use two parameters it uses the old behaviour. I've been as conservative as possible with the code, so if two parameters are given, they're still both used and no macron stripping occurs (I did this originally in anticipation of performance concerns). It means for New Latin emoticons, you can still use {{l/la|ō_ō|ō_ō}}, which is a syntax that could be guessed or worked out by any user of the template in this unlikely situation. I didn't document it explicitly because I didn't think it would ever be necessary, and I'd except them to simply use [[ō_ō]], but I'll add it to the test cases. Note, I didn't make {{la-decl-1st}} as conservative (for simplicity's sake), but it could easily be made so. Pengo (talk) 00:03, 18 March 2013 (UTC)
- Maybe one of the parameters could be set to - to suppress the automatic stripping. Which of the two would be more intuitive, I don't know. —CodeCat 00:20, 18 March 2013 (UTC)
- That would be easy enough to do, but I don't think it's at all necessary, unless it's to fit in with behaviour of other {{l}} languages. And I really don't see the controversy. It's hardly surprising behaviour that a link to the Latin ācer should link to the actual entry, acer#Latin, and not to the non-existent page, ācer#Latin, as it currently does. All other existing behaviour stays the same -- {{l/la|zebra}} still links to zebra#Latin, and {{l/la|elegans|ēlegāns}} still does what it did too, and if you really want to override the macron stripping behaviour you just use two arguments, e.g. {{l/la|ō_ō|ō_ō}} although I'm yet to see a real-world example of where this would be necessary. By the way, {{l/la}} is only transcluded by a handful of pages (11 all up, while it's not being used by {{la-decl-1st}}), so the rush to protect it seems a little unwarranted. Pengo (talk) 00:37, 19 March 2013 (UTC)
- At the time that I protected it, it was widely transcluded; I had no way of telling that almost all the transclusions were via {{la-decl-1st}}. Thanks for the note; I'll correct that. (BTW, regarding your earlier comment that "someone broke [the module] while trying to fix something I did that probably breaks conventions" — nope, it was just a stupid mistake on my part. Some of the unit-tests were already broken even before my changes, so when my changes broke a few more, I didn't catch on at first that I'd messed up. Sorry about that.) —RuakhTALK 05:35, 19 March 2013 (UTC)
- Fair enough, no worries. I think half of what I thought was broken code was from some other templates/pages being reverted. Anyway, any idea how to get those last two tests to pass? Would be nice if it would accept html entities, though not sure if it's needed. Pengo (talk) 11:08, 19 March 2013 (UTC)
[edit] Language table in Lua With Lua it seems that it would be better (and easier) to have all language information (mainly code=names mapping) in a single page/module. This is what is being worked on in Module:languages here, and I'm working on a similar thing on fr.wikt with fr:Module:langues (the actual data table is in fr:Module:langues/data). It looks like it may be a much better way to handle languages, instead of creating several templates for every language like currently (i.e. thousands of templates in the end). However, someone on fr.wikt asked a question about performance. Although for a given page using such module may be more efficient, what would happen if someone changes the data table, just to add a single language ? How would this impact all the pages that use this module (in this case, potentially all articles) ? I asked this question at mw:Talk:Lua scripting#Lua changes and Job queue and I believe you may be interested to have this answered as well. Dakdada (talk) 15:18, 17 March 2013 (UTC) - Do we know about #mw.language.fetchLanguageName? In the lua debug console:
- =mw.language.fetchLanguageName("ar")
العربية - =mw.language.fetchLanguageName("ar", "en")
Arabic - =mw.language.fetchLanguageName("ar-Arab")
- Uses ISO 639 language codes, of course. —Michael Z. 2013-03-21 17:01 z
- That's what I used at first when my initial table (on fr) was incomplete. There are two major issues with this :
- Some languages are missing, some codes are not standard (e.g. m Tosk Albanian) and the name may differ from the ones on Wiktionaries.
- It is slow when there are several names to retrieve. Loading the table in Module:languages is way more efficient (easily more than 10 times faster).
- So in the last version of the module in fr, we completed the table with our 4500 current language codes and I ditched this function (although it can still be used in a secondary module). But I'm still concerned with the job queue impact, so for now we can't use this module (but several other modules are being tested with it). Dakdada (talk) 18:55, 21 March 2013 (UTC)
-
-
- Ouch. Might be worth revisiting some time. I presume (ha ha) that a native function might get optimized to perform better than anything we could write in a scripting language. Also, it might be configurable to use Wiktionary codes or names.[3] —Michael Z. 2013-03-21 19:48 z
- Obviously the function was not made to be queried hundreds of times in a row. If this issue is solved then we may consider switching. Dakdada (talk) 20:51, 21 March 2013 (UTC)
Following WT:RFM#Documentation subpages to /documentation, I've added this category to all templates that still use the "old" name. Modules already use /documentation exclusively. How can these be moved automatically? I don't think bots can do moves, can they? Also, the tab at the top of the page should be changed as well (and if possible, one should be added to Modules too). —CodeCat 17:50, 17 March 2013 (UTC) - Re: "I don't think bots can do moves, can they?": Sure they can; search /w/api.php for action=move, or check out e.g. mw:Manual:Pywikipediabot/movepages.py. But I don't know if there's any page-move analogue to the concept of a "bot edit", so it may flood recent-changes unless done very slowly. —RuakhTALK 17:57, 17 March 2013 (UTC)
- I just realised that regular accounts can't move pages without leaving a redirect. So whichever bot is used for this, it would need administrator rights... —CodeCat 22:00, 18 March 2013 (UTC)
[edit] Latin first declensions in a single template I've mashed all of Latin's first declension templates into one: {{la-decl-first}}. See the documentation for how it works and examples. It largely replaces eight similar templates, which is possible because Lua can look at what the last few characters of a parameter are. For example: I can't see any problems with using it as is, but some might want to wait for the dust to settle, or perhaps until second and third declension templates are done too, when we can be more certain they'll and have a consistent format and parameters, or perhaps a super-declension-template is made that encompasses them all. The guts of the code is in Module:la-utilities. I've tried to keep presentation code separate from other code, and also tried to leave it flexible enough to accommodate the addition of future declension tables relatively easily, or other uses. It largely still uses an existing empty-table template for presentation, but someone might feel like making it build the tables from scratch internally. I'm far from a native Lua or Latin speaker, so please let me know if there's any errors or issues or corner cases I may have missed. See the template's documentation for more information. Pengo (talk) 09:54, 19 March 2013 (UTC) [edit] Simplification of romaji entries Like Mandarin pinyin at some stage, Japanese rōmaji entries need to be converted to soft redirects to hiragana and katakana entries (not direct to kanji as hiragana serves as disambiguation for multiple Japanese homophones. This is the outcome of the discussion we had on Wiktionary:Beer_parlour/2013/February#Stripping_extra_info_from_Japanese_romaji. I wonder if it's doable via a bot. There are too many entries in Category:Japanese romaji, which have PoS headers and don't use {{ja-romaji}} template. Generating new ones is perhaps straightforward but not conversion. This is how the romaji entries will look, (the only category they belong to is Category:Japanese romaji). Copying from Wiktionary:About_Japanese#Romaji_entries: A hiragana only example: "tsuku" ==Japanese== ===Romanization=== {{ja-romaji|hira=つく}} A katakana only example: "rūto" ==Japanese== ===Romanization=== {{ja-romaji|kata=ルート}} A hiragana and katakana example: "ringo" ==Japanese== ===Romanization=== {{ja-romaji|hira=りんご|kata=リンゴ}} --Anatoli (обсудить/вклад) 04:46, 20 March 2013 (UTC) -
- For comparison, Japanese rōmaji will work similarly to Category:Mandarin pinyin. The debate about the Japanese rōmaji was resolved without a vote (see Wiktionary:Votes/2011-07/Pinyin entries for the vote on Mandarin pinyin). The vote actually prescribed NOT to add any definitions but some, especially old monosyllabic have definitions. With Japanese rōmaji we decided, not to have any definitions at all, only soft redirects. --Anatoli (обсудить/вклад) 04:53, 20 March 2013 (UTC)
-
-
- Is there potentially information in romaji entries that would be lost if a bot went through and deleted everything? DTLHS (talk) 05:33, 20 March 2013 (UTC)
-
-
-
- In theory, no, as all the information on romaji entries is essentially duplicated in the corresponding kana entries. This was a large part of the decision to simplify, since romaji entries have basically just been disambiguation pages created as dupes of the kana pages to aid users who don't yet read kana.
- In practice, there may be cases where the romaji entry was developed but the kana entry has not been. Provided the romaji entry information is good, I think that wikicode can just be copy-pasted to the corresponding kana entry, and Bob's your uncle. -- Eiríkr Útlendi │ Tala við mig 05:41, 20 March 2013 (UTC)
-
- On the pinyin vote we also had a rule not to add any pinyin entry if hanzi didn't exist. This rules is followed. There are some entries in Category:Mandarin pinyin entries without Hanzi with both blue and red links but no "just red". It's a good idea not to create rōmaji before real Japanese entry exists. I don't know if this rule should be enforced but what's the point of a redirect to nothing or spend time adding all definitions and other info to a transliteration entry. The converted entries can be viewed in the history, if anything valuable is lost. Fine by me. Let's encourage work on real Japanese and save time. --Anatoli (обсудить/вклад) 05:50, 20 March 2013 (UTC)
-
-
- While cleaning up some categories (suffixes, counters
Done), found wa-ga without kana (わが) but kanji exists (我が). Will convert/create this one but no need to worry if some are lost. Pity the creator didn't bother to create a hiragana entry. --Anatoli (обсудить/вклад) 05:55, 20 March 2013 (UTC) - wa-ga should be waga anyway... :) -- Eiríkr Útlendi │ Tala við mig 06:08, 20 March 2013 (UTC)
Sorry, whatever you proposed doesn't work. Entries must have definitions, otherwise AutoFormat will go and tag them as having no definition. -- Liliana • 16:08, 20 March 2013 (UTC) - Really? What about thousands of Category:Mandarin pinyin entries? To avoid your KassadBot picking them up # See ... on a new line is used.
- @Eirikr. I made waga as well. --Anatoli (обсудить/вклад) 20:12, 20 March 2013 (UTC)
- I agree with Liliana that each Romaji entry should have a line starting with "#" in the wiki code, which is currently not the case at tsuku. Unlike tsuku, Pinyin biǎomiàn does have a line starting with "#": # {{pinyin reading of|表面}} surface. With Romaji, you should better follow the model of Pinyin as closely as possible rather than introduing a different format that uses "See also". Moreover, this dramatic change of treatment of Romaji should go through a vote. I oppose making this dramatic change without a vote. --Dan Polansky (talk) 22:23, 20 March 2013 (UTC)
-
- The new line and # at the beginning is generated by the template. Mandarin, Gothic romanisation entries follow exactly the same patterns - they are soft redirects. The topic has been in the Beer Parlour for a long time with {{look}} to attract input and the most active Japanese editors - User:Haplology and User:Eirikr responded positively and are already using. The rationale was explained but I repeat briefly
- The structure of using Romaji as an index (soft redirect) follows the structure of Japanese dictionaries. Users use "tsuku" to get to "つく". There is no duplication of information.
- All the information in the rōmaji entries is contained in hiragana and katakana entries, only one click away.
- Roman script is not the correct script for the Japanese language, it's only romanisation. No need to mislead users that romaji is a replacement for the Japanese writing system.
- Currently, Japanese romanisation is the only exception (to my knowledge) from other languages. All languages have entries in their native scripts only, if they are not used in other scripts - i.e. Russian is only in Cyrillic, Arabic - only in Arabic. Romanisation entries are helpful only to find entries in their proper form, they are not nouns, verbs, they are romanisation.
- Maintenance hell, mismatch between entries, missing Japanese entries when romanisation entries exist.
- Dan, if you wish, set up a vote but since Japanese editors agreed to this method, I don't see a reason. when you opposed the vote on Mandarin pinyin you used Japanese romaji as a reason to vote against it, what's your reason this time? You're not going to maintain Japanese romaji entries, are you? --Anatoli (обсудить/вклад) 23:03, 20 March 2013 (UTC)
-
-
- Re Mandarin pinyin entries, the vote on pinyin explicitly disallowed any definitions (i.e. English translations in the entries), only links to hanzi (Chinese characters) - "a pinyin entry have only the modicum of information needed to allow readers to get to a traditional-characters or simplified-characters entry". (I was neutral on this rule). See "yánlì", which was used for the vote. This rule wasn't strictly followed in some cases but if it's causing confusion, the we might need to remove all English translation from Mandarin romanisation entries. Anyway, removing definitions was suggested by Eirikr, supported by Haplology and I agreed.
- "yánlì" entry from pinyin vote:
==Mandarin== ===Romanization=== {{cmn-pinyin}} # {{pinyin reading of|trad=嚴厲|simp=严厉|lang=cmn}} # {{pinyin reading of|trad=妍麗|simp=妍丽|lang=cmn}} # {{pinyin reading of|trad=沿例|simp=沿例|lang=cmn}} # {{pinyin reading of|trad=岩櫟|simp=岩栎|lang=cmn}} # {{pinyin reading of|trad=沿歷|simp=沿历|lang=cmn}} --Anatoli (обсудить/вклад) 23:10, 20 March 2013 (UTC) I made a very basic IPA > X-SAMPA transliterator at Module:IPA. Needs work. Also relevant: Wiktionary:Beer_parlour/2013/January#(X)SAMPA —Michael Z. 2013-03-20 21:05 z - I didn't know you could write table keys in that way... —CodeCat 21:37, 20 March 2013 (UTC)
- CodeCat : it's right here mw:Extension:Scribunto/Lua_reference_manual#table.
- By the way, is the usefulness of X-SAMPA accepted here on en.wikt ? On fr.wikt we chose to move everything in a gadget (even then I don't think anyone uses it).
- But if you need a list, check out the gadget list here : fr:MediaWiki:Gadget-APIversXSAMPA.js (not sure if it is complete though). Dakdada (talk) 21:43, 20 March 2013 (UTC)
- I think Lua is preferred to a gadget, though, because it runs on the server. —CodeCat 22:03, 20 March 2013 (UTC)
- Putting X-SAMPA in a Lua module would have a cost, as it would be loaded in every page with IPA. I'm not sure it is worth it, given very few people actually use it (if any). Gadgets are a good way to give users the API to X-SAMPA conversion, since only the people who want to use it would load the gadget from the site. Although that way we assume that the people who absolutely want to read ASCII pronunciations have javascript enabled... Dakdada (talk) 23:16, 20 March 2013 (UTC)
-
-
-
-
- Has anyone figured out how to import a table with mw.loadData? This would let the server load the transliteration table once only, in read-only mode, even if there were many instances of IPA on a page. I couldn't get it to load a table with Unicode data.
- I did use it but I have to admit that I did not compare it to a simple require to see if the data was really cached. Dakdada (talk) 10:27, 21 March 2013 (UTC)
- Where can I see your code? —Michael Z. 2013-03-21 15:22 z
- The module is here (sorry it's in French): fr:Module:langues, with the table in fr:Module:langues/data. As an example, the page fr:Utilisateur:Darkdadaah/eau/Pamputt can be created within 0.5s with
mw.loadData. When I replace it by require, the page is built in 8 seconds, with twice as much memory used. Dakdada (talk) 18:45, 21 March 2013 (UTC)
-
-
-
-
- X-SAMPA could be incorporated into {{IPA}}. I'd like to see a gadget that shows only IPA by default, and lets the reader toggle IPA/X-SAMPA display, or copy X-SAMPA. Less clutter on the page for the 99.999% of us who have no use for X-SAMPA. —Michael Z. 2013-03-21 01:00 z
- It would be easier if both IPA and X-SAMPA were created with a single template. Right now it's something like
{{IPA|}}, {{X-SAMPA|}} so just hiding one or the other would leave an ugly comma. Dakdada (talk) 10:27, 21 March 2013 (UTC)
-
-
-
-
-
-
-
- Yes, exactly. If X-SAMPA can be reliably derived from IPA, then it can be there every time, and no need for a separate template. But seeing as we know of zero users of X-SAMPA, there's no need to show it to everyone at all. Any ideas for an unobtrusive interface? —Michael Z. 2013-03-21 15:22 z
[edit] Error when moving a page? I'm trying to move avantpaísos to avantpaïsos without leaving a redirect. But when I try, I get an error like this: [6560d38b] 2013-03-20 21:31:29: Fatal exception of type MWException. Is anyone else able to do the move? —CodeCat 21:32, 20 March 2013 (UTC) Apparently I'm getting the same error with other pages I try to move. —CodeCat 21:35, 20 March 2013 (UTC) - Not me; I've tried and failed. Mglovesfun (talk) 21:53, 20 March 2013 (UTC)
- Same here. SemperBlotto (talk) 22:07, 20 March 2013 (UTC)
- avantpaís is displaying a script error. This needs fixing urgently. Mglovesfun (talk) 22:43, 20 March 2013 (UTC)
Bot to do this: ==English== ===Noun=== '''crossings''' # {{plural of|crossing}} to ==English== ===Noun=== {{head|en}} # {{plural of|crossing}} The regex is pretty simple. I can do it using the regex function on AWB but AWB also cuts of at 25,000 for categories so I could only go as far as that. Perhaps MewBot (talk • contribs) would like to take this one? Nevertheless, it could be done for other languages and also for verb forms, adjective forms and so on. Mglovesfun (talk) 11:08, 22 March 2013 (UTC) - I would prefer another approach, which I was just about to suggest when I saw this. It's my preference that form-of templates like {{plural of}} don't add part-of-speech categories to the entries. It makes sense to me because we already have, as a rule, headword-line templates that add PoS categories, so this makes it more consistent. But there are other reasons as well. In many cases, the form-of templates end up being added to other kinds of entries and other languages, but in those cases it may not be appropriate to have a category. With {{plural of}} this is particularly noticeable because the category it places words in, Category:English plurals, isn't very clearly named because it doesn't say plurals of what. In a language like Catalan, such a name would not be appropriate, because Catalan also has plural adjectives and plural verbs. Yesterday I cleaned out Category:Catalan plurals, which (not surprisingly) contained several adjective plural forms as well. Some templates, including this one, allow you to suppress the category or change its name, but that seems like putting the cart before the horse. Catalan already has a {{ca-noun-form}} template which places the entry in the most appropriate category, so why would we need to add
nocat=1 every time we use {{plural of}} for Catalan? That seems backwards. Therefore, I propose this replacement instead:
==English== ===Noun=== {{en-noun-form}} [or {{en-noun-plural}}] # {{plural of|crossing|lang=en|nocat=1}} - The headword-line template, which we would need to create, would add the plural category instead. So
nocat=1 is added to suppress the category of {{plural of}}, which in turn would make it easier for us to find out how many entries still rely on its categorisation. It is my hope that once all instances of {{plural of}} have this parameter, we can remove the categorisation code from the template safely. —CodeCat 13:51, 22 March 2013 (UTC)
- What is the advantage of doing either of these over the current situation of a plain wikitext inflection line and categorization by {{plural of}} (presumably eventually to be replaced by {{en-plural of}})? Uniformity? That seems like a positive hazard as it seems to lead folks to believe that they know how to make changes to English entries when the evidence leads me to believe they don't.
- There are quite a few cases where the inflection line is for a lemma and {{plural of}} does categorization at the sense line level. DCDuring TALK 15:39, 22 March 2013 (UTC)
- Why would you want to do this? Why replace simple code with a template that does nothing? Are you trying to make the wiki run even slower? SemperBlotto (talk) 15:41, 22 March 2013 (UTC)
- I can't see any advantages. Intention redundancy? CodeCat you're normally the first to want to get rid of redundancy (even before me). Mglovesfun (talk) 16:37, 22 March 2013 (UTC)
- Redundancy isn't really an issue here, it's about what is workable. If thousands and thousands of uses of a template need a
nocat=1 parameter just to stop it from doing something, then that seems like bad design. And when people come across something that is badly designed, they're going to try to work around it, which may make things worse. For example, I've seen lots and lots of entries that have tried to avoid the categorisation of {{plural of}} by instead using {{form of|plural}}. While others, like I mentioned, ignored the category with the result that at least for Catalan entries, Category:Catalan noun forms and Category:Catalan plurals contained almost the exact same entries. The only difference between them were either a few entries that lacked {{ca-noun-form}}, or entries that used {{plural of}} for adjectives (which is totally intuitive; it's the category that's wrong!). That can't be a good thing. My proposal helps to make things consistent by sticking to a simple rule that most non-form entries already adhere to: the headword-line template is responsible for the PoS category. {{head}} already works that way, as do the many language-specific templates like {{en-noun}}. I think that is a very simple rule, and if we can achieve a situation where our templates follow it, it will make things easier to understand because editors will know exactly which templates they can expect to add an entry to a category and which not, which avoids errors due to uncertainty. I mean, think about this yourself... would you rather have to remember for each template whether it categorises or not, or would you prefer learning a simple rule? —CodeCat 17:28, 22 March 2013 (UTC)
- I have never used "nocat=1" (it is not obvious to me what it is supposed to do), so I have never had to remember what to do. I have no idea which English, French, Italian, Latin or German templates allow such a keyword. SemperBlotto (talk) 17:33, 22 March 2013 (UTC)
- That is exactly what I am talking about above. Consistency in how similar templates work is good. It means that once we learn to expect certain behaviour, we can extend that expectation to new templates with reasonably safe knowledge that it will do as we think. Consider another example, overriding the headword of a headword-line template. The majority of our templates use
head= for that, so many of us (myself included) would just use head= without even thinking about it. We expect it to work. Similar for linking templates, which take a second parameter to change the displayed link text. Nobody thinks about it, everyone just expects it to work. And that is a good thing because it lessens the mental burden of remembering how all the templates. My proposal is intended to be just one step towards that. —CodeCat 17:41, 22 March 2013 (UTC)
[Aside: here's a working link to this section: Bot to add {{head|en}} to Category:English plurals. —Michael Z. 2013-03-22 18:43 z] -
-
- Why not modify {{en-noun}} so it works for plurals? Something like
{{en-noun|pl}}. Every time I create an English plural entry, I spend five minutes previewing {{en-noun|-}}, {{en-noun|!}}, {{en-noun|?}}, read the docs again, and then give up and leave it for someone else to clean up. As an editor, I don't care which template adds the category.
-
-
- Why replace simple code with a template that does nothing? Code that is completely inconsistent with every other noun entry is not simple, it is obscure. —Michael Z. 2013-03-22 20:16 z
- There would need to be a way to distinguish a plurale tantum/plural-only noun (a lemma that happens to be plural) from a plural form of a regular singular noun. We wouldn't want pants categorised as a noun plural form, I think? I think adding such functionality to {{en-noun}} is dangerous, because with misuse we could end up with plurals categorised in Category:English nouns. Having a separate template seems like a safer option, and it also fits with the general idea that each part of speech has its own template (for categorisation purposes, noun forms are their own part of speech, distinct from nouns). Of course, just writing {{head|en|plural}} or {{head|en|noun form}} is a possibility too. —CodeCat 21:07, 22 March 2013 (UTC)
- Please don't mess with {{en-noun}}. A few templaters have said that {{en-noun}} was one that was sufficiently complicated already so that they didn't want to add features. We have {{en-plural noun}} already as an inflection-line template. It is also inappropriate in the cases mentioned above in which the same headword is both a plural-only and a simple plural. {{head|en}} and hard categorization seem adequate for that case and other exceptional cases that arise.
- I see some advantage to creating an English-specific direct sense-line replacement for {{plural of}}. If all other languages want to use an inflection-line approach or a language-specific sense-line approach, then by all means let there be language-specific and generic templates to do so. As the Little Red Book said, let a thousand flowers bloom!!! DCDuring TALK 22:27, 22 March 2013 (UTC)
-
-
-
-
-
-
- Maybe this is an opportunity to try out a single template for headword and sense line(s), incorporating HTML
dfn and dl. —Michael Z. 2013-03-22 23:04 z - Yes. It's probably long past time to kill off this idea of wiki-style participation here. I say let there be an apprenticeship period, no edits from non-whitelisted users without approval, etc, qualifying exams for would be template writers, HTML and CSS qualifying exams for adminship. DCDuring TALK 23:34, 22 March 2013 (UTC)
- you don't think one well-designed template could be made more accessible for editors than two vaguely unrelated templates? —Michael Z. 2013-03-23 01:10 z
[edit] Reminder of Lua help session in a few hours Hi! This is a reminder: today at 1800 UTC, in about three hours, there's a Lua/Scribunto help session on IRC; please see the IRC office hours page on meta for details. Thanks! Sharihareswara (WMF) (talk) 15:05, 22 March 2013 (UTC) | |