Wealth Maker: Wiktionary - Recent changes [en]: Wiktionary:Beer parlour/2013/April

Wiktionary - Recent changes [en]

Track the most recent changes to the wiki in this feed. // via fulltextrssfeed.com

Wiktionary:Beer parlour/2013/April

Apr 22nd 2013, 02:45

Revision as of 01:03, 22 April 2013 (edit) CodeCat (Talk \| contribs)		Latest revision as of 02:45, 22 April 2013 (edit) (undo) Chuck Entz (Talk \| contribs)
Line 538:		Line 538:
	:::::: You can look in [[:Category:Pages with script errors]]. {{User:CodeCat/signature}} 01:03, 22 April 2013 (UTC)		:::::: You can look in [[:Category:Pages with script errors]]. {{User:CodeCat/signature}} 01:03, 22 April 2013 (UTC)
	'''Support''' making <tt>lang</tt> mandatory. It may also be possible to include automatic transliteration later. Perhaps rather than "???", it should say "which language???". --[[User:Atitarev\|Anatoli]] <sup>([[User talk:Atitarev\|обсудить]]</sup>/<sup>[[Special:Contributions/Atitarev\|вклад]])</sup> 23:52, 21 April 2013 (UTC)		'''Support''' making <tt>lang</tt> mandatory. It may also be possible to include automatic transliteration later. Perhaps rather than "???", it should say "which language???". --[[User:Atitarev\|Anatoli]] <sup>([[User talk:Atitarev\|обсудить]]</sup>/<sup>[[Special:Contributions/Atitarev\|вклад]])</sup> 23:52, 21 April 2013 (UTC)
		+
		+	: I don't like calling it an error. For one thing, it's beside the point and adds extra verbiage, but mostly, it gives the impression that things are falling apart. I would suggest following the lead of some of our rf- templates: "This '''term''' template is lacking a language code. If you know it, please add it as a lang= parameter". Still verbose, but it would only show on hover. The symbol should be something small and innocuous, like the one Michael suggested above, or maybe a bullet (•). Even the question marks might not be so bad- as a trailing superscript. Or how about: <span title="This term template is lacking a language code. If you know it, please add it as a "lang=" parameter">{{term\|όρος\|\|term\|tr=óros\|lang=el}}'''''<sup>[→?]</sup>'''''</span> (I'm sure there are attributes that would make it look more like a live control, but you get the idea).

	== Portuguese reflexive verbs ==		== Portuguese reflexive verbs ==

Latest revision as of 02:45, 22 April 2013

[edit] Idea for proper noun entries that belong in an encyclopedia

There seem to be a lot of proper nouns that show up on WT:RFD. Many of these have articles in the EN WP. Since people are clearly looking for these entries, and some editors mistakenly think such entries belong here, while some readers mistakenly think they can find those entries here, it's clear there's some demand for having proper noun entries here at EN WT.

What would folks say to allowing the creation of proper noun entries, such as Mona Lisa or Mini Cooper or Hound of the Baskervilles, but just as redirects (soft or hard, as deemed appropriate) to the corresponding EN WP article? This would meet the apparent demand for such entries, while not wasting EN WT editor time writing and maintaining them, and while avoiding the inclusion of encyclopedic material in this dictionary project. -- Eiríkr Útlendi │ Tala við mig 17:06, 2 April 2013 (UTC)

I don't think hard redirects to Wikipedia are even possible; they'd have to be soft. Wikipedia itself already has w:Template:Wiktionary redirect for pages that will only ever be dictionary entries; all we need to do is make a corresponding template here. —An gr 17:32, 2 April 2013 (UTC)

Sounds good to me. I see, however, that Semper Blotto deleted Template:Wikipedia redirect way back in 2006... -- Eiríkr Útlendi │ Tala við mig 17:35, 2 April 2013 (UTC)

I don't see why this is a good idea. How is it better than just not having the entries at all? How do we decide which entries need {{only in|{{in wikipedia}}}} and which are red links? Or do we create such redirects for all entry titles which have Wikipedia articles? Mglovesfun (talk) 17:38, 2 April 2013 (UTC)

It seems fine to create {{only in}} redirects to WP for all proper nouns (Why not all entries of any kind?) for which we do not have an entry. Editors can replace the redirect with an entry, which is subject to the usual reviews. At the very least we should use the redirects for proper noun entries that have failed RfD for whatever reason. DCDuring TALK 17:58, 2 April 2013 (UTC)

Sorry, I thought my initial comment explains the "why" -- users, both as editors and as readers, are clearly coming to Wiktionary in search of such entries.

As to which entries to convert, any proper noun entry that editors think should not be in Wiktionary would be a candidate for such redirection. If deemed necessary for clarity, the redirection template could include text explaining that Wikipedia might not yet have such an article, but that if anyone were to create such an article, it belongs in Wikipedia and not here.

I'm simply floating an idea about how to respond to apparent user demand for encyclopedic proper noun entries in a way that 1) meets that demand, 2) points users to the appropriate place for such entries, and 3) and doesn't require much work from editors. -- Eiríkr Útlendi │ Tala við mig 17:59, 2 April 2013 (UTC)

A good idea, but there already is a page that comes up when someone goes to an undefined proper noun. See, for example Starry Night, Mini Cooper S, or A Study in Scarlet. It just doesn't serve the required needs. The page that comes up for starry night, mini cooper s, or a study in scarlet is a bit better, but still could be improved.

I wonder if there is a way to improve "perhaps there is a page xxx in our sister encyclopedia project, Wikipedia."

Anyway, let's improve the 404 page instead of reinventing the wheel. —Michael Z. 2013-04-02 18:18 z

The fact that people are searching for things doesn't mean we should include them, even as redirects to Wikipedia. The number one search on a user-generated replacement for Special:WantedPages was in fact the Mandarin for 'naked porno movies'. Mglovesfun (talk) 20:59, 2 April 2013 (UTC)

Well, dang it, someone should create that Wikipedia article already.

<ahem.> On a more serious note, the issue is not just that folks are searching for such pages, but that they are actually creating them. This generates maintenance overhead for WT editors. Redirecting users to Wikipedia might help reduce this overhead. -- Eiríkr Útlendi │ Tala við mig 21:03, 2 April 2013 (UTC)

But redirects are bluelinks. If we make tens of thousands of redirects, how will anyone notice the few bluelinks which have, wrongly, been created as full entries that we (by our current policies and culture) tend to subject to WT:RFD? I agree with Michael: improve the "404" that comes up when someone clicks on [[Some Proper Noun]], goes to [1] or uses the search bar to search for "Some Proper Noun". - -sche (discuss) 21:28, 2 April 2013 (UTC)

We already make color distinctions in our links: a lighter blue for links to other projects, orange for links with the wrong section. A bot could replace links to {{only in}} entries with {{w}} links or "w:" piped plainlinks. Improving the 404 only partially addresses the problem, though it has the enormous advantage of, in principle, being easier to implement. DCDuring TALK 22:04, 2 April 2013 (UTC)

I have never noticed light blue or orange, and as far as I know I have a good computer displays and good color vision. —Michael Z. 2013-04-03 14:28 z

Orange links have to be turned on in your Per-browser preferences; as for light blue links, don't you see a difference between blue and blue? For me the difference is subtle but real. —An gr 15:24, 3 April 2013 (UTC)

Looks precisely like visited and unvisited links. Since external links are marked with a little arrow, nothing has ever prompted me to associate that colour variation with another class of external link. —Michael Z. 2013-04-03 19:52 z

Ah, I see. Red links and blue external links turn paler when visited, but blue internal links turn darker, or lighter when unvisited but pointing to other MW sites. Shoulda been obvious. —Michael Z. 2013-04-03 20:08 z

Huh. For me, both internal and external Wikimedia links turn purple when visited, again with a subtle but present difference in shade. —An gr 20:22, 3 April 2013 (UTC)

Exactly. We also have greenlinks for no page corresponding to inflected forms, if you have the gadget for accelerated creation of these selected on user preferences. (See conquest#Verb.) DCDuring TALK 15:33, 3 April 2013 (UTC)

I realize that I had possibly misinterpreted Mglovesfun's previous comment as suggesting we shouldn't even rework our 404. To clarify, I am not advocating that we start creating scores of pages solely for the purpose of redirecting to WP. My intent instead was originally just to ask if perhaps proper noun pages, particularly those that fail RFD (which I should have stated more specifically earlier), would benefit by having redirects to WP. Michael's suggestion of reworking our 404 sounds like a wonderful idea, either alongside specific redirects for pages that failed RFD, or as a replacement for that idea. -- Eiríkr Útlendi │ Tala við mig 22:19, 2 April 2013 (UTC)

Then I'll back Michael's idea as well. Mglovesfun (talk) 09:16, 3 April 2013 (UTC)

Does anyone know where to edit these pages, and how to create the special links on them? Are the docs? —Michael Z. 2013-04-03 14:28 z

MediaWiki:Noarticletext contains the "Wiktionary does not yet have a mediawiki page for Noarticletext" message; you can change the message by editing that page. (There's also MediaWiki:Noexactmatch, but I don't know that it's used anywhere.) MediaWiki:Searchmenu-new, and possibly other pages, control(s) what's displayed when someone searches for a term we don't have. - -sche (discuss) 20:22, 3 April 2013 (UTC)

Thanks. And do you know where to find the 404-from-a-link page, e.g. mini cooper s, and the additional wrong-case message added to Mini Cooper S? —Michael Z. 2013-04-03 21:09 z

I presume it's one of these pages, but I don't know which one. - -sche (discuss) 21:40, 4 April 2013 (UTC)

The easiest way to find out is to visit http://en.wiktionary.org/w/index.php?title=mini_cooper_s&action=edit&redlink=1&uselang=qqx and examine the indicated messages. For example, (creating: mini cooper s) holds the place of a message generated by MediaWiki:Creating with $1 set to mini cooper s. (qqx is in the "private use" range of language-codes, so some enterprising MediaWiki developer decided to appropriate it for this purpose. I'm guessing the feature's primary target audience was interface translators, so they could find the message that they need to translate, but I've found it very useful myself.) —Ruakh_TALK 04:39, 8 April 2013 (UTC)

[edit] Gothic romanisation template

I have created Template:got-romanization (different from Template:got-romanization of!) and a sample entry "Template:ja-romaji the definition line with # is generated by the template, so it has both the headword and a definition. It has the same look and feel as a new romaji entry. Like Japanese, the Gothic entries only link to the main entry, no other information. --Anatoli ^{(обсудить}/^вклад) 03:38, 4 April 2013 (UTC)

How is it different from {{got-romanization of}}? The output seems to be the same: they both say, "See XYZ" where XYZ is the spelling in the Gothic alphabet. I preferred it when it said "Romanization of", though. —An gr 10:52, 4 April 2013 (UTC)

It's an attempt to make romanisation entries of different languages more similar to each other. Template:ja-romaji is increasingly used for Japanese romaji entries and there are two votes Dan Polansky has created in the protest of the change that was agreed on by JA editors after a very long discussion in BP. The votes: 1. Wiktionary:Votes/pl-2013-03/Japanese Romaji romanization - format and content and 2. Wiktionary:Votes/pl-2013-03/Romanization and definition line. The second vote is specifically about the approach on how definition line is added. Usually it's # on a new line in the wikitext. The new Japanese and the proposed Gothic template generate the definition line, thus not editable directly.

User:Mzajac raised a concern that Japanese and Gothic are different from each other. Both Japanese and Gothic by default don't produce any definition as such, only a link to the main entry. Using a template will enforce this rule. The definition line will still be there (thus complying with Wiktionary:ELE#Definitions) but a new definition line is only added when a new parameter is added. The suggested template is much shorter and as proved by the current work on Japanese romaji entries can be generated very quickly both by people and bots.

Re: "See" and "Romanization of". Again, just to make both templates (Gothic and Japanese) look similar. There's already the word "romanization" at the header level.

New:

  ==Gothic==    ===Romanization===  {{got-romanization|𐌰𐍆𐌳𐍂𐌰𐌿𐍃𐌾𐌰𐌽}}

Old:

  ==Gothic==  ===Romanization===  {{got-rom}}    # {{got-romanization of|𐌰𐍆𐌳𐍂𐌰𐌿𐍃𐌾𐌰𐌽}}

--Anatoli ^{(обсудить}/^вклад) 11:23, 4 April 2013 (UTC)

[edit] Transpondine Portuguese

There is nothing in Wiktionary:About Portuguese concerning spellings on opposite sides of the Atlantic. I have been adding Brazilian forms as "alternative forms" of the spelling used in Portugal. But often, I see that the Portuguese Wiktionary does the exact opposite. Does anyone have an opinion on what we should do - or should it be up to the personal preference of our editors? SemperBlotto (talk) 15:55, 4 April 2013 (UTC)

Personal preference. — Ungoliant ^(Falai) 00:12, 5 April 2013 (UTC)

In a discussion with User:Gdbf137, we discovered that Mac and MS seem to use different Cangjie input sequences. The Unihan database entry for 农 gives a Cangjie input sequence of LBV. Apparently, that works correctly on Mac OS X Lion. On Windows 7, however, MS's Changjie IME accepts HBV to input this character, while LBV just generates an error beep and no character is output.

Does anyone else have a handle on what's going on? Do we need someone to change {{Han char}} to allow for multiple Cangjie input strings, one per OS? Or, more frighteningly, has Microsoft and/or Apple been changing things willy-nilly, and we need to allow for multiple Cangjie input strings, one per OS version? -- Eiríkr Útlendi │ Tala við mig 17:35, 4 April 2013 (UTC)

Is this related to Cangjie_input_method#Versions_of_Cangjie? "Currently, version 3 (第三代倉頡) is the most common; it is the version of Cangjie supported natively by Microsoft Windows ... The Cangjie input method supported on the Mac OS is somewhat like Version 3 and somewhat like Version 5." I don't know what the solution to this would be other than to specify what version the template is referring to. DTLHS (talk) 04:49, 5 April 2013 (UTC)

[edit] Cross-script/mutated semi-borrowings

This seems to be a repeated question, and it's come up again at Wiktionary:Requests for deletion#da. What do we do with half-borrowed words? Stuff like "da", which is clearly a Russian word being used in English, or "si", which is clearly a Spanish word being used in English, even if both would never be spelled that way in their original language. google books:si senor gets a lot of hits of English hits, even once we've excluded "Sí, Señor". Writings across the world are dropping a little bit of foreign language that their audience will understand in their text, and whenever there's orthographic differences, we'll probably see this type of change. "Da" can probably be attested in every major European language in this sense. Instead of creating senses under da for all the languages, maybe we could create a orthographically mangled (for foreigners) version of|да template (name to be changed, of course) and stick it under Russian. Same thing with si and danke schon and probably some mangled Latin we've deleted, etc. (This doesn't intend to change real borrowings, just one language stuck into another.) (The template could maybe use a foreign lang tag, so {{orthographically mangled (for foreigners) version of|old_lang=de|new_lang=en|[[danke schön#]]}}; I do suspect that da and friends are used in multiple Latin-script languages, but it's a too common particle to make that easy to check.)--Prosfilaes (talk) 05:17, 5 April 2013 (UTC)

There are so many edge cases that it's hard to draw the line. Da might be meant as transliterated Russian, in which case WT:ARU disallows its existence. But I have a Hispanophone friend who sometimes says /siː ˈsɛn.nɚ/ as a joke, and si senor might be a valid English entry. I don't think you've made it crystal clear when to use this hypothetical template and when to create a normal entry, so I can't really support it yet. —Μετάknowledge^{discuss/deeds} 19:50, 7 April 2013 (UTC)

I'm confused too. What if we use the normal process of finding citations? Words from one language used in another could be labeled as such, using {{context}} or {{qualifier}}. It's dangerous if we go too far, e.g. if we start quoting all English words in Latin letters in another language, especially in non-Roman based languages. For the moment, I wouldn't go with romanised Russian either. --Anatoli ^{(обсудить}/^вклад) 02:36, 8 April 2013 (UTC)

I'm not suggesting we don't find citations; I'm worried about the stuff where we can find plentiful citations that establish it's between two languages. What I'm most concerned about is that stuff like "Do svidanya", that English speakers can find in English texts and want to look up, but is likely to get treated as Russian, and then get deleted because it's romanized. There seems to be a hole here where things that can be cited, and might actually get looked up, are deleted because they aren't real Russian or Latin, etc. I think si senor is a good example; it's not English, it's clearly Spanish or at least pseudo-Spanish. But we deleted danke schon for the same reasons, as not German. I'm not comfortable if it's created as English, it will survive. I am sure that eminently citable words and phrases like that need to be stored on Wiktionary in the spelling that people will find them used under, and what language tags them is less important then that.--Prosfilaes (talk) 06:27, 9 April 2013 (UTC)

I'm not sure I understand your suggestion. We could have redirects for commonly known foreign words if they incorrectly spelled or written in the wrong script. do svidanya -> до свидания, danke schon -> danke schön (danke schon previously failed RFD but I see in the history, it was a full entry, not a redirect). Note: schon in German is a different word from schön. I don't think si senor or si señor merit an entry, English or Spanish. konnichi wa already exists as a romaji entry and can be looked up. --Anatoli ^{(обсудить}/^вклад) 06:58, 9 April 2013 (UTC)

This template is used to label specific synonyms or antonyms. With antonyms that leads to problems though, like this edit shows: diff. People get confused because they expect that the sense being shown is the sense of the words listed after it. And that isn't really a strange assumption either, except that it's not how we use the template. So, would it be ok if some extra text were added to the template, so that it displays this instead: of the sense "(sense)" ? —CodeCa t 14:19, 7 April 2013 (UTC)

What is the before and after of your proposal in the general case? DCDuring TALK 14:54, 7 April 2013 (UTC)

What do you mean? —CodeCa t 16:05, 7 April 2013 (UTC)

This is a fairly well-known problem; what are you actually proposing? Mglovesfun (talk) 16:34, 7 April 2013 (UTC)

Um... I'm proposing to change the text that the template displays, like I said? —CodeCa t 17:24, 7 April 2013 (UTC)

To exactly what. DCDuring TALK 18:51, 7 April 2013 (UTC)

Quote: "So, would it be ok if some extra text were added to the template, so that it displays this instead: of the sense "(sense)" ?" —CodeCa t 18:58, 7 April 2013 (UTC)

If anything, it should display "definition" or "def". "Sense" communicates mostly to us, perhaps to linguists. DCDuring TALK 19:12, 7 April 2013 (UTC)

Status quo ante:

(a definition): word, another, more

CodeCat proposal (as I [incorrectly] understood it):

(sense) a definition: word, another, more

CodeCat proposal (from below):

(of the sense (a definition)): word, another, more

Alternative proposal 1:

(definition: "a definition"): word, another, more

Alternative proposal 2:

(def.: "a definition"): word, another, more

There are numerous other arrangements of brackets, font types, and wording possible. I don't know that any of these will solve the problem of communicating the intent of the antonym section (and the less familiar semantic relations) while simply providing a breadcrumb back to the definition. We could also try putting "NOT" in front of the gloss for the antonyms heading only or we could skip trying to communicate to ordinary users. DCDuring TALK 19:42, 7 April 2013 (UTC)

I didn't realise you wanted me to be that specific, because I felt that the spirit was more important than the letter. What I intended was for it to show as: (of the sense (sense)): word —CodeCa t 19:54, 7 April 2013 (UTC)

@CodeCat you said you wanted to change the text of the template, just not what you wanted to change it to. Mglovesfun (talk) 19:48, 7 April 2013 (UTC)

No, she said: of the sense "(sense)". (Use of <tt> or italics might have made that harder to miss, but she did say.) - -sche (discuss) 20:09, 7 April 2013 (UTC)

Corrected CodeCat proposal now above. DCDuring TALK 21:23, 7 April 2013 (UTC)

I've proposed:

(of "gloss here"): foo, bar, spam

in a previous discussion. — Ungoliant ^(Falai) 11:40, 9 April 2013 (UTC)

Missed that. It has the advantage of brevity over the other proposals. And it makes sense if one read from linearly from the headings to the individual items: 'Antonyms of "definition"', 'Coordinate terms of "definition"' etc. How could a user misread it? Perhaps by ignoring the quotation marks and reading the "of" as part of the following text. Should "of" also be italicized? DCDuring TALK 11:57, 9 April 2013 (UTC)

If the gloss is italicised and the 'of' isn't, it will help prevent misreading. — Ungoliant ^(Falai) 12:05, 9 April 2013 (UTC)

Actually I meant to include quotes around the sense, but that kind of got list in translation. —CodeCa t 13:01, 9 April 2013 (UTC)

So some possibilities with "of" are:

(of definition):
(of "definition"):
(of definition):
(of "definition"):

Of these my favorite is the last, because: 1., we often put glosses in quotes, eg in {{term}}, 2., 'Of' needs to distinguished, 3., the whole thing needs to be visually distinct from the terms following, including any that are not links, eg SoP circumlocutions. DCDuring TALK 14:00, 9 April 2013 (UTC)

Of course, {{term}} italicises, so it may not be as distinctive as you think. Chuck Entz (talk) 14:21, 9 April 2013 (UTC)

The standard practice is to italicise mentions, but this isn't a mention, more like a quotation, so {{term}} isn't appropriate here. —CodeCa t 14:29, 9 April 2013 (UTC)

The wording "of [sense]" or "of the sense [sense]" works for 'nyms and pronunciations, but not for usage notes. I propose "in the sense '[sense]'" (or "in the sense of [sense]" or whatever), which is I think how normal people speak about a particular sense of a word. It works for 'nyms and pronunciations also: in fact, for me at least, it seems much more natural even for 'nymsand pronunciations.—msh210℠ (talk) 18:53, 9 April 2013 (UTC) ← Portions struck through at 04:45, 10 April 2013 (UTC).—msh210℠ (talk)

I was thinking of "of". Mglovesfun (talk) 22:06, 9 April 2013 (UTC)

How about allowing an alternative wording, specified, say, by an "alt=" parameter for whatever cases cases not well served by "of". There are, in English at least, relatively few uses of {{sense}} in Usage notes AFAICT. Is it commonly used there in other languages? DCDuring TALK 22:52, 9 April 2013 (UTC)

I guess my issue is partially that {{sense}} is often with not a gloss but a usage restriction or a field of endeavor as its parameter. For example, work (which currently has no 'nyms listed at all) might list 'nyms of the "Said of one's workplace (building), or one's department, or one's trade (sphere of business): He mostly works in logging, but sometimes works in carpentry" sense using {{sense|of a workplace or trade}} and 'nyms of the "(zymurgy) To cause to ferment" sense using {{sense|zymurgy}}. I've definitely seen examples of each of these types of uses of {{sense}}. Adding "of" would make no sense in those cases either. (The 'nyms aren't 'nyms of zymurgy.)

And even in the more common case, viz even when the parameter of {{sense}} is a gloss of the headword, what we're really listing aren't 'nyms of "to cause to ferment" — as the wording "of cause to ferment" (or the awkward "of to cause to ferment") would imply. Rather, what we're listing are 'nyms of work in the sense of "to cause to ferment". So adding "of" doesn't cut it, in my opinion — not even for 'nyms and pronunciations.

Perhaps best would be "for [pagename] in the sense of:" with a colon at the end and no quotation marks around what follows. Quotation marks (and even italicization if the prefatory text isn't italicized) wouldn't work in the zymurgy (or field of endeavor) case, as it'd seem like "zymurgy" is a gloss. The colon is then necessary, as "in the sense of [gloss]" doesn't flow. Using only "in the sense of:" is still slightly ambiguous, not solving the problem we started with here: it could be referring to the listed antonyms rather than the headword. I think "for [pagename] in the sense of:" takes care of all these issues — though of course there may be others I haven't thought of.—msh210℠ (talk) 04:45, 10 April 2013 (UTC)

[edit] Some small changes to Mandarin (also Cantonese, Min Nan) entry structure and about topic categories - suggestion

Input needed: This discussion needs further input in order to be successfully closed. Please take a look!

I will run this by all our active Chinese contributors but I'd like to suggest to dump the rs (radical sort) value in Chinese entries, e.g. {{cmn-noun}}.

The rationale is the following:

Finding the sorting order for the Chinese character entries is not straightforward, although Wiktionary itself is has this info. Lack of the knowledge impedes casual editors and any people who is sure about words but not sure about the structure to add new entries.
The mistakes are numerous, I have fixed some when I noticed but I'm sure I missed many.
Simplified and traditional topic categories are sorted differently but there is no real reason for it, e.g. 標準 (biāozhǔn) ("standard") is sorted by "木11標準" (so will appear under "木" (tree) radical but its simplified equivalent 标准 by "biao1zhun3" and will appear under letter "B".
A Chinese person who would rely on the radical sorting and very familiar with radicals and their order would probably be better just entering the word they are searching in Chinese and find it, rather than searching in the category listings

Take a look at this Category:cmn:Intermediate_Mandarin_in_traditional_script:

You see, a small number is sorted by a Latin letters, others are by radicals. Those under Roman letters are incorrectly formatted. Errors are often introduced when a traditional entry is created by copying a simplified entry and the initial character is different.

I suggest to remove the "rs" from entries and from category sorting and just sort by numbered pinyin (e.g. "biao1zhun3"), perhaps stop splitting topical Mandarin categories into simplified/traditional. Serbo-Croatian entries don't separate Cyrillic/Latin entries into separate categories. Or we need to check/fix all incorrectly formatted entries, for which we just don't have enough resources.

I'm not insisting on this change but User:A-cai is no longer very active here who did a great job and we could get more people on board if Mandarin entries were simpler.

Just want to check the mood and get opinions. We have tens of thousands of entries in traditional script, so there needs to be an agreement before anything happens. --Anatoli ^{(обсудить}/^вклад) 04:24, 8 April 2013 (UTC)

I have no strong opinion on this. The rs value is autogenerated when using {{cmn new}}, which relies on {{zh-sortkeys}} to produce the rs of the first character in the page title. So doesn't really bother me. (I wish the language sections are just a single template, with various parameters included, eg.

{{language_name|標|準|p1=biāo|p2=zhǔn|jy1=biu1|jy2=zeon2|poj=piau-chún|n|[[standard]]|eg=|syn=基準|syn2=|ant=}} (effectively everything needed to generate 標準),

and all the rest (trad-simp detection/conversion, pinyin analysis, sort key, even generating pinyin for character) are automated.) Wyang (talk) 05:16, 8 April 2013 (UTC)

Thanks. You're well equipped, others are not so lucky. :)

What about maintenance of topic categories. Many have been moved or deleted, just because they don't follow the structure of other languages.

Category:Mandarin terms derived from English exists on its own (35 entries), although initially was meant to be split.

Category:Mandarin terms in simplified script derived from English (356)

Category:Mandarin terms in traditional script derived from English (301)

Category:Mandarin terms derived from Japanese is now a separate category (21) but Category:Mandarin terms in simplified script derived from Japanese and Category:Mandarin terms in traditional script derived from Japanese deleted or moved (like many others, they are not empty!). It's a mess. Some long time editors like Tooironic seems to be confused about categories in Mandarin, so people just stopped categorizing Mandarin entries or categorise them at random (with or without words traditional/simplified). Well, the reason is simple - trad. and simpl. entries are sorted differently and therefore categorised differently. --Anatoli ^{(обсудить}/^вклад) 05:54, 8 April 2013 (UTC)

I do like the idea of getting rid of the duplication in categories- it always struck me as rather kludge-y. The main drawbacks/issues I can see would be characters that have multiple pronunciations, and the fact that we would instantly increase the membership of most categories and decrease the number of distinct entries per page. Also, the difference between traditional and simplified characters isn't as easy to see for those who don't know one or the other as for the difference between Latin and Cyrillic. I can see how there might be confusion about which terms in a category are traditional, simplified, or the same in both, and even which ones are paired with which. I'm sure those aren't terribly difficult to deal with, so I'm in favor of changing the category sorting.

As for the rs parameter: we wouldn't have to get rid of it. It would be easier to just make it non-mandatory and ignore it in category sorting. Maybe someday we can give users the option of choosing which sort order to use, though we'd have to populate the rs parameters by bot, first. Chuck Entz (talk) 07:18, 8 April 2013 (UTC)

Of course we should keep separate entries for simplified and traditional characters and words. Wiktionary after all aims to catalogue all words in all languages, in whatever forms. However I too support the abandoning of the old system under A-cai. It's simply not worth the extra effort. At present I add about 50 or so Mandarin entries a week. I imagine I, along with other editors, could create double the number of entries if we didn't have to deal with the rs field. But now Wyang says the rs field is generated automatically. Is that really the case? I just created a new Mandarin entry at 扇贝 - where is this automatic rs field you speak of? Did I do it wrong? If so advise me how. Cheers. ---> Tooironic (talk) 09:49, 8 April 2013 (UTC)
When you create the entry, you can use the code {{subst:cmn new/a|p1=shàn|p2=bèi|n|[[scallop]]}} in both forms, and this will generate the entire content. Wyang (talk) 12:03, 8 April 2013 (UTC)
Wow, that script is powerful. I just created 拆開 and 拆开 in seconds. Wish someone had told me about that earlier. But is the IPA on those entries correct? It doesn't look right to me... ---> Tooironic (talk) 23:09, 8 April 2013 (UTC)

@Tooironic. Re: simplified/traditional separation. With Serbo-Croatian it's easier. The words in Cyrillic and Roman sort themselves differently automatically. As you know, the parameter "t" in {{cmn-noun}} is an indicator that the noun is traditional, "s" is simplified. They are automatically added to Category:Mandarin nouns in traditional script or Category:Mandarin nouns in simplified script or both if the value is "ts". A word, which is both simplified and traditional will appear in both categories but if you just want Category:Mandarin nouns they will appear in the alphabetical order - both forms. We could apply the same sorting for both traditional and simplified noun categories but abandon trad/simp approach for topical categories? What do you think?

In a nutshell - I don't suggest removing "t", "s" and "ts" params, so SoP will always be separated into trad/simp categories as parts of speech. I suggest sorting by numbered pinyin instead of radical + number of strokes, i.e. "biao1zhun3" ("pint" parameter) instead of "rs" - "木11" for both simplified and traditional entries and remove words traditional/simplified from topical categories.

--Anatoli ^{(обсудить}/^вклад) 13:15, 8 April 2013 (UTC)

I personally don't have any issues at finding the "rs" value, it only takes a few seconds longer to create a Mandarin entry and I have to open another tab. Don't get me wrong, guys. I am just worried that most templates we use for other languages don't work for Mandarin, like for example {{etyl}}. Japanese entries also use sorting parameters (hiragana) but it's more consistent. Consider entries like 傍晚. It's adding to Category:cmn:Elementary Mandarin using "人10" as "skey" and Category:cmn:Elementary Mandarin in simplified script using "bang4wan3" as the sorting key. Why is it not categorised as a traditional version? If we treat simplified and traditional categories equally (using one sorting key) and move all topic categories to match other languages, then it would be easier for everyone. Musical instruments categories - trad/simp and without suffix all seem independent from each other - these entries ended up belonging to three topic categories, obviously using whatever sort order.

Category:cmn:Capital cities in simplified script and Category:cmn:Capital cities in traditional script don't have a common supercategory, they go directly under generic Category:Capital cities. Whatever category you take, there are problems. I stopped categorising a while ago, except for HSK, which is still OK, sort of.

Allowing a bot to load rs value may not be such a bad thing but it's probably better to normalise categorise (make them similar to other languages - no trad/simp suffixes) and use numbered pinyin or radical sort (whatever we decide) but equally for both trad and simp entries. --Anatoli ^{(обсудить}/^вклад) 13:03, 8 April 2013 (UTC)

@Tooironic. I have modified your 屌絲 and created 屌丝. With my suggested way of categorising - # {{slang|vulgar|lang=cmn|skey=diao3si1}}. Now both entries appear in Category:Mandarin slang and Category:Mandarin vulgarities sorted by chai1kai1 (under letter "D") (note categories are without words "traditional"/"simplified".

They are still in Category:Mandarin nouns in traditional script and Category:Mandarin nouns in simplified script - not suggesting to change that but we could change the sorting of the traditional term to be the same as simplified (pinyin, not rs), if we are in agreement.

Please check whoever is interested, if this is worth attention. --Anatoli ^{(обсудить}/^вклад) 00:24, 9 April 2013 (UTC)

I don't have any problem with this. I've never liked the idea of separating categories based on script types, especially two that share some characters. I wasn't even aware that some traditional terms were sorted differently. If this goes ahead, you will get my support. Jamesjiao → ^{T ◊ C} 01:41, 9 April 2013 (UTC)

Great stuff. Will invite the creator - User:A-cai. I hope he will not be upset. We could still have some bots to do tricks with automatically adding rs values to Mandarin values, right?

Wyang, you expressed suggestions how to add rs automatically but have not expressed your opinion on categories and sorting. What do you say?

The hardest bit would be converting or automating this change but as I said, Mandarin topical categories are in a mess, anyway. --Anatoli ^{(обсудить}/^вклад) 01:48, 9 April 2013 (UTC)

I think simp/trad should be merged into one single category and sorted by pinyin. Adding the pinyins everywhere would be troublesome, but like I said I would prefer if all the templates in one language section are merged into one template {{language_name|..., with various things defined by various parameters, including definitions and context labels. But I can't see this being actualisable on Wiktionary any time soon, so... Wyang (talk) 04:24, 9 April 2013 (UTC)

Both entries 動能动能 belong to Category:cmn:Physics (not in Category:cmn:Physics in simplified script or Category:cmn:Physics in traditional script!) and are sorted by "dong4neng2", so appearing under letter "D", not under radical "力". If everyone is OK with this, I will update Wiktionary:About Sinitic languages. All entries in Mandarin categories with "...in simplified script" and "...in traditional script" should gradually be moved to categories without these suffixes, with the numbered pinyin sort order e.g. skey=dong4neng2 or just by adding |dong4neng2 in the category name, e.g. [[Category:cmn:Physics|dong4neng2]]

It's a lot of work and I am currently busy with other things but will get to this eventually.

Parts of speech categories remain as they are for now, with the traditional/simplified distinction. We could change the sorting key for traditional entries to use pint rather than rs but I don't how. Simplified entries are sorted by pinyin. --Anatoli ^{(обсудить}/^вклад) 00:17, 11 April 2013 (UTC)

[edit] En dash in {was wotd}?

Per user request at Template talk:was wotd#request to exchange hyphen for en dash, is it ok to change the hyphen "-" for an en dash "–" in {{was wotd}}?

This is a v. minor change, but it's highly visible, so I thought it best to ask.

—Nils von Barth (nbarth) (talk) 12:24, 8 April 2013 (UTC)

I support. Good thing you asked, as some editors seem to really hate the use of typographic characters instead of plain ASCII ones. — Ungoliant ^(Falai) 12:31, 8 April 2013 (UTC)

The stated justification is typographical correctness. Really? DCDuring TALK 21:45, 8 April 2013 (UTC)

I would like to know what kind of person (other than a trained Wikipedia pedant) actually writes Bose–Einstein condensate rather than Bose-Einstein condensate. Equinox ◑ 21:47, 8 April 2013 (UTC)

Writes? I don't think anyone uses a hyphen-minus in writing. People type it, but typesetters (who have nothing to do with Wikipedia) have always had to choose the correct dash-type character from the type tray or now character set. Pick up any properly typeset book, and you will find that Bose–Einstein condensate is typeset with an en dash.--Prosfilaes (talk) 01:35, 9 April 2013 (UTC)

I support.—msh210℠ (talk) 18:55, 9 April 2013 (UTC)

It's a weird world where we set type and publish it to the world on a typewriter keyboard. —Michael Z. 2013-04-09 02:58 z

DC, regarding typographical correctness: yes, an en dash is correct here, while a hyphen is incorrect; see hyphen and dash. The hyphen is reserved for intraword usage, such as line-wrapping and compounds (such as line-wrapping ;), while en dash is used in varied contexts, including interword use such as this. See Wikipedia:Manual of Style: Hyphens for usage at 'pedia.

Beyond correctness, there's also aesthetics – a hyphen jumps out at me here as conspicuously too short (it's sized for intraword use, and thus feels stubby surrounded by spaces), which is the standard typographical judgment.

The main objections to use of non-typewriter typographical characters I've heard are:

Rendering problems – non-ASCII characters render poorly on some computers, particularly older ones.
Input or editing difficulties – some editors have difficulty entering non-ASCII characters (due to needing to use a character picker) or editing entries with non-ASCII characters (esp. due to rendering issues).
Personal preference – some users prefer typewriter characters over book-style typographical characters.

Use of typewriter characters is naturally common online, due to ease of input, though we needn't be limited by it. In the case of templates (as opposed to use in entries), there aren't any editing difficulties, and we have lots of Unicode throughout Wiktionary, so I don't think there are significant problems, but want to check.

Sounds like people are generally supportive (or "meh"); will wait another few days for more comments.

—Nils von Barth (nbarth) (talk) 15:38, 9 April 2013 (UTC)

Just go ahead and change it. Why on earth start a discussion about using the correct character in a template, where it will never have to be re-entered?

Rendering problems – this keeps getting mentioned, but really? Give me a break! Netscape Navigator 4 had Unicode support. If you're reading a dictionary site with "over 500 languages" on a pre-1997 browser, maybe a dash out of place won't ruin your day. —Michael Z. 2013-04-09 21:51 z

There being no opposition, I have gone and dunnit. —Michael Z. 2013-04-09 22:02 z

Thanks Michael!

—Nils von Barth (nbarth) (talk) 14:35, 15 April 2013 (UTC)

[edit] Facebook

I set up this page on Facebook for promoting Wiktionary of all languages. You are welcome to become co-administrators of the page, so you can update the page with inspiring messages. --LA2 (talk) 20:55, 8 April 2013 (UTC)

Where do I apply? — Ungoliant ^(Falai) 21:29, 8 April 2013 (UTC)

I noisily hate social meeja and would prefer us to "promote" ourselves through just making a good dictionary that people want to use. But I suppose it can't hurt :) Equinox ◑ 21:34, 8 April 2013 (UTC)

I am boycotting Facebook, but why not promote Wiktionary there? DCDuring TALK 21:44, 8 April 2013 (UTC)

I'm using that page to pull people out of Facebook and into Wiktionary. Whether you boycott Facebook doesn't matter, since you are already here. However, if someone would like to help to pick a "word of the day" for the Facebook page, I think that could make the page quite popular. --LA2 (talk) 22:37, 8 April 2013 (UTC)

From what I know about Facebook, that's not going to happen. Facebook is all about making more Facebook... —CodeCa t 22:48, 8 April 2013 (UTC)

That's not the first page on Wiktionary in Facebook. Earlier this one was advertised. I liked both. Don't see why not. Would also be useful if we could recruit some native speakers and talented editors but promoting among users is also important, Wiktionary is for users, not for editors :) --Anatoli ^{(обсудить}/^вклад) 00:57, 9 April 2013 (UTC)

I really don't understand the amount of hate here for Facebook. Use it wisely and use it to your advantage. Don't post info that you don't want others to see.... Simple... I will take a look at the page on my home btw. I liken this attitude to the one on StackExchange towards Wiktionary. Take a look at this: How much should I trust Wiktionary?. I tried to defend Wiktionary and provide my own arguments (thanks Hippietrail for chiming in), but I can't change everyone's mind I guess. Jamesjiao → ^{T ◊ C} 01:56, 9 April 2013 (UTC)

It is correct that in a Beer parlour discussion in March 2012, the existing Facebook page was mentioned, but that is a placeholder page that Facebook created based on a Wikipedia entry. That page doesn't get updated and there is no way to claim it, it's a dead end. The page I created now has a dozen co-administrators that are able to update the page and appoint more co-administrators. It's an anarchy of the same kind as the Wikisource page on Facebook, that I set up last year. It gets updated sometimes, but not very often. Right now, the Wikisource page has 418 fans and Wiktionary has 69. --LA2 (talk) 13:58, 9 April 2013 (UTC)

69? That's a good position to be in. Mglovesfun (talk) 22:04, 9 April 2013 (UTC)

groans loudly* OK, seriously, Facebook pages have some sort of automated system in which you can write a bunch of posts and they'll come out on a schedule. Assuming somebody's willing to put some time in, we could easily have posts and to spare. —Μετάknowledge^{discuss/deeds} 15:11, 13 April 2013 (UTC)

WE could have a Facebook widget on our Front Page, that users could click on. I think the code is something like <a title="Tell Facebook" href="http://redirect.viglink.com?key=11fe087258b6fc0532a5ccfc924805c0&u=http%3A%2F%2Fwww.facebook.com%2Fsharer.php%3Fu%3Dhttp%3A%2F%2Fen.wiktionary.org%2F%3Bt%3DWiktionary">Facebook</a> SemperBlotto (talk) 15:20, 13 April 2013 (UTC)

[edit] Proposal of a pronunciation recording tool

Hello, Rahul21, a developer, offers to develop a pronunciation recording tool for Wiktionary, helped by Michael Dale as part of GSoC. The tool would allow to record and add audio pronunciations to Wiktionary entries while browsing them (see background discussion on Wiktionary-l). Please read and comment the proposal! Regards, Nemo 22:37, 9 April 2013 (UTC)

[edit] A slightly different way to show etymologies derived from Latin verbs

Romance languages use the infinitive as the lemma, but for Latin we use the 1st person singular present. This means we can't write "from Latin cantō" in any of the etymologies at cantar, because the infinitive derives from cantāre. Most entries solve this by just saying "from Latin cantāre, present active infinitive of cantō". But that is rather wordy, moreso than what's really needed to get the point across: the word cantar derives from cantāre, but its Latin lemma/paradigm entry is at cantō. For that reason I've started to use another approach, by writing {{term|canto|cantāre|lang=la}}. So it will show "cantāre", but link to canto. Since not many entries have this, I wondered if nobody had considered doing it that way yet, so I'm sharing the idea here. :) —CodeCa t 02:22, 11 April 2013 (UTC)

That's a good idea. — Ungoliant ^(Falai) 02:37, 11 April 2013 (UTC)

I'd done that and waited for someone to complain about it. The case that CodeCat mentions seems ideal for that approach. What about derivations from participle forms? DCDuring TALK 03:30, 11 April 2013 (UTC)

Participles are considered separate lemmas as far as I know. They have their own declension tables too. —CodeCa t 12:32, 11 April 2013 (UTC)

I've done this for months :) —Μετάknowledge^{discuss/deeds} 15:00, 13 April 2013 (UTC)

I just do "from Latin cantō", exactly as you say we "can't". (I guess I've found a way! :-P) The French verb chanter really does come from the Latin verb cantō, so it's straightforward and correct. It's only a problem when people try to gloss cantō as "I sing" (as though they were glossing the specific form) instead of the correct "to sing" (which is how we gloss verbs). —Ruakh_TALK 16:39, 14 April 2013 (UTC)

I like CodeCat's suggestion. Also, had I in the past noticed any entry glossing "canto" as "to sing" rather than "I sing", I would have changed it and (though this discussion informs me not to do so) I would have marked the edit as minor, assuming I was uncontroversially correcting a simple error by a random IP unfamiliar with Latin grammar. - -sche (discuss) 00:10, 15 April 2013 (UTC)

Another issue is the descendant section of Latin verbs. Should, say, video's descendants be linked to as {{l/pt|ver}} or {{l/pt|ver|vejo}}? — Ungoliant ^(Falai) 01:02, 15 April 2013 (UTC)

[edit] Appendix:1000 Japanese basic words

This may not be appropriate for the BP but since this is the most visible spot, I want to ask everyone their opinion about Appendix:1000 Japanese basic words and what to do with it. (I wrote something on the talk page too.) It's a good appendix now, but it's "1000 Japanese basic words" and the description is "This appendix is a specific list of one thousand basic words," and yet there are about 700 words in it.

Some background: I don't know the full story but as far as I can tell, in a nutshell, the Japanese Wiktionary was building the list ja:Wiktionary:日本語の基本語彙1000 some time ago, and the editors here decided to copy it. At the time the original list was incomplete. Since then, the original list has grown but en.WT's list has not been maintained. Now, ja.WT's list has surpassed 1000 words and their list says "作業中現在:989項目 2008年11月16日一旦、1,000以上挙げ、その後取捨選択するなり基本語彙2,000にタイトルを変更するなりする方針としたいと思います。" which means that their list broke 1000 entries and that they are considering changing the name to 2000 basic words.

We can go two routes: depart from ja.WT and keep it a list of 1000 basic words, or mirror their version, and exceed 1000 words in the process.

I don't have exact numbers, but if you search for "Japanese word list" on Google, our appendix is the first result. That suggests to me that the wider world is making use of it as a resource. While ja.WT's version is good, it lacks essential words such as 可愛い (kawaii), いっぱい (ippai, "very"), たくさん (takusan, "many"), or すごい　(sugoi, "very/wow!".) You can't have a 30-second conversation with high school students without using those words. Conversely, ja.WT's appendix has quite specific words such as ミミズ (mimizu, "earthworm") and 十二指腸 (jūnishichō, duodenum). Duodenum is a basic word?

How about both routes? I would like to combine the most basic of the "basic" words and the Japanese Language Proficiency Test Level 5 appendix (the lowest level) for a "1000 basic Japanese words" appendix, and maybe mirror ja.WT's appendix on a different page. --Haplology (talk) 05:02, 12 April 2013 (UTC)

Your last paragraph sounds eminently reasonable, and I fully support that method (although I think perhaps mirroring ja.wikt's appendix is less important, because it would appear that we are a better arbiter of basicness than they are). —Μετάknowledge^{discuss/deeds} 14:59, 13 April 2013 (UTC)

This appendix is not a very scientific one and was made by amateurs. It's worth adding words to make a thousand, choosing carefully from JLPT or frequency list and/or removing that are identified as not being basic.

The valuable time could be spent on making Appendix:JLPT better - fixing the word format and choosing the spelling we actually have here, e.g. we have 上がる but not 上る, or create the alternative spellings.

JLPT appendices could be made similar to Appendix:HSK list of Mandarin words with new categories like Category:JLPT/N5 Category:ja:JLPT-5 or similar. --Anatoli ^{(обсудить}/^вклад) 01:53, 15 April 2013 (UTC)

I'm glad we all agree. I've been adding common words from the N5 list to the category, and once the category reaches 1000 items, I plan to add them to the appendix and add the sort keys to the categories. I've been though the whole N5 list once and added common words at my discretion (but not all of them,) and there are now almost 900 words in the category. I plan to go through N5 again, and also look at the N4 list and try to find any other essential words that may have been missed. The original list is biased toward nouns, so other parts of speech would be good places to look for new candidates. It also ignores casual words like ちゃう, which is also essential to high school students, or pretty much anybody. To anyone who is so inclined, if you see anything that strikes you as essential in the real world, then please add it. --Haplology (talk) 05:42, 17 April 2013 (UTC)

I have just created new categories. What I meant is something like this: Category:Japanese by difficulty level with five categories. I only added two words as examples: 会う to level 5 (Category:ja:JLPT-5) and 安心 to level 4 (Category:ja:JLPT-4). The actual names of categories and templates, format and links can be discussed. The HSK categories provide a bit more info and look better. Please take a look. --Anatoli ^{(обсудить}/^вклад) 06:09, 17 April 2013 (UTC)

Sure, that sounds good. I just have a few questions. So basically this means completing the JLPT appendices project, as well as the 1000 basic words project, and having both exist in parallel? That's what I would hope for, as both projects have already been made, and they serve slightly different purposes. I assume that no new words would be added to the JLPT categories, only the ones already on the appendices? In the process of reviewing the appendices, it sounds like you want some revision to be done to them, such as adding more common forms like 上がる rather than 上る. I agree with that. I just changed "掃除　そうじする to clean" to "掃除そうじ cleaning", but perhaps "掃除するそうじする to clean" would be better, and have that link to 掃除? I think there is also 近く, so what should be done with that? In the past there was some opposition to creating pages like　近く, but I think there's precedent for pages like that in other languages and there's no policy against them. It's mainly just that the Japanese editors have enough work with lemmas, and if there are going to be forms like　近く with their own entries, I'd rather a bot add them. The L5 appendix was a bit slow to edit, but did not time out or have any problems like that, so I guess there's no need to break it up like L1 (which was too much for the server to display.) What do you think about breaking up appendices? --Haplology (talk) 04:28, 18 April 2013 (UTC)

Yes, I think both templates and category groups could easily coexist.

する-verbs, I'd link to lemma but display lemma + する because they are verbs. Having "掃除 to clean" would look weird because 掃除 is a noun. I have adopted this for translations. Same thing for な-adjectives.

Cleaning sounds good but I don't know if JLPT would prescribe 上る for the tests, not 上がる. JLPT is a bit more strict in nature than 1000 basic words but I have no idea who made original lists, how accurate and up-to-date they are. Should students for level 5 know both forms? We can always have simple entries with links to main entries, even skipping conjugations, etc. to save time. What do you think?

No strong opinion on 近く but since く-adverbs are simple in structure, I don't see why we should discourage them, also for the sake of back translations from English. No need to create them, if a bot could do it but I wouldn't delete if they exist.

Breaking up appendices - OK. You already did one. --Anatoli ^{(обсудить}/^вклад) 04:53, 18 April 2013 (UTC)

A lot of editors are used to typing <tt> to make things look typewritery. In HTML5, tt is "entirely obsolete, and must not be used by authors."[2] The W3C suggests:

Where the tt element would have been used for marking up keyboard input, consider the kbd element; for variables, consider the var element; for computer code, consider the code element; and for computer output, consider the samp element.

It looks to me like code is a good general replacement. More specific semantics can be conveyed with samp, kbd, and var. Continuing to use tt in discussions won't break anything, but we should replace it in templates and entries, so we don't have to endure the shame of unnecessary validation errors after the MediaWiki software is brought up to par. —Michael Z. 2013-04-12 17:51 z

By the way, also gone the way of the rotary dial are acronym, big, center, font, strike, and u, and all of those styling attributes on table elements. —Michael Z. 2013-04-12 17:59 z

What does "obsolete" mean in HTML-world? I went to an HTML class today, and we were using some of these (well, definitely font) without any indication that they could ever be a problem. —Μετάknowledge^{discuss/deeds} 03:39, 14 April 2013 (UTC)

Font? Ouch – I should have a word with your teacher.

During the 1990s' browser wars, every browser was making up new features and displaying them differently, and web development was a fragmented nightmare. Since then, the W3C approves the official open standards that make up the web based on feedback from browser developers, and we can mostly write HTML for one standard instead of for five current and twenty-seven past browsers (but don't get me started on MSIE 6). The wide adoption of CSS, which allows for the separation of presentation from document structure, has led to newer versions of HTML deprecating and obsoleting purely presentational elements.[3] Unfortunately, the nature of wikitext encourages editors to include lots of presentation guff repeated many times in every page, but this is bad practice because it bloats pages and makes maintenance difficult. Like templates, style sheets let us centralize presentation and reduce page bloat.</pedantry>

Browsers are built for backwards-compatibility, so most of the old elements will still work. But as an organization for openness, we should follow the recommendations of current open standards, and certainly abandon practices deprecated in the last century.

Specifically, HTML 4.01 (1999) deprecated center, font, s, strike, and u, and others.[4] HTML5, which MediaWiki is now specifying in the doctype at the top of every HTML page, has obsoleted these and other elements and attributes,[5] and redefined some others.[6] —Michael Z. 2013-04-14 15:28 z

Thanks for that explanation. Specifically, my teacher recommended using CSS (which I'm learning now), but said that for basic formatting, just using the HTML tags is fine (although it may not be much faster than inline CSS). I agree with replacing them in templates but not giving a damn on discussion pages. —Μετάknowledge^{discuss/deeds} 04:44, 15 April 2013 (UTC)

Agreed, in principle. But I suggest you keep in the mindset that you are structuring HTML, not formatting as one does in MS Word, and the presentation is created by the browser's or website's default style sheet. —Michael Z. 2013-04-15 14:53 z

[edit] 100 million edits

According to our sources, the 100 millionth edit was made to Wiktionary (all languages taken together, humans and bots included) during Friday April 12. Congratulations to us all! About 20% of the edits have gone into the English Wiktionary. --LA2 (talk) 02:05, 13 April 2013 (UTC)

I wonder which was the 100 millionth edit. — Ungoliant ^(Falai) 03:20, 13 April 2013 (UTC)

Probably me changing a shitty em dash to a beautifully appropriate en dash. —Michael Z. 2013-04-13 06:15 z

In the vote for creating the FWOTD feature, the points "eligibility of reconstructed languages" and "eligibility of constructed languages" didn't achieve consensus (except conlangs which don't meet CFI, which failed) by the end of the vote.

Also, we've had a few of people complain about the name "Foreign word of the day," so if anyone wants to suggest a change feel free to do so.

Summarising, I'm consulting the community on:

whether terms in reconstructed languages (Proto-Indo-European, Vulgar Latin, Proto-Germanic, etc.) should be allowed to be foreign words of the day;
whether terms in constructed languages that meet CFI (Esperanto, Ido, Lojban, etc.) should be allowed to be foreign words of the day;
whether the feature's name should be changed.

— Ungoliant ^(Falai) 14:18, 13 April 2013 (UTC)

I support the eligibility of reconstructed languages, because they are some of our most interesting content. Naturally, for reconstructed terms we shouldn't require pronunciation and should require a reference from a trustworthy source instead of citations.

I support the eligibility of constructed languages that meet CFI. Don't see why not.

I oppose changing the name. I don't find it offensive in any way whatsoever.

— Ungoliant ^(Falai) 14:18, 13 April 2013 (UTC)

I support the first two, and I kind of oppose the third because I don't see anything wrong with the current name. In Dutch, there is a nice word anderstalig, but I don't know if English has an equivalent word. Maybe that would be a good word to feature? :) —CodeCa t 14:43, 13 April 2013 (UTC)

I oppose the eligibility of reconstructed languages since they are by definition uncitable. That's why they're not in mainspace, too. I support the eligibility of constructed languages that meet CFI. I abstain on the issue of the name; I don't understand what could be offensive about it, though I can see it might be misleading, but I can't think of a better name besides "non-English word of the day" which sounds dumb. Incidentally, although you didn't ask, I also oppose allowing mentions rather than uses to count as cites in FWOTD nominations. I know that mentions are good enough for RFV when it comes to LDLs, but I think FWOTD ought to have higher standards than RFV/CFI. Note that FWOTD already requires pronunciations, even though nothing at CFI requires them. —Angr 14:51, 13 April 2013 (UTC)
- While I sympathise with your point, this would make it much harder to feature words from languages without contributors who speak them, like Kaingang and Quechua, and it's already Indo-European dominated enough as it is. — Ungoliant ^(Falai) 15:16, 13 April 2013 (UTC)
  - The trouble with allowing a single mention is that there's no protection against errors. If the single source we use for Kaingang or Quechua has a fictitious entry (whether deliberate or accidental) or even just a typo, then we are at risk of propagating that error if we don't confirm it elsewhere. Bad enough when that happens in any entry, but worse when it happens in an entry being featured on the main page. —An gr 17:08, 13 April 2013 (UTC)

I vote per Ungoliant, although I also support the eligibility of terms in conlangs, which Ungoliant took no stance on. —Μετάknowledge^{discuss/deeds} 14:54, 13 April 2013 (UTC)

I did. — Ungoliant ^(Falai) 15:16, 13 April 2013 (UTC)

Sorry. Rectified above. —Μετάknowledge^{discuss/deeds} 03:37, 14 April 2013 (UTC)

I vote per Angr. I'm undecided on whether the name needs to change; we don't have a great alternative, but I do understand why people might want to change it.--Prosfilaes (talk) 19:56, 13 April 2013 (UTC)

Not much point in voting against a title if there is no clear proposal for a replacement.

What exactly were the complaints against "foreign?" It's not exactly offensive, but kind of ignorant when it's a minority of English speakers who live in countries where other languages are truly foreign. Calling French a foreign language in Canada, for example, is incorrect and at least off-putting to a francophone Quebecker who accepts his or her first or only language for granted as native.

What alternatives are there?

foreign-language word of the day
non-English word of the day
other-language word of the day
alterlingual word of the day (is there a real Latinate word?)
alloglossal word of the day (ditto Greek?)
interlingual word of the day
international word of the day
global word of the day
world word of the day
exotic word of the day
other word of the day

—Michael Z. 2013-04-13 19:21 z[updated list —Michael Z. 2013-04-14 14:40 z]

But suppose you are an anglophone Canadian who learned French. If someone asks you "do you speak any foreign language?", isn't "French" a correct answer? — Ungoliant ^(Falai) 19:45, 13 April 2013 (UTC)

No? I would regard it as sloppy usage of the word "foreign" = from a different country. In any case, suppose you are a francophone Frenchman; why would French be foreign?--Prosfilaes (talk) 19:50, 13 April 2013 (UTC)

Well, foreign also means "from a different language," and many Canadians live with only one of the official languages, which is why such misunderstandings can happen.

If you are from France though, wouldn't you understand what a "foreign word" is in the English-language Wiktionary? So, can anyone link to some complaints about the title, so we can replace speculation with evidence? —Michael Z. 2013-04-14 01:35 z

[7], [8], [9], possibly [10]. — Ungoliant ^(Falai) 01:46, 14 April 2013 (UTC)

"From a different language" is not listed as a definition at foreign, and it doesn't sit right with me when ASL or Native American languages get lumped in as foreign languages, though the lack of a better term often means they do. At Distributed Proofreaders, we got in a habit of using "languages other than English (LOTE)", precisely because they weren't foreign to our site or users.--Prosfilaes (talk) 10:40, 14 April 2013 (UTC)

The readers' feedback is convincing. I support changing the name FWOTD to anything else. —Michael Z. 2013-04-14 14:40 z

Of the four feedback comments linked to above, three explicitly recommend "non-English", so if we're going to discuss a new name, I guess that's the primary contender. —An gr 16:21, 14 April 2013 (UTC)

Like you, I think it sounds dumb. The best of Mzajac's suggestions is "foreign-language word of the day," though it might still offend people and I still oppose change. — Ungoliant ^(Falai) 17:34, 14 April 2013 (UTC)

Can we agree to enhance the name by moving WT:Foreign Word of the Day to WT:Foreign-Language Word of the Day? We can get used to that in a month or two and see if it still raises readers' ire. And reconsider renaming if it appears warranted later? —Michael Z. 2013-04-14 17:47 z

I seriously doubt anyone who objects to "Foreign Word of the Day" will be content with "Foreign-Language Word of the Day". —An gr 19:46, 14 April 2013 (UTC)

I support "non-English"; "foreign-language" strikes me as having pretty much all the problems "foreign" does.--Prosfilaes (talk) 07:10, 16 April 2013 (UTC)

Regarding the entry foreign, defintion 2, example "eating with chopsticks was a foreign concept to him": Certainly, this use of "foreign" is not restricted to other cultures? Things can be "a foreign concept" to a person that has never met that idea before. I think good synonyms are "unfamiliar, unknown, strange", and that these should be added to the explanation. But English is not my native tongue. --LA2 (talk) 23:43, 14 April 2013 (UTC)

[edit] Increasing default font-size

I proposed this a couple of weeks ago, and had little feedback. Not sure if everyone doesn't care or just didn't notice. So I'm posting this reminder, and will change the site's default font-size, shortly. —Michael Z. 2013-04-14 02:02 z

It looks perfectly readable to me so I see no reason to change it. Why do you think it's too small? —CodeCa t 03:04, 14 April 2013 (UTC)

As I wrote in the original post, editors have used Common.css to enlarge the font for 54 languages and scripts, affecting thousands of entries. The discrepancies bug me. —Michael Z. 2013-04-14 05:14 z

It is odd to me that the existing "default" font size for the site would not be the default for the user's browser, i.e. not medium. But Web designers seem to work upon contrarian principles of their own. Bigger is fine by me, but I hope it can be set to browser default rather than a hard-coded "what looks good on this year's monitors". Equinox ◑ 03:32, 14 April 2013 (UTC)

I did the math. Browser default.. For a preview, copy the first bits from my vector.css. —Michael Z. 2013-04-14 05:19 z

I will update MediaWiki:Vector.css within the hour. Complaints welcome. —Michael Z. 2013-04-14 15:34 z

Done.[11] Force-reload to update the style sheet immediately. —Michael Z. 2013-04-14 15:46 z

I rolled back your edit; it looks terrible to me and there was not sufficient consensus IMO. —Μετάknowledge^{discuss/deeds} 15:54, 14 April 2013 (UTC)

I thought two BP discussions with no opposition would constitute consensus to try out a harmless improvement. Your single subjective opinion after a one-minute look at a major visual change doesn't constitute any consensus or evidence either. Thanks for speaking for everybody. —Michael Z. 2013-04-14 16:18 z

I'll be opting out anyway. I didn't like it at all. Mglovesfun (talk) 16:00, 14 April 2013 (UTC)

Could someone actually respond to the evidence I have cited, instead of blowing away a major change based on "I don't like it," without even using it? —Michael Z. 2013-04-14 16:19 z

Sorry, I don't see anything that I would call "evidence". In the previous discussion you gave a list of putative advantages, but seemingly no "evidence" for them. (Perhaps you and I define the term differently?) At any rate, if you want people to reply to something specific, please indicate what. In particular, if you could highlight some part of your argument that would justify increasing the font size even if no one liked the result, that would certainly be interesting! —Ruakh_TALK 16:45, 14 April 2013 (UTC)

The biggest objective evidence that our font-size is small is that other editors have been increasing it, to the tune of over 50 CSS declarations in our style sheet, the majority setting the font-size to the browser default. No one has mentioned any disadvantage of setting the font-size to the browser default.

I've put in significant time doing research and testing, tried to outline my reasoning, and did my best to get feedback. Not one objection was made. Now, could someone here at least do me the courtesy of actually trying to use this for an hour or a day, instead of taking one glance, blurting out "I don't like it" because it is different, and blowing off my effort completely? —Michael Z. 2013-04-14 17:00 z

Re: "could someone here at least do me the courtesy of actually trying to use this for an hour or a day": http://en.wiktionary.org/wiki/User:Ruakh/common.css?diff=20160735. —Ruakh_TALK 17:23, 14 April 2013 (UTC)

Thank you for that. Sorry to get cranky. I included a list of what I see as concrete advantages in my original proposal. I think things can be improved, and I would appreciate critical feedback. —Michael Z. 2013-04-14 17:50 z

I've been trying out the larger size for the past several days. While it's more legible, there are other drawbacks. While this larger size may correspond to the "de jure" default browser size, it doesn't correspond to the "de facto" default size for web pages. Almost every other text-based website I look at has smaller text, much closer to the "traditional" Vector size. People get used to one font size on webpages and when they encounter something noticeably smaller or (as in the proposed new Vector size) much larger, it looks absurd. And more urgently, if we change the default Vector size here at English Wiktionary we're out of sync with every other Wikimedia project's Vector skin. I know that perfect unity isn't possible across languages, but at least every English-language project's Vector should look like every other English-language project's Vector. If I'm looking at Wikipedia, then at Wikisource, then at Commons, and then at Wiktionary, it's startling when Wiktionary's text is so much larger than every one else's. And if I didn't know that it's that way because I deliberately set it that way on my own CSS page, I would be baffled and put off by it. —An gr 21:31, 17 April 2013 (UTC)

Some good points I hadn't considered in detail.

WikiMedia branding. Indeed, most WikiMedia projects use 13px font size. I see that zh and ja Wiktionaries use 15px, Arabic, Pashto and Farsi 14px. However, explicit branding elements in the other projects vary a lot. Among Wiktionaries, even the site logos (!), home-page layout, use of tone and colour, icons, etc., vary wildly. The only thing all these sites have in common is the basic MediaWiki interface with grey and white background and blue rules. Also, the favicon is identical on all but cs. and en.Wiktionary. Choosing font-size for branding over readability would be poor prioritizing, when it would make an insignificant difference in the visual identity, but potentially a large one in readability. If we value our uniform branding at all, why don't we coordinate site design, or unify even the most basic branding elements before compromising readability?
The appearance of credibility. It's true that 13px may be the the most popular font-size,[12] but that isn't a "de facto default" in any sense I can think of, nor does being widely used make it the best choice for anything specific.[13] A website doesn't look smart or credible by picking the most popular font size for no other reason. It does it by considering the factors that font size affects, and choosing an appropriate size for the particular site. Increasing font size for over 50 languages while sticking to a13px default looks "absurd" to me.
Readability. As you say, a larger font than 13px is more legible. This is particularly true on both the extra-small and extra-large screens that more readers are using these days. Still more so for many of the language scripts we use, as we have concretely demonstrated in our style sheet.

I will also add that we have to overcome serious readability problems inherent in Vector, like the fact that text columns can be ridiculously long for readers who do not resize their window.[14] And 13px is not our smallest font size. —Michael Z. 2013-04-21 01:14 z
Accessibility. Overlaps with the above, but it should be mentioned that many of the designers of the average 13px websites have good eyes, good displays, and are poorly schooled in accessibility and internationalization. Many of these "average" websites are aimed at youthful or moneyed markets. Ours is the broadest possible audience, including non-native readers, aging, vision-impaired, impoverished, having only mobile internet access, etc. Failing to optimize readability harms segments of our audience that many other websites ignore.

I still think any disadvantages of increasing font size are minor at worst, and far outweighed by the concrete benefits. —Michael Z. 2013-04-21 00:52 z

[edit] Tracking category for missing inflected forms

Feel free to let me know if there is a better way of doing this already in place, but an idea struck me recently upon seeing red links in inflection lines. I think that we should have a system to track these links, since they are either valid missing entries for inflected forms of lemma entries or incorrect inflections being displayed on entries (for example, words lacking plurals or different feminine forms where the editor has not changed the template's default behavior). In both cases, they should be actively dealt with, either by creating pages for missing inflected forms or correcting the inflection templates. This seems like low-hanging fruit, since it is simple work and a motivated editor could do dozens of these in a sitting, or far more with acceleration. It would be relatively simple to use the ifexist parser function so that pages with red links in their inflection templates are put in a maintenance category recording that, so that editors can come along and address them.

As an example of what I am talking about, I made an edit to {{es-adj}}, so that it now puts entries with red-linked feminine singular forms in inflection templates into Category:Missing Spanish feminine adjectives. Have a look at that category to see what I mean. There are 884 of these (as of now) being detected, which means potentially 884 missing entries just in looking at Spanish singular feminine adjective forms alone. Ideally, I think this kind of category could be useful across all of the inflection templates and all of the inflected forms they output, but I wanted to raise the idea here for comment. We may want to have broader categories than the "Missing Spanish feminine adjectives" one I created; maybe all entries with missing inflected forms should go in a single big maintenance category. Is this a useful idea? Dominic·t 07:02, 16 April 2013 (UTC)

There is one major difficulty with that. To check whether a given page exists is considered "expensive" by the MediaWiki software, and we're limited to about 100 of those checks per page. Once a page reaches that limit, any remaining checks will return "does not exist". So, we can't use this too much on pages because there is a danger that it will break the page if overused. —CodeCa t 12:43, 16 April 2013 (UTC)

Agreed. This is a bot job; we just need to convince somebody like SB to take it on. —Μετάknowledge^{discuss/deeds} 13:55, 16 April 2013 (UTC)

I would be more afraid of false positives from people not changing the inflection template defaults if we just created them all at once than I would be of pages which will hit the limit of parser function checks from adding this new one. Do we have any reason to think there would be many, or any, pages that would break? I am fairly sure the limit is actually 500 calls, not 100. That's a lot of inflection templates for one page. Also, once the limit is reached, it does not make the functions return false, creating false positives. It actually just refuses to expand the templates after the limit. Dominic·t 14:53, 16 April 2013 (UTC)

Can somebody delete 암글--wasn't sure how/where to ask? King jakob c 2 (talk) 20:47, 16 April 2013 (UTC)

Done. Thanks. Adding the {{delete}} template is enough. — Ungoliant ^(Falai) 20:58, 16 April 2013 (UTC)

[edit] Template term and lang parameter

I oppose template {{term}} requiring the "lang=" parameter, showing "???" before the term if the lang parameter is not provided. This change seems to have been introduced to the template today or yesterday by CodeCat (talk • contribs). An example of use of template "term" without lang parameter: physics. --Dan Polansky (talk) 08:07, 20 April 2013 (UTC)

Something like this seems to have been discussed at Template_talk:term#lang. People should not use such obscure pages to discuss significant changes! --Dan Polansky (talk) 08:09, 20 April 2013 (UTC)

I feel the same way. Ƿidsiþ 08:52, 20 April 2013 (UTC)

Why do you oppose it exactly? Not specifying the language leaves many problems: the link does not link to the correct section, the script template is not applied, and the word is marked in HTML as English (which creates usability problems). I wonder what justification there can be for ignoring those problems. —CodeCa t 12:33, 20 April 2013 (UTC)

This change breaks many, many, many discussion pages. -- Liliana • 12:36, 20 April 2013 (UTC)

I don't think displaying a small notification really breaks anything. It's just a friendly reminder that something is missing and needs to be corrected. I don't know how to make it less obvious without making it so unobvious that nobody sees it. —CodeCa t 12:39, 20 April 2013 (UTC)

Others' posts should never be edited, even in case of incorrect syntax and such. At best, this should be restricted to the main namespace. -- Liliana • 12:41, 20 April 2013 (UTC)

We've edited or broken people's posts in the past. Whenever a template is deleted, if that template is used in a past post, deleting it will break the page, but we do it anyway. In some cases we've replaced the template with an equivalent, but in other cases the pages remain broken. For example look at the transclusions of {{hr}}; some were replaced by "sh" but some still remain. Similar with {{zh}}. This isn't really any different. We can't always guarantee backwards compatibility, and indeed we shouldn't try to go too far out of our way for it. —CodeCa t 12:52, 20 April 2013 (UTC)

@CodeCat: Naturally, I am not opposing using "lang=" for non-English languages to add script, and whatnot. I am opposing making "lang=en" mandatory for English. What you wrote does not seem to apply to English terms without lang=: "the link does not link to the correct section, the script template is not applied, and the word is marked in HTML as English". What I am saying is, if there is no lang=, let "term" template assume the term is English, as it did before your edits. --Dan Polansky (talk) 13:09, 20 April 2013 (UTC)

I think you are a bit mistaken. It always has been mandatory, because specifying lang=en has never, in the history of the template, been equivalent to specifying no language. So it never assumed that the term is English, not before my edits and not after them. That is one of the biggest flaws in this template in particular, which others (which do default to English) never had because they were created properly from the beginning. The result is that we now have thousands of entries that use this template both for English and for many other languages, without specifying which. Simply changing the template so that English is the default is therefore not an option, because it would not be correct for the many thousands of non-English words that lack a language. The only option that I know of is to mark lack of a language as an error so that it be corrected. I am currently running a bot to correct some of the most obvious ones (uses where the {{term}} template is preceded by {{etyl}}, which allows the bot to figure out the correct language), but there are still many many more that need to be fixed. —CodeCa t 13:19, 20 April 2013 (UTC)

Re: "It always has been mandatory, ...": That seems incorrect. If lang= really were mandatory, the template would complain of a missing parameter. The parameter could only have been "mandatory" in a sense that I do not know. --Dan Polansky (talk) 13:24, 20 April 2013 (UTC)

What I meant is that the template doesn't do what it should do if the language is left out. The correct behaviour, when lang=en is given, is to use Latn as the script, "en" as the language, and link to the English section. But when no language is given, it uses None as the script, "" as the language, and links to no section. Therefore, to correctly link to English terms, the language is mandatory. —CodeCa t 13:29, 20 April 2013 (UTC)

All of that is irrelevant. This is one of our most heavily-used templates, especially by our less-template-sophisticated editors. Changes that significantly affect its behavior should be discussed thoroughly in an appropriate venue before being implemented. Most of the people who use it aren't going to have a clue what the ??? means, and a good many won't know where to go to find out. There should have been some steps taken to educate people before implementing it. Chuck Entz (talk) 16:24, 20 April 2013 (UTC)

There is a help message when you hover the cursor over it. That may not be entirely obvious, but actually writing the message out would look really bad and would have made even more people angry. The real "education" has been in {{term}}'s documentation, which I presume is the proper place to put it. —CodeCa t 17:03, 20 April 2013 (UTC)

Support the change.

No one has changed any discussion pages, but if you want your talk posts to continue looking the same, don't leave live templates in them. Use subst. —Michael Z. 2013-04-20 23:24 z

Totally support making lang= obligatory, but we should wait until the bot run is over before displaying the ???s, and not display them at all outside the content namespaces. — Ungoliant ^(Falai) 23:53, 20 April 2013 (UTC)

The bot doesn't really have anything to do with the ??? either, the bot works from a category that can be added or removed independent of the question marks. But from the way the bot is running now, it's not really making a serious dent in the amount of pages. It is making the occasional change but it's skipping most of the pages in the list without doing anything (because it sees no change it can make). There were around 45 thousand pages in the list when it started, and I expect it won't be able to get rid of more than a few thousand of them currently; it's at 41 thousand now. —CodeCa t 00:14, 21 April 2013 (UTC)

But in this revision of warlock the term lie has ???s, and after the bot edit it doesn't. — Ungoliant ^(Falai) 01:01, 21 April 2013 (UTC)

That's true, but that's only because the bot has changed something that happened to both remove the ??? and remove it from the category. What I am saying is, the bot works from the category, and the ??? doesn't influence that. If we removed the ??? the category would still be there, and we could also put in ??? and remove the category. —CodeCa t 01:06, 21 April 2013 (UTC)

But the bot does influence the ???s. What I was saying is that we should wait for the bot run to be over before displaying them, because there would be no benefit displaying something that makes our entries look bugged when it's going to be automatically fixed soon enough. But I changed my mind, since the bot isn't going to make a serious dent (unfortunately). — Ungoliant ^(Falai) 01:18, 21 April 2013 (UTC)

Support considering lang= obligatory (meaning only that it must be present: I think it's fine for it to be explicitly blank, but English should be lang=en), probably oppose whatever "bot run" Ungoliant and CodeCat are referring to (it doesn't seem like it was ever discussed or approved?), weakly support some sort of visual indication of missing lang= once that's rare (though I'd strongly support such a visual indication if it were visible only to admins and opters-in), and oppose distinguishing content namespaces from non-content namespaces in this respect, since that will just make it harder for editors to learn what they're supposed to be doing. —Ruakh_TALK 00:30, 21 April 2013 (UTC)

The bot run is adding lang= to uses of {{term}} in etymologies where it can use a preceding {{etyl}} template to determine the correct language. Basically, it's replacing {{etyl|xx|yy}} {{term|word}} with {{etyl|xx|yy}} {{term|word|lang=xx}}. It didn't seem like a very controversial change. —CodeCa t 00:35, 21 April 2013 (UTC)

Ah, O.K., that's fine, then. :-) (I mean, I still think it should have been proposed in the BP first. But I agree with you that it probably wouldn't be controversial.) —Ruakh_TALK 02:29, 21 April 2013 (UTC)

Perhaps we should have a class of error messages that are hidden from readers but displayed for all logged-in editors. —Michael Z. 2013-04-21 01:18 z

That might both be a good idea and a detrimental one. {{nl-noun}} shows "error" messages when some of its parameters are missing, and calls on the viewer to provide them. Since those messages were added to the template, I have seen quite a lot of editors - IPs, newly registered and experienced alike - take the messages to heart and provide the forms. We even have an editor, User:DrJos, who registered specifically to provide the forms and has now made it his life's work to fix them all. :) So I would say that's first-hand evidence that this kind of notice not only works, but it even gets IPs to lend a hand. So if we decide to hide these requests from IPs, we will be losing some of the editors who might help out. —CodeCa t 01:26, 21 April 2013 (UTC)

Don't forget that we also serve a lot of site visitors who don't edit and have no idea what "lang=" is. Why should some 10-year-old doing his or her homework have part of the content replaced by ??? so you can send a wake-up call to someone else? Are we the "dictionary that anyone can edit", or "the dictionary that everyone has to edit"? Chuck Entz (talk) 02:49, 21 April 2013 (UTC)

Re: " […] part of the content replaced by ??? […] ": That's a straw man, since the version with ??? still has all the same content. (The ??? appears before the term, not instead of the term.) Maybe you meant to say that the 10-year-old would think that the ??? had replaced actual content? —Ruakh_TALK 03:43, 21 April 2013 (UTC)

My mistake. I had already forgotten what the actual effect was, having only seen it on one page. Although I obviously overstated the effect, it still seems a bit much to clutter the main body of the text used by non-editors with stuff aimed strictly at editors. It might indeed cause concern among non-editors that something was broken that they didn't know how to fix.

I'm not opposing the eventual implementation of such a change, just the massive scale of the change combined with the lack of effort taken to get consensus and to get feedback about what effect it might have, let alone to prepare people for it. Something that noticeably changes the appearance of a significant percentage of our millions of entries should require more than a general mention of the principle behind it here and there, followed by a discussion on the template talk page that only a very few would even know about. Chuck Entz (talk) 04:15, 21 April 2013 (UTC)

I have rolled back CodeCat's edits to {{term}} because currently, it seems that only CodeCat and Michael support the ???s, whereas Dan, Widsith, Liliana, Chuck, and I oppose some aspect of CodeCat's edits altogether, and Ruakh and Ungoliant do not support putting in the ???s until non-lang-specified uses become much more rare. That's only 22% of editors in support so far. This is why we need to have BP discussions before making sweeping changes to the interface as readers view it. —Μετάknowledge^{discuss/deeds} 02:03, 21 April 2013 (UTC)

Not read the whole discussion, but do we need '???'? Is there any way of making these stick out less like a sore thumb, this is a dictionary after all, readers come here for lexical information, not to correct wiki syntax. PS there is a line in User:Mglovesfun/vector.js that converts {{term|foo}} into {{term|foo|lang=en}}. Mglovesfun (talk) 09:45, 21 April 2013 (UTC)

I think that so far, the majority of people in this discussion agree that it's a good idea to make sure {{term}} always has a language code. But that immediately brings up the question, how do we get there? Even if people want to add a language where it's missing, how can they do it? The reason why I added ??? was that it would make it obvious to editors that something needs fixing there. Making the problem visible and apparent is the first step towards fixing it, and that has been a real problem before. I also argued that showing a similar message on {{nl-noun}} has indeed helped to make the problem visible and therefore has led to more people fixing it. The bot I am running is helping, but it can only do so much; it has almost passed over all entries with a missing language but it has only managed to fix about 10% of the total (from 45 thousand to 40 thousand). A bot could never fix the majority of the entries that remain. So I suppose the real goal of this discussion is: if at least some of us agree that adding a language in all cases is a good thing, what can we do to make that happen and make it happen more quickly? If adding ??? to the entry is not the right way, then what is? —CodeCa t 12:06, 21 April 2013 (UTC)

If there isn't a bot solution for the remaining 90%, then I guess we'll just have to use MG's JS (or a modified form of it) on every page we're already editing. The reason why the ???s don't work is that instead of solving the problem, they create a new one. It looks messy and unprofessional, and users have to go for an unintuitive tooltip to find what's gone wrong. (Don't get me wrong, I love xkcd, but tooltips are not what people try first upon seeing a cryptic message.) This is not an acute crisis, so if a chronic solution is the best we have, so be it. —Μετάknowledge^{discuss/deeds} 14:39, 21 April 2013 (UTC)

What exactly does the script do? Blindly adding lang=en is not correct... if it were, we probably would have done that already. I think there is one approach that we could try in the long term. If we could weed out all the uses that are not English (which are presumably a minority) then it becomes more feasible to add lang=en to the remainder. Using Lua, we might be able to recognise some of the languages, and we can use other means as well. For example, anything with {{polytonic}} as the script is bound to be Ancient Greek (and that template even sets lang="grc" if nothing is provided), so adding lang=grc whenever sc=polytonic is present is safe. Adding lang=got where sc=Goth is also safe, and many other scripts are only used for one language so we can derive the language from the script. We can also look at the characters in the term being linked to. Templates can't recognise which characters a word consists of, but Lua can. So if a word contains, say, Hiragana or Cyrillic, we can be pretty certain it's not English. We could also separate out calls to {{term}} that use Latin characters that are not used in English, like å. Granted, none of those approaches is absolutely failsafe, but it would probably be right more than 99% of the time, and it would make it much easier to chip away gradually at the number until it becomes more manageable. And making a few mistakes (marking a link with the wrong language) is not serious, especially not considering that currently 40000 are marked with the wrong language (it can only get better!). —CodeCa t 14:55, 21 April 2013 (UTC)

I was imagining that most would be English, and then it would be easy for a human to scan it and fix the langcode if necessary. I don't know what percentage the script/character method can handle, but I'm sure it's noncontroversial for you to attempt it. —Μετάknowledge^{discuss/deeds} 14:59, 21 April 2013 (UTC)

I don't know how many there would be either, but I can add an invocation to a module (that needs to be made) which would add a category to the page when lang= is not present. That module can then decide to add the page to different categories depending on other factors like the script code or the characters in the word. The number of entries in each category would then be used to gauge what needs to be done. And even if one category contains only a few hundred entries, that's still a few hundred fixed and done. Every little bit helps, and we'll need to do this in little bits. :) —CodeCa t 15:05, 21 April 2013 (UTC)

How about an error message like the one after this term [! Editors: the preceding term template lacks a language code]. Visible to all, relatively unobtrusive, self-explanatory and ignorable. The copy could be made more accessible; it should convey that an improvement is needed but doesn't affect the accuracy of the information. —Michael Z. 2013-04-21 15:57 z

How would that look if there is also a translation or a gloss? —CodeCa t 16:40, 21 April 2013 (UTC)

How about Greek όρος (óros, "term") [! Editors: the preceding term template lacks a language code]. I think it belongs after the whole template, because it refers to that construction. If it were before the brackets, it would look more like it was referring to the term itself.

Uh-oh! That probably can't work unless someone customizes or rewrites the javascript. Each collapsing element has to have a unique ID. In my browser, clicking any one of those examples expands them both. Even if we want that behaviour, duplicate IDs break the HTML. —Michael Z. 2013-04-21 23:04 z

If we wanted a bit more urgency and context, a background appearing on hover or expand could tie it all together like Greek όρος (óros, "term") ⊕ Editors: this term template lacks a language code. —Michael Z. 2013-04-21 23:27 z

Lua script errors show a floating window when you click on them. Maybe you can have a look at how they work, and copy that? —CodeCa t 00:34, 22 April 2013 (UTC)

Can you point me to one, or tell me how to generate one? I remember something like that, but now when I try to save a module with a script error, I just see the big red box at the top of the page. —Michael Z. 2013-04-22 00:50 z

You can look in Category:Pages with script errors. —CodeCa t 01:03, 22 April 2013 (UTC)

Support making lang mandatory. It may also be possible to include automatic transliteration later. Perhaps rather than "???", it should say "which language???". --Anatoli ^{(обсудить}/^вклад) 23:52, 21 April 2013 (UTC)

I don't like calling it an error. For one thing, it's beside the point and adds extra verbiage, but mostly, it gives the impression that things are falling apart. I would suggest following the lead of some of our rf- templates: "This term template is lacking a language code. If you know it, please add it as a lang= parameter". Still verbose, but it would only show on hover. The symbol should be something small and innocuous, like the one Michael suggested above, or maybe a bullet (•). Even the question marks might not be so bad- as a trailing superscript. Or how about: όρος (óros, "term")^[→?] (I'm sure there are attributes that would make it look more like a live control, but you get the idea).

[edit] Portuguese reflexive verbs

I have just added compadecer-se, but have no idea how to show its inflections. There is nothing in Wiktionary:About Portuguese and no obvious templates. The entry in Portuguese Wiktionary has no conjugation table. Any ideas? SemperBlotto (talk) 10:50, 21 April 2013 (UTC)

I don't know Portuguese, but in general, it is worth considering whether to direct the reader from compadecer-se to compadecer, along the likes of mračit se directing the reader to mračit. Nonetheless, as regards reflexive forms, different languages seem to use diffferent approaches. Portuguese entry dirigir-se directs the reader to dirigir for conjugation, as does encaminhar-se. --Dan Polansky (talk) 14:55, 21 April 2013 (UTC)

What do you do in cases where the non-reflexive verb doesn't exist? Then there is no entry to direct the reader to. On the other hand, let's imagine dirigir didn't exist and was not attestable, only dirigir-se. Then dirigir-se would have to have a conjugation table. But what should it contain? Suppose that it contains forms with the reflexive particle attached, so that it has te diriges. Then that would violate "all words in all languages" because diriges gets no entry, and that would confuse users who don't realise that "te diriges" is one term. Suppose on the other hand that the table instead displays te diriges, linked separately. Then we're faced with another dilemma: what would the entry diriges contain? It can't say "second person singular present of dirigir-se" because that's not correct, "te diriges" is the second person singular of dirigir-se, not "diriges". On the other hand, it can't be "second person singular present of dirigir" either, because dirigir doesn't exist. —CodeCa t 15:37, 21 April 2013 (UTC)

(after edit conflict) For Czech, I always create a non-reflexive entry even if all its uses are reflexive. Thus, for "mračit se", the definition is at mračit, where "se" is stated on the definition line; "mračit" is always used with "se". As for inflected forms, there would be e.g. mračila. Note that, in Czech, the reflexive particle se or whatever that is is separated from its verb, as in "pořád se na něho mračila", so I do not see it necessary to have mračila se as an inflected-form entry. --Dan Polansky (talk) 15:51, 21 April 2013 (UTC)

Yes, it's awkward, isn't it. In Italian, we hard code the pronoun in the inflection table (with no wikilink) and wikilink the inflected verb (even in the few cases in which the non-reflexive form doesn't exist (Hmm)). In French, we redirect the "pronoun + infinitive" to "infinitive".SemperBlotto (talk) 15:42, 21 April 2013 (UTC) (See lavarsi and se laver as typical of these)

In Dutch we don't have separate entries for reflexive verbs either. But that may not really be the best idea for all languages, because in some there is no space to separate the particle from the verb. Spanish and Portuguese are examples, but Catalan also has many pronouns that contract with the verb when next to a vowel (like in French). So Catalan might have adormir-se with the form m'adormo, and the imperative of acostumar-se is acostuma't. —CodeCa t 16:46, 21 April 2013 (UTC)

My practice has been using:

  ====Conjugation====  See {{l/pt|compadecer}}.

Listing each combination would be too messy. A verb form like compadeceria can give se compadeceria, compadeceria-se and compadecer-se-ia. — Ungoliant ^(Falai) 17:09, 21 April 2013 (UTC)

You are receiving this email because you subscribed to this feed at blogtrottr.com.

If you no longer wish to receive these emails, you can unsubscribe from this feed, or manage all your subscriptions

Wealth Maker

Sunday, April 21, 2013

Wiktionary - Recent changes [en]: Wiktionary:Beer parlour/2013/April