Monday, June 24, 2013

Wiktionary - Recent changes [en]: Wiktionary:Beer parlour/2013/June

Wiktionary - Recent changes [en]
Track the most recent changes to the wiki in this feed. // via fulltextrssfeed.com
Wiktionary:Beer parlour/2013/June
Jun 24th 2013, 23:01

Line 1,096: Line 1,096:
 

::We could have the dumps processed to count the lines in each Language's PoS sections starting with "#" (and not "#:" or "#*") and attempt to eliminate "form of" type definitions. That would yield "definitions", not lemmas, by language. If we use the dumps and keep growing, then we could honestly say something like "more than X,XXX,000 English definitions of English words" and update it periodically (monthly, quarterly?). Changes in the count between periods would be another way to monitoring activity. We could express gratitude to contributors in obscure languages, note unsanctioned reductions, etc. Whether its worth the cycles and effort I don't know. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 20:05, 24 June 2013 (UTC)

 

::We could have the dumps processed to count the lines in each Language's PoS sections starting with "#" (and not "#:" or "#*") and attempt to eliminate "form of" type definitions. That would yield "definitions", not lemmas, by language. If we use the dumps and keep growing, then we could honestly say something like "more than X,XXX,000 English definitions of English words" and update it periodically (monthly, quarterly?). Changes in the count between periods would be another way to monitoring activity. We could express gratitude to contributors in obscure languages, note unsanctioned reductions, etc. Whether its worth the cycles and effort I don't know. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 20:05, 24 June 2013 (UTC)

 

::: Actually, I wasn't really thinking of senses, just words. "[[set|Set]]" has a large number of senses, but it is still a single "word". I was thinking that it would be nice to show how many English words we have definitions for, irrespective of the number of definition per word. Of course, given the number of words with many senses (and words with senses in many languages), I'm sure that if we were to count all of the actual definitions here, that would put us in the tens of millions. [[User:BD2412|<font style="background:lightgreen">''bd2412''</font>]] [[User talk:BD2412|'''T''']] 20:29, 24 June 2013 (UTC)

 

::: Actually, I wasn't really thinking of senses, just words. "[[set|Set]]" has a large number of senses, but it is still a single "word". I was thinking that it would be nice to show how many English words we have definitions for, irrespective of the number of definition per word. Of course, given the number of words with many senses (and words with senses in many languages), I'm sure that if we were to count all of the actual definitions here, that would put us in the tens of millions. [[User:BD2412|<font style="background:lightgreen">''bd2412''</font>]] [[User talk:BD2412|'''T''']] 20:29, 24 June 2013 (UTC)

::::I don't see how we are not misleading folks when we say a form-of entry has a "definition". What we say has the sound of marketingspeak. Furthermore, the proportion of English L2 sections that have multiple definitions is less than 10% of the total, even excluding English form-of entries. We can't really use "lemmas" and expect normal folks to understand. We could possible count each L2 section as an "entry" without being misleading, except for the form-of problem. It seems to me that any honest count requires some work. If we only counted carefully once a year, but also counted some percentage increase of some meaningful measures of overall size since that last count, we would not be misleading folks and convey some idea of continued growth.

+

::::I don't see how we are not misleading folks when we say a form-of entry has a "definition". What we say has the sound of marketingspeak. Furthermore, the proportion of English L2 sections that have multiple definitions is less than 10% of the total, even excluding English form-of entries, so we cannot assume that we have "tens of millions" of definitions, if we exclude form-of entries. We can't really use "lemmas" and expect normal folks to understand. We could possibly count each L2 section as an "entry" without being misleading, except for the form-of problem. It seems to me that any honest count requires some work. If we only counted carefully once a year, but also counted some percentage increase of some meaningful measures of overall size since that last count, we would not be misleading folks and convey some idea of continued growth.

 

::::None of this gets to the real problem of quality, which is probably more important to keep users coming back, especially in the competitive areas, such as against English online monolingual dictionaries, where we do not excel. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 22:59, 24 June 2013 (UTC)

 

::::None of this gets to the real problem of quality, which is probably more important to keep users coming back, especially in the competitive areas, such as against English online monolingual dictionaries, where we do not excel. [[User: DCDuring |DCDuring]] <small >[[User talk: DCDuring|TALK]]</small > 22:59, 24 June 2013 (UTC)

   

Revision as of 23:01, 24 June 2013

MAY · JUNE · JULY

Contents

Thoughts from an experienced outsider

As a nine-year Wikipedian and very occasional editor here at Wiktionary, I would like to offer insight into what this community's atmosphere feels like to an outsider. Here are a few thoughts from 42 short hours of interacting with six established contributors (on various pages) with respect to the subject of untranslatable terms:

  • Content upon which articles, books, research, and most members of the public express interest is useless
  • Content upon which articles, books, research, and most members of the public express interest is irrelevant
  • Content pertaining to knowable terms should be omitted because it could be contradicted by unknown information
  • Content should be omitted if inspired by the popular press
  • Content should be omitted if it requires substantial effort to produce
  • The most widely acclaimed translation of a nation's most seminal work is unacceptable as a reference
  • Scientific meta-analyses consist of subjective opinion and cannot be used as a reference
  • Two books and an expert's paper aren't to be considered durably published references
  • Verbal challenges to the veracity of references may not be overcome by verbal verification with one or more native speakers of an obscure language
  • Verbal challenges to the veracity of references may not be overcome by directly contacting its author to obtain explicit provenance

These observations are not intended to ridicule the associated contributors in any way as individuals – in fact, I hope to work productively with each and every one of them even on this very issue. However, most new contributors to Wiki-projects cannot endure more than two or three blows like these before getting a pretty sour taste in their mouth or just outright leaving forever.   — C M B J   09:08, 3 June 2013 (UTC)

I think you are from Wikipedia. Having to follow these rules doesn't seem much different from having to follow Wikipedia's rule about "no original research" (which talking to a native speaker would also be, unless it were published and peer-reviewed). Perhaps it's just a matter of understanding why these rules exist, and adjusting to them? Equinox 09:19, 3 June 2013 (UTC)
No original research is a fine policy, but that was not a point of concern for three very sound reasons: (a) the content was published by multiple sources and veritably by at least one, (b) its criteria for inclusion is lower due to special considerations for rare languages, and (c) past thinking on the matter was that attestation by a native speaker or knowledgeable individual would be considered appropriate. I am more than able to navigate policies foreign to me and I do not believe unfamiliarity or confusion on my part to have been a factor.   — C M B J   09:40, 3 June 2013 (UTC)
These are very quickly formed opinions and based on comments by individual users rather than the community as a whole. We cannot stop individual users having opinions, nor would we want to. Mglovesfun (talk) 09:21, 3 June 2013 (UTC)
'Source' here isn't a good word, we don't source entries, we cite them. If you see Appendix:English dictionary-only terms you'll see we can source quite a lot of words that don't appear to exist. Mglovesfun (talk) 09:22, 3 June 2013 (UTC)
As per above, I am here speaking on good faith and, frankly, I really don't appreciate being accused of trying to break the project's rules for personal reasons, of being ignorant and uninformed after providing this explanation, or being disparaged for identifying with Wikipedia, or being asked to stop talking (while simultaneously being told "we cannot stop individual users having opinions, nor would we want to"), all because I dared to disagree with unsubstantiated claims. These unprovoked and irrational ad hominem attacks are ironically representative of why I felt that it was necessary to go out on a limb and share my experience in the first place. This environment is toxic.   — C M B J   10:18, 3 June 2013 (UTC)
I think there are two issues going on here, and some of the disagreements may stem from different understandings of what's under discussion. The sources you've cited for these words are fine (at least as far as I'm concerned) for having an entry on words like tingo#Rapa Nui. The sources are perfectly adequate in terms of WT:CFI for less well attested languages. What's more problematic is putting them in the Category:Terms without an English counterpart (or even having that category) because deciding what words do and do not "really" have an English counterpart is highly subjective. At some level of philosophizing, no word in another language has an exact English counterpart because each word in another language will have slight shades of meaning and connotation that the English word doesn't have. I think all of the words listed over at WT:RFD/O#Category:Terms without an English counterpart is worthy of having a Wiktionary entry, provided it's in the correct script with the correct capitalization and as long as confirmation from some published work by a recognized expert in the language (as opposed to the popular press) is provided to confirm the term's existence in the case of less-attested languages, and as long as three cites from durably archived sources are provided in the case of well-attested languages like German. But deciding what goes into the category of "English doesn't have a word for this" is problematic because there are no objective criteria for it. I myself have worked on Sättigungsbeilage, a word with no obvious English translation, and brought it to the attention of word mavens by nominating it for WT:FWOTD, but even I'd be wary of putting it into that category. —Angr 10:37, 3 June 2013 (UTC)
I wish to associate myself with Angr's comments. DCDuring TALK 11:42, 3 June 2013 (UTC)
I agree with much of what Angr said and have replied on his talk page to help keep this thread on-topic.   — C M B J   12:34, 4 June 2013 (UTC)
CMBJ, please don't see disagreeing with you as a personal attack, as we can't agree with you purely so you don't feel attacked. Mglovesfun (talk) 11:29, 3 June 2013 (UTC)
To quote:
  • …here's a tip. If you don't know what you're talking about, stop talking."
  • "…become better informed, please!"
  • "You come across as a Wikipedian trying to get round the rules for his own personal reasons."
  • "I'd suggest if you have nothing relevant to say, say nothing."
These are not spirited disagreements over the issue at hand. They're wanton personal attacks, and even worse, they're non sequiturs. I am willing to forgive and forget and move beyond them, but I will not for one second tolerate further denigration—especially in a safe place and from an administrator. It's doubly unacceptable.   — C M B J   12:05, 3 June 2013 (UTC)
I stand by all of that. What's wrong with being better informed or not making ill-informed comments? What exactly do you object to? Mglovesfun (talk) 17:02, 3 June 2013 (UTC)
The insulting tone, the implication that he doesn't know what he's talking about, has nothing relevant to say, and should shut up. I find these statements insulting too and I'm not even the one they're directed at. Any reasonable user would take these as personal attacks. —Angr 18:05, 3 June 2013 (UTC)
I wish to associate myself with Angr's comments immediately above, as well. Even though I sometimes have delivered abusive comments, I don't think it is good practice, especially directed at a new contributor who is making good faith efforts to contribute. DCDuring TALK 18:25, 3 June 2013 (UTC)
  • I mostly agree that WT editors and admins, myself included, sometimes come across too tersely or even insultingly, perhaps as we let our frustrations get the better of us.
However, the comment above that "You come across as a Wikipedian trying to get round the rules for his own personal reasons" does not itself strike me as all that accusatory -- it is simply a description of what CMBJ's push-back might be viewed as. Within the greater context of CMBJ's interactions, I can see how CMBJ might interpret it as inflammatory, however.
Online discourse can be difficult. Without all the visual social cues that humans have evolved to give and receive, intent is often hard to discern. -- Eiríkr Útlendi │ Tala við mig 18:39, 3 June 2013 (UTC)
Just to be clear here, these comments were not a clumsy exchange of benign text that just came out sounding wrong. Mglovesfun was not even a participant in the associated discussion prior to stating, with candor, that I was insidiously "trying to get round the rules" (what applicable rules?) for my "own personal reasons" (what possible reasons?) and that I should "stop talking" until I "become better informed" (about what subject?). There is also no reconciling "we cannot stop individual users having opinions, nor would we want to" with "I'd suggest if you have nothing relevant to say, say nothing" because they're contradictory advice in principle. Moreover, if dissecting and attempting to calmly refute unsubstantiated claims is perceived as frustrating push-back, then something's very wrong here, because that's how consensus is supposed to be developed.
For what it's worth, I'm thick skinned. I'm not here calling for his de-sysopping. I respect his right to say anything to me, even if accidentally misconstrued or necessarily offensive, and even if in violation of the letter of policy if he genuinely felt it was justifiable for some reason. But that doesn't mean that the other 99 contributors who don't have the nerve to speak up will stick around after such shoddy treatment, which is the focus of this thread.   — C M B J   04:26, 4 June 2013 (UTC)
Thanks for the suggestion. I seem to have already picked up just about everything it describes, but it would've undoubtedly been helpful to me not that long ago. Maybe an eventual goal should be to automatically display it in a dismissible sitenotice for unified accounts that have >500 Wikipedia edits and <25 Wiktionary edits.   — C M B J   03:35, 4 June 2013 (UTC)
  • CMBJ, you came in with your personal project. Many people who join Wiki-projects with broad ideas of how it should be leave frustrated. Your sources do not match up to what we expect for the project, and many of us don't think your new category is useful. Personally, the fact that your signature links back to Wikipedia doesn't inspire me to treat you as other then a Wikipedian tourist.--Prosfilaes (talk) 19:37, 3 June 2013 (UTC)
First of all, this is not my personal project and I did not come here with an ulterior agenda based on broad assumptions of how everything should be. I did, however, come here and find that an expected level of detail was missing in an area that is of particular interest to many readers, and whether that information is most appropriately presented in the form of a category or not is aside the point. The problem here is that, for a new user, participating on this project is painful, and not for reasons that can be explained away as normal responses to personal fault. This is a chronic problem and that is made abundantly clear by the reactions that articulating it has provoked.
Even in your case—and I stress that you're not even involved—the response has been to just further make this about me. The fact that the thought would even cross your mind to view cross-project editors as "tourists" is very telling of the climate here. The fact that you would for some reason consciously treat them differently, and feel comfortable and confident about stating that intention openly amongst peers and moderators—and to justify the behavior of others, no less—is even more telling because these are the very people who should sense the utmost of hospitality and collegiality and support while making their first contributions.
Further to that point, this "we're not Wikipedia" mantra does not resonate with me at all; both projects are funded by the WMF and both projects labor for the same central goal. The fact that their content guidelines differ is not an excuse for abrasive and callous attitudes toward those who are stellar enough to contribute in multiple areas of concentration.   — C M B J   03:30, 4 June 2013 (UTC)
You can't say I stress that you're not even involved to any editor at this point, because you are making negative accusations about the entire project, i.e. all of us. Strictly regarding the project, if the issue is that the environment is toxic and painful, some users may experience that in isolated cases, but overall I think the editors do their best to be civil and helpful. When they fail, it's because of the limits human nature and of communicating via online forum. That's my opinion. The empirical support would be that 1000 active editors made it through the supposed toxicity somehow. --Haplology (talk) 04:48, 4 June 2013 (UTC)
Actually, yes, I can, and previously did, and will continue to do so, because my perception of this problem is that it is systemic in nature. This is not an unreasonable assertion because attitudes and norms are contagious social factors. In this case, I am already familiar with the complications that you speak of from Wikipedia and other communities, but it is my view that, with respect to this particular community, they are above and beyond what would be considered normal. It is also more than possible for thousands to unknowingly endure such tendencies and then inadvertently and unintentionally perpetuate them forward without ever taking notice.   — C M B J   09:34, 4 June 2013 (UTC)
I don't view editors who also edit other projects as tourists. I view editors who set their signature to link to another project as tourists. It's an obnoxious habit, and it makes my eyes roll on any project I see it on. And anybody who waves a flag saying "I'm not interested in working with this project" is not really the person to devote extra care on.--Prosfilaes (talk) 04:29, 4 June 2013 (UTC)
"I don't view cross-project editors as tourists, I just view cross-project editors like you as tourists" isn't exactly making the premise any less malicious. For your information, I personally provide a link to my home wiki to centralize my identity and so that others can receive a timely response to messages. I consider this configuration to be of mutual benefit and so I utilize a dirty workaround to make possible what will likely be a standard MediaWiki feature at some point in the future. Regardless, this has yet once again devolved into ignoring the issue while making this about me.   — C M B J   09:34, 4 June 2013 (UTC)
Granted, Mglovesfun was pretty rude, and your experience overall hasn't exactly been pink horsies and rainbows. Still, you aren't completely blameless, either. Before you came along, the discussion consisted of 5 edits totalling 778 characters. SemperBlotto called it a "useless category", but otherwise the comments centered on practical issues. Pretty mild stuff.
A week later, you decided to weigh in. Ignoring the entire discussion, you set out to educate us about how your category was the only thing preventing us from descending into a morass of error and mediocrity. You started out with "This concept is itself independently notable and such categorization is necessary for the eventual completeness of our project".
From your very first sentence, you set yourself to the task of telling us in absolute terms what Wiktionary has to have in order to be any good at all. Your comment later on is telling: "The fact that their content guidelines differ is not an excuse for abrasive and callous attitudes toward those who are stellar enough to contribute in multiple areas of concentration." No false modesty there. The fact that most of us also contribute to Wikipedia seems to have escaped you.
Except you don't seem to understand what you're proposing to change: notability is strictly a Wikipedia concept- our CFI center on usage. What's more, categories aren't content- they're tools for organizing and navigating through the dictionary entries, which are the real content. The completeness of the project has nothing to do with categories.
We do things differently than Wikipedia not because we don't know any better, but because Wiktionary is a dictionary, and Wikipedia is an encyclopedia. Dictionaries are highly structured and concise- we don't go into much detail, because people come to us for very specific types of information, and everything else is clutter. Your category has all the markings of a typical Wikipedia list article, starting with the interesting concept. As I mentioned above, our categories are mostly for organization and navigation- not for telling a story.
You then added almost 50 lines of unnecessary examples regurgitated from popular websites, complete with footnotes/bibliographic references, for a total of 6 edits and 5876 characters- 7 1/2 times the size of the entire discussion- before even starting to address so much as a word of what anyone else had said. I'm pretty verbose, myself, but that's a lot!
To sum it up: you tried to graft encyclopedic concepts onto a dictionary, jumped into the discussion about it without addressing anything already said, dumped huge amounts of verbiage on us while still missing the point, talked to us like you were introducing civilization to the heathens, and then wondered why everyone got annoyed at you.
What it boils down to, is this: the category that you thought of as the ideal way to dress up this nondescript little backwater of ours was nominated for deletion as useless by the locals. You seem to have taken this as a criticism of your judgment, and have a very strong emotional vested interest in fighting off the challenge. You don't want to hear that, so you've been repeatedly ignoring the issue while making this about Wiktionary. I would say more, but this has grown to almost half the size of your original post... Chuck Entz (talk) 09:52, 4 June 2013 (UTC)
We now have one account across Wikimedia Wikis, and come August, the chance that anyone might believe there's two CMBJs editing on Wikimedia will be removed with the renaming of unified accounts. It's not mutually beneficial; it left me on another wiki when I was trying to check your contributions, I would have had to deal with completely irrelevant material if I wanted to leave a message there, and certain users may not be able to leave a message at all. (I know of at least two major editors on Commons that are blocked on en.WP.)
You keep saying it's not about you, but you were one party in all these discussions. How could we have best informed you that we were deleting the category in all forms? If you can't think of a way, you're saying the members of this Wiki don't have the right to choose what content they find acceptable.--Prosfilaes (talk) 17:46, 4 June 2013 (UTC)
  • Discussion is rapidly becoming uncivil and unproductive. Caution is advised. --Yair rand (talk) 10:05, 4 June 2013 (UTC)
    • Don't feed the trolls! —This unsigned comment was added by 82.18.16.213 (talkcontribs).
    • Agreed. I spent so much time trying to come up with a coherent explanation of the problems I saw in all this, that I just ended up tired and grumpy. I take back the negative tone of my comments, but I don't have time or energy to rework everything right now. It will have to stand in its current ugliness until I can rework it and address the real issues I was trying to get across. Chuck Entz (talk) 12:15, 4 June 2013 (UTC)
Individually,
  • "Granted, Mglovesfun was pretty rude, and your experience overall hasn't exactly been pink horsies and rainbows. Still, you aren't completely blameless, either. Before you came along, the discussion consisted of 5 edits totalling 778 characters. SemperBlotto called it a "useless category", but otherwise the comments centered on practical issues. Pretty mild stuff. A week later, you decided to weigh in."
The reason that I weighed in a week later on this matter is because no one had the courtesy to notify me of the deletion discussion. I found it accidentally while navigating for unrelated reasons.
  • "Ignoring the entire discussion, you set out to educate us about how your category was the only thing preventing us from descending into a morass of error and mediocrity. You started out with "This concept is itself independently notable and such categorization is necessary for the eventual completeness of our project". From your very first sentence, you set yourself to the task of telling us in absolute terms what Wiktionary has to have in order to be any good at all."
I strongly disagree that I ignored this discussion and in fact my original response was intended to address prior concerns ("useless", "difficult to manage", "necessarily subjective") by presenting a cogent case otherwise ("independently notable", "necessary for completeness", and as a clarification, "scholarly examples exist"). Moreover, I do believe that this information is necessary for the eventual completeness of this project. I base that view on the observation that many publications have expressed interest in this particular area.
  • "Your comment later on is telling: "The fact that their content guidelines differ is not an excuse for abrasive and callous attitudes toward those who are stellar enough to contribute in multiple areas of concentration." No false modesty there. The fact that most of us also contribute to Wikipedia seems to have escaped you."
This was not false modesty and this assertion may very well be the most offensive remark made since this ordeal began. The comment does not refer to my self-image but my view of each and every individual who meets this description, many of whom are truly stellar in every sense of the word.
  • "Except you don't seem to understand what you're proposing to change: notability is strictly a Wikipedia concept- our CFI center on usage. What's more, categories aren't content- they're tools for organizing and navigating through the dictionary entries, which are the real content. The completeness of the project has nothing to do with categories. We do things differently than Wikipedia not because we don't know any better, but because Wiktionary is a dictionary, and Wikipedia is an encyclopedia. Dictionaries are highly structured and concise- we don't go into much detail, because people come to us for very specific types of information, and everything else is clutter. Your category has all the markings of a typical Wikipedia list article, starting with the interesting concept. As I mentioned above, our categories are mostly for organization and navigation- not for telling a story.
The only thing I was/am proposing is that this information—which, again, I believe to be necessary for the project's completion—not be needlessly eradicated. The way it is presented in makes little difference in my mind, so long as it's easily accessible to readers.
  • "You then added almost 50 lines of unnecessary examples regurgitated from popular websites, complete with footnotes/bibliographic references, for a total of 6 edits and 5876 characters- 7 1/2 times the size of the entire discussion- before even starting to address so much as a word of what anyone else had said. I'm pretty verbose, myself, but that's a lot!"
These examples were preceded by the question of "what would go in these categories?" and I consider them to have been a decent response. The sources were presented in such a way that would convey their journalistic nature, attributions were provided to avoid plagiarism, and they were formatted in the usual way.
  • "To sum it up: you tried to graft encyclopedic concepts onto a dictionary, jumped into the discussion about it without addressing anything already said, dumped huge amounts of verbiage on us while still missing the point, talked to us like you were introducing civilization to the heathens, and then wondered why everyone got annoyed at you. What it boils down to, is this: the category that you thought of as the ideal way to dress up this nondescript little backwater of ours was nominated for deletion as useless by the locals. You seem to have taken this as a criticism of your judgment, and have a very strong emotional vested interest in fighting off the challenge. You don't want to hear that, so you've been repeatedly ignoring the issue while making this about Wiktionary. I would say more, but this has grown to almost half the size of your original post."
No, I simply tried to incorporate popular lexicographical information into Wiktionary. I found that information silently nominated for deletion and sprung into action to help save it. I attempted to address the other participants' concerns individually and have continued to do so as best possible. I did and still do take issue with the unwillingness of multiple participants to address cogent counterarguments.
Again, and as a final note, I want to reiterate and make unequivocal that this thread was not intended to be focused on the RfD. It is not about and was never about me or my opinions here. It is, however, about how toxic this environment feels from the perspective of a new user, which unfortunately has been further echoed by this discussion.   — C M B J   11:19, 4 June 2013 (UTC)
It seems most people here (especially judging from Mglovesfun's comments) just tried to bash the new contributor instead of taking their time and help him to contribute what he want in the "right" way (which differs in every project); of course because it's the easiest way to deal with new users. The worst part was the comment by the idiot who accused him of being a troll. Chuck Entz is right about the purpose of the category namespace, and CMBJ is also right that this information is necessary and quite useful. The solution here is that these informations should be put in the appendix namespace, as a list, and I think it would become a quite useful one. --Z 12:07, 4 June 2013 (UTC)

  • I hereby admit to being rude and promise to try to do better.
  1. Symbol support vote.svg Support DCDuring TALK 12:16, 4 June 2013 (UTC)
  • I wholeheartedly accept instances of this gesture as making amends. Additionally, if my own actions led to ill feelings for anyone involved at any point, then I ask forgiveness and offer my commitment to continued cooperation and respect in all efforts that contribute to the advancement of our common mission.   — C M B J   11:15, 5 June 2013 (UTC)

  • Silent deletions are really infuriating. Every {rfv, rfd}-ed page should have all of its respective contributors notified on their talk page (perhaps by a bot, it's easily automatable). Or even better - through a notification gadget like the one on Wikipedia. --Ivan Štambuk (talk) 14:07, 4 June 2013 (UTC)
That is apparently part of the basic software and is available in user "Preferences". I find it very useful to track the limited number of pages I watch in WP, Species, Commons, and MediaWiki. DCDuring TALK 16:05, 4 June 2013 (UTC)
I've created a proposal to help prevent this from happening to others in the future.   — C M B J   11:10, 5 June 2013 (UTC)
See also Wiktionary:Beer parlour/2012/April#Renaming "context labels", and [[Wiktionary:Grease pit/2012/April#Rewrite {{context}}?]]

I imagine we want to convert {{context}} to Lua at some point, so I am wondering what the best way would be. Ruakh made a start with creating a replacement some time ago, {{label}}. It's used on a few pages but it's template-based and uses subtemplates instead of "raw" templates. One of the advantages of using subtemplates is that it eliminates any conflicts between context labels and other templates. {{context}} would use any template that had the same name as the label, which often causes problems (the recent issue with {{abbreviation}} is one example). On the other hand, because {{label}} doesn't use the "bare" template as the label, it's not possible to write something like {{intransitive}} by itself, you'd need {{label|intransitive}} instead.

The most straightforward way to convert these to Lua is probably something like Module:languages, with a single data module containing all the information for the context labels, and a separate module to handle the processing and display. I think that the approach used by {{label}}, in which labels always need {{context}} or {{label}} prefixed, is preferred for a Lua implementation. It would drastically reduce the number of context templates we need to maintain, it would remove any conflicts between labels and other templates with the same name ({{plural}} for example!), and it would also prevent any desynchronisation between the templates and the module. For example, if someone creates a new context label, they'd need to remember to also create a matching template, which would not really add much value to the system and just be there for convenience. It makes more sense to not create those templates in the first place and to always require the same template to "initiate" the process. Another advantage is that bots, if they want to parse entries, no longer need a long list of which templates can possibly be used as context labels, because there'd only be one.

Another change I would like to make while we're at it, is to use the first parameter to specify the language code. It's common for editors to forget to specify the lang= attribute because they're not aware that some context labels add categories. The problem is compounded by the fact that only some labels categorise while others do not so editors need to remember this for every label. {{intransitive}} does not categorise for example, so it's easy to miss this and, when you want to add a second label like {{rare}} (which does categorise), to forget the language. I believe that requiring the language as the first parameter will help with these problems because then it can never be forgotten or skipped, so it makes editors more aware that they need to put something there and that they need to change it when copying content to another language.

I'd like to hear what you think and if you have any specific points to raise. —CodeCat 16:18, 3 June 2013 (UTC)

I support everything. My only suggestion is that if we are going to make {{context}} obligatory, we should create a shorthand, like {{x}} or something, redirecting to it. — Ungoliant (Falai) 19:23, 3 June 2013 (UTC)
We could also use something a bit more descriptive like {{con}}. {{x}} is really vague. —CodeCat 19:27, 3 June 2013 (UTC)
That's a language code, and {{c}} is a grammatical label. What about {{ct}} or {{ctx}}? (But let's not forget that any 2 or 3 letter template is a timebomb waiting for ISO to release it as a language code.) — Ungoliant (Falai) 19:31, 3 June 2013 (UTC)
Well, we're phasing out the language templates, so we don't have to worry about that. And {{c}} may also be phased out if we decide to do so, since we now have a module to replace it. We could also decide to use Ruakh's {{label}} instead, which is a bit shorter. —CodeCat 19:34, 3 June 2013 (UTC)
In that case, I state my preference for {{c}}, since that's the smallest increase possible in the amount of characters one will need to type. — Ungoliant (Falai) 19:40, 3 June 2013 (UTC)

Sounds good, but let's abandon the misleading name "context." Labels like {{pejorative}}, {{plurale tantum}} and {{abbreviation}} are nothing to do with context.

Context means two different things in lexicography. One is the context a word appears in in its citation, esp. in corpus lexicography. The other is in something called discourse analysis, and seems to be only vaguely related to usage as we consider it here.

We are using this template for both usage and grammatical labels, or tags. Michael Z. 2013-06-03 20:34 z

[after edit conflict] We shouldn't be wedded to the somewhat misleading name "context" as we use the beginning-of-the-definition-line position for many things, including topical labels, sense-specific complement information, semantic-grammatical classification (eg, intensifer, modal adverb), as well as register and regional and other context.
One thing that might be very helpful in the long run would be to build in support for various default types of display for various types of tags. One useful thing would be to differentiate topic from usage context typographically. Another would be to allow semantic-grammatical tags to be non-displaying by default. This might also be useful for maintenance-related tags. I suppose such things could be done using CSS to make it easier to users to use common.css to customize display of such tags. DCDuring TALK 20:40, 3 June 2013 (UTC)
With all of the labels codified in a Lua table, it should be easier to categorize the labelled entries, as well as to inject CSS classes. Perhaps something like class="label-subject-history" or class="label-grammar-intensifier", so CSS can be used to style or hide individual labels, general classes, or all of them. Michael Z. 2013-06-03 22:22 z
We have 935 labels that might want to have their own CSS class or ID. For myself, I would rather be selecting groups, if at all possible. For some types I would think that we not need individual CSS classes. I take it that CSS does not allow one to select members of a class equal to specific text. DCDuring TALK 01:32, 4 June 2013 (UTC)
Are you asking whether CSS can select and style based on the text of the content? No. But if we are putting that text into the page, then there's practically no overhead in also putting it into the class attribute.
Actually, using simple class selectors would require separate classes for the levels of categorization, as class="label label-subject label-subject-history", allowing one to style all labels, or labels in a category, or a specific label. Leaving out the individual label class would save a tiny bit of overhead in loading time and page weight, but I guess it would be insignificant.
If we used only class="label-subject-history", then we could use a substring selector in modern browsers (MSIE 7+), as in *[class*=label-subject] {. . .}, as long as we made sure that *label* didn't appear in any unrelated classes. Michael Z. 2013-06-04 16:28 z
Why over-Luaize everything? {{label}} seems like a simple enough solution that should fit all our needs. -- Liliana 20:42, 3 June 2013 (UTC)
Template:label's subpages all have a common piece of code, which I don't like. It's harder to maintain, if one decides to perform any change then a lot of pages need to be changed. I support the proposed change. --Z 21:09, 3 June 2013 (UTC)
Both {{label}} and {{context}} have the problem that they don't properly separate code and data. {{context}} is impossible to modify thoroughly for that reason, but I don't know how much better {{label}} is. —CodeCat 21:11, 3 June 2013 (UTC)
  • Although I do think there's a risk of going too far with Lua, {{context}} is one template that absolutely should be Luacized, for performance, for readability, and for correctness. (The demo that I created, and that Liliana-60 copied illegally to {{label}}, is an improvement over {{context}} in all three respects, but a Lua module would be a much greater improvement. I would never have created that demo if I had known that we'd get Lua so soon.) —RuakhTALK 06:44, 4 June 2013 (UTC)
    • Illegally? Are you the dictator of Wiktionary? -- Liliana 09:41, 4 June 2013 (UTC)
      I think Ruakh is referring to the copyright violation. --Yair rand (talk) 09:42, 4 June 2013 (UTC)
      I believe that anything we save in our user space is released under the open licences. It can be republished freely, but does requires attribution, e.g., linking to the source in an edit summary. Caveat: I might be wrong. Michael Z. 2013-06-04 16:34 z
      Exactly. Liliana-60 has a habit of ignoring the attribution requirement, and refuses to acknowledge that it's a problem. Frankly, I don't see how we can keep an administrator who insists on violating copyright, but wev. —RuakhTALK 17:46, 4 June 2013 (UTC)

Then if nobody minds, I will convert the few uses of {{label}} that are still present back to {{context}}, so that we can work on it and eventually convert {{context}} to the new Lua-powered {{label}} altogether. —CodeCat 11:52, 4 June 2013 (UTC)

I've been working on adding an explicit call to {{context}} to the labels, but there are a lot of them (160 thousand...) so it will take some time even with a bot. The progress is at Category:Context label called directly. I noticed that quite a few pages misuse the templates by using them as something other than a context (like where {{qualifier}} whould be better). But I have also realised that there is a more fundamental problem with some of the labels we need to address. Labels can have different "scopes" so to say: it can be used to specify a topic, it can indicate restricted usage (by field, place), and so on. Currently, the labels are just names and do not distinguish between these types, but there can be some ambiguity in quite a lot of cases. For example, it could be desirable to use a label to restrict a term to the topic of a particular country, but all of our country labels are currently used for restricted usage (that is, dialectisms), so this is not possible. If you write {{context|Britain}} then the term is assumed to be a Britishism, even when you really want it to mean that the term pertains to Britain. So you may get something like Category:British Dutch when you really wanted Category:nl:Britain. I'm not really sure how to solve this currently, but I do think it's important. —CodeCat 15:51, 6 June 2013 (UTC)

Can you give an example? I'm not sure how the topic of a particular country is a usage, but it might be used only the academic field of British studies, or have a special meaning when speaking about Britain, or only when referring to a sense of a thing that is in Britain (although the last is properly part of a definition and not a usage). This kind of usage categorization is problematic, because many editors start to categorize things with them rather than terms (like animal was being applied to names of animals). We once had label {{London}} and category:London, but got rid of them because they were just labelling the names of things in London. Michael Z. 2013-06-10 15:42 z
Aw, crap. Michael Z. 2013-06-10 15:45 z
I think bush would be an example. It has a meaning that originated in Australia, but is now used worldwide in that sense. Nevertheless, the word doesn't refer to that same thing outside the context of Australia, so only when speaking of Australia it has that specific meaning. —CodeCat 15:49, 10 June 2013 (UTC)
The definition already says "area of Australia," so it is clear what thing is the referent. This is not usage.
You could refine the usage label as {{chiefly|_|Australian}} or {{originally|_|Australian}} to indicate that it is not only used in Australian English.
(The Canadian and Australian usages look identical to me, except the Canadian is not widely popularized in phrases like bush tucker.) Michael Z. 2013-06-10 16:13 z
Of course, the lexicographer could also account for nuance: an Australian in Asia may refer to the local countryside as the bush. This could be analyzed as the Australian/Canadian sense of bush meaning "countryside," and the global sense meaning "Australian countryside." Michael Z. 2013-06-10 16:18 z
We do have a lot of labels like {{chess piece}}, {{enzyme}}, {{organic compound}}, {{logical fallacy}} and so on. Should we get rid of those? —CodeCat 16:20, 10 June 2013 (UTC)
I would say yes, with care. {{enzyme}} is used to label maltase with biochemistry, but the term is not restricted to biochem. {{organic compound}} labels the widely-known substances like amyl nitrate, ethanol and lactic acid, etc as technical terms restricted to the field of organic chemistry and put them into the non-lexicographical category:en:Organic compounds (why do we need to distract readers from the much better w:en:Category:Organic compounds?). Shortcut templates like these just encourage editors to categorize referents instead of labelling usage. A label should be explicitly what it is, so editors can understand what it is and does. Mzajac
Maybe what we really want is a template that works parallel to context but is explicitly intended for gloss tags instead, and placed after the word. So that something like {{gloss|organic compound}} will also categorize. —CodeCat 17:54, 10 June 2013 (UTC)
Yes, templates like those should be deleted. Some have already been deleted, or are orphans. (But some people like them, as I recall from one contentious RFDO...) - -sche (discuss) 17:56, 10 June 2013 (UTC)
  • For the record, I have a lot of doubts about bot edits in which "vulgar" is replaced with "context|vulgar". Making these edits from a thread entitled "Lua-cising Template:context" seems rather inadvisable, to say the least. I am disappointed. --Dan Polansky (talk) 18:10, 7 June 2013 (UTC)

Trademark discussion

Hi, apologies for posting this in English, but I wanted to alert your community to a discussion on Meta about potential changes to the Wikimedia Trademark Policy. Please translate this statement if you can. We hope that you will all participate in the discussion; we also welcome translations of the legal team's statement into as many languages as possible and encourage you to voice your thoughts there. Please see the Trademark practices discussion (on Meta-Wiki) for more information. Thank you! --Mdennis (WMF) (talk)

Universal Language Selector to replace Narayam and WebFonts extensions

On June 11, 2013, the Universal Language Selector (ULS) will replace the features of Mediawiki extensions Narayam and WebFonts. The ULS provides a flexible way of configuring and delivering language settings like interface language, fonts, and input methods (keyboard mappings).

Please read the announcement on Meta-Wiki for more information. Runab 14:07, 5 June 2013 (UTC) (posted via Global message delivery)

Excellent. We'll finally have an easy way of typing in different languages. --Yair rand (talk) 15:50, 5 June 2013 (UTC)

This seems to be breaking font specification. See Talk:Fraktur, where the specified font in the Fraktur sample only appears up if JavaScript is disabled, in Safari/Mac 6.0.5, Firefox/Mac 21.0, and Chrome 27.0. To sum up:

This works:

  style="font-family:UnifrakturMaguntia, UnifrakturCook, Unifraktur, serif;"  

Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsse ihre Abſicht von den Zeit­genoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.

This fails:

  lang="de" style="font-family:UnifrakturMaguntia, UnifrakturCook, Unifraktur, serif;"  

Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsse ihre Abſicht von den Zeit­genoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.

 Michael Z. 2013-06-14 19:00 z

Wasn't the point of WebFonts that the user didn't have to have the font installed in order to have it render correctly for him? Because what you have written above after "This works:" is perfectly legible for me, but doesn't appear in Fraktur. Maybe if I tracked down and installed UnifrakturMaguntia on my computer it would, but doesn't that defeat the purpose? —Angr 19:16, 14 June 2013 (UTC)
I don't know the precise point of the ULS, but in this case it steals control. If its point is readability, then it should add fallback fonts, not override the editor's choices. It isn't a case of the user didn't have to have the font installed, it's the user may as well not have it.
It is also not as smart as it should be, because it is stupid about script tags. If I correctly tag the language-script as German Fraktur with de-Latf, it still prevents correct rendering. Michael Z. 2013-06-14 20:59 z Also drives me fucking nuts by making a stupid keyboard icon pop on and off constantly while I type in this edit field. And the pop-up menu is garbled in Safari, but at least it can be turned off by a preference. Michael Z. 2013-06-21 19:23 z

Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsste ihre Abſicht von den Zeit­genoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.

I have Unifraktur installed (it's great), but neither of the above sentences displays in Fraktur. lol - -sche (discuss) 19:36, 14 June 2013 (UTC)
UnifrakturMaguntia or UnifrakturCook, or some other version? Are you able to check your Unifraktur font's font name? (On the Mac, open your Font Book.app, select the font, get info, and see what it says under Family, I think.) Michael Z. 2013-06-14 20:59 z
I just installed UnifrakturMaguntia and the first text above does now display in Fraktur for me. Don't like the capital U though; it looks wrong. (And müßte/müſſte is spelled wrong.) —Angr 21:26, 14 June 2013 (UTC)
I corrected müste → müßte.
Now this is interesting: the first example above now renders in a Fraktur font on an iPad/iOS 5! I am certain that it did not when I started this thread. I assume that the ULS extension matches font names from Google Fonts and loads those? Too bad it barfs on language codes. Michael Z. 2013-06-21 16:39 z Interesting because the iPad doesn't have installable fonts Michael Z. 2013-06-21 19:23 z
I found the original quote on b.g.c.: it's müsse, and seyn rather than sehn. The sentence still makes no sense to me, though. Just figured it out.Angr 18:51, 21 June 2013 (UTC)
Curiously, the first example also displays in Fraktur for me now (in Windows), possibly because I re-installed UnifrakturMaguntia. - -sche (discuss) 18:49, 21 June 2013 (UTC)
Okay, I have disabled the Unifraktur fonts on my Mac, and my first example above appears to be displayed in UnifrakturMaguntia. Anyone else seeing this?
Just to diagnose this, here is a sample that prefers UnifrakturCook to UnifrakturMaguntia.It should still use one of these fonts if either is available. but it is failing for me. Perhaps the ULS only looks at the first-choice font. It didn't work on preview, but now works on both Mac and iOS, showing the second-choice UnifrakturMaguntia face.
Where the heck are the docs for ULS? Michael Z. 2013-06-21 19:32 z
  style="font-family: UnifrakturCook, UnifrakturMaguntia, Unifraktur, serif;"  

Keinen Unparteiiſchen wird der Einwand ungläubiger Theologen: wenn es Typen geben ſolle, ſo müsse ihre Abſicht von den Zeit­genoſſen ſchon erkannt worden ſeyn, ſonderlich beunruhigen können.

Format of articles, why not put the definition in a lede at the top?

Take a look at toches as just one example, wouldn't that article be a lot more useful (and nicer) if right at the top of the page there was a lede that gave the definition? As it is, the definition is dead last to many various and sundry less important things. I'm sure this question has come up, but I did some looking and couldn't begin to find it. 108.54.62.155 17:30, 5 June 2013 (UTC)

It comes up on WT:FEED quite a lot. There's an argument for it, one argument against it is when say 'the definition', many words have more than one definition, some words have dozens of definitions. Since we're a multilingual dictionary we do need to put what the language is. The definition of lit changes a lot depending on what language you're speaking. Inflection is quite important but could I suppose go after the definitions, as could just about everything else. I've seen at least one suggestion to use the ===Definitions=== headers which I quite like. I suppose, one factor that shouldn't be underestimated is how users will get used to the format if they use Wiktionary enough. It's pretty simple; I'd've thought most people can learn it in a matter of minutes. Mglovesfun (talk) 17:43, 5 June 2013 (UTC)
It would be a good idea. If we were able to separate data from presentation (the two concepts are currently hideously intertwined here), we could easily generate custom layouts with javascript. Until then any change would be counterproductive. DTLHS (talk) 18:03, 5 June 2013 (UTC)
But remember that in most documents, certainly in web pages, the text is serialized – it has an inherent order. This order will manifest itself in many contexts – excerpts, search results, mobile view, in non-visual browsers including screen readers and braille readers, when our data is reused elsewhere, etc.
We could expand this to a discussion about whether an entry is a page or a database. Michael Z. 2013-06-06 17:11 z
  • Arrowred.png As things currently stand, we've got data that's supposed to have a specific structure, but that structure is enforced manually by humans (and by bots), rather than by the system itself. This is rather horrible, in a number of different ways -- it's terribly inefficient, it's error prone, it unavoidably mixes data and presentation in ways that have been recognized to be huge no-nos, and it's very labor intensive. Building a database by hand is not the best way to go about it.  ;)
In a couple of my jobs now, I've had occasion to poke around looking at various terminology management apps. One that was quite interesting was TermWiki, which (as best I understand it) is mostly MediaWiki with the semantic extensions added in (http://semantic-mediawiki.org/, itself relying much upon mw:Extension:Semantic_Forms), one or two other openly-available extensions, and some custom whizbang. I have no idea how much pull or push we have with regard to how the Wiktionary back-end is set up, but Semantic MediaWiki is high on my wish list for this site.
If any MediaWiki extensions are entirely off the table, I think we should explore building tools ourselves to emulate that kind of automated structure building and integrity management. Why should I be expected to remember all the wrinkles of WT:ELE? That kind of structure is exactly what a database provides, automatically. Users shouldn't have to even be aware of this; it should just happen. We would avoid a huge class of entry maintenance problems if we could do this. -- Eiríkr Útlendi │ Tala við mig 17:31, 6 June 2013 (UTC)
Basically I agree. On the other hand you could look at it though as being similar to Wikipedia pages about people: they start out with the person's childhood, even though that is often not really what most people care about. Yet it's natural to begin at the beginning. The etymology is sort of a description of how the word grew up, if you will. --Haplology (talk) 15:32, 6 June 2013 (UTC)
Wikipedia's articles about people start with a summary of one's life and important achievements. It is strange to talk about the etymology of a word for which no definition has been given yet. I've also never understood why pronunciations are placed before the definitions. On fr.wikt we've moved the pronunciation some time ago, but the etymology remains at the top, too (but in our case we still have a general section for all homographs, which is both good and bad). Dakdada (talk) 16:16, 6 June 2013 (UTC)
I think the original reason for putting the etymology at the top was to distinguish words with different etymologies. I'm not really sure that is the best solution, though. -- Liliana 16:23, 6 June 2013 (UTC)
Also print dictionaries traditionally put etymology and pronunciation before the definition, so our order matches what you might see in one of them:

horse [OE. hors] /hɔrs/ n. 1. A hoofed mammal, Equus ferus caballus, often used throughout history for riding and draft work. 2. A piece of gymnastics equipment with a body on two or four legs, approximately four feet high with two handles on top. [...]

However, all the print dictionaries I currently have to hand do in fact put their etymologies at the end of the entry, not at the beginning. They do all put the pronunciation first, though. Instead of ===Etymology 1===, ===Etymology 2===, and the like, maybe we could find some other heading like ===Word 1=== or ===Form 1=== or ===Lexeme 1=== to use instead. One thing that's bothered me about separating by etymology is that people are often tempted to separate parts of speech this way, for example listing the noun house as ===Etymology 1=== and saying it comes from Old English hūs, and the verb house as ===Etymology 2=== and saying it comes from Old English hūsian. Which is strictly speaking true, but not the way I think we should be using those headings. —Angr 17:08, 6 June 2013 (UTC)
  • Arrowred.png In Japanese, that (breaking an entry down by each etym) does actually seem to be the best organization -- sometimes you have a single "term" as written, but it's actually umpteen different "terms" as spoken, and each has its own etymology. Lumping all of those different readings and definitions together and then trying to explain the etymologies of each after the fact would be horribly confusing. C.f. 愛#Japanese, 大人#Japanese (multiple readings shown, but the entry still needs etyms & pronunciations, etc.), 目#Japanese, and so forth.
One suggestion would be to amend the CSS or JS to make etymology sections auto-collapse, so users only see the text if they want to (by either clicking or by customizing their CSS/JS). Just the etym text itself, not including the subsections thereunder. Inspecting the rendered page shows that the etym headers are <h3> elements, generally followed by a series of <p> elements until the next header. I know that some of our subsections use collapsible divs, like in {{der-top}}; I have no real idea how difficult it would be to auto-collapse a series of <p> elements based on their relative position in the page. -- Eiríkr Útlendi │ Tala við mig 17:18, 6 June 2013 (UTC)
I have always thought that XML would be more suited to making a dictionary, because it more strictly separates formatting and structure. It also has the advantage of being able to validate pages and reject them if they are invalid. That would allow the software itself to check the formatting, and add categories for missing inflections and so on. —CodeCat 17:40, 6 June 2013 (UTC)
That or JSON. Anyway, some things that would have to happen: 1. write a parser / validator (in Lua?) 2. create a new editing interface (javascript) 3. figure out how to make it work with our existing template, Lua and javascript infrastructure. 1 and 2 are easy enough but I'm not sure how 3 would work (can you call templates from a string in Lua?). DTLHS (talk) 19:51, 6 June 2013 (UTC)
Writing our own parser seems a bit pointless because we would want to make use of something like XML schema for validation, and XSL as well to transform the data into HTML. —CodeCat 19:56, 6 June 2013 (UTC)
Yes you're right. How do you envision something like this working with our existing infrastructure? Or would we have to scrap everything. DTLHS (talk) 20:10, 6 June 2013 (UTC)
Just write a new mw:ContentHandler (though this would probably require some coöperation from the WMF…). As for scrapping everything, I would implement migration this way: first split pages into per-language elements containing blocks of wiki markup, and then successively refine the markup schema to capture more of the entry structure, and rewrite pages into the new schema. With current quite consistent usage of templates, I think Wiktionary markup is already quite machine-readable, so actually bots could be doing the conversion. And when they fail, they could just leave wiki markup in place as a fallback. Keφr 20:28, 6 June 2013 (UTC)
Something like that would work, yes. One effect of having a strict separation of content and presentation is that we would need to split our templates and modules in the same way. A template or module could, under this scheme, either generate content or display it, but not both. This would have quite a few consequences that we would need to work out. Inflection tables are probably the easiest to do, but other things will need more thought. —CodeCat 20:41, 6 June 2013 (UTC)
  • @CodeCat, we already have the database, no? Why recreate it using XML? And if we're about to embark on anything that "would probably require some coöperation from the WMF", shouldn't we first look at things like Semantic MediaWiki, given that they've already done a lot of the work? I know some folks like reinventing the wheel for the thrill of learning how it was done, but I'm more interested in having a back-end that works well, and sooner rather than later.  ;) -- Eiríkr Útlendi │ Tala við mig 21:39, 6 June 2013 (UTC)
    • XML and a relational database are very different things. Databases are meant for representing raw data without structure, whereas we definitely do want structure. XML is also much easier for people to edit because it's human-readable text, whereas for a database we'd need to make a whole interface as well. —CodeCat 21:50, 6 June 2013 (UTC)
      • Really, I think we need to look at what's already out there before embarking on anything this major. What you're proposing sounds to me an awful lot like something that's already been done. To wit:
      1. We already have the MW database. Readily available extensions already provide much of the functionality required to ensure structural consistency and integrity. See above about the semantic and form extensions.
      2. Wikitext is also human-readable, and it's much less verbose than XML, and it's already supported.
      3. Those same readily available extensions also already supply an interface.
      Jumping right into creating a whole huge infrastructure for reworking everything about Wiktionary to use XML and then reworking everything about the UI to deal with that XML, all pretty much from scratch, strikes me as potentially foolhardy. Again, I'm sitting here looking at something that looks an awful lot like a wheel, and that already exists. And then I hear you talking about plans to invent one.  :-/
      Note, I'm honestly not trying to be obstructionist, I'm really just trying to make sure that we proceed with our feet on the ground, and in possession of all the relevant facts. -- Eiríkr Útlendi │ Tala við mig 22:17, 6 June 2013 (UTC)
      Why would we need a whole new UI? My intention is that when you click edit, you are served with XML-ified page content instead of wikitext, and you edit it in that form. Then you save, and saving will validate the page and if it's valid, it's done. The editing process itself would not change at all. The only thing that would change is that there is an XML validation step followed by an XSL-driven pre-parser which converts the XML into wikitext (or directly to HTML cache). So from the wiki's point of view we would still store things in pages like we do before, only the source code of those pages would be the more data-oriented XML instead of the current presentation-oriented wikitext. —CodeCat 23:29, 6 June 2013 (UTC)
      • I'm confused -- it sounds like you're suggesting that editors would still have to remember everything about WT:ELE, and now a lot of stuff about XML, and would still be editing the raw code -- the only addition you describe that seems to make sense is the validation.
      Again, I think Semantic Wiki does everything good that you describe (enforcing structure, validating data, etc.), only without the ugliness -- data is entered using forms, not raw XML.
      Besides which, doesn't MediaWiki filter out or disallow certain kinds of markup in the raw wikitext? Or does that only apply to a small subset of HTML, like <a> tags? -- Eiríkr Útlendi │ Tala við mig 23:36, 6 June 2013 (UTC)
      No, the whole point is that we can ignore ELE because of the XSL processing step. The XSL style sheet will determine which parts of the XML tree go where, so it will rearrange the source to match whatever we like. The order of elements in the source would be dictated by the XML schema but it would not affect the end result. So you could put translations first or last in the source, or even reorder all the languages, and they'd still show up the same on the page thanks to XSL. —CodeCat 23:42, 6 June 2013 (UTC)
      Still, 1) editing raw XML == yuck. No thank you. 2) And we couldn't ignore WT:ELE entirely, as we'd still have to be aware of what kinds of content were allowed. 3) Why reinvent the wheel? -- Eiríkr Útlendi │ Tala við mig 23:52, 6 June 2013 (UTC)
      I'm not trying to reinvent the wheel. It's just how I would envision a purely source-based dictionary wiki like we do now. Of course if we drop source editing altogether we can change a lot of things, but I feel that that would not actually be very practical in many situations. Being able to copy parts of a page in one go has its advantages. —CodeCat 23:55, 6 June 2013 (UTC)
      Yes, copying chunks is quite useful. I believe that's still possible using Semantic MediaWiki; I just popped over there, and you can still get at the raw source just fine. Their page on wiki/Semantic_Forms describes some more of what I was thinking -- directing user input in ways that hide away the minutiae of proper wikitext. -- Eiríkr Útlendi │ Tala við mig 00:14, 7 June 2013 (UTC)
  • We don't really have very structured data at this point. We really should be using things like {{senseid}} and (hopefully improved versions of) {{etymtree}}/{{findetym}}. Switching straight over to XML or anything else to far from our current format is probably not going to happen, at least not any time soon, for a number of fairly obvious reasons. That shouldn't stop us from making more usable data, or dealing with the main topic at hand, that the definition is too hard to find. We could put cognates in a collapsible box, preferably one much smaller than the normal full-sized 100%-width large-padding boxes that translation tables use. We could make the pronunciation section more compact. We could shrink down the ginormous headers and inflection lines and the space between them. All sorts of things would help. --Yair rand (talk) 00:40, 7 June 2013 (UTC)
    Thanks for mentioning {{senseid}}, I didn't know about that feature. That's exactly the type of thing I've been thinking WT ought to have. I tried it just now with ハウス. --Haplology (talk) 06:13, 7 June 2013 (UTC)
    Thank you Yair, yes, back to the main topic :) -- (rabbit hole well and duly fallen into for the day) -- in addition to the visual reworking of an entry page's style, which should be easy enough to do, another of the suggestions above that we could implement fairly easily would be to use a ===Definitions=== header, clearly pointing the user to the defs. -- Eiríkr Útlendi │ Tala við mig 05:03, 7 June 2013 (UTC)
    Definitons are put in PoS sections which use level >3 headings, so it should be a level >4 heading. I'm not agree with this idea though, we are already over-sectionizing informations, which works for bigger entries but just causes problems in most other cases. For example, having a separate section for pronunciation is a good idea for some languages, but doesn't work for most other ones (compare ܟܣܐ with this version). --Z 09:32, 7 June 2013 (UTC)
    • @Z, it would make much more sense to put ===Definitions=== at L3 with all the POS under that. That way you only need one defs header. -- Eiríkr Útlendi │ Tala við mig 15:24, 7 June 2013 (UTC)
    • @Z, part of the issue with Syriac (and presumably other Semitic languages that don't mark vowels) is that one written form can have multiple spoken forms. Japanese does this too, only in different ways. I've been treating each reading (i.e. spoken form) as a separate etymology, since, frankly, each spoken form has its own derivation, which users may want to know about (I do myself, which is partly why I'm happy to dig up the whys and wherefores and write it all down here). Have a look at 愛#Japanese for one example -- this single written form has three spoken forms (that I know about) that are regular words, each with its own derivation, and then tons of exceptional uses in names. 目#Japanese has four spoken forms, 盆#Japanese has three, and so forth. Hopefully food for thought, at any rate.  :) -- Eiríkr Útlendi │ Tala við mig 15:36, 7 June 2013 (UTC)
Moving pronunciation below definitions is doable, IMHO. However, when I made the proposal in Wiktionary:Beer_parlour/2012/April#Moving_.27pronunciation.27_down_in_ELE, there were no supporting posts. --Dan Polansky (talk) 17:08, 7 June 2013 (UTC)
  • Separating readings by etymologies sometimes results in repeating definitions word-for-word such as 旋風 where we have exactly the same definition all of four times. Meanwhile maybe only one of them is common and the others are freaks that only live in dictionaries, and the only note to this effect is hidden in a Usage Note. Sometimes I use {{ja-altread}} to avoid this, but it's cheating. --Haplology (talk) 16:49, 8 June 2013 (UTC)
    • That particular entry could certainly use some cleanup, expansion, and clarification (I'll add it to my list).
    In general, though, Japanese presents a bit of an odd case. For folks not familiar with the mechanical issues of Japanese, imagine that eat, consume, and ingest all had their individual pronunciations, etymological derivations, and meanings -- but all shared the same single single spelling. Consequently, all would go under the same headword here at Wiktionary. So the question then becomes how to organize that entry.
    Duped defs in and of themselves I don't think are necessarily a problem; if the overlap is complete, a simple "see [some other reading elsewhere on the page]" could suffice. If the readings are really distinct, I think the etyms should be broken out, with proper etymological descriptions given. If the readings are only slightly different, and this produces no difference in meaning, I think {{ja-altread}} is probably the way to go. See 伊邪那岐 for one such example -- this can be read as Izanagi, or Izanaki, with no real change in meaning or derivation. The difference is explained by a simple sound shift, mentioned in the etym. Meanwhile, 紫苑 is a different example, where each of the three different readings has the same def for the proper noun, but other aspects are different -- the Shien reading is never used for the common noun, and shioni is never used in the derived term.
    The combination of the MW back end, the need to accommodate multiple languages per headword, and the oddities of Japanese orthography all lead to a bit of an inelegant matching, but there you have it.  :)  -- Eiríkr Útlendi │ Tala við mig 18:51, 8 June 2013 (UTC)
    This is a tangent, but one example of semi-synonymous English homographs that comes to my mind (though it isn't a very good example) is board+board. Amusingly, en.Wikt currently conflates the two separate but overlapping etymologies and groups of senses which that string of letters has, but if you can read German, you can take a look at how I handled them on de.Wikt. - -sche (discuss) 19:25, 8 June 2013 (UTC)
    That's interesting (though I struggle to follow German). There are indeed two separate origins, but I'm not convinced that such a clear case can be made for separation of modern meanings since the two words had already become conflated in Old English. Dbfirs 12:02, 10 June 2013 (UTC)
    There should at least be a note somewhere explaining how "a length of a piece of wood" came to mean "food". As the board entry currently stands, that development is completely unclear, probably leaving any reader quite puzzled. -- Eiríkr Útlendi │ Tala við mig 17:54, 10 June 2013 (UTC)

Make Template:alternative form of explicitly say that a US/UK spelling's definitions are found in the other entry

Continued from User talk:Dbfirs#Double_links.

Recently, I noticed several edits like this, where a second link to sense of humour is put immediately after the templatised one, explicitly stating that the definitions of sense of humor are found in [[sense of humour]]. Is this desirable? Dbfirs thinks it is, because "users will expect to see definitions for a valid word in their region, so the repetition makes is clearer that the definitions are only a click away". I think it isn't, because users are no more likely to expect content on favour/favor than on their preferred spelling of kinnikinnik/kinnikinnick/kinnickinnick/etc, and no less likely to understand how a soft redirect works on one of those pages than on the other. What do you think? If it is desirable, can the extra text be added by {{alternative form of}} itself (or by whichever other template we use to redirect US/UK spellings; e.g. {{form of|Standard form}} or *{{standard form of}}), rather than being added by hand? - -sche (discuss) 22:12, 6 June 2013 (UTC)

Redundant links with different link text are confusing for users.
The link text is also wrong, because English spellings cannot be divided into binary US and British sets. Humour is a British and Canadian spelling, while curb is a Canadian and US spelling. Unless you disavow the classic linguistic meaning of British English, in which case humour is a UK, Irish, Canadian, Indian, South African, Australian, and New Zealand spelling....  Michael Z. 2013-06-06 22:48 z
Yes, you're right, it's more complicated than I anticipated. Perhaps the only way to deal with the variations is by adding individual usage notes? Dbfirs 05:37, 7 June 2013 (UTC)
I hope you don't mean usage notes about humor placed on the entry for humourMichael Z. 2013-06-07 13:59 z
No, those entries are perfectly clear as they stand. It's when we provide a soft redirect from a correct spelling (for one region) to an incorrect spelling (for that region) that we need clarification. Dbfirs 21:49, 7 June 2013 (UTC)
So how come we have a "soft redirect" from sense of humor to the simple definition at sense of humour? But full entries at humor and humour, where two separate, moderately complex pages for the spelling variations of a single actual English term will be forever impossible to keep in sync?
Not only is this inconsistent for no reason, but here we are discussing the idea of complicating the simple pages instead of merging the relatively complicated ones. Michael Z. 2013-06-08 01:43 z
Humour and humor haven't been merged yet only because no one has gotten to them yet. As I wrote in the Tea Room, I've been reducing such content duplication for about a year now, but it's hard to find entries: when I started, there were 13 pairs of supposedly synced entries (findable via Category:English synchronized entries and via HTML comments), and not a single one actually was or had even recently been synced. Unfortunately, there are many pairs like humo(u)r that are such terrible messes that they're not even categorised.
Postscript: I just consolidated humo(u)r and colo(u)rful, leaving only colour/color (which are only synced because I synced them) as a monument to failure, a reminder of naïveté, a proof for anyone who, in the future, ever finds it hard to believe that people would have attempted something as foolish as the total duplication and perfect synchronisation of content across dozens of pages. (Let all who doubt view the entries' edit histories see that Wiktionary did once try that, and that the entries did indeed spend much of their time out-of-sync.) - -sche (discuss) 02:56, 8 June 2013 (UTC)
Bravo. If you have a free weekend, have a look at labour, Labour, labor and LaborMichael Z. 2013-06-08 03:31 z
Blimey! Looks like you've already been through that one. Michael Z. 2013-06-08 03:32 z
I still prefer separate entries, but I have to admit that the problem of synchronisation is almost insurmountable until we find a way to include a single set of definitions in both entries. Meanwhile, sche's amendments have produced unambiguous entries. Is everyone happy that we standardise on this format, with the optional addition of individual usage notes where the spelling situation is complicated? Dbfirs 12:14, 8 June 2013 (UTC)
I hope that your thoughts allow for the occasional possibility that there are regional differences in meaning and usage that happen to correspond to the regional prevalence of the spelling. I hope we don't end up discouraging contributions because of seeking 'coordination'. DCDuring TALK 15:18, 8 June 2013 (UTC)
Indeed, there are subtleties that are best explained by separate entries, but the consensus of editors seems to be that the problem of trying to keep these synchronised is insurmountable. (I agree that synchronisation is a problem.) I like your suggestion below. Do you think the experts here would be happy with the links? Should I try a few to see what people think? Dbfirs 08:24, 10 June 2013 (UTC)
I'd try the experiment in a few pages mentioned in discussion pages (to speed the feedback). Widespread implementation would not be wise without some positive response, at least, and, at least surly, grudging silence from others. DCDuring TALK 12:59, 10 June 2013 (UTC)
There is Wikipedia-logo.png American and British English spelling differences on Wikipedia.Wikipedia:American and British English spelling differences. We could provide, under Usage notes, a section-specific link to the section covering specific classes of "UK-US"-type (pace MZ) spelling differences. We could even have templates like {{en-spelling -or-our}} that provided the section link and could be updated when we get our own vastly superior coverage of this and similar matters next month. DCDuring TALK 14:30, 7 June 2013 (UTC)

I can't imagine how "subtleties [...] are best explained by separate entries." Even a series of blatant contrasts in usage are difficult to assimilate while flipping between two, three, or four variant entries, or having them open in separate browser windows. And how is the reader under this heavy cognitive load supposed to discern "subtleties" from synchronization errors?

Labor, labor, Labour, and labour are not four different words. No other dictionary, print or electronic, considers moving them to separate pages, because that would be a disservice to their readers and reduce the value and utility of their resource. Michael Z. 2013-06-10 14:58 z

Explaining meaning is the most important task here. Usually differences are of interest to linguists. A meaning that is present in one spelling, but not in the other, belongs to the entry in that spelling in which it exists. Usage examples for a sense are often regional as well, though that would argue for revising them rather than duplicating content. DCDuring TALK 16:37, 10 June 2013 (UTC)
When a non-linguist hears an unfamiliar usage of /ˈleɪbər/ on the radio, she has to learn of the existence, locate, and click through four separate articles, then compare all their senses in her head before she can be confident that she has determined this one word's meaning. That's a failure of the dictionary. Michael Z. 2013-06-10 17:03 z
I don't disagree with the general advantage of combining senses across alternative spelling entries. I'm just suggesting that we need to not prevent or even discourage user input on whatever page they prefer. Their choice of one page rather than another might be useful data about the actual distribution of the sense or they might be knowledgeable about the term's usage. I fear the consequences of the accretion of layers of rigidity and complexity leading to a gradual and premature ossification of Wiktionary, especially in English.
It's an empirical matter as which layout might give the best results for various classes of users with various look-up needs. I am extremely skeptical that we will manage a major advance with a part-time crew of amateurs and no empirical data on how our current and potential non-linguist users use and could use Wiktionary. Wikipedia hews close to the style of an encyclopedia in most regards. Just as the QWERTY and telephone-style keyboards largely define many, many user-input interfaces, the limited variation in styles of dictionaries provides the total range of base interfaces we can use. Feasible paths to desirable innovations seem to me to be limited to incremental changes.
For incremental changes, we don't even know whether our existing users would prefer that the main entries typically (subject to variation by type of alternative spellings and even by individual term) be the "US" or "UK" (or "North American") variants. Suggested proxies for the missing empirical data are the relative sizes of English-speaking populations in countries assumed to prefer one or the other, the preferences of our contributors generally, the preferences of our contributors who make the first entry in one of the main alternative forms, usage in the large controlled corpora like BNC and COCA, and usage in the various Googles, especially News, which enables regional differentiation. DCDuring TALK 17:32, 10 June 2013 (UTC)
(tl;dr summary: this is a non-issue that can be sidestepped anyway in most cases)
I doubt many senses are attested only in certain US or UK spellings. Some UK/US authors use US/UK spellings, and UK/US works are often copyedited when published in the US/UK, so even British senses of -our or -ise words are usually attested in American spelling, too.
Even if a sense is attested only in a word's US spelling, I doubt the same word has a second sense attested only in its UK spelling. Thus, in most cases, we can sidestep the issue by making whichever spelling has the peculiar sense the lemma: then the sense is on the "right" page, and all the senses are on one page.
If a word does exist which has some senses only when spelt -ise and other senses only when spelt -ize (can anyone think of such a word?), then I still think the content should be centralised: I agree with Michael that scattering not-entirely-overlapping sets of definitions over different pages obfuscates, rather than explains, a term's meaning. (It requires casual readers to, first of all, realise that the content is spread out across multiple pages, and then to compare the pages to see which spellings have which senses.)
Capitalisation is an at least slightly different issue. What I've most often seen done, and what I do, when a word has different (and overlapping) senses when capitalised bzw. uncapitalised, is what is done on gypsy/Gypsy (each term has a definition line linking to the other). I try to employ similar linking from singular entries to plural-only senses (see message, messages). I suppose such linking some also be employed between -ise and -ize words, etc, but that seems unideal—less ideal that simply putting all the definitions on one page.
DCDuring's comment of 17:32, 10 June 2013 makes me think we may be talking past each other / about two different things, however. I tend to accept that new users should be allowed to add content to whichever page they like; if the content they add to an alt form is already found in the lemma entry, their edit can be rolled back; if the content isn't found in the lemma entry, it can be moved thither. (In this as in many other things, Wiktionary might benefit from adopting "Stabilversionen", so that we could clean up noobs' misformatted entries at leisure, and be less rigid about formatting in the meantime.) - -sche (discuss) 17:43, 10 June 2013 (UTC)
I see the basic issue as what is a term and its lemma (WT:Lemmas is sorely lacking). We have separate entries for spellings and capitalizations (while keeping together completely unrelated other-language terms that share an orthography). But a dictionary entry is properly for a term in a particular language, and offering a survey of its variable properties like spelling and especially capitalization.
From the readers' point of view, they may need to find a lemma entry based on any alternative, regional or historical spelling, and especially capitalization, which can be absent in their source – as in spoken form or most subtitles and captions – and can vary freely – as at the beginning of a sentence, in title caps, or in all-caps texts.
The one lemma entry should guide the reader as to the normal or preferred spelling and capitalization, or its range of usage. So in addition to usage and grammatical labels, either entries/headwords or particular senses may require an indication of the usual or preferred spelling and capitalization, plus the variations of these in different regions, over time, or in different contexts. Michael Z. 2013-06-10 19:32 z

So, can these notes – as found in entries favour and tumour – be removed? (Or at least replaced with a template so they can be easily found and altered?)

In "UK and Canada spelling of favor", it is clear what the link takes the reader to.

In "UK and Canada spelling of favor. (For definitions, see the American spelling.)" it is unclear what "the American spelling" is referring to. There is no indication to the reader why there are two choices, and no evidence of what two things they lead to. Michael Z. 2013-06-12 14:35 z

Also confusing: "the American spelling," when the target headword is labelled US, alternative in Canada. Labels on links should use the same terminology as labels in entries. Links should not be redundant, and their destination should be clear. A better alternative might be one of:
  1. UK and Canada spelling of favor (US).
  2. UK and Canada spelling of favor (US spelling).
But we have enough difficulty labelling headwords. It's a mistake to also start labelling links in headword lines too.
If you want to explicitly mark the relationship, then add an "Alternative forms" header. Michael Z. 2013-06-12 15:17 z
Yes, I'd already conceded to the majority view, so I've removed my addition. I've also replaced "Canada only" with "Commonwealth", even though I don't really like the label. What alternative would include NZ, Oz, SA etc? Dbfirs 16:13, 14 June 2013 (UTC)
I'm arguing for British, referring to the British English branch of the language. We were using that when the label text was changed to "UK" with no allowance for the difference in meaning. Michael Z. 2013-06-15 15:28 z
In that case, I fully agree with you. Can we change the template to say "British English" rather than "UK" when the parameter "from=British" is used? Dbfirs 08:02, 16 June 2013 (UTC)

Adding explicit "context" before context templates on definition lines

MewBot (talkcontribs) has started to replace {{vulgar}} with {{context|vulgar}} and the like on definition lines. An example edit: diff. Apparently, no one has protested so far. Let this be a record in Beer parlour that this is ongoing. I have doubts about advisability of these replacements, but do not see them as obviously wrong. I am disappointed that this was not expressly discussed.

Before the example edit:

  # {{pejorative}} A man who uses services rendered by [[whores]].  

After the example edit:

  # {{context|pejorative}} A man who uses services rendered by [[whores]].  

--Dan Polansky (talk) 18:23, 7 June 2013 (UTC)

Can you be specific about these doubts, or are you just concern trolling? DTLHS (talk) 18:26, 7 June 2013 (UTC)
I do not know what is "concern trolling". The new text is longer and looks more messy in the wiki text, for one doubt. WT:AGF. --Dan Polansky (talk) 18:28, 7 June 2013 (UTC)
Okay, as per concern troll: "Someone who posts to an internet forum or newsgroup, claiming to share its goals while deliberately working against those goals, typically, by claiming "concern" about group plans to engage in productive activity, urging members instead to attempt some activity that would damage the group's credibility, or alternatively to give up on group projects entirely." --Dan Polansky (talk) 18:29, 7 June 2013 (UTC)
Fair enough. For what it's worth, I also wonder about how productive it is to add everything under {{context}} if that template is eventually going to be orphaned or changed to something more broad like "label". DTLHS (talk) 18:37, 7 June 2013 (UTC)
Subjectively speaking, I just don't like the new wikitext. I suppose there must be some advantage, or else the bot would not be doing it. But I am not very clear about what the advantage is. --Dan Polansky (talk) 18:42, 7 June 2013 (UTC)
I am mainly doing it as a first step so that the initial problems have been worked out, and we can work out what (concretely) to do next. It may be somewhat redundant but it's not harmful either, and it eases the conversion later. One of the main problem points for having a template for every label (which I listed above) is that there is an automatic conflict between any label and a template with the same name. This created problems like {{acronym}} which was originally not a context template and which was causing script errors because some pages used {{context|acronym}} as a label (correctly). Another example is our inability to use "law" as a label because of {{law}}, having to resort to "legal" instead. Removing this point of conflict, even if we don't do anything else, is therefore a definite benefit.
The advantage of adding {{context}} now is is that once all calls are through {{context}}, we can start editing the label templates themselves without worrying about breaking this backwards compatibility, because we can assume that they are always called by {{context}}. This makes it possible to do the transition in small steps gradually, without any fear of breaking anything with one huge step. Also, the next step I was intending to do was to add lang= to the template where it's missing, as the proposal above (the name of the final template hasn't been settled yet) would have a mandatory language code as the first parameter. It will also will catch many errors. This is a lot easier to do this replacement if there is only one template to consider than if there are hundreds. —CodeCat 18:52, 7 June 2013 (UTC)
I see. The ultimate verbosity fest that you are planning is this:
# {{context|pejorative|lang=en}} A man who uses services rendered by [[whores]].
Yuck. --Dan Polansky (talk) 19:04, 7 June 2013 (UTC)
No, not the ultimate. It's just an intermediate step in the process. Once all labels are called via {{context}} and they all have a language, converting all of them to another call (using another template) becomes trivial. I am just doing groundwork right now. Also see my post below. —CodeCat 19:07, 7 June 2013 (UTC)
CodeCat's proposal included: "I think that the approach used by {{label}}, in which labels always need {{context}} or {{label}} prefixed, is preferred for a Lua implementation."
If people read and supported the proposal, and they did, they also supported mandatory {{context}}. — Ungoliant (Falai) 18:46, 7 June 2013 (UTC)
You seem to be referring to #Lua-cising Template:context. There, two people supported:
  • "I support everything. ..." --Ungoliant
  • "Sounds good, but let's abandon the misleading name "context." ... —Michael Z. 20
On the day on which the post was made, the bot started replacing. But whatever. The point right now is, if someone can explain the benefits, they should do so. --Dan Polansky (talk) 18:52, 7 June 2013 (UTC)
I don't really see the point of the effort under challenge except to remove an implementation barrier to deprecating all direct use of context tags. Presumably that is being done to bring Luacization within the capabilities of the technical resources that we have. If Luacization requires millions of extra keystrokes, thousands of them mine, then it doesn't seem like such a good idea to me. A philosophy of adding keystrokes where not necessarily essential seems exactly the opposite of what we need. I thought that Luacisation would not be used as a reason for such an obvious regression. I thought a basic principal of user interface design is: NO REGRESSIONS. DCDuring TALK 18:55, 7 June 2013 (UTC)
I reject any notion that simply adding more text is automatically a regression. Besides, "context" can be renamed to something shorter eventually. DTLHS (talk) 19:01, 7 June 2013 (UTC)
Indeed, and that was already discussed as well. Proposals were {{label}}, {{x}}, {{ctx}}, {{ct}} or {{c}}, although DCDuring you yourself said that we should abandon the name "context". Once the language code templates are out of the way, we will have a lot of freedom in naming because we can use almost any two-letter name. How would {{lb|nl|rare|archaic}} be in contrast with the current {{context|rare|archaic|lang=nl}}? It's quite a bit shorter. —CodeCat 19:05, 7 June 2013 (UTC)
You are misreading by lack on enthusiasm for the name "context" for some kind of implicit support for adding to the the typing burden. One of the great virtues of the {{context}} system is that it allows folks to add whatever they thought was appropriate for the entry. The implementer of the system would periodically review the labels added and make a new direct context label where appropriate, sometimes making it a redirect. On the NO REGRESSION principle, I assumed that this behavior would be continued as a matter of course. That requires that there be a template to support labels not yet implemented as direct labels. That is the sole essential use at present for {{context}}. It seems that the current conception is to discard direct labeling for programming convenience. I would be inclined to add Category:Entries with redundant context templates and eliminate all those instances where context preceded a label which has its own template. DCDuring TALK 19:35, 7 June 2013 (UTC)
That part would always work. Any new proposal would certainly allow you to use "new" labels, as well as to make labels aliases of one another (something we currently use redirects for). The difference would be that the list of "recognised" labels (which get a category or a link or something else) is stored in a Lua module rather than in templates. So you could still type {{lb|nl|I'm a label}} and it would work. And you could still then, later on, define a new label called "I'm a label" and the entry would then use that. —CodeCat 19:41, 7 June 2013 (UTC)

I support this. CodeCat is eliminating all of the conflicts and open-ended problems with {{context}}. Huzzah! 🐱

DCD, the concept of invoking a template without typing its name is completely flawed. An open-ended set of keywords is not templates, it is parameters. Abandoning the idea is not a regression. Being able to move forward is progress. Michael Z. 2013-06-07 19:48 z

Another way to see it is in terms of "namespaces". Not namespaces as they are in the wiki but a bit more abstract, they are the set of all possible names. Currently, the namespace that contains all possible context labels is the set that is the union of: 1. all templates that are context labels, plus 2. all other names in the Template: space that do not exist. This has created many problems because as Michael said it's an open-ended set of names for a namespace that is not open-ended, because it already contains other templates. Every template you create restricts the space of possible context labels further and there have already been conflicts. {{plural}} is not a context label template, but most of its current transclusions are via {{context|plural}}, and they just happen to work, kind of, by chance (but {{context|plural|rare}} will break!). And then there is the problem I mentioned with {{law}} and {{acronym}}. Ruakh's original {{label}} proposal as well as my own both avoid this by making the namespace for context labels distinct from that of template names. This does mean that you have to call the template explicitly each time, but I think that is a small price to pay compared to the issues we otherwise have (and have had). —CodeCat 19:54, 7 June 2013 (UTC)
I understand the point about restrictions on available context names, though it seems rather rare in practice. I'm not in a position to evaluate what is or is not possible technically. What I see is simple regression from a entry-content-contributor PoV. For templates that are used commonly, like context or its successor a one letter name, or alias, eg {{c}}, would seem best. At more than 300K uses before all the conversion of all the directly template labels, it would seem to have earned the right to usurp that name from the "common gender" template.
The principle of liberating text strings from their least efficient uses has even broader application. Why not reserve all one- and two-letter strings for things that are directly input by humans. The principle has already been established by such templates as {{m}}, {{l}}, {{g}}, {{a}}, etc. Perhaps it is time to revisit the use of two- rather than three-letter language codes to liberate more one- and two-letter codes for those doing manual data entry. {{n-g}} shows another approach for two-part names. DCDuring TALK 20:26, 7 June 2013 (UTC)
The gender and number templates may disappear in the future as well, as we now have a module for them. There are still many cases where they're present directly in entries, but we can find solutions for that. So that will free up {{c}}, {{f}}, {{n}}, {{p}} and so on. Currently, {{head}} already uses the module, but none of the others do, and it will take a while to migrate all of our headword-line templates over, so we can't use {{c}} just yet. I do prefer {{lb}} over {{c}} though, and we can't usurp {{l}} because that template is probably used even more widely than {{context}}. —CodeCat 20:48, 7 June 2013 (UTC)
  • It's slightly longer, but I'd like to make a plug for using {{lbl}} instead of {{lb}} -- {{lbl}} is an obvious mnemonic for "label" for English speakers, while {{lb}} is obtuse enough that I had to reread a few paras above to remind myself of what it was supposed to stand for. -- Eiríkr Útlendi │ Tala við mig 21:15, 7 June 2013 (UTC)
It should be possible to put Mewbot to work on renaming the 28K instances of {{c}} and get that done quickly. {{temp}} has fewer than 19K instances. All of these are far fewer than such templates {{head}} and the reigning champion of both raw count and redundancy(?): {{Latn}} @ 2,958K. {{Latn}} is an example of something that could easily be much longer as it is not very commonly typed in by humans. DCDuring TALK 21:18, 7 June 2013 (UTC)
It's not that easy. Only a small proportion of all uses of {{c}} is actually from direct transclusions in entries. Most of them are called through other templates like {{t}}. So we would need to track down all the templates that use genders in such a way and change them, but that is quite a big task because it has to be done by hand, and there are many languages whose nouns might use gender templates. It would take a few weeks to track them all down and fix them, assuming of course that there's a consensus for such an operation. We can't rename {{Latn}} because its name follows an ISO standard for script codes. —CodeCat 21:55, 7 June 2013 (UTC)
Just to make it explicit: I, too, support what Mewbot is doing. It's good for the reasons Ruakh outlined earlier and the reasons CodeCat outlined recently. It's an important first step in updating our context templates, which currently have to use highly complex, recursive code to account for the possibility that they might be called directly, or called from {{context}}, or called from another template, or followed by |another template}}, or followed by |something that isn't a template}}, or... - -sche (discuss) 02:16, 8 June 2013 (UTC)
I don't see why it is that the replacement for context can't simply compare a given label with a list of pre-existing valid labels and operate as if context had been called explicitly. Lua should make that much easier. Furthermore and more importantly, I think more attention needs to be paid to the process by which new labels are added. Our systems are increasingly effectively closed to many types of user input due to creeping top-down templatism - exactly contrary to wikiness. DCDuring TALK 17:41, 10 June 2013 (UTC)
I'm not sure what you mean by your first point. A replacement for {{context}} can do just that, and it will have to because that's how it already works currently. If you call {{context|label}} then currently it will look for Template:label and transclude it if it exists, otherwise it will just show the text "label". The Lua form will do the same, except that instead of looking for Template:label it will look in a Lua table of labels. —CodeCat 17:49, 10 June 2013 (UTC)

North Frisian (Mooring Dialect)

Is this an acceptable level 2 language header? See greewe as an example. If not, didn't we used to have a bot that sorted these things out? SemperBlotto (talk) 08:37, 8 June 2013 (UTC)

Wiktionary:Todo/invalid L2s picks them up. PS there is an unresolved one. Mglovesfun (talk) 08:39, 8 June 2013 (UTC)
Just as we use context labels to distinguish US English, UK English, Canadian English, etc., under the ==English== header, this entry should use one under the ==North Frisian== header. I'll go do that now. —Angr 10:44, 8 June 2013 (UTC)
Editing this entry, and leaving a comment on the creator's talk page, has raised a question in my mind. We are currently moving away from directly called context tags towards using parameters of {{context}} for everything. Does this mean that new directly called context tags are not to be created? If the creator of greewe wants to start a category for the Mooring dialect of North Frisian, can he start a new template {{Mooring}} to allow dialect terms to populate a Category:Mooring North Frisian? Or are such templates now deprecated, and everything is done by parameters of {{context}}? If the latter, what edits would have to be made to {{context}} to get it to know that anything labeled {{context|Mooring|lang=frr}} goes into the appropriate category? —Angr 11:27, 8 June 2013 (UTC)
Currently, {{context}} still relies on the presence of directly-named templates because that's how it works. The label templates always had this dual way of using them, and can still be used that way, except that it's now discouraged to use them directly. So you still need to create {{Mooring}} for now, but you should use it as {{context|Mooring|lang=frr}}. It's likely that sometime soon the templates will be changed so that only the latter works. —CodeCat 12:15, 8 June 2013 (UTC)

Phonosemantic interpretation

Lawrence J. Howell (talkcontribs) is adding information under the nonstandard 'Phonosemantic interpretation' header. I don't know what to do; is this information relevant, if so, how do we include it? Mglovesfun (talk) 11:38, 9 June 2013 (UTC)

I would categorize it under "wild theories with nothing to back them up". -- Liliana 11:47, 9 June 2013 (UTC)
Perhaps under "Usage notes" provided those knowledgeable about the language/characters agree. DCDuring TALK 11:58, 9 June 2013 (UTC)
He's also linking copiously to his own site (with his name on it), which sells books. I smell spammer. Equinox 12:13, 9 June 2013 (UTC)
IMHO, these should not be added as long as they are sourced from no more than a single source. --Dan Polansky (talk) 12:20, 9 June 2013 (UTC)
He asked about it here, but none of you commented. I wasn't sure what to say, myself, so I kept waiting for someone to weigh in. Regardless of the merits, he did ask. Also, I wouldn't call it spamming, since we have been using information from his site. Chuck Entz (talk) 14:43, 9 June 2013 (UTC)
Yes, he has asked; I am not crying foul. I merely doubt that speculative information sourced from a single source is suitable for Wiktionary. --Dan Polansky (talk) 18:22, 9 June 2013 (UTC)
Hello all. Thank you for the comments above, which are of the sort I expected might be generated in response to my original inquiry. Allow me to address them here.

The major sticking point appears to be the one expressed by Dan Polansky, who believes that single sourcing for this data is inadequate. As it happens, this research overlaps with that of an unimpeachable source for Old Chinese studies, Axel Schuessler, who discusses the relation between particular sounds and meanings in his ABC Etymological Dictionary of Old Chinese. I've uploaded a brief article detailing these overlaps, which anyone interested can find by searching my name and the name of the book. If dual citations are necessary, there are, among the characters in common use today in China and Japan that I cover, approximately 2,500 corresponding to the half-dozen sound/meaning relations specifically described by Schuessler. This corpus would seem to be an unobjectionable starting point for the Wiktionary entries.

Turning to other concerns mentioned, it appears something needs to said about the links. The editor who originally used my material linked with a (sitename).com format, providing unsolicited free advertising for my site. I was not comfortable with retaining that format for entries I was redoing, and chose to use only the family names of the individuals standing behind the claims, hiding the site name in the link coding. My intent was to be less, not more obtrusive, but the decision seems to have brought on the law of unintended consequences.

Finally, DCDuring mentioned the possibility of inserting data in a Usage Notes section, a suggestion that - -sche too had made earlier. I look forward to hearing from experienced editors about how best to go about doing this, especially in respect to coordinating the cites from my data and from Schuessler's. This would have the added benefit, I suppose, of allaying concerns about link spam. Thank you for your consideration. Lawrence J. Howell (talk) 07:16, 11 June 2013 (UTC)

This is utter phonesthemic nonsense, largely based on outdated reconstructions. I suggest simply reverting these additions. (reverting the reverting) Wyang (talk) 09:33, 11 June 2013 (UTC)
I agree with Wyang; these "phonosemantic interpretations" are linguistically nonsense. (I also disagree with reverting non-trolling/non-vandalism comments just because they're made by an anon.) —Angr 13:54, 11 June 2013 (UTC)
  • @Wyang, you say this is nonsense, but I don't really know how to weigh your comment -- based on what? Lawrence has pointed us to a scholarly work, providing a way for us to verify that someone out there has this theory, and ostensibly to follow up on that author's sources as well. (Google Books, for the interested.) On the opposing side, we have you and Angr, and while I respect both of you as WT editors, I have no real idea what your academic background might be, nor any real idea what underlies your calling this "nonsense". I don't have any solid basis for evaluating your comment.
Could you unpack things a bit? Maybe even point us to sources disagreeing with this Schuessler person? Curious, -- Eiríkr Útlendi │ Tala við mig 17:38, 11 June 2013 (UTC)
I don't have access to sources, but the statement "Old Chinese Initial /*k-/ lends semantic value Frame. Final consonant /*-t/ lends semantic value Cut/Divide/Reduce." at [1] was the first thing that set off my bullshit detector. It is axiomatic in phonological theory (the field my Ph.D. is in, since you asked about academic background) that phonemes have no inherent meaning (e.g. [2]), and although sound symbolism has some reality, it generally holds across languages rather than being language-specific, and it tends to correlate sounds with vague physical properties like "big/small", "pointy/rounded" etc. rather than very specific concepts like "frame" and "cut/divide/reduce". —Angr 19:39, 11 June 2013 (UTC)
Schuessler (2007) is a pioneering work in Chinese etymology and is in general reliable, but the "phonosemantic interpretations" posted here are far from what was written in that publication. Schuessler tentatively proposed some Old Chinese prefixes and suffixes (not initials, vowels and final consonants), some of which are reconstructable at the Proto-Sino-Tibetan level, and identified some phonesthemic patterns, such as *m– for "darkness", *–m/p for "closure", which are patterns generally found translingually, especially in the E/SE Asia region. However, the theory advanced by User:Lawrence J. Howell is that every Old Chinese monosyllabic word can be reanalysed in terms of initial, vowel and final consonant, in that all or part of its phonological shape resulted from its meaning, which is of course untrue for any language without an established pattern of such word formation. For example, for the word for "two" (, *nij-s in Baxter-Sagart (2011) or *njis in Zhengzhang (2003)), based on the outdated reconstruction *ȵi̯ær by Karlgren (1957), he proposed that it can somehow be reanalysed as the diliteral root *n–r, which encoded meanings in its individual consonants (suppleness + continuum). This is obviously false since the word came from Proto-Sino-Tibetan *g/s-ni-s ("two"), and it's not even a marginally accepted theory in Chinese or Proto-Sino-Tibetan linguistics that the word for "two" was constructed from "phonemes expressing suppleness and continuum". Linguistics is not my profession; my area is in phylogenetics or hominid evolution rate. Wyang (talk) 23:48, 11 June 2013 (UTC)
  • Thank you both, that's exactly the kind of detail I felt I was missing previously. @Angr, your mention of specificity articulates an unvoiced worry in the back of my head about how over-specific a couple of these phonosemantic interpretations have been. @Wyang, your comments are particularly damning in pointing to where Lawrence is diverging from the actual sourced material.
@Lawrence, what can you say in your defense? As it stands, I am now strongly in support of removing your additions as apparent bogosity. -- Eiríkr Útlendi │ Tala við mig 00:09, 12 June 2013 (UTC)
@Eirikr: Thank you for steering the discussion in a productive direction. On the other hand: In my defense? Bogosity? Wow.

@Wyang: Your presentation of the contents of the ABC Etymological Dictionary of Old Chinese merits close inspection.

Schuessler tentatively proposed some Old Chinese prefixes and suffixes (not initials, vowels and final consonants)... (Tentatively? But let me not get sidetracked.) Please refer to p. 27 of the dictionary, Section 2.9 Meaning and sound. You'll find that your assertion does not square with the terms the author employs in his discussion of the nexus of particular sounds and meanings (quotation marks omitted): OC words; final -*p; final *-m; stem initial *-m; roots; stems; initial *w-; variants with other vowels; initial *l-; initial consonant; start with *n-.

... every Old Chinese monosyllabic word can be reanalysed in terms of initial, vowel and final consonant, in that all or part of its phonological shape resulted from its meaning. Regrettably, you misstate my theory, which properly accounts for the presence in OC of loan words and terms originating in onomatopoeia.

You take issue with my interpretation of initial *n- term 二. Perhaps my interpretation of a particular character is dissonant. But try coming at things from the opposite direction. In other words, begin with the normative and sort out the exceptional. You do not care for my data, so let's follow Schuessler who, on the page noted above, writes Words for 'soft, subtle, flexible', including 'flesh; female breast' start with *n-...

Turn to p. 395 of the dictionary. Over the following dozen pages, scattered among the loan words you will find enough examples of terms beginning with *n- and associated with soft/subtle/flexible that Schuessler felt justified in making the statement quoted immediately above. If you have a bone to pick with that conclusion, the author would be the person with whom to remonstrate.

As for me, among the thousands of characters I interpret, I may unwittingly be offering scores of problematic examples. I will readily acknowledge those when presented with compelling arguments. For one, I will revisit 二, and thank you for that.

The ultimate point here, however, is that Schuessler identifies what he calls phonesthemic or phonoaesthethic phenomena in OC, where certain meanings are associated with certain sounds. Nothing you have written counters that fact or undermines his finding.

Thank you, Wyang, for assisting the Wiktionary community to perceive where the lines are drawn in this matter. Lawrence J. Howell (talk) 05:51, 12 June 2013 (UTC)

I realised I missed the bit on phonesthemic patterns but you beat me by about ten minutes in submitting the reply. I've slightly revised my post. It's true that such patterns are found in Old Chinese, but these patterns are of limited derivational consequences in OC as the majority of OC lexicon does not conform to the semantic expectations from generalised phonesthemic patterns (as proposed in KN). In addition, these patterns are generally true for other E/SE languages as well (or even wider, see Phonosymbolism and the Verb cop). Postulating that words are consistently derived using this principle is a bit like positing a proto-phoneme initial *f– for English, denoting "movement", which is responsible for deriving Modern English words like fare, fast, fight, flee, flow, fly. Even if a large proportion of Old Chinese appears to be of non-Sino-Tibetan origin, a large part of which does not even seem to be related to anything else found in languages in the vicinity (!), there is no need to resort to such extensions of the sound symbolism principles, especially when the model is 1) unrecognised elsewhere; 2) nonspecific and ambiguous in the semantic descriptions; 3) relying on outdated reconstructions; 4) gives numerous misfits apart from the true sound symbolisms (Sorry). For OC words with established ST comparanda, listing the PST etymology is sufficient. Wyang (talk) 06:40, 12 June 2013 (UTC)
Thank you for your thoughtful, pointed response, Ywang. It'll be a day or two before I can reply properly. Lawrence J. Howell (talk) 07:49, 12 June 2013 (UTC)
Oops, talk about replying properly. Sorry, Wyang! Lawrence J. Howell (talk) 08:06, 12 June 2013 (UTC)
I have no knowledge of Chinese and can't make any judgement on this issue, but I'd like to point out that only a few days ago there was a rather heated discussion about bashing newbies - and utter phonesthemic nonsense, wild theories with nothing to back them up and spammer seem rather harsh judgements to make without actually asking the editor what he has to say. Isn't everyone supposed to AGF? Lawrence has been polite and has cited his source(s); so it would be good if we could be polite back. Hyarmendacil (talk) 05:52, 13 June 2013 (UTC)
@Hyarmendacil: Thank you for the call to civility. You missed my favorite, though: The implication that this is a tribunal, and that my options are to clear my name or face sanctions. I know it was written tongue-in-cheek, but still ... And bogosity? Perhaps after this thread winds down someone will be kind enough to assign it a bogosity level rating, so we all know where we stand.

As for the editors you quote: Wyang has managed not only to regain his equilibrium but to perform an immensely constructive service for the Wiktionary community: Confirming that a reputable authority maintains the existence in Old Chinese of phonoesthemic (phonosemantic) patterns. (That information, I might add, has been in circulation since 2007, contained in a readily available book that has received both scholarly and popular acclaim. For that reason, the knee-jerk rejections caught me by surprise.) His example holds out hope that the other editors too may eventually come around and offer positive contributions here.

@Wyang: OK, so we're agreed on the existence of phonesthemic patterns in OC. Great start.

Can you tell me the basis for your statement ... the majority of OC lexicon does not conform to the semantic expectations from generalised phonesthemic patterns ...? AFAICT, the majority of OC lexicon conforms surprisingly well.

... these patterns are generally true for other E/SE languages as well ... I understand you to be offering this as a rationale for not applying the patterns to Wiktionary's entries for Han Chinese characters. I take it to be, on the contrary, an excellent reason to apply the patterns across the Wiktionary board.

Skipping to your last point before circling back, I would second your idea about adding a PST etymology (source?) for each character. But that doesn't have to come at the expense of listing the true sound symbolisms.

Now to your four points, in order.

1) Your indication that the model is unrecognized elsewhere. (Shrugs shoulders.) Speaking here to the Wiktionary community: Bearing in mind that this discussion is, in the end, about whether or not to add certain data to Wiktionary, and presuming consensus can be obtained for adding consensus true sound symbolisms (minimal definition: Contained in both Schuessler and KN): Is lack of precedent a make-or-break issue?

2) ... nonspecific and ambiguous in the semantic descriptions ... Can you elaborate? Earlier in this thread certain aspects of my data came under suspicious for being overly precise.

3) Outdated reconstructions. I don't dispute your use of the descriptor outdated reconstructions; scholarly reconstructions of OC have progressed considerably since Karlgren. Nonetheless, the considerable handicap of ORs did not prevent my research collaborator and I from identifying the phonosemantic/phonesthemic patterns in OC noted at KN, which overlap with those of Schuessler. Also, it is entirely possible that the ORs should be credited for enabling us to see just a bit more deeply than Schuessler in certain cases. For example on p. 21 of the dictionary he presents a chart with a small number of examples of labial initials connected with the meanings swell, protrude, prominent, bloom, bud etc. These are all, the KN data indicates, manifestations of the single concept Spread, encompassing a number of related terms many times larger than the number of characters in the chart. Also in this context, I could discuss at some length the Cut/divide/reduce aspect of final *-t (ABC: sometimes transcribed as *-t, occasionally as *-ts) that was called into question earlier in this thread, but maybe some other time.

An additional point with regard to reconstructions is their mutable nature. It's 1992, and William Baxter has the stage pretty much to himself with A Handbook of Old Chinese Phonology. 1999, however, brings competition in the form of Laurent Sagart's The Roots of Old Chinese, a work glowingly reviewed by Wolfgang Behr. Sergei Starostin's reconstructions are coming out, too. Fast-forward to 2007 and here's Schuessler with his ABC Etymological Dictionary of Old Chinese, giving us four differing sets of OC reconstructions by contemporary scholars (not counting works published in China). What are conscientious editors of an online reference source such as Wiktionary to do? The solution seems to have been to ignore them all, a shrewd move as it turned out because just four years later Baxter and Sagart (neither scholar having been satisfied with his earlier work) bestow upon the world their collaborative Old Chinese reconstruction files. My point is, of course, that when it comes to OC reconstructions, the goalposts rarely remain in place for long. And so one may be excused for regarding them with a jaundiced eye.

4) Misfits. These are, as I see them, the undesirable flip side of the ORs which, as I describe above, have their strong point too. I look forward to identifying misfits and amending their interpretations as necessary. The process, if it is to be carried out on Wiktionary, depends on the issues presented below. (/@Wyang)

Now, as a practical matter, and returning to the topic that has brought us here, I'd like to ask the Wiktionary community whether there is a consensus to add OC reconstructions to the Han character entries. If so, whose? Baxter/Sagart's? Starostin's? Schuessler's? Someone else's? Some combination of two or more?

Scenario One: No consensus for adding scholarly OC reconstructions. In this case, I propose adding KN data as it stands, accompanied by footnotes or usage notes with verbiage such as: Reconstructions based on B. Karlgren; Interpretations by H & M; No scholarly approbation implied.

Scenario Two: Consensus to add scholarly OC reconstructions from one or more source. In this case, what objection might there be to adding interpretations to the entries for those characters in which KN data overlaps with Schuessler's, providing two cites for each entry, which should satisfy the concerns of Dan Polansky and others who share them? Lawrence J. Howell (talk) 08:17, 13 June 2013 (UTC)

My point was that the majority of your additions could not be found anywhere other than your site, not in any publications, including Schuessler (2007). Looking at your most recent edits, almost none of these character etymologies can be backed by Schuessler, and so is your methodology of phoneme decomposition and semantic value association. Even if a few are also identified by Schuessler as true sound symbolisms, the fundamental reliance of KN data on an outdated reconstruction has made such analyses unreliable. The multitude of reconstructions is not an excuse for choosing an obsolete one; in fact recent reconstructions have been surprisingly convergent (as evident in the case for "two" above), for example the reconstruction of the uvular series, lateral initials and voiceless sonorants, none of which is reflected in KN.
Let me use an example to illustrate this - ("horse", OC /*mˁraʔ/). Below is the passage from Schuessler (2007) (no copyright infringement intended):
1 馬 (maB) - LH maB, OCM *mrâʔ

'Horse' [OB]

[T] Sin Sukchu SR ma (上); ONW

[E] ST: PTB *mraŋ (STC no. 145): > OTib. rmaŋ, Kan. *s-raŋ, WB mraŋB, JP gum31-ra31 ~ raŋ; JR (m)bro < mraŋ). For the OC - TB difference in finals, see §3.2.4. STC (p. 43 n. 139) relates PTB *mraŋ to a PTB root *raŋ 'high' ( → líng6 陵).

Horse and chariot were introduced into Shang period China around 1200 BC from the west (Shaughnessy HJAS 48, 1988: 189-237). Therefore this word is prob. a loan from a Central Asian language, note Mongolian morin 'horse'. Either the animal has been known to the ST people long before its domesticated version was introduced; or OC and TB languages borrowed the word from the same Central Asian source. Middle Korean mol also goes back to the Central Asian word, as does Japanese uma, unless it is a loan from CH (Miyake 1997: 195). Tai maaC2 and similar SE Asian forms are CH loans.

and this is your added content:
Old Chinese Initial /*m-/ lends semantic value Concealment.

Pictogram (象形) of a horse. It is unclear whether this term is onomatopoeic. If so, there is no semantic role behind initial /*m-/. If not, and the term was devised in connection with "concealment," the precise nature of the link is uncertain. Source: Howell & Morimoto.

How is your theory that the OC word for "horse" was perhaps derived phonesthemically from the phoneme initial /*m-/, signifying "concealment", or your theory that the word was perhaps onomatopoeic in origin, backed by Schuessler (2007) or other publications? If not, isn't the above paragraph entirely your envisagement against established consensus? Wyang (talk) 02:12, 14 June 2013 (UTC)
ABC and KN both maintain the existence of phonesthemic patterns in OC. How Schuessler arrived at that conclusion and how he chooses to shape the material is his prerogative. Likewise for KN. Remember, the only reason Schuessler has been brought into the discussion is to address concerns about single sourcing (NB: sourcing of hermeneutic principles, not one-by-one interpretations).

More (much more) below. Lawrence J. Howell (talk) 04:43, 16 June 2013 (UTC)

@Wiktionary editors: Unless I am greatly mistaken about how things work around here, the community will at some point be shifting into decision-making mode to determine: Do KN interpretations belong in the dictionary? As a quick reference for those to be involved in that process, allow me to contribute a recapitulation.

・I have come to the dictionary with the intention of helping to improve the presentation of existing material.

・I stated my relation to that material, offered for inspection a sample of entries in a format I believe is compatible with Wiktionary style, and requested feedback.

・I waited until responses petered out, implemented formatting improvements that had been suggested for the sample entries, and uploaded similarly formatted new entries.

・Apparently unaware of the thread I had initiated here in the Beer Parlour, an editor called attention to my uploading activity and asked what should be done about it.

・A half-dozen other editors swiftly converged, casting aspersions on my motives and vilifying the idea of phonosemantic principles being operative in Old Chinese.

・One editor asked why none of these issues had been raised in the initial post. (No response.)

・Everyone disappeared save a single editor whose rhetoric thus far has included:

Vituperation: utter phonesthemic nonsense

Volte-face: It's true that (phonesthemic) patterns are found in Old Chinese ...

Untenable claim: Schuessler tentatively proposed some Old Chinese prefixes and suffixes (not initials, vowels and final consonants).

False attribution: ... the theory advanced by User:Lawrence J. Howell is that every Old Chinese monosyllabic word can be reanalysed in terms of initial, vowel and final consonant, in that all or part of its phonological shape resulted from its meaning ...

False analogy: Postulating that words are consistently derived using this principle is a bit like positing a proto-phoneme initial *f– for English, denoting "movement", which is responsible for deriving Modern English words like fare, fast, fight, flee, flow, fly. (No, actually, it is not a bit like it at all. The subject is OC, not Old English, and nobody is claiming that universals across languages maintain applicability along all categories.)

Non sequitur: ... almost none of these character etymologies can be backed by Schuessler ...

This is kettle logic, AKA chucking out arguments in hopes that one of them will stick (or create the desired impression, which can be almost as good).

This is not to assert that Wyang is completely off target. For example, with reference to OC reconstructions, and for what it's worth, it's safe to say that the opinion ... recent reconstructions have been surprisingly convergent ... is mainstream in Sinologic circles. S/he also notes the advances resulting in the reconstruction of the uvular series, lateral initials and voiceless sonorants; what if any influence they may bear on KN interpretations is a matter for study.

As for the thrust of Wyang's rhetoric, however, s/he appears to be arguing that the current state of OC reconstruction (partially/largely) invalidates the KN interpretations. Also (and Wyang will correct me if I'm wrong), it appears the intent is to persuade the community to exclude the interpretations from Wiktionary; expending such energy on the thread makes little sense otherwise.

I'll continue the debate with Wyang as long as necessary, but I wonder if it would be too much to request some form of indication that the community is working toward a resolution of this issue. To that purpose, interested editors may wish to look at the explanations found in the Etymology sections of Han Chinese characters (ones not taken from KN). A few dozen entries will be enough to create a valid if minimal sample. For each, do you find that the citing make the origin of the explanation evident? If not, is that not a problem? If so, do the sources conform with the inclusion standard to which KN data is being held? I refer especially though not exclusively to that prickly issue of multiple sourcing. I believe you'll agree that the answers to these questions are of great relevance.

There's more to say, but I think it's time for the community to return to the stage.

Real life considerations dictate that I'll be able to offer nothing beyond cursory remarks in the next few days, and none at all in the week following. I will however be back at the end of the month, so please do carry on without me. Thank you all for your consideration. Lawrence J. Howell (talk) 04:43, 16 June 2013 (UTC)

If I read it correctly, you didn't really answer my questions did you? Your addition at the etymology of 馬 was unsubstantiated by publications. Do you agree? Wyang (talk) 04:36, 20 June 2013 (UTC)
If this thread reaches a conclusion, can someone tell me as I can't be bothered reading it all. Mglovesfun (talk) 08:46, 20 June 2013 (UTC)
Here you have unfortunately discovered the major flaw in wikionary's bureaucratic process; that discussions like this in the Beer Parlour never actually 'end' - they just. I'm afraid I don't think the community is going to return to the stage (cf. Mglovesfun), so if you're still interested in the whole affair, I suggest we just try to resolve the issue. To me it appears that the main points are as such:
- Both sides agree that phonosemantic interpretations can be valid under at least some cases.
- Neither side agrees on the extend to which phonosemantic interpretations apply.
In a case like this, the only way to resolve the argument is to cite eveything with a reliable source[1]. Schuessler says Words for 'soft, subtle, flexible', including 'flesh; female breast' start with *n-...; this implies that the Chinese words for flesh and female breast (if you can find the words being referred to) are valid tender for a phonosemantic note; but not that all words starting with *n- are so valid. In other words; each phonosemantic note you make should be able to be cited to a statement in a reliable source that confirms that this specific entry has the phonosemantic etymology you attribute to it.
You note, correctly, that many other Chinese etymologies do not conform to the standards you are being asked for. Most etymologies (e.g. see basically every English etymology) are not cited becuase they are widely accepted by the community, and have not been challenged. Chinese Phonosemantic etymologies (however well established to the Sinologists) are unfamiliar to the general public; hence the disparaging remarks at the beginning of this chapter; hence the need for adequate citation. The point is that every entry should be able to be cited reliably when challenged.
So (if you're still interested) I would suggest you do the following:
  • Continue to add phonosemantic etymologies.
  • Classify them under the etymology header.
  • Make sure that they are cited thoroughly, as per above, using the references tool so that they appear as footnotes (this is the standard way of doing it, e.g. gonzo)[2]
  • Get Wyang to check over a few ones you have done, so that (s)he is happy.
  1. ^ Perhaps you haven't been made aware of a key point in this debate: most wiktionarians will not consider KanjiNetworks a 'reliable source'; certainly not for non-mainstream etymology content. Most websites are generally veiwed with suspicion, unless they're academically affilated (Perseus) or have been found to reliable and uncontroversial (Online Etymology Dictionary.
  2. ^ You should note that, if you've cited Schuessler or someone, it is not actually necessary to cite KanjiNetworks for any phonosemantic etymologies. However, KanjiNetworks may be useful as a source for any general, mainstream etymologies (e.g. for you note that the shell and bone character is a pictogram - I don't think anyone has challenged that etymology.)

Sorry that this has been a long and drawn-out process for you. Hyarmendacil (talk) 06:56, 22 June 2013 (UTC)

The following sentence has been out of date since before I started editing here: "Each definition may be treated as a sentence: beginning with a capital letter and ending with a full stop."

In fact I can trace it all the way back to User talk:Mglovesfun/Archives/1#formatting (2009). Definitions of non-English terms are formatted without a full stop or an initial capital letter (with the obvious exception of words that always require a capital letter like Spain) and English definitions have full stops and initial capitals. Can we finally update WT:ELE to cover this?

Furthermore, a separate but much more minor point:

  ::'''to end''' (''third-person singular simple present'' '''[[ends]]''', ''present participle'' '''[[ending]]''',   ''simple past'' '''[[ended]]''', ''past participle'' '''[[ended]]''')  

It shouldn't have ended twice as {{en-verb}} doesn't show that anyone. Since en-verb only categorizes in the main namespace, we can just use it directly. The entire section Headword line could use the templates directly, but I don't think I could do it and pass it off as an uncontroversial edit, so here it is. Mglovesfun (talk) 12:48, 9 June 2013 (UTC)

Also just noticed the particle 'to' in 'to end' which we no longer use. Mglovesfun (talk) 12:49, 9 June 2013 (UTC)
I noticed that quite a few pages have some kind of substituted version of templates. I don't think that's a good idea because then things like this happen. It's better to put the actual template in there, so that they always match whatever the template really looks like. —CodeCat 12:53, 9 June 2013 (UTC)
I vaguely remember a big controversy over this. I thought it was cap-and-period if the definition was an explanation, and neither if it was a simple gloss, as in most non-English terms. And there was argument over WTF "treated as" was supposed to mean. Michael Z. 2013-06-10 03:51 z
If anyone's drafting a vote, another problem is that ELE currently allows an explicitly "unlimited" variety of headers. - -sche (discuss) 04:01, 10 June 2013 (UTC)
I'd draft a vote if people said in this thread they'd broadly support it. Mglovesfun (talk) 08:36, 10 June 2013 (UTC)
*Nudge*. Mglovesfun (talk) 09:58, 11 June 2013 (UTC)
I'd support a comprehensive (or even partial) update of ELE. - -sche (discuss) 08:47, 13 June 2013 (UTC)

South Picene alphabet

The South Picene language (spx) has Old Italic (Ital) as its alphabet. However, not every character used to write it is encoded by Unicode; it lacks one of the characters transliterated as 'í' and the word separator (looks like a vertical ellipsis), and the characters for 'ú', 't', 'f', 'o' and the other 'í' are rather different. I propose we change its script to Latn until the Unicode coverage of the South Picene alphabet is adequate (compare how we treat Iberian and Egyptian.) — Ungoliant (Falai) 05:03, 10 June 2013 (UTC)

  • Support. The Noric language also uses a variety of Ital, but I didn't even try to use it at Artebudz as Ital is LTR by default, and the Noric inscription is RTL; also, the letter shapes are different. —Angr 08:24, 11 June 2013 (UTC)
    • Can't you just use the RLO character? Or is it no longer allowed in page titles? -- Liliana 14:25, 11 June 2013 (UTC)
      • No idea, but even if it is, it doesn't change the fact that (1) letter shapes are reversed (like mirror-writing) in RTL, and (2) the letter shapes in the Noric inscription are different from those provided by Old Italic fonts anyway. —Angr 19:49, 11 June 2013 (UTC)
  • If we had some kind of "Wiktionary font" (I think that was discussed previously) we wouldn't have to deal with this kind of problem, as we could just devise our own encodings in the PUA. But oh well. -- Liliana 19:31, 15 June 2013 (UTC)

bad-iw filter

Why not just block all bad interwiki edits (there aren't all that many) but with a note explaining why the edit has been blocked. It will stop good faith bad edits, vandalism and experienced editors who make a typo (I've done it) will be able to correct their work with minimal fuss. Mglovesfun (talk) 16:20, 11 June 2013 (UTC)

Good idea. — Ungoliant (Falai) 16:30, 11 June 2013 (UTC)
Here's an example of a bad-iw edit that isn't a bad-iw edit and shouldn't be blocked. —Angr 19:55, 11 June 2013 (UTC)
Can't we change the regex so it doesn't trigger a bad-iw for pages with the prefix "Unsupported titles/"? — Ungoliant (Falai) 10:31, 12 June 2013 (UTC)
The "bad-iw" filter already excludes those, since it only looks at mainspace entries. As far as I know, it pretty much exactly matches the rules that have been used by bots like Interwicket and Rukhabot to correct iw entries. There are a few cases such as straight-versus-curly apostrophes and variations in rules for representing Hebrew lemma entries that make for a few mismatches between WTs that might cause problems. I'm not sure how Rukhabot and the bad-iw filter deal with those. Chuck Entz (talk) 13:36, 12 June 2013 (UTC)
Oops! Didn't read closely enough. Yes, I'm sure we could exclude Unsupported titles (I thought we already did). Chuck Entz (talk) 13:43, 12 June 2013 (UTC)
If Unsupported titles were already excluded, I never would have found the example above, since I found it just by looking in Recent changes for edits tagged by the bad-iw filter. —Angr 14:14, 12 June 2013 (UTC)
The filter blocks all edits that add (or, in some cases where diff messes up, retain) bad interwikis. I would agree with blocking an edit that only adds a bad interwiki (and I doubt it'd be too hard to write one), but I can't agree with blocking an edit like diff, which adds a lot of info besides the bad interwiki.​—msh210 (talk) 07:23, 16 June 2013 (UTC)

Problems w/recent changes?

Maybe it's just my internet, but I haven't been able to get to recent changes all day yesterday or today. At all, and it's the only page that's not loading (I'm on Chrome on Mac OSX 10.6.8). Anyone else having problems with it? Thanks. --Neskayagawonisgv? 17:14, 12 June 2013 (UTC)

OK for me (Win 7, FF 21, Vector). DCDuring TALK 17:22, 12 June 2013 (UTC)
Must just be me. Thanks though. --Neskayagawonisgv? 00:16, 17 June 2013 (UTC)

Watchlist wishlist

There is a WMF project called [Watchlist wishlist]. It may be a pipedream, but it is at least one step closer to reality than the complaints about our wishlist that sometimes surface here. I would like to collect any thoughts that folks here have about how to make watchlists more useful for us. I have already mentioned there the great utility of limiting watching to sections (in particular language L2s). We also already have the problem with editing the watchlist and even using large watchlists on record.

The project has a list of some suggestions that may be thought-starters. DCDuring TALK 18:13, 13 June 2013 (UTC)

I would like to be able to automatically watch all entries in a given language, and perhaps sort them under separate tabs within the list so I can quickly browse it. —CodeCat 18:42, 13 June 2013 (UTC)
Me too. --Haplology (talk) 02:58, 14 June 2013 (UTC)
Me three :).
Opening the Watchlist on mobile devices (excluding iPad's) leaves much to be desired. Crashes in various browsers on simple operations, such as resizing. The Watchlist specific to mobile phones is useless, even if it doesn't crash.
My Watchlist is too big, so I can't edit it raw, even on a desktop computer. --Anatoli (обсудить/вклад) 03:05, 14 June 2013 (UTC)
I hadn't thought about mobile and I don't think anyone else had mentioned that yet.
Someone had suggested category-specific watchlists, which could include the language categories. That seems second-best to something "section"-specific, ie, only changes in L2 sections for one's selected languages. Our page architecture of having multiple languages on the same page does make things harder and out of sync with what WP typically needs.
@CodeCat: I don't understand "sort them under separate tabs within the list so I can quickly browse it". If you are watching an entire language, what sorting would you want? Where are the tabs coming from? DCDuring TALK 04:27, 14 June 2013 (UTC)
I think she means she wants to have one tab open with her Catalan watchlist, another with her Dutch watchlist, another with her Swedish watchlist, and so forth. —Angr 09:41, 14 June 2013 (UTC)
I see, I think. That would mean one would be "allowed" to have multiple watchlists, say, one for each language, category, or [] . The tabs are provided by one's browser? DCDuring TALK 12:22, 14 June 2013 (UTC)

Template:param

My post at Wiktionary:Beer parlour/2013/March#Use of Template:param in template documentation didn't garner any responses at all, so fair warning: unless someone complains, I will "soon" implement the recommendation I suggested at Template talk:param. - dcljr (talk) 23:44, 13 June 2013 (UTC)

Manual transliteration and transliteration from modules

After I've added manual transliteration to the translations (to a few selected languages where it's possible), some editors started removing previously added manual transliteration. I'm against this practice. The transliteration may be out of date but it can easily be updated from preview (User:Conrad.Irwin/editor.js). Note that auto-transliteration is only added when it's missing. If people wish to add the new transliteration, then perhaps a bot could do this - as a once-off job - overwrite existing transliteration and add where it's missing. Perhaps one of User:Conrad.Irwin/editor.js or User:Kephir/gadgets/xte could do that?

What's the general opinion about this? I also think there should be the transliteration written in entries and translations. How it gets there - manually or via a bot is another thing. Can someone create a bot to update/insert transliterations or modify the scripts, so that auto-translit is written to translations if it's not supplied manually (this condition is important)? --Anatoli (обсудить/вклад) 06:57, 14 June 2013 (UTC)

I think that auto-transliteration should always override manual transliteration. Manual transliteration will not coincide with auto-transliteration only if an editor made an error in transliterating. By forcing auto-transliteration we can neutralize such errors. Consider this: historically different editors have used different transliteration schemes for Armenian on Wiktionary. By adding auto-transliteration to {{hy-noun}} and the rest I made sure Armenian is transliterated consistently. We should do the same to {{t}}, {{l}}, {{term}} and others.
Anticipating your objection, that in Russian we show stress in the transliteration and so it does not coincide with auto-translit, I say we should show the stress on the Russian word (like this, {{ru-noun|head=соба́ка}}) and let auto-translit pick up the stress from there.
Using a bot to upload transliterations is not a good idea, IMO. The bot would need to rerun every time we decide to modify a transliteration scheme. On the other hand, with auto-transliteration you need only change Module:Armn-translit once. --Vahag (talk) 09:42, 14 June 2013 (UTC)
Transliteration is supposed to convey orthography. Maybe we should consider dropping foreign-style stress marks from transliterations in the few languages where we use them, and only indicate stress in the pronunciation, where it properly belongs. Michael Z. 2013-06-14 14:59 z
Perhaps, but it might come in handy in cases where there are homographs with differing stress, so it's obvious which one is meant. Chuck Entz (talk) 15:12, 14 June 2013 (UTC)
Right, as in горілки. Are there other ways to handle such entries? The important difference would be emphasized if stress were only indicated in such entries. Michael Z. 2013-06-14 15:39 z
Of course pronunciation can be indicated in language-specific form. Michael Z. 2013-06-14 15:08 z
Acute accents are the norm for Russian. Russian Wiktionary uses them as well. —CodeCat 15:13, 14 June 2013 (UTC)
Right. So pronunciation and stress could be indicated in the "Pronunciation" section, as in его, for example. (Russian is a special case because pronunciation is also entered where transliterations normally appear, and no transliteration appears in Russian entries.) Michael Z. 2013-06-14 15:39 z
Our transliteration system for Burmese is pronunciation-based, not orthography-based. In probably 75% of the cases the pronunciation-based transliteration can be correctly mechanically predicted from the orthography, but in the remaining 25% of the cases it can't and will need to be done manually. The alternative would be to switch over to an orthography-based romanization of Burmese, which I would actually be in favor of but which met some opposition a few years back. —Angr 16:03, 14 June 2013 (UTC)
I hope this is not intended to apply to uses of {{term}} in etymology sections. In etymology sections for terms mostly transmitted over time in writing I think a pronunciation-based transliteration system can be quite misleadinging. For example, the writers who took Greek terms into Latin followed a practices that must have fit their modified pronunciations and created precedents that are followed to this day, ie υ (upsilon) -> "y", not "u". I don't know for how many situations this objection is relevant beyond what I've drawn from. DCDuring TALK 17:00, 14 June 2013 (UTC)
Actually, I think we're supposed follow the standards for transliteration of entries for transliterations in etymologies. This isn't really followed all that much, because a lot of editors just use the transliteration in the source they got the etymology from, and have no clue what the Wiktionary practice is. Chuck Entz (talk) 17:13, 14 June 2013 (UTC)
What I wish we did in etymology sections is the same thing most English-language dictionaries do, namely present all foreign words in transliteration. We could still link to the original-script page, of course: if we say that raj comes "from Sanskrit {{term|राज्य|rājyá|lang=sa|sc=Latn}}" rather than "from Sanskrit {{term|राज्य|tr=rājyá|lang=sa}}", for example, it displays as "from Sanskrit rājyá", saving space and not confronting readers with possibly unfamiliar Devanagari, while still linking to the Sanskrit entry. I've tried that on one or two pages, but it always gets reverted. —Angr 17:53, 14 June 2013 (UTC)
I wonder what percent of our actual and likely future users prefer and find more useful English Etymology sections the way we do them to an alternative presentation having no non-English script, just transliterations (with no cognates visible by default). Are we just doing this all just for a small population of scholars and for machines that will render it all more useful for humans? DCDuring TALK 18:03, 14 June 2013 (UTC)
I think it's a good idea. A better form is mentioning the transliteration first and putting the term in its native script(s) after it, in parentheses, e.g. "rājyá (राज्य)", as we do in Wikipedia. --Z 18:15, 14 June 2013 (UTC)
What group of users do you think prefer it that way, rather than the current way or a presentation with no non-Latin script? DCDuring TALK 18:26, 14 June 2013 (UTC)
English Wiktionary is for English-speaking readers, most of whom can't read non-Latin scripts. A small group who are familiar with that non-Latin script prefer to see the term in that way and others prefer Angr's suggestion I think. But I think saving space is not big advantage to completely remove it. I think it would be even preferred by readers whose native script is not Latin; it's kinda hard for the reader to switch to a non-Latin script while reading an English text. --Z 18:47, 14 June 2013 (UTC)
It's bad enough already when a reader follows a link राज्य (rājyá) to a heading "Sanskrit," and has to read down past Etymology and Adjective to find the precise text "राज्य (rājyá)." In many entries they may have to scroll just to see the headword. (Maybe our headwords should be a head of a language entry instead of just of the page.)
Presenting transliterations only would force the reader to additionally extrapolate that what they clicked on is a derivative representation of राज्य. And I don't see the point of linking rājyá (राज्य) to represent राज्य (rājyá) – these should be consistent, and I think the current use of brackets to indicate that the transliteration is a representation derived from the original, in both entry and link, is clearest. Michael Z. 2013-06-15 16:55 z

Synonym internationalization

How do you translate a page, if each synonym should lead to a different word in the other language.

Say Foo has two meanings: (1) food, the other (2) disgusting.

But in OtherLang the word food is ol:Phould, while the disgusting is Fouyah.

What to do in those cases. Pashute (talk) 07:07, 14 June 2013 (UTC)

Are you talking about automated translation of entire pages? If so, you don't, it doesn't work yet. To get anything but a very poor translation, you need a human being in there somewhere. Tell me if I've missed the point. Mglovesfun (talk) 08:42, 14 June 2013 (UTC)
To see how we handle the translation of polysemous words (words with multiple meanings), look at get#Translations as an example. Each meaning has its own separate translation box. —Angr 09:59, 14 June 2013 (UTC)

Hebrew and Aramaic terms here inside English wiktionary

It seems there has been much work done on Hebrew and Aramaic here in the English Wiktionary. May I ask why? What is the rational? (Perhaps exactly the above issue, but if so it is a very bad solution, and differs from all the other many languages referred to from here) ... Pashute (talk) 07:10, 14 June 2013 (UTC)

Second thought - perhaps it was meant for Talmudic and Kabalic or biblical transaltions - and if so: Why not open a separate wiktionary exactly for that, and move the terms to there? There are many benefits: There could be words specific to the time that are not used anymore, there could be a special entry for words that have changed meaning from the modern language, or from the other versions of the language (say between Biblical and Talmudic hebrew - there are many examples of that...)

So actually this is another topic: Ancient languages...

Back to my question here: Why are there Hebrew and Aramaic terms here in the English Wiktionary, what is the rational of those who worked on it extensively, and is it possible change this without loosing their obviously hard work. Pashute (talk) 07:17, 14 June 2013 (UTC)

The rationale is that these languages exist and contributors have decided to give their time and effort to make these entries, which are perfectly valid and so haven't been deleted. Mglovesfun (talk) 08:41, 14 June 2013 (UTC)
Pashute, the point of English Wiktionary is not only to list English words, but to list all words in all languages. Words in languages other than English are provided with English translations. We don't just have Hebrew and Aramaic, we have Swedish, French, German, Arabic, Hausa, Swahili, Zulu, Persian, Sanskrit, Burmese, Indonesian, Chinese, Japanese, Russian, Navajo, Quechua, and thousands of other languages. There is already a Hebrew Wiktionary, and its point is also to list all words in all languages—but with Hebrew glosses for words in languages other than Hebrew. —Angr 09:48, 14 June 2013 (UTC)
I agree with above, except that AFAICT hewikt lists only Hebrew words.​—msh210 (talk) 07:28, 16 June 2013 (UTC)
Good heavens, you're right. Is that their policy, or is it just that no one's gotten around to adding words in other languages yet? I thought "all words in all languages" was the goal of each Wiktionary and would be dismayed to learn that certain Wiktionaries had decided not to accept that. —Angr 14:19, 16 June 2013 (UTC)
Yes. (Specifically, it is their policy.) Keφr 15:04, 16 June 2013 (UTC)
Yes, en.Wiktionary has been visited by a few exiles from that Wiktionary who lamented that policy and the (in their opinion) unresponsive and unreasonable admins who used their power to ban anyone who opposed it. (e.g. in 2007) - -sche (discuss) 16:47, 16 June 2013 (UTC)
I am shocked. What an abominable policy! — Ungoliant (Falai) 16:56, 16 June 2013 (UTC)
I wonder if we could submit this to the Wikimedia foundation? We could do it on the grounds that they are discriminating against Hebrew speakers, since they do not have access to foreign translations in the way that speakers of other languages do. I doubt that the foundation wants to sponsor that. —CodeCat 17:03, 16 June 2013 (UTC)
It should be brought up at Meta (m:Requests for comment; if there is obvious misuse of sysop/crat rights, m:Stewards' noticeboard) (preferably, by active users of he.wikt). --Z 17:09, 16 June 2013 (UTC)
Yes, it would be preferable for active users of he.Wikt to open any RFC; they would also know better if there has been any recent admin/crat action (the discussion I linked to above being from 2007). Of course, if "Hebrew only" has been he.Wikt's policy long enough that opponents of it no longer edit there, that could complicate matters. - -sche (discuss) 17:57, 16 June 2013 (UTC)
I don't think it's necessary to find editors from he.wiktionary. After all, this concerns a policy that goes (IMO) against the neutral and open spirit of Wikimedia projects, and I think we, being also editors of Wikimedia projects, are entitled to have a say about it even if we do not edit there. Consider for comparison if Wikipedia adopted a policy stating that articles about things in the English-speaking sphere were inherently more notable than other things. I'm sure we'd have something to say about that even if we didn't edit there. —CodeCat 18:05, 16 June 2013 (UTC)
You wish to argue that hewikt — whose editors are Hebrew speakers — is discriminating in its editor-written policies against Hebrew speakers. Seriously?​—msh210 (talk) 04:33, 17 June 2013 (UTC)
Having a monolingual dictionary is abominable? It's a different goal than having a pan-lingual dictionary, is all.​—msh210 (talk) 02:31, 17 June 2013 (UTC)
Not a goal that a Wiktionary should force onto its contributors, IMO. You can still have a Hebrew dictionary and allow foreign language entries. Look at how good our coverage of English is. — Ungoliant (Falai) 02:42, 17 June 2013 (UTC)
And you can have a pan-lingual dictionary and allow a numerical (Roget-style) thesaurus. But we've decided not to, and that's something we force on our contributors. They've decided to limit the focus of their project, much as we have. I don't see the problem with it, at all.​—msh210 (talk) 04:14, 17 June 2013 (UTC)
Wiktionary projects are supposed to contain all words in all languages,[3] and I don't think decision about changing this goal can be made by user community. --Z 05:56, 17 June 2013 (UTC)
That text was added by Dominic, an enwikt (and enWP) denizen. AFAICT he acted alone (and from an enwikt perspective) in making that edit — though of course we can ask him. I have no reason to believe that it reflects the Foundation's official view.​—msh210 (talk) 06:48, 17 June 2013 (UTC)
It is still obviously against the nature of WMF projects. We may not force the contributors to focus their contributions on what we prefer; everyone must be free in contributing as far as possible. --Z 07:18, 17 June 2013 (UTC)
Allow me to introduce you to WT:CFI. I can't tell whether you're being disingenuous or you really don't see the analogy between us and hewikt.​—msh210 (talk) 16:22, 17 June 2013 (UTC)
The purpose of WT:CFI and whatever WT:... else is to indicate what is an improvement and what is not. Adding a non-Hebrew entry to he.wikt is nothing but improvement of this project of WMF, and people should be free to improve WMF wikis, that's all we tried to tell you. Anyway, lets stop this discussion, it doesn't belong here in en.wikt and is none of our business, he.wikt editors should decide about it. --Z 16:57, 17 June 2013 (UTC)
Arguably, including a numerical thesaurus is an improvement of enwikt, and including information about every startup music band is an improvement of enWP. Anyway, that's all I meant also: that hewikt editors should decide on this. That is, I didn't mean that I agree that hewikt should not have foreign entries: only that the outrage against it and comments denouncing it, above, are uncalled for. Glad we quasi-agree.  :-) ​—msh210 (talk) 17:04, 17 June 2013 (UTC)
I'm not familiar with Roget's Thesaurus, but it does sound like it's similar to our Wikisaurus project. In any case, I'm not sure submitting this to the WMF is a very good idea. It sets a bad precedent, and soon enough other Wiktionaries will start demanding that we change our practices too (and you know how some people feel about our logo and SOP-deletion.) — Ungoliant (Falai) 09:32, 17 June 2013 (UTC)
Request for comment sounds about right. No reason to prejudge or get ahead of ourselves. Mglovesfun (talk) 18:08, 17 June 2013 (UTC)

greetings from he.wiktionary.org

Sorry for joining in late. I'm user כחלון (sounds like "Kakhlon"/"Kahlon") from the Hebrew wiktionary.
I want to point out the he.wikt doesn't have a policy against non-Hebrew entries. As User:Angr suggested above, we just haven't gotten around to adding words in other languages. The Hebrew wiktionary is very small, and has about 5 ~constant editors, of which 1 is an administrator (not me, but i'm speaking on his behalf). We still haven't added many basic verbs and nouns, so the foreign languanges words are quite ahead in our plan...
Of course, if any of you wants to add any entry to he.wikt - be it in Swahili, Zulu, Persian... - he's welcomed. However, since they are only a few of us, and a lot of the work is monitoring contributions/abuses, please do not add too much before we get to know you (-:
If there is anything I didn't clarify - feel free to ask. I'll be checking this page in the next days. Be well, 132.76.61.23 12:19, 22 June 2013 (UTC)

User:Kephir above linked to he:ויקימילון:עקרונות וקווים מנחים when saying that it is the policy of Hebrew Wiktionary to exclude non-Hebrew words. I don't read Hebrew, so I have to ask: what does that page say about non-Hebrew words? —Angr 12:29, 22 June 2013 (UTC)
I assume he's referring to the sentence: "ויקימילון העברי מציג ערכים עבריים בלבד..." etc.
It says: "the Hebrew wiktionary displays only entries in hebrews. i.e. in the "translation" section of every entry, one should link the translation to the foreign language wiktionary. e.g.: in the entry זאב (="wolf") the translation "wolf" should be linked to the entry wolf in the English wiktionary".
Now, all that is correct. The translations at he.wikt do link to the other wiktionaries (otherwise, all these links will be red). However, you can still add words to he.wikt in any other language. In particular, it seems the sentence "the Hebrew wiktionary displays only entries in hebrew" was misread. I notified our administrator about that, and asked him for his opinion. 132.76.61.22 12:50, 22 June 2013 (UTC)
You probably lack the necessary templates for other languages, don't you? If that's the case, are there rules what foreign language entries should look like? We had some ridiculous, badly formatted entries like "ghar is a Hindi word for house", which was obviously deleted. Maybe that's why the rumour about the Hebrew Wiktionary? People didn't know how to create correct entries? --Anatoli (обсудить/вклад) 13:12, 22 June 2013 (UTC)
I'm not sure what do "the necessary templates for other languages" mean. I'm pretty weak in all the technical aspects of wiki...
Your guess may be correct. Every "regular" entry (=Hebrew word) should meet cetrain criteria: "grammatical analysis" template, good definition, etc. We have similar demands for non-Hebrew entries, but we never really defined them, as far as I know. We simply never got to that. We delete bad new entries every day, but I rarely see any contribution in foreign languages. 132.76.61.23 13:54, 22 June 2013 (UTC)
There's also this thread from 2007 where a user from Hebrew Wiktionary (who doesn't edit there or here anymore) said "In the past, users who tried to contribute entries for German and English words were ordered to stop, and their contributions were deleted" and "Yesterday [i.e. 2007-05-13] I intiated a discussion, trying to convince my fellow users to change the policy. Unsuprisingly, the idea was rejected." That user's he-wikt contributions for that day can be found at he:מיוחד:תרומות/שי. So was it true in 2007 that German and English words were deleted simply for not being Hebrew? And was there a proposal back then to accept non-Hebrew entries that was rejected? As recently as last August, the entry "dog" was deleted at Hebrew Wiktionary, though of course I can't tell if it was deleted for being badly formatted or simply for being English. (I did find the pages he:English and he:Hebrew, though, as well as 18 German words in he:קטגוריה:גרמנית, showing that at least a few non-Hebrew entries exist and haven't been deleted.) —Angr 14:02, 22 June 2013 (UTC)
I'm not sure what was the policy in 2007, if there was any. I joined he.wikt only 4 years ago, of which I'm active only about 1 year total. Nevertheless, I'm considered one of the experienced users (-:
I think "dog" was deleted because it was badly written. However, that only allowed us to stall. If someone was to start adding well-written English entries to he.wiktionary - we'd have been forced to phrase criteria for foreign languages entries. That might have taken some time, though.... There are several dozens (I think) of German entries in the he.wiktionary. They are badly formatted, and give partial information. The current Status Quo is leaving them as is.
Finally, for those of you who read Hebrew, here's a link to discussion we had on this subject (from 2009). It started at our parlour, and was moved to its own archive. Most of the editors supported "turning he.wiktionary to a multi-lingual dictionary", but thought the time hadn't yet come. I guess that situation didn't change since. 132.76.61.22 15:07, 22 June 2013 (UTC)

Ancient languages

Is it possible to open a wiktionary for an ancient language, that is now being studied extensively? Such as Medieval Latin, Talmudic Aramaic, Zohar Aramaic etc. ? Pashute (talk) 07:17, 14 June 2013 (UTC)

The short answer is yes, but this isn't really the place to ask. There's a Wikimedia incubator for wikis which are not ready to go 'live' yet. Mglovesfun (talk) 08:38, 14 June 2013 (UTC)
la: (Latin Wiktionary) already exists. Mglovesfun (talk) 09:52, 14 June 2013 (UTC)
The short answer is no; although there are a few Wikimedia projects in ancient languages, the only projects that can still be created for ancient languages are Wikisource and Wikiquotes (since they don't provide original content). New projects that provide original content, such as Wikipedia and Wiktionary, can be created only in living languages with native speakers. But words in those languages can certainly be added to English Wiktionary, and indeed we already have many words in Category:Latin language and Category:Aramaic language (as you noticed in your post above this one). —Angr 09:57, 14 June 2013 (UTC)
Yes, for dead languages only Wikisource is allowed according to the new policy. --Z 10:15, 14 June 2013 (UTC)
Fair enough. Glad to hear it, actually! Mglovesfun (talk) 10:17, 14 June 2013 (UTC)
Not that new a policy; it's been that way for almost six years. —Angr 10:52, 14 June 2013 (UTC)

Statistics related to only one part of an entry

At per there are two etymologies, a preposition derived from Latin and a pronoun/adjective coined in 1979. The entry also includes a statistics section noting that it was the 760th most common word prior to 1923.

Obviously those statistics cannot be for the senses coined fifty-five plus years after 1923, so should the statistics section be moved to a L4 heading at the end of Etymology 1 rather than a L3 heading at the end of the English section? Thryduulf (talk) 07:37, 15 June 2013 (UTC)

If it's just purely based on frequency, then no. Because it's probably just searching the equivalent of the regex . In human terms, a space, the three letters per followed by a space, or a period, or a comma, or a semicolon, or a colon. Mglovesfun (talk) 10:58, 15 June 2013 (UTC)
I have yet to see any comprehensive statistics about the frequency of usage of meanings of spellings. That seems beyond the capability of corpus analysis at this time. It wouldn't even seem possible in principle without some kind of standardization of meaning. PoS-level and Etymology-level statistics might be possible. But, for example, COCA's PoS reporting seems not ready for prime time.
Perhaps we can find some studies of small sets of words that report such frequency information so that we have good reason to show additional statistics and decide how to show them. DCDuring TALK 12:14, 15 June 2013 (UTC)
I'd guess \b[Pp][Ee][Rr]\b. In any event, I agree with Mg: the frequency stats are independent of sense so should be listed ===thus===.​—msh210 (talk) 07:39, 16 June 2013 (UTC)

All uses of context labels have been converted so that they explicitly call {{context}} now. That has allowed me to significantly rework the template and, maybe the most important, to get rid of the recursion. So all the numbered context templates should now be orphaned (it will take the software a few days to catch up, I expect). The new version of the template now uses a few new helper templates, {{context/show}}, {{context helper}} and {{context test}}. {{context/show}} is called for every numbered parameter that is passed to {{context}}, and is responsible for showing the label and transcluding the label template when it exists. {{context helper}} is called by the labels themselves. Because the recursion is gone, the label templates no longer need to be passed all the remaining labels. However, they still need to know the next label, because that determines whether or not to show a comma after the label. Some labels explicitly omit the comma as well. So, labels are now passed only the next label that follows them, but they do not display it; they only use it to determine what separator to show.

The labels originally called {{context {{{sub|}}}| where the parameter "sub" was supplied by {{context}} or its numbered varieties. I thought it would be useful to co-opt that mechanism for another purpose. That is what {{context test}} is for. When {{context/show}} needs to see whether a template is indeed a context label (because of the naming conflict that still exists), it calls the label template and passes it sub=test. This causes the label to call {{context test}} rather than {{context helper}}, and it will return the text "valid context label". {{context/show}} then checks for that text and considers the label valid if so. —CodeCat 14:51, 16 June 2013 (UTC)

Thanks for all your work, and, perhaps more importantly, for your initiative. One question: It looks as though {{context labelcat}} still works fine. Is that your intent for the future? If not, what are you thinking of doing with its current uses?​—msh210 (talk) 04:22, 17 June 2013 (UTC)
I don't really know. What purpose does it serve exactly? —CodeCat 12:23, 17 June 2013 (UTC)
[e/c] It displays the context template's label and categorizes the entry in the context template's category, but does so without parentheses or italics. It's used in usage notes ("Considered {{informal|sub=labelcat}} when construed with for" or whatever) and in some templates.​—msh210 (talk) 16:46, 17 June 2013 (UTC)
On pages like [[clamour]], it's used by {{alternative spelling of}}. On pages like extract the urine and [[enchanted]], it's used by context labels via syntax like {{obsolete|sub=labelcat}}, which allows people to put the labels in usage notes (and suppress their parentheses and italics) rather than on the definition lines and/or in the POS sections where the labels would in many cases be more at home. On pages like [[gáagii]], it's used (by context labels) in etymologies, where a dedicated, categorising etymology template vaguely like {{borrowing}} might be better. - -sche (discuss) 16:42, 17 June 2013 (UTC)
So... do we really need it? Or is there another, better way of doing it? —CodeCat 18:06, 17 June 2013 (UTC)
IMO, we don't need it. It would make more sense for {{alternative spelling of}} to apply categories and display text on its own, without invoking {{British}} etc via {{context labelcat}}. Many uses of the template in usage notes should be deleted in favour of regular uses of context templates on sense lines. Even if a few uses are left over once that's done, I expect they'll be very few (because the template only has <150 mainspace uses even now), and to replace them, people could just write out 'jocular' by hand without encasing it in brackets and appending |sub=labelcat, and add any necessary categories manually. (The few etymologies which use it to describe parts of words as onomatopoeia could use a dedicated onomatopoeia template and/or write out any necessary categories manually.) - -sche (discuss) 20:09, 17 June 2013 (UTC)
I agree with -sche (just above) that the uses in etymology sections can be converted to 'manual', though template use is certainly more editor-friendly and it'd be a shame to see the template go. But {{alternative spelling of}} and {{eye dialect of}}'s use of it is one that allows those templates to include any regional context tag, including such as are created in the future, and IMO that's an important feature of those templates which we should definitely not lose. So either {{context labelcat}} or an equivalent (i.e., some other template that does what it does, reading any (at least regional) context tag, displaying its label, and categorizing) is necessary.​—msh210 (talk) 05:49, 18 June 2013 (UTC)
Question: is there any reason one might want {{context labelcat}} to not work? I mean: It sounds from the above that {{context}} is now stable. In that case, {{context labelcat}} should be fine. Is that correct? If not, what further changes might be desired to {{context}}?​—msh210 (talk) 05:49, 18 June 2013 (UTC)
I did some checking and it turns out that {{rare|sub=labelcat}} produces the same as {{rare}} itself. So we don't really need the extra template. —CodeCat 12:26, 18 June 2013 (UTC)
Perfect. Thanks for checking; it looks like you're right. If that's to be true for the foreseeable future — is it? — then we can simply redirect {{context labelcat}} to {{context helper}} and no further work is necessary for this.​—msh210 (talk) 17:10, 18 June 2013 (UTC)
Previous discussion (example of non-free file): File talk:Far Side 1982-05-28 - Thagomizer.png.

In the past, one user proposed to delete the small number of "fairly used" copyrighted files en.Wiktionary hosts locally, citing the fact that en.Wikt did not have an EDP (exemption doctrine policy, allowing copyrighted images to be hosted locally and fairly used) of the nature required by the WMF. In response, I drafted Wiktionary:Non-free content criteria, based on Wikipedia's EDP, but heavily adapted to Wiktionary. The deletion discussions were closed with the files kept... and discussion of our EDP petered out. We are still listed on meta:Non-free content as having a "draft proposal only [on which] consensus has not been reached". So: do you support the non-free content policy I drafted? And/or would you propose a slightly or significantly different policy? Or do you think en.Wikt should not host non-free files locally under any circumstances? Let's see if we can get consensus for a EDP (whether it's my draft or not), or if consensus is that we shouldn't host non-free files. - -sche 21:17, 16 June 2013 (UTC)

Do we need a formal vote or will a poll on this page suffice? DCDuring TALK 22:16, 16 June 2013 (UTC)

Support

  1. Symbol support vote.svg Support DCDuring TALK 22:16, 16 June 2013 (UTC)
  2. Symbol support vote.svg Support. — Ungoliant (Falai) 22:51, 16 June 2013 (UTC)
  3. Symbol support vote.svg Support. Points 2 and 3 are important (no free equivalent; minimal usage). We ought to be able to do our job as a dictionary with little or no use of non-free content, and that will remove a source of possible trouble. Equinox 00:23, 17 June 2013 (UTC)
  4. Symbol support vote.svg Support (current version or substantively similar).​—msh210 (talk) 04:29, 17 June 2013 (UTC)
  5. Symbol support vote.svg Support. User: PalkiaX50 talk to meh 04:59, 17 June 2013 (UTC)
  6. S Michael Z. 2013-06-17 15:40 z
  7. Symbol support vote.svg Support. --Haplology (talk) 14:22, 18 June 2013 (UTC)
  8. Symbol support vote.svg Support. The fact that we may only very rarely have a reason to use such material is no reason not to have a policy for its use where such use is legal. bd2412 T 14:45, 18 June 2013 (UTC)
  9. Symbol support vote.svg Support. Much as BD2412 said, we don't do this much, which is probably a good thing, but sometimes there aren't really other options (c.f. File talk:Far Side 1982-05-28 - Thagomizer.png), so not having a clear policy would be a mistake, IMHO. -- Eiríkr Útlendi │ Tala við mig 18:28, 19 June 2013 (UTC)
  10. Symbol support vote.svg Support. It is very rare that Wikionary needs to use non-free content but "very rare" != "never" so we need a policy to cover those situations. Thryduulf (talk) 12:27, 20 June 2013 (UTC)

Oppose

  1. Symbol oppose vote.svg Oppose In my opinion we don't need fair use images at all. From what I see, we only have one right now, so... yeah. -- Liliana 12:27, 17 June 2013 (UTC)
  2. Symbol oppose vote.svg Oppose We don't need non-free media here. Let's keep it "pure". This, that and the other (talk) 09:54, 19 June 2013 (UTC)
  3. Symbol oppose vote.svg Oppose There are more important things than putting lots of pictures. --KoreanQuoter (talk) 18:29, 22 June 2013 (UTC)
  4. Symbol oppose vote.svg Oppose Wiktionary should remain as free as possible. --Ivan Štambuk (talk) 19:14, 22 June 2013 (UTC)

Abstain

  1. Symbol abstain vote.svg Abstain no strong feelings. Minimal usage is a good idea in case there's a MediaWiki ban on all such images (or files, not necessarily images) so we can remove them quickly if we need to. Mglovesfun (talk) 11:29, 18 June 2013 (UTC)
  2. Symbol abstain vote.svg Abstain. --Dan Polansky (talk) 19:45, 20 June 2013 (UTC)

Arabic dictionary (Sakhr) down but its data can be useful

It used to be the best online Arabic dictionary. It's the only comprehensive dictionary that consistently provided pronunciation (with vowel points) for most of the words. Others have so far failed to do it. I've been in contact with them in the past. When it went down, I've contacted them, they replied a year ago that they were still fixing it. So far, no progress. I hope we can hold of the data and import it into Wiktionary. I made another contact today in hope they can release the data:

Dear Sir/Madam,

القواميس Under Construction

One of the links above says "under construction", the other never returns anything.

The dictionary has been offline for quite a long time. Will it ever be back up again? Is there another site where the dictionary is working.

If there are no resources to restore the dictionary, are you able to release the data, so that it can be used elsewhere?

There are two possibilities - the English Wiktionary:

English Wiktionary or OMITTED If your data is in a readable format, it can be reused, so that learners of Arabic could still use it.

Please let me know if you're able to release your data and on what terms or please advise about the progress with restoring the dictionary.

Signed --Anatoli (обсудить/вклад) 03:59, 18 June 2013 (UTC)

Who wrote this dictionary, and is it an original work? Or did they compile it from various sources? DTLHS (talk) 04:29, 18 June 2013 (UTC)
The approach to create this dictionary was similar to Wiktionary, EDICT (ja), CEDIC (cmn). Various users added their contributions but I'm not quite sure, as the volume was quite big, so they may have some initial data from somewhere. I could find a lot of various words there in their lemma form with Arabic short vowels written, so that a person knowing the letters could read. It wasn't too smart, as it didn't separate various senses but every word's translation was split into parts of speech (Arabic word). A user with very basic knowledge of Arabic could find what they were looking for.
I'm waiting for their response but wanted to advise that some major importing work may be forthcoming and in case there are any licensing issues. Also, in case anyone found any other decent comparable resource (I doubt there is). العربية (Arabic) - WordReference Forums is as close you can get to it, it has sample sentence but only some words have marked pronunciation - the main hurdle in learning to read Arabic well is not the alphabet but missing letters, one has to know those words, grammar and patterns. --Anatoli (обсудить/вклад) 04:55, 18 June 2013 (UTC)

That's it really. Seems uncontroversial enough so I'll get on with it soon unless someone objects. Mglovesfun (talk) 11:28, 18 June 2013 (UTC)

How you mean "misused"? Do you mean used without explicit {{context}}? Without lang= tag? Wrong section? Used other than is a definition line? DCDuring TALK 14:02, 18 June 2013 (UTC)
In this case, the context labels are being used as grammatical labels on the headword line. —CodeCat 14:04, 18 June 2013 (UTC)
Is that a misuse? I thought that usage and grammatical labels could be applied to entire entries or individual senses. Michael Z. 2013-06-18 15:26 z
Yes, that was my understanding as well. I put it on the headword line if there are multiple definitions/translations and they all have the same transitivity. SemperBlotto (talk) 15:30, 18 June 2013 (UTC)
Should headword templates incorporate this for basic info, like (in)transitivity of verbs? Or should they be able to accept any usage or grammar label as a parameter? I suspect the most important consideration here is a consistent UI for editors. Michael Z. 2013-06-18 15:50 z
That would be good. {{fr-verb}} accepts type= (but not for (in)transitive), {{it-verb}}, {{de-verb}}, {{es-verb}} and {{pt-verb}} don't (yet) support this. SemperBlotto (talk) 16:03, 18 June 2013 (UTC)
But verbs are not intransitive or transitive. Senses of verbs are. So this information belongs on the definition lines. —CodeCat 16:14, 18 June 2013 (UTC)
And so, presumably, nouns are not masculine, feminine or neuter, countable or uncountable, only their meanings are. SemperBlotto (talk) 16:17, 18 June 2013 (UTC)
I don't understand the nuance of the meta-semantics, but if a label applies to all senses of a term, then isn't it clearer for the reader if the label is applied at the headword? I believe print dictionaries do it thus. Are our usage and grammatical labels clearly in a different class from mainly-headword labels like m, uncountable, plural only, plural ---, or superlative most ---Michael Z. 2013-06-18 18:03 z
Because no one has done so explicitly, I'm objecting to this "fix", whether by bot or not. (I agree with Mzajac and SB.)​—msh210 (talk) 17:13, 18 June 2013 (UTC)
Semper's right on this one, in French and I believe some other Romance languages, there are verbs that can only be used transitively or only used intransitively. I picked the wrong fix. Mglovesfun (talk) 18:06, 18 June 2013 (UTC)
How about just qualifier? The templates {{transitive}}, {{intransitive}} and {{reflexive}} don't categorize anyway. Mglovesfun (talk) 18:18, 18 June 2013 (UTC)
But other context templates are also used (and also correctly) on the headword line. And {{context|transitive}} is perfectly correct there: there's no need at all to change it to {{qualifier|transitive}}: I don't see why you're calling this a "fix".​—msh210 (talk) 18:29, 18 June 2013 (UTC)
If you consider {{context}} as a definition-line only template (which I do) it is a fix. If you don't, it isn't, I grant. Mglovesfun (talk) 13:40, 19 June 2013 (UTC)
Yes, but I don't really consider (in)transitive etc. to be context labels. SemperBlotto (talk) 13:43, 19 June 2013 (UTC)
Does that mean you would support {{qualifier|transitive}} and so on? Mglovesfun (talk) 13:46, 19 June 2013 (UTC)
As I understand the implicit total proposal, we are to ignore the names of these formatting/categorizing templates and instead treat {{context}} as if it were named {{definition-line context}} and {{qualifier}} as if it were named {{inflection-line context, et al}}. DCDuring TALK 15:48, 19 June 2013 (UTC)
Well, using them interchangeably would defeat the object of having two templates, not one. Mglovesfun (talk) 21:21, 19 June 2013 (UTC)
No one's saying they're interchangeable. {{qualifier}} — q.v.! — qualifies a synonym or relterm or the like with a register or region or the like. {{context}} does the same for definitions. That's per the template documentation and accepted practice. Obviously (from this conversation), I'm not the only editor who's extended {{context}} to apply not only to single definitions but to blocks of definitions — by putting context labels atop definition lists, on headword lines. That certainly seems more reasonable than applying a template to headword lines that's meant for relterms.​—msh210 (talk) 22:04, 19 June 2013 (UTC)

Okay, this is really confusing, and some of the conversation doesn't even make sense to me. The problem starts with "context labels are being used as grammatical labels." I don't know what that means. As far as I know, we have these kinds of labels:

  • Usage labels, including subject-area labels, using {{context}}. These indicate that a term, a sense of a term, or a spelling is restricted in usage to a particular period, genre, technical subject, region, social situation, or other. Some usage labels qualify others, like chiefly.
  • Grammatical labels, which indicate a grammatical quality of a term or sense. They are applied by headword templates (e.g., m, f, n, pl, sing) or using {{context}} in either the headword line or a sense line. They have nothing to do with "context," and the term "grammatical context label" appears to be nonsense. When a grammatical label is added to a headword line, it looks awkward because that usually creates two adjacent sets of round brackets.
  • Indicator labels, merely indicating which particular sense of a term is being referred to. They appear in the header of a translation section, or next to a linked term in a list, and consist of a concise gloss of the definition, or a copy of a sense's usage or grammatical label. They are sometimes enclosed by {{qualifier}} or {{sense}}, but I'm not clear on the difference (apparently they both "qualify").

Do we all see it the same way? Does anyone have a substantially different picture of all this? Did I miss anything?

If we don't get straight what we are talking about, then we can't understand what it is, or what we are saying to each other. I recommend we stop saying "context" at all. Michael Z. 2013-06-19 22:12 z

There are numerous instances in the entries for English polysemic terms of sense-specific information, such as concerning complements, which has, for some time, possibly ab ovo, used {{context}} to avoid multiple sets of parentheses and diverging styles of formatting for such information. We also have semantic scope, eg "(of animals)", indicated in the same way, for the same reasons. I am a little concerned that, at this late date, the legacy uses of {{context}} come as a surprise to anyone implementing what I'd supposed was an updating and performance improvement of the templates, not a simplifying reversion of capabilities. DCDuring TALK 23:16, 19 June 2013 (UTC)
Don't be surprised. CC is necessarily doing a lot of cleanup, and that requires dealing with every possible edge case, and ambiguous or rare situation. Admirable, considering that we've never been able to agree on what "context" means. So let's keep trying. Michael Z. 2013-06-19 23:34 z
Well, Conrad had a system of monitoring the counts of words used within {{context}} that did not have a specific template associated with them, which was used as a basis for creating new templates. Has that fallen into disuse, like so much of our infrastructure? It would seem that such a system would provide useful data about the needs that an improved "context" system would work. DCDuring TALK 23:48, 19 June 2013 (UTC)
@Michael: the difference between {{sense}} and {{qualifier}} is that sense goes before words in ===Synonyms=== and ===Antonyms=== sections and indicates which sense (of the entry one is on) the following word is a synonym/antonym of. {{qualifier}} often goes after words and tells how their usage is restricted, although in some lists (e.g. of alternative forms), it's placed at the front of the list rather than the end, to better indicate that it applies to the whole list. For example, in the synonyms section of [[iron]], there's this: * {{sense|tool for pressing clothing}} [[flatiron]] {{qualifier|old-fashioned}}, [[smoothing iron]] {{qualifier|old-fashioned}}. That {{sense}} is intended to be followed by things rather than to follow them is evident from the colon that comes after the closing parentheses of the text it produces. - -sche (discuss) 00:30, 20 June 2013 (UTC)
Thank you. We should try to use the same text in {{sense}} and in translation section headers. E.g., in iron#Synonyms: "strong of will, inflexible," and "made of the metal iron," but in iron#Translations: "strong, inflexible," "made of iron." Michael Z. 2013-06-20 02:07 z
Why not make it fit in with {{senseid}} so that they share glosses? —CodeCat 02:15, 20 June 2013 (UTC)
Wiktionary:Context labels and Template:qualifier/documentation seems to clear this up nicely. Mglovesfun (talk) 11:10, 22 June 2013 (UTC)

Wikidata

There's a brand new draft proposal for support for the Wiktionaries from Wikidata: [4]. This one is different from previous proposals and it is quite concise so I urge everyone to take a look. --Haplology (talk) 17:49, 19 June 2013 (UTC)

A few questions:
  1. Can this be enforced upon English Wiktionary regardless of local community's consent?
  2. Are editors from all of Wiktionaries supposed to get involved to iron out the flaws before this feature gets activated?
  3. Will importation of Wiktionary data to WikiData necessarily involve elimination of what is perceived as a "duplicate" from the main project, similar to what has been done with explicit interwikis on Wikipedia? Does that also imply that any kind of future editing/restructuring of content imported thusly will be taking place not on Wiktionary, but on a related WikiData page? --Ivan Štambuk (talk) 22:44, 19 June 2013 (UTC)
It seems to be a "it's-there-if-you-want-it" (sorry for all the hyphens, I know) approach: "The Wiktionaries would be able to access the data about words and meanings (and also items, actually, for what it's worth) through Lua. It would be completely up to the communities of how they want to use Wikidata data in their Wiktionaries." --Haplology (talk) 02:45, 20 June 2013 (UTC)
  1. As Haplology said you'll be able to use it. If you do use it is up to you.
  2. If you want it to be great and useful for you then yes please help iron out all the issues with it.
  3. That'd be up to the local community to decide. --Lydia Pintscher (WMDE) (talk) 12:55, 20 June 2013 (UTC)

I, for one, think it would be nifty if we used Wikidata as the handy repository for all our transwiki linking needs. It would be much easier than manually adding every new link from a new language to every article in which it belonged. bd2412 T 20:32, 20 June 2013 (UTC)

Transwikis are the one item I've seen proposed that might actually be useful, but they're a long way from making it practical for Wiktionary. --EncycloPetey (talk) 21:45, 24 June 2013 (UTC)

Bad section nesting in template documentation

I just changed Help:Documenting templates and modules and Template:documentation/preloadTemplate to use level-two (==) section headings for "Usage" and (in the second case) "See also", since these sections should immediately follow the page title (a level-one header) on both the /documentation subpage and the template pages themselves. The third level would have been correct if the documentation was preceded by a level-two "Documentation" section heading, but (unless I have missed something important) that's not how it's being done. This has left a great number of template documentation subpages (and template pages into which they are transcluded) with badly nested section headings. (For example, Template:sense/documentation contains a level-three "Usage" section followed by 2 level-two sections. Madness! [g]) How should we fix this problem? A bot? Can someone at least generate a list of pages containing badly nested sections (only those of the form "Template:*/doc" or "Template:*/documentation", for what I'm talking about)? - dcljr (talk) 04:08, 21 June 2013 (UTC)

How should the "new" context labels work? Please discuss!

Almost all uses of {{context}} now have a language code specified. So the first stage is done and we can now start looking at further questions:

Name of the template

What do we want to call the new template? Do we want to keep the name {{context}}, or use {{label}}, or something else? We can also use one name as the main name, but another name as a shortcut for convenience. Because the language code templates are now mostly orphaned, we can use any of their names as well, so we have a lot more choices like {{lb}} or {{lbl}}. We also have a module to replace the gender templates, so we could also re-use {{c}} if we want to.

In the original proposal I made, I wanted to make the language code the first parameter of the template rather than named like it is now. So {{context|...|lang=en}} would become {{context|en|...}} (or whatever name we use for the template). To ease the transition, it may be beneficial to choose a different name for the new template, so that {{context}} still takes the same language parameter it always did, while the new template takes the new numbered parameter. On the other hand, we could also just convert {{context}} to work both ways for a while until everyone becomes accustomed to the new method. We can say "use lang= if it's present, otherwise use the first parameter". {{prefixcat}} also works that way currently. —CodeCat 19:54, 21 June 2013 (UTC)

Please, not "context." It doesn't accurately describe usage labels. It is confusing. It has nothing to do with grammatical labels. No one has ever explained what "context" means in this context.
"Label" is fine. We are using this template for two rather different types of labels for terms and senses.
What if we had two different templates, {{label-usage}} or {{use}}, and {{label-grammar}} or {{gram}}? This might keep everything clearly ordered in the entry for readers, help separate the functions for editors, and facilitate categorization, and provide a sensible way to split up the database of labels. Michael Z. 2013-06-23 23:54 z
That last proposal would mean showing two separate pairs of brackets whenever an entry has both a usage and a grammar label. Is that what you want? —CodeCat 00:39, 24 June 2013 (UTC)
That could be improved. We already show two sets of brackets when a label is placed after a headword template, which doesn't look right. Grammatical labels should probably be rolled into headword templates, so they appear within those brackets. That's on the "later" list. Michael Z. 2013-06-24 03:22 z

Internal structure of the Lua data module

This will need some careful consideration because it will affect how flexible things become. We will want to keep the flexibility of our current system at the very least. That means the following things:

  • Unrecognised/undefined labels should be shown as given. But what if someone creates a new recognised label that just happens to be already used in another entry? How do we find out which entries use a certain label?
  • There should be support for "modifier" labels which cause the following comma to be omitted, such as _, and, or. It may be possible to extend this system in Lua so that it allows any label to "modify" the label that follows it in any way we choose. For example, there are many entries that specify "with dative" and "with accusative" and so on. We could make a label "preposition with" that automatically categorises the entry based on the label that follows it, so that {{context|preposition with|dative|lang=de}} adds the category Category:German prepositions that take the dative or something similar.
  • Multiple labels should be able to be treated as aliases, so that we can use multiple names for the same underlying label. "law" and "legal" should be the same, for example. We currently use redirects for this purpose, but we will need to find a substitute in Lua because it doesn't have redirects as such. I think it would be good to have one table to contain the actual labels, and another to contain aliases. That way we can keep them clearly separate while making it easy (programming-wise) for a module to find the "canonical" name of any alias.
  • Labels currently allow one topical category (en:Foos), one grammatical category (English foos), one regional category (Fooish English) and one "bare" category (Foo). We could keep this more or less the same, but it might also be desirable to allow more than one category for a single label. We could also decide to remove the distinction between the types of category, and specify the categories (in the Lua data module) as something like {{{lang}}}:Foos or {{{langname}}} foos, which gives us more freedom to format the category names the way we like.
  • We may want to make it possible for a single label to "expand" to multiple sub-labels. For example, {{ambitransitive}} is really two labels in one. This will add some complexity, so we can also decide that it's not worth it and simply encode the few labels that need this as if they were really one label that contains a comma (like now).

CodeCat 19:54, 21 June 2013 (UTC)

Regional labels

Regional labels are, in principle, very open-ended. It can be rather cumbersome to create a label for every dialect of every language we come across. There might eventually be thousands of them, and it can become hard to manage. I have thought of a way to mitigate this, by allowing a special prefix to specify that a label is a regional label. Something like {{context|r:British}}. The module will recognise this prefix and treat it specially; it will not need a label to have a category, but it will automatically be treated as a dialectal term. An alternative to this is to use the extended "modifier" labels above for this purpose, such as a "used in region" label that is then followed by another label to specify the region name. We would need to think of a nice way to say "used in region" when there are multiple regions, though.

Even if we don't do the above, we probably also want to get rid of {{British English}} and similar labels which contain the name of the language within them. That is really redundant because there is already a language code. So instead of {{context|Northern England|lang=en}} and {{context|Northern Dutch|lang=nl}}, why not just use {{context|Northern|lang=en}} and {{context|Northern|lang=nl}}? —CodeCat 19:54, 21 June 2013 (UTC)

I'm not convinced it's necessary to prefix regional labels, or that 'regional labels' is a more open-ended class than 'non-regional context labels'... one equally open-ended class of labels that comes to mind is 'temporal labels' which modify {{historical}} or other templates such as {{military}}, for terms which are used e.g. "especially [in reference to] the Vietnam era" (which is not the same as {{defdate|1955–1975}}, the label for words which fell out of use after the Vietnam War). OTOH, I'm not necessarily opposed to it. I can see how it would be beneficial to store such large categories of labels in a different module or section of the module, just to make things more übersichtlich. - -sche (discuss) 21:17, 21 June 2013 (UTC)
PS, {{context|Northern|lang=en}}{{context|Northern England|lang=en}}. {{context|Northern|lang=en}} could be "Northern United States", "Northern Canada", "Northern Australia"... and recent discussions of how to categorise 'Commonwealth' English have suggested that 'British English' may not be the same as (and/or may be worth distinguishing from) 'British'. - -sche (discuss) 21:17, 21 June 2013 (UTC)
Ok, but the current way the template works, regional labels are formed by adding the name of the language after the name of the label (possibly modified). So, {{British}} creates its category as British + English. {{context|British|lang=fr}} creates Category:British French. In that respect, having "English" at the end of the label is redundant. —CodeCat 21:48, 21 June 2013 (UTC)
But {{context|British}} displays as "(UK)" which is not always helpful. Dbfirs 21:41, 22 June 2013 (UTC)
Indeed, it's distinctly unhelpful, and I thought someone was going to update it to display as "British" instead. (We are, however, straying from the topic at hand.) - -sche (discuss) 04:32, 23 June 2013 (UTC)
I thought so too, but nothing happened. I'm tempted to remove the "|UK" from label=British English|UK, but I don't know what effects it might have elsewhere, so I haven't tried it. Where are our template experts? (Apologies for straying off topic). Dbfirs 07:12, 23 June 2013 (UTC)
British and UK aren't the same thing, per pretty much everyone. Mglovesfun (talk) 09:06, 24 June 2013 (UTC)
Regional labels can't be completely separated from other usage labels. They overlap with socio-cultural, socio-ethnic, temporal, media type, and other usage. Overt examples include African American Vernacular English, British spelling (specific to written language), Helsinki slang, Multicultural London English. Many regionalisms are directly a result of other factors, like politics and administration, cultural and religious history, etc.
Most regional usage labels are naturally organized as a hierarchy, and the categorizing can be limited to a lower level. For example, even though jambuster is used in Manitoba and northwestern Ontario, the very-specific label is designed to classify it as Canadian English.
Ideally, each label or combination would carry a specific definition. For example, southern + US isn't just the southern half of the USA – it is the South, whose particular boundaries and definition result from its history. Michael Z. 2013-06-24 14:46 z

Automatic topical categorisation

Something we may consider for the more distant future is automatic recognition of topical labels. Our repertoire of topics is determined by {{topic cat}}, which we probably want to Lua-cise as well at some point. There is no reason that its data module could not be shared, though. We could use that module for labels, so that any label that matches a topical category name will automatically be categorised in that category. {{context|clothing|lang=en}} would then automatically add the entry to Category:en:Clothing if "clothing" is recognised as a valid topic by {{topic cat}}. This won't happen anytime soon but it is something we can consider. —CodeCat 20:00, 21 June 2013 (UTC)

New l

There's an upgraded version of {{l}} here, it has more features and is faster. The template is backward compatible, so nothing will break if we replace l's code with it, and I hereby suggest to do this. Any thoughts are welcomed. --Z 09:34, 22 June 2013 (UTC) (edited --Z 10:40, 22 June 2013 (UTC))

There's a script error at the bottom; everything else looks bloody marvelous. Mglovesfun (talk) 10:31, 22 June 2013 (UTC)
That's fixed, I don't know why it's still there, try opening Template:l/beta/documentation or click "Edit" > "Show preview". --Z 10:40, 22 June 2013 (UTC)
Bit of lag I suppose: yes it works fine now. Mglovesfun (talk) 11:08, 22 June 2013 (UTC)
I trust you mean "nothing will break if we replace the current code of {{l}} with its code" rather than the other way around. —Angr 10:36, 22 June 2013 (UTC)
Yes, thanks. --Z 10:40, 22 June 2013 (UTC)

BTW, there's a similar template for term, {{term/t}}. --Z 10:45, 22 June 2013 (UTC)

I hope we replace {{l}} and {{term}} with these excellent new templates soon. Will make editing for me much easier. --Vahag (talk) 11:02, 22 June 2013 (UTC)
The name {{term/t}} is confusing, though I suppose it's not permanent, because I assume t mean translation as in {{t}}. Mglovesfun (talk) 11:12, 22 June 2013 (UTC)
Actually it stands for "temporary". :) --Z 11:16, 22 June 2013 (UTC)
What about existing terms that start with *? —CodeCat 12:01, 22 June 2013 (UTC)
Use &#42; -- {{l/beta|en|&#42;nix}} > *nix. --Z 12:11, 22 June 2013 (UTC)
Ok, so those need to be fixed first. —CodeCat 13:14, 22 June 2013 (UTC)
I took a look and there was nothing to fix, we only have 6-7 entries that start with "*", none of which are linked (and probably will be linked) by Template:l, they are English. --Z 16:58, 22 June 2013 (UTC)
Can this template be made to have the functionality of {{l-self}} in inflection tables, viz. a link that appears on its own page appears linkless and in bold instead of linked and in blue? —Angr 21:28, 22 June 2013 (UTC)
I added that to a test module, (if you are in WP:BP, go to Wiktionary:Beer parlour/2013/June to see this test correctly) {{l-self/sandbox|la|link to current title [[Wīktiōnary:Beer pārlour/2013/Jūne]]}} > link to current title Wiktiōnary:Beer pārlour/2013/Jūne but the feature makes language_link's code a bit ugly (I'm not sure where is the best way to handle it, in language_link(), outside of it, or in a new function) so I didn't add it to Module:links. --Z 11:22, 23 June 2013 (UTC)

To update {{term}} with {{term/t}} and in a way that it takes the language code as the first parameter we first need to check for all usages of {{term}} without the "lang" parameter, and add "|lang=" to them, by bot, so that all lang parameters would have a value. --Z 13:57, 23 June 2013 (UTC)

Um... Category:term cleanup? The problem is what code to add; it's not possible for a bot to figure out what language a word is in. —CodeCat 14:53, 23 June 2013 (UTC)
I meant exactly "lang=", the value is an empty string (we can also add "und"). --Z 14:56, 23 June 2013 (UTC)
Actually it is also possible to guess the lang to some extent (for example when term is used just after etyl), but the change still needs to be checked by humans so it can be a JS tool or something. --Z 15:01, 23 June 2013 (UTC)
I'm not sure what the benefit of that would be, honestly. For the template, it doesn't matter whether the parameter is empty or just not there. And we did already try to guess the language in many cases. That category contains what is "left" after all of that. Still quite a lot to do. —CodeCat 15:15, 23 June 2013 (UTC)
The benefit would be that we can make term/t (and therefore term) backward compatible -- if the parameter "lang" is not provided, then the first parameter will be treated as the language code, otherwise it would be considered as the target page. --Z 15:20, 23 June 2013 (UTC)
We could also just make {{term}} call {{term/t}} with different parameters. —CodeCat 15:22, 23 June 2013 (UTC)
There's no need for that, the problem is not making term/t's parameters identical to term, we can do that right now. But I want to get the rid of this horrible "lang" beside Lua-izing term. --Z 19:18, 23 June 2013 (UTC)
Excellently done. Support upgrading to l/beta. --Yair rand (talk) 21:40, 23 June 2013 (UTC)
Does it handle artificial languages ok when they belong in an appendix? Also, what happens when you specify a reconstructed language but the term is not preceded by *? This should trigger an error, ideally. —CodeCat 22:32, 23 June 2013 (UTC)
For what people are expecting from {{l}} in particular, yes it works ok. I was not sure what to do in that case (consider that a valid input and link to its appendix, or showing an error) but I think it should be considered an error, otherwise inputs would be inconsistent. I added that. --Z 07:53, 24 June 2013 (UTC)

A code for Vulgar Latin

We have been creating more entries for reconstructed Vulgar Latin lately, but we still use the code "VL." for it. That is rather inconvenient and somewhat inconsistent as we are really treating it as a distinct language with its own name and language header. So I propose that we create a separate language code for this language. An obvious candidate would be "roa-pro", but other ideas are also welcome. —CodeCat 13:42, 22 June 2013 (UTC)

But it was contemporaneous with Classical Latin and spoken, in some dialect, in all the places where Classical Latin was and shared most grammar and a great deal of the vocabulary. And they shared the same army. How is it a separate language? DCDuring TALK 16:43, 22 June 2013 (UTC)
It's a separate language because we treat it as one. This request is a practical consideration, not a theoretical one. —CodeCat 16:50, 22 June 2013 (UTC)
I was thinking maybe we should go the other way; turn all the headers into ==Latin== and use {{context|Vulgar Latin|lang=la}} in the entries. Mglovesfun (talk) 17:04, 22 June 2013 (UTC)
That's also possible, but we probably don't want to use the same inflection tables for those entries, because use of cases and verb forms was somewhat different in Vulgar Latin, as were the endings themselves. The ablative case was no longer distinct for example, except maybe as a relic formation, and genitive and dative were merging or had merged. Some Romance pronouns also descend from case forms that were created analogically within VL and never made it into writing, such as French lui. There is more at Wiktionary:AVL. —CodeCat 17:29, 22 June 2013 (UTC)
I'd prefer to see it all under a single language header of ==Latin==, but do acknowledge the need for some different templates, almost as if it were a separate language from Classical Latin. The problem is that one can't completely separate the two, so that for example, some words in Classical Latin have a different inflection under Vulgar Latin. Do we then duplicate the contents of the Classical Latin under Vulgar Latin, only with different pronunciation and inflection? Wouldn't it be simpler to have a Vulgar Lation inflection table added? Further, it opens a can of worms concerning Ecclesiastical, Medieval, Renaissance, and Modern Latin, which also have differences from the Classical. Unless a proposal deals with the gamut of the language, I don't think I could see treating ==Vulgar Latin== entries as particularly feasible. --EncycloPetey (talk) 21:52, 24 June 2013 (UTC)

You know

I'm very afraid to think that the only moderator of the Korean Wiktionary is missing for quite a long time. Besides there are only two main contributors of the Korean Wiktionary and this includes me. --KoreanQuoter (talk) 18:32, 22 June 2013 (UTC)

Is there anything there that needs to be done that requires sysop rights? -- Liliana 18:37, 22 June 2013 (UTC)
Not at the moment but the templates need extensive updates. I don't know but the Korean Wiktionary is stuck the same since 2011. Let's not forget that the whole Korean Wikipedia community is disintergrating due to some members most likey suing each other under the Korean laws. --KoreanQuoter (talk) 18:59, 22 June 2013 (UTC)
Haha wow. Is it really that bad over there? I am really interested in the background and how it came to that.
It just so happens that I do have sysop powers at Korean Wiktionary, though I'm not too fond of using them. But if there's anything urgent then I guess it's okay. -- Liliana 19:59, 22 June 2013 (UTC)

Do we want "imperfective" and "perfective" to be treated as genders?

I noticed that some translations of verbs into Russian use imperfective and perfective labels as the second parameter of {{t}}, indicating that they are treated like genders. This is not our usual practice, but is it something that we want to adopt? There is no technical restriction against it; we can add "impf" and "pf" to the list of valid gender codes in Module:gender and number and then these codes will work fine. So this is more a question of, do we want it that way? I am a bit unsure about it myself. While on one hand these labels are very useful for the Slavic languages (and maybe others as well), there is the danger that we might end up extending this into other types of verbs like frequentative, durative, stative, inchoative, causative and so on. And I'm not sure if we want to indicate just any arbitrary verb type on a translation or headword line. It could become quite messy if we did that. —CodeCat 20:51, 22 June 2013 (UTC)

I noticed that this method was used and I think that's a good idea to add impf. and pf. directly into {{t}}. (for some reason the second one is always with a dot, as in pf.). The problem is only that the template doesn't allow both impf. and pf., so such verbs are marked impf. / pf., which User:Kephir/gadgets/xte currently doesn't like. I'd like this to be adopted. It would definitely benefit all Slavic languages and seemingly Georgian. I don't know if any other language groups have imperfective/perfective pairs of verbs. All Slavic verbs are either imperfective, perfective or both (a smaller number), which affects their usage (sometimes complicated) and grammar (e.g. perfective verbs don't have a present tense).
I don't see any other verb labels to be overused. Transitive and intransitive are usually marked with {{qualifier}}. Japanese causative verb link to their normal lemma forms (often a noun). Slavic abstract and concrete verbs are only a small group of verbs, so they are also marked with {{qualifier}}. --Anatoli (обсудить/вклад) 22:45, 24 June 2013 (UTC)
Ok, I have added the gender codes "impf" and "pf" (without the dot), so they can now be used anywhere a gender can. You can also combine them with the other codes in silly ways like "m-p-impf" or "pr-pf-d" but of course you're not supposed to do that... —CodeCat 22:54, 24 June 2013 (UTC)

(Sorry for writing in English. You can translate the proposal.)

Should X!'s edit counter retain the opt-in requirement? Your input is strongly encouraged. Voice your input here.——cyberpower ChatAutomation 04:22, 23 June 2013 (UTC)

Distributed via Global message delivery. (Wrong page? Fix here.)
No need to apologize, English is ok here. Mglovesfun (talk) 09:04, 24 June 2013 (UTC)

Can we indicate on the main page the number of English words defined?

Currently, the main page says:

Wiktionary, the free dictionary
3,444,617 entries with English definitions from over 500 languages

I understand this to mean that the 3,444,617 entries are entries for words in all different languages. I think it would be nice to indicate how many entries we have that define English words, i.e. adding text saying, "including 1,234,567 definitions of English words". Cheers! bd2412 T 18:46, 24 June 2013 (UTC)

Actually it's 3,444,617 main namespace pages, each of which can have many entries. If we add the amount of English words, it would have to be updated manually. — Ungoliant (Falai) 18:57, 24 June 2013 (UTC)
We could have the dumps processed to count the lines in each Language's PoS sections starting with "#" (and not "#:" or "#*") and attempt to eliminate "form of" type definitions. That would yield "definitions", not lemmas, by language. If we use the dumps and keep growing, then we could honestly say something like "more than X,XXX,000 English definitions of English words" and update it periodically (monthly, quarterly?). Changes in the count between periods would be another way to monitoring activity. We could express gratitude to contributors in obscure languages, note unsanctioned reductions, etc. Whether its worth the cycles and effort I don't know. DCDuring TALK 20:05, 24 June 2013 (UTC)
Actually, I wasn't really thinking of senses, just words. "Set" has a large number of senses, but it is still a single "word". I was thinking that it would be nice to show how many English words we have definitions for, irrespective of the number of definition per word. Of course, given the number of words with many senses (and words with senses in many languages), I'm sure that if we were to count all of the actual definitions here, that would put us in the tens of millions. bd2412 T 20:29, 24 June 2013 (UTC)
I don't see how we are not misleading folks when we say a form-of entry has a "definition". What we say has the sound of marketingspeak. Furthermore, the proportion of English L2 sections that have multiple definitions is less than 10% of the total, even excluding English form-of entries, so we cannot assume that we have "tens of millions" of definitions, if we exclude form-of entries. We can't really use "lemmas" and expect normal folks to understand. We could possibly count each L2 section as an "entry" without being misleading, except for the form-of problem. It seems to me that any honest count requires some work. If we only counted carefully once a year, but also counted some percentage increase of some meaningful measures of overall size since that last count, we would not be misleading folks and convey some idea of continued growth.
None of this gets to the real problem of quality, which is probably more important to keep users coming back, especially in the competitive areas, such as against English online monolingual dictionaries, where we do not excel. DCDuring TALK 22:59, 24 June 2013 (UTC)

Templates and categories for passive infinitives?

I've recently created templates and categories for (past and present, active are still to be done) following the Bulgarian way, using template boiler. What would be the correct way to categorise Russian passive infinitive (they are some of Russian reflexive verbs with the suffix "-ся/-сь", e.g. делаться (passive infinitive of делать) or нестись (passive infinitive of нести)? Can someone help with create a correct category boiler? --Anatoli (обсудить/вклад) 22:55, 24 June 2013 (UTC)

You are receiving this email because you subscribed to this feed at blogtrottr.com.

If you no longer wish to receive these emails, you can unsubscribe from this feed, or manage all your subscriptions