Wealth Maker: Wiktionary - Recent changes [en]: Wiktionary:Beer parlour/2013/July

Wiktionary - Recent changes [en]

Track the most recent changes to the wiki in this feed. // via fulltextrssfeed.com

Wiktionary:Beer parlour/2013/July

Jul 21st 2013, 23:38, by CodeCat

Revision as of 23:25, 21 July 2013 (edit) Ivan Štambuk (Talk \| contribs)		Latest revision as of 23:38, 21 July 2013 (edit) (undo) CodeCat (Talk \| contribs)
Line 594:		Line 594:
	:::: Can you elaborate? {{User:CodeCat/signature}} 22:36, 21 July 2013 (UTC)		:::: Can you elaborate? {{User:CodeCat/signature}} 22:36, 21 July 2013 (UTC)
	::::: Since you asked, CodeCat, you seem to have an agenda that everything must me made to work differently to how it works now, even things which work perfectly well. It worries me that a lot of good infrastructure will be thrown away for your personal reasons (whatever they are) and not for the good of the wiki. [[User:Mglovesfun\|Mglovesfun]] ([[User talk:Mglovesfun\|talk]]) 22:46, 21 July 2013 (UTC)		::::: Since you asked, CodeCat, you seem to have an agenda that everything must me made to work differently to how it works now, even things which work perfectly well. It worries me that a lot of good infrastructure will be thrown away for your personal reasons (whatever they are) and not for the good of the wiki. [[User:Mglovesfun\|Mglovesfun]] ([[User talk:Mglovesfun\|talk]]) 22:46, 21 July 2013 (UTC)
		+	:::::: Maybe I should try to explain my reasons then. I am primarily concerned with consistency and making things work in a way that is the most intuitive and sensible. Some people like DCDuring complain about template-itis, and I do agree that it is rather confusing with how many templates we have. However, I argue that the confusion stems from how they all work differently from one another. If they all worked similarly, then it would reduce the mental burden on newcomers because they would not need to learn every little slight difference about all the templates, category names and so on. Instead they would be able to actually get things done because it would be easier to actually remember how all of their tools work. In this particular case, try typing {{temp\|term\|tr{{=}}something\|lang{{=}}ru}} in an entry. It will add the page to [[:Category:Russian entries which need Cyrillic script]]. Can you see what is wrong with that? That category name will be the same even if you put it in a German entry. So the "Russian entries" part is incorrect, it should be "Russian terms". And you can increase consistency further by changing "which need" to the more usual "needing" (which is used in a lot more cleanup categories). Working "minority" conventions out of the system in favour of the majority so that people no longer have to think "was it this category that was called 'which needs' or was it that other one?". It will always be "needing", no more question needed, which leaves more mental room for questions that actually matter.
		+	:::::: The other half of the reason is that I don't feel that the current structure of these categories makes sense. They may have made sense at one point, but there is always a time when you need to re-evaluate things that you once took for granted, and judge whether they really make as much sense as you thought they did. Let's say that the code above was placed on a German entry, and I am a Russian speaker and I want to fix any links to Russian terms that need the Russian spelling. Where do I go? Well, the place to start is [[:Category:Requests (Russian)]] but that category is already a horrible mess. Let's disregard that for a moment and assume that I make it to [[:Category:Russian terms needing attention]] (although there is nothing particularly intuitive about that category, since plenty of request categories are placed elsewhere in the tree). Then I see [[:Russian entries which need Cyrillic script]]. Aha, I think, that is what I am looking for, so I work through that category. But then later on, I come across its second parent category, [[:Category:Entries which need Cyrillic script]], and I find more Russian entries there. Why were those not listed in the first category? Why do I need to look in two categories to fulfill essentially the same task? Request categories should be task oriented, so that is bad organisation and it's what I am trying to eliminate with this proposal. Under this proposal, [[:Category:Terms needing native script by language]] would contain subcategories organised ''by language'', including [[:Category:Russian terms needing native script]], and both [[:Category:Entries which need Cyrillic script]] and [[:Category:Entries needing various scripts]] would disappear. There would also be no [[:Category:Russian terms needing Cyrillic script]] because Russian terms are always in Cyrillic, the script is redundant. Maybe an exception could be made for those few languages that use multiple scripts, but for the majority of languages, the script is completely irrelevant to the task. People who can write in Russian don't go adding "Cyrillic" spellings to entries, they add Russian spellings and don't care about Ukrainian or Kazakh spellings because they don't know those. {{User:CodeCat/signature}} 23:38, 21 July 2013 (UTC)

	== British/American spelling and redirects ==		== British/American spelling and redirects ==

Latest revision as of 23:38, 21 July 2013

← June · July · August →

Alternative Wikidata proposal[edit]

Hi! A colleague and me, we have prepared an alternative proposal for Wiktionary support on Wikidata It tries to address several problems that the former proposal had (see its comments).--Micru (talk) 15:15, 1 July 2013 (UTC)

I was under the impression that this vote had been effectively superseded by Wiktionary:Votes/pl-2013-03/Japanese Romaji romanization - format and content, but Liliana and CodeCat have apparently decided that it's worth holding. Please either chime in and vote, or discuss whether this *was* actually superseded and should be shelved. -- Eiríkr Útlendi │ Tala við mig 20:21, 1 July 2013 (UTC)

The point I raised on that page is that these templates violate WT:ELE. So we should either modify that page to allow {{ja-romaji}} to take the place of a definition line (which requires a vote like all significant changes to WT:ELE do), or we should not use this template in its current form. —CodeCa t 20:25, 1 July 2013 (UTC)

As I've pointed out before, in direct reply to you, WT:ELE makes no such explicit requirement. I can see how someone might read it that way, but the text of the relevant WT:ELE section is not explicit that the wikicode needs a #. -- Eiríkr Útlendi │ Tala við mig 20:36, 1 July 2013 (UTC)

My reading of CFI is same as the one of CodeCat. ---Dan Polansky (talk) 20:38, 1 July 2013 (UTC)

It was not superseded. For one thing, Wiktionary:Votes/pl-2013-03/Japanese Romaji romanization - format and content does not raise the issue of whether the wikitext should contain a definition line. For another thing, even if it did, a completed vote cannot supersede a vote that is just being started. --Dan Polansky (talk) 20:38, 1 July 2013 (UTC)
- It looks like that vote doesn't even mention {{ja-romaji}} at all, even though it's the key in this whole discussion. So it doesn't seem very relevant. Besides, the vote did not pass so it's moot. —CodeCa t 20:42, 1 July 2013 (UTC)
This bifurcated discussion feels somewhat schizophrenic. I've replied to Dan on the vote talk page; where shall we continue discussion? Ideally in one location. -- Eiríkr Útlendi │ Tala við mig 20:46, 1 July 2013 (UTC)

Eirikr, should editors working with Japanese stop maintaining/adding romaji entries entirely and focus on kana and kanji? Convert romaji entries, which have only one value to redirects and let opponents maintain romaji, if they oppose the new structure so badly and refuse to listen to editors who actually contribute in the Japanese space. It's quite discouraging and frustrating. Just a thought. Otherwise, I strongly oppose yet another proposal to undermine our efforts. --Anatoli ^{(обсудить}/^вклад) 22:49, 1 July 2013 (UTC)

Actually, Anatoli, that might be exactly the answer. :)
@CodeCat, you've put together your module for deriving romaji from a given kana string, as used by {{ja-suru}}, for instance. Could you and/or Liliana leverage that to undertake the bot-driven creation and maintenance of JA romaji entries, in accordance with whatever the outcome is of the current formatting vote? Such entries are purely simple links with no gloss; theoretically, no human need intervene, once a bot is up and running. I've toyed with the idea of getting a bot going for this purpose before, but with the state of JA romaji entries in such flux, and with everything else going on in life, I haven't gotten around to learning everything required to make a bot.

FWIW, my opposition is entirely because this change impacts JA editors almost exclusively, and because this requires changes in what human editors do, as romaji entries are currently entirely human-created and human-maintained. If romaji entries are instead entirely bot-maintained, my concerns evaporate, as does my opposition to the vote. -- Eiríkr Útlendi │ Tala við mig 23:03, 1 July 2013 (UTC)

I fundamentally disagree with the notion that this "impacts JA editors almost exclusively". I think Liliana's comment on the vote talk page is apt: you want all bot operators to code their bots to account for Japanese entries using a different basic structure than other entries. - -sche (discuss) 23:17, 1 July 2013 (UTC)

How many bots look for the specific autoformatting issues presented by {{ja-romaji}}? The *only* concrete concern I've heard about is with regard to KassadBot. If other bots are choking on {{ja-romaji}} as well, by all means, clue me in. Otherwise, Liliana's comment is purely theoretical outside of KassadBot, and wildly exaggerated to boot -- I'm not asking 999 out of 1000 computer users to switch from Windows to Linux, I'm asking those specific computer users out of the 999 Windows users (which is some much smaller number) to use a specific program when working with specific files. More in the mien of this thread, I'm not asking all bot operators to completely rewrite their bots on a completely different platform with a completely different coding paradigm, and instead I'm asking those bot operators whose bots choke on {{ja-romaji}} to add a few lines of logic to handle this specific case. -- Eiríkr Útlendi │ Tala við mig 23:26, 1 July 2013 (UTC)

A couple of things:

If you make a pause in contributing Romaji for a while, you may create more content-having Japanese entries.

Now that they are content-free, Romaji entries can be created by a bot, so if humans stop creating them manually, that will actually save human effort. If the editors previously contributing Romaji are no longer enthusiastic about Romaji, new editors will appear. After all, the only thing it takes to create Romaji with a definition line in the wiki text is to expand the current template with "subst:".

Having unified formatting helps all sorts of ad-hoc reporting over a dump using such tools as grep, sed, awk, perl oneliners and the like, regardless of whether current bots choke on disunified formatting. Reporting over a dump is often done without a connection to Mediawiki server and without having templates expanded. I have done such reporting and have no idea how to expand templates in a dump. Arbitrarily breaking assumptions that such ad-hoc reporting relies on is no good thing. I am talking from actual experience. --Dan Polansky (talk) 16:24, 3 July 2013 (UTC)

Re: leaving romaji entries be for the time being, I do hope these can be bot-maintained. I will be ignoring them for the foreseeable future, other than possibly inquiring about bots.
Re: dumps, this is very useful concrete information that either wasn't presented before, or that I missed. Without this information, I am left with the impression that this issue boils down to bot maintainers versus Japanese editors, which isn't a very useful mental model. Thank you for explaining. -- Eiríkr Útlendi │ Tala við mig 18:05, 3 July 2013 (UTC)

Here is a discussion regarding these two templates and their corresponding categories, it was suggested that we bring it up at BP. This type of categorization is useful, because it's the only thing that we know in many cases, that the source language belongs to which one of these categories of the language family -- Old, Middle, or New. Note that this is not a purely chronological division, but also a linguistic one. Similar classification exists for other language families, such as Indo-Aryan (Old Indic, Middle Indic, Modern Indic). --Z 11:00, 2 July 2013 (UTC)

If it helps others to understand the issue, think of these categories as being analogous to "modern Germanic languages" or "Celtic from the 10th to 15th century". They are genetic groupings (like a real family) but with the addition of a time frame. —CodeCa t 11:54, 2 July 2013 (UTC)

Is it ok to replace gender templates with Template:g?[edit]

A week ago I made a post at Wiktionary:Beer parlour/2013/June#Propose to use a single template for genders, but that didn't receive any attention so I was worried that if I started replacing things, others would complain. So I am now explicitly asking if it's ok to do this. Basically, all remaining transclusions of gender templates would have {{g}} prefixed to them, so that instead of the variety of templates we currently still have ({{m}}, {{f}} and so on), there would be only this one. This template would be used only in entries themselves; in templates you'd just invoke Module:gender and number directly. This change will have some consequences for scripts and bots that still use the old templates, but they will probably not be orphaned and deleted for some time, so there is no sudden rush to fix everything yet. —CodeCa t 14:11, 3 July 2013 (UTC)

What are the consequences to users, contributors, and template writers in terms of appearance, performance (eg, download time), keystrokes required, etc. If there are costs, what are the offsetting benefits and to whom?

Also, do you have a system for converting wikicoded genders, eg ''f'' to whatever your desired gender formatting approach is right now? DCDuring TALK 14:23, 3 July 2013 (UTC)

There are no consequences at all to users or template writers. This change applies only to cases where these templates are placed directly in entries for some reason. All that changes is that you write {{g|f}} instead of {{f}} if you want to put a gender in an entry. But even that is pretty rare, because most of the time you will use a headword-line template like {{head}} or {{fr-noun}}, which have their own support for genders and therefore these templates are not placed in the entry itself but are handled internally by the template. There is nothing to be done for things like ''f'', although of course it is probably desirable to convert those to another format (like the one I propose) at some point. But finding that will be hard, it would probably need someone to go through a dump (which I'm not experienced with) to find and list all the instances. —CodeCa t 14:36, 3 July 2013 (UTC)

I've made a list of all pages that use ''f'', ''m'', ''n'', ''c'', ''p'' or ''pl'': Wiktionary:Todo/Non-templatised genders. There may be uses of those strings that shouldn't be templatised, so some human inspection of them is necessary before any mass/automated changes are undertaken. - -sche (discuss) 21:26, 3 July 2013 (UTC)

No consequence = 2 extra keystrokes per insertion for the manual assertions, which are there "for some reason" that is evidently beyond understanding. I am getting weary of all the extra keystrokes. Where are the changes that save effort instead of costing effort? DCDuring TALK 00:03, 4 July 2013 (UTC)

If typing is so hard for you, why don't you save yourself some keystrokes and stop nitpicking? :/ —CodeCa t 00:10, 4 July 2013 (UTC)

Because God knows where this will stop. You've caused a significant waste of keystrokes with the changes to "context". If you don't get some pushback from someone, we will end up with one baroque system after another and no actual content contributors. I feel that our previous tech-side contributors seemed to take more care to not constantly change the user interface in one petty way after another. They seemed to enjoy actually making the site easier to use. DCDuring TALK 01:43, 4 July 2013 (UTC)

I understand and partially share your frustration. I supported the change to {{context}}, because the old {{context}} was and is dilapidated, and changing how it is called from entries is a first step to rewriting it. I agree that all the added keystrokes are a hurdle, though. Is there any reason I shouldn't redirect {{cx}} so that it can be used as a shortcut? (Recall, if you favour a different abbreviation/shortcut, that we can create more than one.) And can the switch from the current {{context|archaic|lang=foo}} format to the shorter {{cx|foo|archaic}} format be prioritised?

As for these gender templates... I tend to agree that {{m}} and {{f}} should be left as redirects to the module, so that they can continue to be invoked in entries as they have been. I don't see any benefit to replacing them with {{g|m}}, and I see the same drawback that you (DCDuring) see. And as much as I'd like to see {{c}} commandeered for use as a redirect to {{context}}, it would be counter-intuitive for some but not all of the gender templates to be able to be used directly vs through {{g}}, so it's probably best to leave it (and {{n}} and {{p}}) as gender templates. Indeed, if Polish is ever moved out of the way, {{pl}} could be added as a redirect to {{p}} ({{g|p}}). - -sche (discuss) 02:03, 4 July 2013 (UTC)

All of these templates really contain the same code, though. Just look at what is inside {{m}} and {{f}}. Why do we need many that do pretty much the same when this can be unified more clearly into a single template? More to the point, {{g}} can do more than {{m}} can. If you want to display "masculine animate" for example, you cannot use {{m}}, and since there is no {{m-an}} template, you need {{g|m-an}}. Thus, we can never get rid of {{g}} without sacrificing functionality, and that means these templates are already shortcuts to common cases, but they can never be exhaustive because Module:gender and number and its companion template {{g}} are just that much more flexible than the other templates can ever hope to be. —CodeCa t 02:09, 4 July 2013 (UTC)

So what? We often use some templates for common cases and other templates for more complex and/or obscure cases (e.g., adjective declension templates). There's no reason not to keep {{m}} and {{f}} and other very common genders' templates as shortcuts, while using {{g}} in more complex (and obscure) cases. How many entries need to specify something outside of {{head}} or another headword or linking template as "masculine animate"? Probably not many, compared to how very many specify things as "masculine". Unlike contexts, which there are a very large (arguably infinite) number of, there are only very few genders which are common across the thousand languages we cover: m, f, n, c, p (and perhaps, once we greatly expand our hitherto tiny coverage of Native American languages, animate and inanimate... but masculine animate seems comparatively rare; how many languages use it, a few dozen?). - -sche (discuss) 02:21, 4 July 2013 (UTC)

Note that the change may eventually save characters, we can use the old gender templates for more important templates, e.g. {{m}} for {{term}} (mentioned term), or {{c}} for {{context}}. Moreover, usually there is no need to use gender templates directly. --Z 21:19, 5 July 2013 (UTC)

I suppose that's right. The gender templates do next to nothing. If the gender annotation isn't handled by the inflection-line template or a user doesn't figure out those templates, there's always good old wikitext formatting. And AF or bot runs can pick those up. I can't imagine that we will soon have a complete user-friendly, error-trapping interface for our entries by default. We just have to make sure that we don't delete contributions just because they don't use templates or have deficient formatting. DCDuring TALK 21:56, 5 July 2013 (UTC)

I've created this template now. It works the same as {{context}} with one difference: the language code is given as the first parameter rather than as lang=. Both templates will continue to work for now, so there is no immediate need to start using this, but it's now available. I am not sure whether to call it "label" or "labels" though. The module is called Module:labels already, and this template technically shows many labels rather than just one label. —CodeCa t 16:38, 4 July 2013 (UTC)

Spelling variations by introduction of repetitive vowels for emphasis.[edit]

It is trivially easy to find CFI-worthy citations for certain interjections containing the addition of repetitive vowels for emphasis. For example, a writer might write pleeeease with extra e's added to indicate the plaintive nature of the plea. See, e.g.:

1965, Kenneth Theodore Mackenzie, The Deserter, page 52:
"Please, Japie, pleeeese," said Gert, beginning to cry.
1996, Ginny Russell, Step by Step, page 24:
As soon as you get home, pleeeese put clean sheets on your bed, loan your snakes to Susie, scrub bath tub, hide the liquor, and call me at work.
2008, Ray Green, The Seventh Sense, page 139:
Oh, God, please, please, pleeeese help me. I'll be good. I won't do it again. I promise. Pleeeese let this fucking market fall.
2011, John Bowers, Star Marine, page 325:
"Oh, pleeeese!" she wailed. "Oh, sweet Jesus! Don't shoot me! Pleeeese don't shoot me!
2013, Sherry Schumann, The Christmas Bracelet, page 93:
"More juice pleeeese." John grabbed another carton of orange juice from the door of the refrigerator.

I have found multiple examples with up to eleven e's (but, oddly, no CFI-worthy hits with more than that). Other very common examples include the addition of o's to "love", "no", and "stop", of e's to "help", and of u's to "you". Should we include these? If so, should we redirect them to the correct spellings, list them as alternative spellings, perhaps as eye dialect spellings indicating that their use suggests an exceptionally forceful use of the word by the speaker? Or should we do something else entirely with them? bd2412 T 19:16, 4 July 2013 (UTC)

It seems help would just about pass CFI with 12 es. I don't think we would be doing anyone any favours (or using our time productively) by including all such forms. It is to be understood that English virtually never has 3+ of the same letter together in a word, and such instances are probably indicating a vocal prolongation, rather than trying to coin a new "word". Equinox ◑ 19:26, 4 July 2013 (UTC)

It's one of those cultural things, like pig Latin and double Dutch, that turn up in speech, but aren't really lexical in nature (although some instances of those have become lexicalized and are included). A similar phenomenon is words that are truncated to indicate interruption: "I keep trying to tell y-" "I don't want to hear it!".

This can potentially occur with any word, with varying numbers of letters. It could get pretty involved documenting all the attestable ones. Perhaps we could create redirects for some of the more common ones, but some languages have orthographies that don't write phonemic glottal stops and/or show length by repetition, so there's at least the potential for conflicts on the shorter sequences. Chuck Entz (talk) 20:13, 4 July 2013 (UTC)

What do you mean by "potential for conflicts"? Also, I see that we have nooo as an "emphatic" form of "no". bd2412 T 20:18, 4 July 2013 (UTC)

We also already have variants of aaah and hmmm. - -sche (discuss) 20:35, 4 July 2013 (UTC)

I was referring to the possibility of a redirect having the same spelling as a regular word in another language. I'm thinking of creating an appendix to document things like this in one place: word games, truncation, abbreviation, sandhi, pronunciation spellings, eye dialect, etc. There are quite a few processes that create spellings that don't show up in dictionaries, but can be hard to find information on. I'm not sure what to call it, though. "Spelling distortions" is the first that came to mind, but that's a bit clunky. Chuck Entz (talk) 20:50, 4 July 2013 (UTC)

Others of "those cultural things" include Internet leetspeak (l33t), funny accents (ze = the), and attempts to spell out dialect (wonderfool). Equinox ◑ 20:40, 4 July 2013 (UTC)

Those, we have entries for. Supposing that I want to use some of my time unproductively, perhaps we should have entries for a few very, very common examples and redirects from less common examples to those very common ones (i.e., an entry for "pleeeease" to which "pleeeeeeeeeeease" redirects). bd2412 T 20:59, 4 July 2013 (UTC)

Redirects sound like the best solution. — Ungoliant ^(Falai) 21:48, 4 July 2013 (UTC)

Redirects can be accompanied by a usage note on the target page indicating that some writers may add an arbitrary number of extra instances of the primary vowel for emphasis (I note that pleeeease with 4 e's gets about 40 times as many hits as pleaaaase with 4 a's). Also, it is worth noting that this seems to be a fairly modern phenomenon, largely restricted to about the last century. bd2412 T 23:00, 4 July 2013 (UTC)

Sounds good, and soft redirects can be used instead of hard redirects in any instances where another language has a valid term spelt the same way (e.g., if "aaaah" is a Waray-Waray word for "the foobar bird"). And the text of the usage note can be transcluded from a master template, so that the language doesn't go out of symc. (Compare the templatised usage notes than a few Latin entries use.) - -sche (discuss) 23:08, 4 July 2013 (UTC)

I like the idea of including the most common as real entries, with all the others displayed as alternative forms of that one, but existing only as redirects. There are families of these, however. For example, puhleeze, puh-leeese, puh-leaze. It would be nice if we had only a single "canonical form" for each of the families.

But, maybe we should leave the redlinks, eg, at [[please]], as training exercises for new contributors. DCDuring TALK 23:28, 4 July 2013 (UTC)

I think we need a new langcode for the Griko language of Southern Italy. It's a Hellenic lect written in the Latin script, and sometimes considered a dialect of {{el}}, although its grammar is relatively different. Generally, there is mutual intelligibility with Greek, but I think the writing system should seal the deal. I would propose a code like grk-gri.

Incidentally, there is also the matter of considering whether Calabrian Greek is a language; some authors say it is, but it is still written in the Greek script and seems a bit less changed to me, so I'm less sure on this one. —Μετάknowledge^{discuss/deeds} 06:52, 5 July 2013 (UTC)

The Wikipedia articles you linked to say that Calabrian Greek is written the Latin alphabet and provide a Greek-alphabet sample of Griko, so unless you're getting your information from elsewhere, you may have gotten that backward. Can "Italy's toe" Greek and "Italy's heel" Greek be considered dialects of the same language, even if they're considered a separate language from Greece Greek? —An gr 16:07, 5 July 2013 (UTC)

Yeah, I think I just had them switched in my mind. Sorry about that inaccuracy. Anyway, some sources do seem to group them together as Italian Greek, but it doesn't seem (to me, at least) to be any better than grouping them with {{el}}, from a linguistic standpoint. From an organizational standpoint, perhaps, although as we add inflectional templates and whatnot, it becomes even more of a organizational problem. —Μετάknowledge^{discuss/deeds} 18:15, 5 July 2013 (UTC)

I can't find much German- or English-language literature on the admittedly specialist subject of the differences between Griko, Bova and standard Greek. There are plausible claims of Doric elements in Griko/Bova, but Wikipedia's note that the two are mostly based on the same Koine as other modern Greek varieties seems correct, and the lexical differences in the small sample provided seem no more pronounced than dreamed of a new parka vs dreamt of a new anorak. Besides, we combine (historical) Doric itself with Attic under the label of Ancient Greek, so Doric-ness would not in itself seem a reason to separate Griko or Bova from Greek. The difference in alphabet seems more significant: because "Greek" refers overwhelmingly to the Hellenic language now spoken in Greece and written using the Greek alphabet, it would be very confusing to have Latin-script "Greek" entries for Bova words, and hence it could be better to give that lect a separate code and L2, if it is typically written in the Latin alphabet.

The question of declension and conjugation tables is interesting, but IMO much broader than just "how should we list both the Griko and the Greek plurals of φοοβαρ?": not just with Greek but with most languages, we tend to show only standard inflections; that's not very descriptivist... but that's a subject for another BP thread, one I'll start soon. - -sche (discuss) 21:07, 9 July 2013 (UTC)

Is Bova preferred over Calabrian Greek? I think at the least we ought to have a code/L2 for "Greek written in the Latin alphabet and spoken in Calabria", and you seem to agree, but the nomenclature still remains to be decided. —Μετάknowledge^{discuss/deeds} 03:10, 12 July 2013 (UTC)

Should we be acting as a database of written works?[edit]

Consider Category:Latin quotation templates and imagine this data being moved into a single module, able to be called from a single quotation template in any entry- something like {{quote|title|passage=...}}. And then extending this approach to all other languages, with all of the bibliographic data in a single place. This would undoubtedly simplify quoting from well known works, but is this within our scope, or is it something better suited for wikisource? Do they collect bibliographic data from copyrighted works? DTLHS (talk) 20:37, 5 July 2013 (UTC)

This would be appropriate in Wikidata. Dakdada (talk) 20:41, 5 July 2013 (UTC)

Wikisource has no truck with copyrighted works at all, but Wikiquote does have quotes from copyrighted works. —An gr 21:01, 5 July 2013 (UTC)

We use quotes from copyrighted texts under fair use exemptions, which allow use of limited amounts of the text in question under the right circumstances. By aggregating these small pieces in one place, we risk assembling enough of the original there to go beyond the limits allowed. If we had a quote for each line of a short poem, for instance, we might conceivably end up with the whole thing in our database. I'm not saying this arrangement would inherently be a copyvio, but we need to be know where the line is so we can take precautions to stay on the right side of it, if necessary. Chuck Entz (talk) 02:22, 10 July 2013 (UTC)

This is actually not an academic or abstract debate; we literally have this exact problem. Specific translations of the Bible are copyrighted, right? We essentially have a few chapters of Genesis in a couple languages (I am to blame for some of this myself). That could conceivably be quite a problem, if your fears are warranted. —Μετάknowledge^{discuss/deeds} 05:53, 11 July 2013 (UTC)

Inconsistent mention of inflected forms[edit]

For a descriptivist dictionary, we're rather inconsistent when it comes to mentioning, in the entries for lemmata, their attested inflected forms.

drown displays only its standard forms; it nowhere links to the past tense form it has in many dialects, drownded, nor to its obsolete past tense forms drownd and drowndd (our entry for drownded currently lists it only as the past tense of drownd, but it's trivial to find it used as a past tense of drown).
sit, on the other hand, does list its obsolete/dialectal past tense form sitten on its headword line... but doesn't list its obsolete third-person form sitteth.
forbid lists its past tense forms forbid, forbade and forbad, but not forbode.
laugh, among the many obsolete and modern forms on its headword line, including laugh'd, quite spectacularly lists "(obsolete)" set apart by commas as if it were itself a past tense form...twice. And given that some of our entries put obsolete tags in the headword line before the obsolete term, while others put the tag after the term, it's not even immediately obvious which term the tag is qualifying. It doesn't list laughest.
streak lists only its standard forms; it doesn't list e.g. streak'd.

Similarly, in other languages, we usually only list the standard, modern inflected forms and not the dialectal ones... which can raise questions if, as discussed in the section above this one on Griko and Bova, the dialectal forms are different from the standard ones.

I'd like us to come up with some standard way of both presenting all common inflected forms of each word in its entry, and not misleading anyone as to the standardness vs dialectality and archaicness vs modernity of each form. What I suggest is this: on the headword line and in the first inflection table, list only the standard, modern forms; then, explain the all of the other (obsolete and dialectal) forms in their own inflection table or templatised usage note...possibly omitting the forms like laugh'd which are just spelling/typographical variants.

What do you think? Does that idea so good enough to start fine-tuning it and making mock-ups? Or would you propose something else? - -sche (discuss) 22:06, 9 July 2013 (UTC)

Well drownd and drowndd seem to me to be just different ways of spelling the the same word: drowned, while sitteth and sitten are forms in their own right. I support mentioning obsolete and dialectal forms, but oppose mentioning obsolete spellings. These should be listed in the Alternative forms heading of the standard spelling. — Ungoliant ^(Falai) 22:24, 9 July 2013 (UTC)

I would be in favor of consistently downplaying non-standard, literary, dialectal, obsolete, and archaic inflected forms by putting all of them under something that:

only optionally displayed and
didn't add to the vertical space taken up by the inflection line except to display something like "± other inflected forms".

I don't think that we need to dedicate precious screen space to usage notes on this matter, which is of interest mostly to specialists. I'd be happy if it was all on a subpage or an appendix, though these options would require changing all of the links to those forms.

The entries for the forms are already available to help someone decode them and provide a home for usage notes of arbitrary length. If we would like to enable comparison among the forms by having them on a single screen the optional display approaches would seem to offer a way of accommodating specialist needs without cluttering the screen and driving away what normal users we may still retain. DCDuring TALK 22:27, 9 July 2013 (UTC)

Mentioning the archaic and dialectal forms in the Inflection/Conjugation heading sounds nice. The lack of mention of -eth and -est forms is a major shortcoming of our English entries! — Ungoliant ^(Falai) 22:35, 9 July 2013 (UTC)

Yet, those forms aren't in current use, except possibly in dialectical pockets. Adding such content right into the inflection heading substantially increases the likelihood of user confusion.

Personally, I'd push for such archaic and alternate information to be 1) optionally displayed, if kept on each individual entry; and 2) more ideally, kept to an appendix. Endings like -eth and -est are largely irrelevant to the modern English language, and as such, I'd really like to avoid cluttering up modern English entries with these historical details.

To clarify, I'm not opposed to having this information somewhere in Wiktionary -- I just think we need to be careful about where we put it. :) -- Eiríkr Útlendi │ Tala við mig 22:42, 9 July 2013 (UTC)

-eth and -est are used in the KJV and in Shakespeare's works, which are still very widely read. Naturally, they should have qualifiers specifying that they are archaic. — Ungoliant ^(Falai) 22:45, 9 July 2013 (UTC)

I think that, as a rule of thumb, we should only include forms that might be at least in somewhat common use in the last 150 years maybe. So the -eth and -est forms would fall out of that, but Dutch plural imperatives (which were still used in written Dutch in WW2) would be shown. —CodeCa t 22:47, 9 July 2013 (UTC)

I don't think it makes any sense to have an appendix of "-est" and "-eth" forms: such an appendix would basically be an enormous list of all verbs, with first "-est" and then "-eth" suffixed to them. What's the point of that? I think it makes more sense to list each form with the lemma it's a form of — and the regularity with which those forms are formed should also make it easy to add them to inflection tables. I do think they should be limited to those (collapsed) tables in an ====Inflection==== or ====Conjugation==== section beneath the definitions, though. They shouldn't be in headword lines, even marked as "archaic" or "obsolete", and neither should sitten and laught, because I agree with Eirikr that that is mostly just confusing. I have encountered people who used "-est" forms, because they couldn't stand that English didn't have as many person and tense suffixes as they were used to, and didn't understand/accept that saying "sittest" made them sound ridiculous! - -sche (discuss) 23:18, 9 July 2013 (UTC)

Appendices organized around the inflectional suffixes are not what I had in mind, rather a single appendix, section of an Appendix, or a subpage that contained full information on all inflected forms of a given verb (and its morphological relatives ?).

I don't know that we can deal with pathological cases such as those individuals you've encountered. Were they contributors we might have to, but otherwise we can ignore them!

I have the feeling that different languages are in different situations. English and Chinese (?) seem to have the worst problems of entry clutter. English does not have inflection tables in which many such complications could be concealed, etc. DCDuring TALK 01:27, 10 July 2013 (UTC)

My comments about an appendix were directed at Eirikr, who proposed one. - -sche (discuss) 01:52, 10 July 2013 (UTC)

Inflection tables for English verbs might be nice, but would they really give much additional value? —CodeCa t 01:30, 10 July 2013 (UTC)

I think they would, especially since early Modern English has more complex inflection then current English. It would be very helpful to someone reading an old text if we helped them learn the whole paradigm rather than forcing them to come back and look up each form as they encountered it. I like the idea of a collapsible box labeled with something like "obsolete conjugation of <insert headword her>", so it doesn't get in the way for those who aren't interested. Chuck Entz (talk) 02:46, 10 July 2013 (UTC)

I think we should mention archaic/obsolete/whatever inflected forms, but I don't think we should list them in the headword line. In other languages, the headword line is for the principal parts, while the ===Conjugation=== section is for other inflected forms. I'd say laugh should just have "laugh, laughed, laughed" in its headword line and sit should just have "sit, sat, sat"; the other things like laughest and sitteth and whatnot—even the modern regular forms like laughs and sitting, which are not traditionally considered principal parts—should be in the Conjugation section, where their {{obsolete}} labels can be less ambiguous and where they don't clutter up the area where your average reader is looking to find out the most obvious information. —An gr 16:20, 10 July 2013 (UTC)

I agree. The headword line is meant for readers who want to learn the verb's basic inflections at a glance. It is suited primarily for users who already know how to conjugate in English and just want to know how this particular verb is conjugated. Apart from a few verbs, the 3rd person singular and the present participle are always perfectly regular, so adding them doesn't really provide this type of user with any more information than they already knew. An inflection table on the other hand can easily list all forms, old and modern, and show their proper relationship to one another, so it is more suited to the few users who want to know more details. —CodeCa t 18:29, 10 July 2013 (UTC)

I see no reason why a modestly inflected language like English should be strait-jacketed into a mold that suits inflected languages. The obsolete and archaic inflected forms are not or great use for encoding and if one has found one's way to the lemma from the inflected form, one has done most of the decoding. All that the presence of the form on the page does is confirm that one is on the right page. In addition the dialectal and non-standard forms are not a particularly good fit with the format of a conjugation. Finally, I believe that this will push Wiktionary further in the direction of putting off would-be monolingual English users - and contributors. DCDuring TALK 17:12, 10 July 2013 (UTC)

@-sche, my thought about an appendix was to include not every single verb with such inflections, but rather to describe the process of inflection and provide a sample table or two showing the different forms. This would be much like what we already have at Appendix:Japanese_verbs or Appendix:Spanish_verbs. We could just expand Appendix:English_verbs to include a section on Elizabethan forms, for instance. Teaching the user how to inflect would be much more useful than requiring them to look up each verb to find the inflected forms.

In addition, I share DCDuring's concerns that we might alienate a portion of our user base if we clutter up our entries with too much ancillary information. I had thought that the Appendices were created precisely to offload such information from the main entries, hence my suggestion for including obsolete and/or dialectical inflection information there. ‑‑ Eiríkr Útlendi │ Tala við mig 18:01, 10 July 2013 (UTC)

I fail to see why monolingual English speakers would be put off by a headword line that is relatively tidy and a Conjugation section beneath the definition that they can easily ignore if they're not interested in it. —An gr 18:55, 10 July 2013 (UTC)

Provided the conjugation section is collapsible, and that all obsolete and dialectical forms are clearly indicated (possibly in separate "Historical forms" and "Dialectical forms" tables?), I might be persuadable. However, why duplicate obsolete and non-standard information on lots of pages, instead of just consolidating it in one place in an appendix? That seems like a lot of busywork for human editors. -- ‑‑ Eiríkr Útlendi │ Tala við mig 19:12, 10 July 2013 (UTC)

I guess for the same reason we duplicate modern and standard information on lots of pages, because we have room and there's no reason not to. Also, including the obsolete and dialectal (not dialectical, which means something quite different) forms that are attested prevents the implication that all verbs have these forms. In other words, by including writeth at write but excluding *faxeth at fax, readers know that fax is not attested with archaic endings. If all we have is an appendix saying that verbs have an archaic 3rd person singular present-tense form ending in -eth, readers don't know which verbs are actually attested with it and which aren't. —An gr 19:34, 10 July 2013 (UTC)

Incidentally, emaileth is citable, though most usage is incorrect (imperative, first-person indicative, noun plural, infinitive, future...) — Ungoliant ^(Falai) 19:41, 10 July 2013 (UTC)

Because we want people to be able to find the forms on the lemma page via the search box, in case someone doesn't want to bother creating a whole alt-form entry for every word that has attested archaic forms. You want to have the form somewhere that the search will see it: I doubt most people would go to the trouble of setting the search to include the appendices as well as mainspace (if they even knew how). Chuck Entz (talk) 03:05, 11 July 2013 (UTC)

I made an example inflection table:

This is the most irregular verb in English, most verbs will have several forms identical. —CodeCa t 21:57, 10 July 2013 (UTC)

There's also wast as in google books:"wast thou", was as in google books:"was you there", wert as in google books:"you wert". Possibly more too, but I can't call them to mind. ‑‑ Eiríkr Útlendi │ Tala við mig 22:14, 10 July 2013 (UTC)

(after e/c) Observations:

If "thou"/"-est" and "you"/∅ forms get separate lines, "-eth" and "-s" (third-person) forms deserve separate lines. Alternatively, the "-est" and "-eth" forms could be listed after a <br> in the same cells as the ∅-second-person and "-s"-third-person forms, with superscript tagging them as archaic... but some users might take that to mean the whole cell was archaic, so perhaps separate lines are best.
We should almost certainly have a different table for be than for other verbs, in recognition of be's complexity and other verbs' simplicity. Consider that "be" has, in effect, two conjugations (one of them mostly obsolete); it has forms like "beest" beside "art" beside "are", "beeth" beside "is"; it has "wast" and "wert"; etc... and that's not to mention dialectal forms.
The past tense subjunctive of be is not archaic. If it were archaic, I would have said "if it was" rather than "if it were", right?

- -sche (discuss) 22:15, 10 July 2013 (UTC)

And, be it ever so seldom heard, the present tense subjunctive isn't archaic either -- just rare. :) ‑‑ Eiríkr Útlendi │ Tala við mig 22:18, 10 July 2013 (UTC)
I like the idea of a separate (collapsed) table for the entire archaic paradigm, so one can see when the archaic forms are the same as the modern ones: most people who try to imitate archaic speech use the -eth or the -est forms for everything. Perhaps we might think about having inflection tables directly under the headword, instead of under a separate header, with the collapsed table labeled as "Inflection of" or "Conjugation of" or "Declension of". I wonder if it's possible to have multiple collapsible tables nested inside a master collapsible table, so that an entry with alternative inflections would take up the same amount of space as one with only one Chuck Entz (talk) 03:05, 11 July 2013 (UTC)

If any change to accommodate this takes on average a single additional line of vertical screen space, it is exactly the wrong direction for such changes. The screen device to display this material should be on the inflection line, not under a separate header. DCDuring TALK 22:42, 10 July 2013 (UTC)

To -sche: I guess you are right. "be" is probably the only verb with a past subjunctive that is still in use, probably because it has a distinct form in the singular. All other verbs have a past subjunctive that is identical to the indicative. The situation in Dutch is not so different, except that the identity is restricted to regular weak verbs in Dutch; irregular weak verbs still have a distinct past subjunctive singular. But that is mainly because Dutch still distinguishes singular from plural everywhere in the verb paradigm whereas in English they have fallen together. In fact, apart from that difference, Dutch and English conjugation really isn't that different, so Dutch can serve as a fairly decent model for a possible English conjugation table. And I think the Old English/Middle English tables were based on Dutch as well. —CodeCa t 20:05, 11 July 2013 (UTC)

Here are some more examples using the same table (more or less): User:CodeCat/en-conj examples —CodeCa t 20:19, 11 July 2013 (UTC)

I would suggest a table like this: User:-sche/en-conj-table/use. I think it's inaccurate to have a separate cell for the subjunctive plural archaic and then call it archaic. Given that statements like "if he were to walk in — if he walked in — right now, I would tell him" are plentiful, I think the most plausible analyses are (1) if it exists as such, the subjunctive plural form is still used, or (2) no English verb other than be has a subjunctive plural form (because that form, and the indicative third-person plural past tense form, merged with the indicative first- and second-person plural past tense forms into a generic past tense form). I think the second option is more sensible, and it's reflected in the first of these mocked-up tables. - -sche (discuss) 22:26, 11 July 2013 (UTC)

Take a look at this table and let me know what you think of what's displayed. We probably want to rewrite the template's guts; I copied the parameter names from CodeCat's table, but we probably want to use different names or make the parameters positional, we may want to code the -eth form as {{{pres}}}eth rather than a separate parameter, and we definitely need to add graceful support for multiple past tense and past participle forms, whether they are common (dreamt vs dreamed) or archaic/dialectal (laughen), since listing all forms is the point of the table. But I'm not asking about the template's guts at the moment, I want to know if the table as displayed is OK or if it's missing any fields. Note that it is intended only for use on most English verbs; complex cases like be and may need to be handled by different template(s). PS, for comparison, look at the small and large tables de.Wikt has of English verb forms. - -sche (discuss) 01:20, 18 July 2013 (UTC)

I made the parameters named because the template was not intended to be used directly in entries. Rather, the template would be filled with forms by another template. Keeping presentation (the table) separate from code/logic (the creation of the forms) is important. —CodeCa t 01:39, 18 July 2013 (UTC)

Clutter in our entries[edit]

@Angr: "because we have room and there's no reason not to"

Yes, our hosts don't seem to complain a bit about storage. We can find locations for any kind of semi-lexicographic context we care to contribute. But users are staying away in droves. Remarkably, some of those who come actually take the trouble to complain about our layout. Most complaints seem to be about how hard it is to find the content that they want, ie, definitions. How do we respond? We provide more and more content other than definitions and fail to significantly improve the definitions we have, even when the faults are obvious. We cater to those who could learn to navigate and customize their way around any complications and give the back of our hand to newbies, ignoring for their expressed complaints. Most websites seek out user response and attempt to adjust the overt appearance of the site (landing pages, ie, our content pages and failed-search pages) to the expressed complaints. Not us, after all, we're volunteers, just doing what amuses us. Don't we have some obligation to serve the broad population of users? DCDuring TALK 20:17, 10 July 2013 (UTC)

We could solve all of these problems by putting everything (and I mean everything, including translations, pronunciations, etymologies, derived terms) under the individual definitions, collapsing sections as necessary and duplication be damned. Dividing entries by parts of speech is ridiculous. DTLHS (talk) 20:27, 10 July 2013 (UTC)

A while ago, someone (Ruakh?) proposed collapsing translations under each definitions, just like quotations. It would make editing the entries harder—unless we used HTML comments to separate each definition from the others by a few lines of whitespace, it could be hard to find each definition amid the ensuing clutter—but it would stop people from changing and removing definitions without updating the translation table glosses, it would stop people having to scroll back and forth between the defs and the tables of long entries like [[line]], and it would reduce the amount of clutter than comes between definitions in multi-POS/multi-etym entries like [[line]]. While recognising that it would take massive effort and would be a great upheaval that it would take a while to get used to, I would support it. I oppose merging POS sections, and I couldn't support collapsing pronunciation info under individual senses, either (how would that even work, in an entry like [[line]])? - -sche (discuss) 20:46, 10 July 2013 (UTC)

@DLTHS: I mostly agree, but it's hard to find an English dictionary that doesn't respect parts of speech as an organizing principle and hard to find an "unabridged" one that doesn't differentiate by etymologies. But pronuncations, etymologies, usage notes, derived terms, synonyms, and translations, let alone antonyms, hyponyms, hypernyms, anagrams, descendants, et al, — though they may set us apart from some other dictionaries — can't overwhelm the core content. DCDuring TALK 20:53, 10 July 2013 (UTC)

You're right about pronunciations, that would get ridiculous very fast and most words share pronunciations between all of their senses. I don't mean to suggest abandoning part of speech as an organizing principal, just that it shouldn't get an entire heading- more like a small label at the beginning of each definition line. DTLHS (talk) 20:56, 10 July 2013 (UTC)

@DCDuring: One thing we could do about semantic relations is move all of them to Wikisaurus, and then have only one ====Semantic relations==== header with a link to any relevant Wikisaurus pages. The way we currently duplicate lists of synonyms and antonyms in entries and in Wikisaurus results in both clutter and in the lists falling out of sync. - -sche (discuss) 21:02, 10 July 2013 (UTC)

For interest and comparison, here is how dog is presented in Chambers Dictionary (taken from the CD-ROM edition from 5-10 years ago, whose layout is more or less identical to the print edition — hence the abbreviations and general "tightness"). Related terms (dogged, dogger, doggess, etc.) are listed on separate lines underneath, but still within the dog entry and under that headword. Equinox ◑ 21:16, 10 July 2013 (UTC)

dog¹
n a wild or domestic animal of the genus Canis that includes the wolf and fox; the domestic species, diversified into a large number of breeds; a male of this and other species; a mean scoundrel; [...etc.]
adj and as combining form of dogs; male, opp to bitch; spurious, base, inferior (as in dog Latin).
adv esp as combining form utterly.
vt (dogging; dogged) to follow like a dog; to track and watch constantly; to worry, plague, infest; to hunt with dogs; to fasten with a dog.
[Late OE docga; cf Du dog a mastiff, and Ger Dogge]

I oppose putting obsolete English spellings such as hopeth on the inflection line. Thus, I support removing obsolete spellings from the headword lines of such entries as laugh. I imagine a heading somewhere down at the bottom of the entry like "Obsolete forms", where both obsolete inflected forms and obsolete alternative forms could be listed. Thus, in knowledge, "Obsolete forms" would list knolege, knowlage, knowleche, etc., while, in laugh, it would list laugh'd, low, and the like. If listing inflected forms, the list could start with "Inflected forms:" or the like, to make it clear, but still under "Obsolete forms" headings. --Dan Polansky (talk) 21:08, 10 July 2013 (UTC)

@-sche: If a large portion of the most polysemous English entries were in good shape, then we could use that reliable structure as an organizer for the rest of our material. I was recently lamenting to myself how hard it is to attempt to improve the structure of the definitions in English entries because they themselves are mostly unstructured lists, without even historical sequence or order of frequency as an organizing principle, let alone some kind of semantic principle.

I guess I misunderstood what "under the definitions" means to others. I was only thinking of pushing the content of non-definition sections below the associated Etymology-PoS groups of definitions in collapsible bars. As to the more radical approach of pushing all content under definitions: if such a thing could be made available optionally to users via different SQL and php from the server or by JS on the client side, that would be great.

Obviously, different editing tasks are facilitated by different UIs. Our current UI seems to facilitate correcting formatting "errors" and mistakes of omission. As soon as one is working on material that does not fit on one screen, the UI is deficient. It seems to me that, to a great extent, we don't want structural modification of polysemous English entries, no matter how poorly organized - and, therefore, hard to edit - they are. DCDuring TALK 21:15, 10 July 2013 (UTC)

Dan, what about something like what the Dutch entries use? See lopen. Singular and plural can be collapsed for English (except for the present singular indicative). —CodeCa t 21:23, 10 July 2013 (UTC)

CodeCat, what part of lopen#Dutch entry do you mean? Where does it list obsolete inflected forms, and what are they? --Dan Polansky (talk) 17:42, 11 July 2013 (UTC)

The plural imperative and the subjunctive are archaic and no longer in use, at least not commonly. The subjunctive is found occasionally like in English, but nobody uses the plural imperative anymore except if they want to sound deliberately old-fashioned (like "thou" or "ye" would do in English). —CodeCa t 18:42, 11 July 2013 (UTC)

So you are pointing to archaic forms being listed in a collapsible inflection table in "Conjugation" section of lopen#Dutch. I would not like to see a Czech conjugation table overflooded by obsolete spellings. Are you proposing to make an English collapsible conjugation table with all sorts of weird forms, as others have proposed? I am not very enthusiastic about that. In any case, I am enthusiastic about removing obsolete inflected forms from the headword line. --Dan Polansky (talk) 19:28, 11 July 2013 (UTC)

I posted an example table for English in the discussion just above. —CodeCa t 19:53, 11 July 2013 (UTC)

Usage examples and italicization[edit]

Wiktionary:Example sentences states that all example sentences should be italicized, even those in non-Latin scripts such as Cyrillic. Should there be an exception for certain scripts, or should Module:usex simply apply italics to everything? DTLHS (talk) 16:50, 10 July 2013 (UTC)

Italicized Cyrillic is IMHO much less readable. I'd support showing usexes in different font colors, italicized or not. --Ivan Štambuk (talk) 17:28, 10 July 2013 (UTC)

Similarly, italicized Japanese is often illegible. The format Japanese editors have been using for usexes, when entered without the template, is:

First line using kanji (if any) -- not italicized

Second line giving the kana-only rendering (if different from first line) -- not italicized

Third line giving the romanized rendering -- italicized

Fourth line giving the English translation -- not italicized

When using the {{usex}} template:

{{usex|lang=ja|First line with kanji|tr=Second line in kana<br/>''Third line in romaji''|t=Fourth line in English}}

Adding automatic italicization to this template would be a substantial legibility problem for Japanese, unless the lang=ja argument turns off such italicization. ‑‑ Eiríkr Útlendi │ Tala við mig 17:52, 10 July 2013 (UTC)

I would prefer it if the parameters for this template were changed to match those of {{l}} more closely. Language code first, then the phrase, then the translation (optional if {{{1}}} is "en" and maybe "mul"). Transliteration would still use tr=. —CodeCa t 18:33, 10 July 2013 (UTC)

@CodeCat, I'd be fine with that, provided a bot could be used to handle the conversion of existing wikicode. ‑‑ Eiríkr Útlendi │ Tala við mig 19:07, 10 July 2013 (UTC)

Typographically, italics are suitable only for Latin and Cyrillic; no other writing system should ever be put into italics. I disagree that italicized Cyrillic is much less readable; surely anyone who can read Cyrillic well enough to be reading whole sentences in it in the first place can read it in italics with no difficulty. —An gr 19:00, 10 July 2013 (UTC)

I would only leave Roman script italicised. Less than advanced learners are known to misread a few letters. Cursive Cyrillic looks markedly different from normal, e.g. cursive "т" - т (t) looks like Roman m. From Wikipedia: АВДЕИКНОРСУХавдезиопрстухч, cursive: АВДЕИКНОРСУХавдезиопрстухч looks almost like Roman ABDEUKHOPCYXabgezuonpcmyxr (it's about various hand-written styles but partially applies to computer italics as well). Of course, it doesn't apply to native Cyrillic users and advanced learners. --Anatoli ^{(обсудить}/^вклад) 12:47, 11 July 2013 (UTC)

Well I didn't know Cyrillic at all prior to coming to Wiktionary, and it took me some ~15 minutes to learn it (it is easy because letters are so similar to equivalent Latin and Greek script letters). However, several cursive Cyrillic letters are identical to completely different Latin and Cyrillic script letters, which kind of confuses your brain while reading it in running text, causing it to "stop" periodically when you subconsciously interpret e.g. т as /m/ instead of /t/. You need a lot of practice to achieve complete reading fluency (reading entire blocks of words at once without any misinterpretations of letters) in italicized Cyrillic script, which many learners of Russian, etc. with an English-language background don't have. It's not about ability to comprehend it, but it's a needless PITA. --Ivan Štambuk (talk) 22:47, 11 July 2013 (UTC)

I noticed that {{usex}} acts a bit strange when it comes to quotations. See 𐌷𐌰𐌻𐌻𐌿𐍃 for example. —CodeCa t 12:13, 11 July 2013 (UTC)

usex assumes that you have used "#:" before it (the template is supposed to be used in usage examples). --Z 13:09, 11 July 2013 (UTC)

I see. It does seem like a shame not to use it for quotations as well, they use the same format don't they? —CodeCa t 13:14, 11 July 2013 (UTC)

Yes. Users should be able to change that default behaviour; even in usage examples we sometimes need to use "#::" etc. --Z 13:18, 11 July 2013 (UTC)

No. Actually usexes and quotations do not have the same format, and they should not. Quotations are preceded by citation information on the previous line, and so always have "#*:" before them, because the citation of the source has "#*". The distinction is deliberate so that (a) readers can distinguish the made-up examples from the published data, and so (b) quotations can be automatically collapsed while leaving usexes visible. If the two had the same format, then this wouldn't be possible. --EncycloPetey (talk) 18:53, 12 July 2013 (UTC)

But if the extra first line for quotations is the only difference, then everything aside from that can use the same format. A quotation is then nothing more than a usage example with extra info. —CodeCa t 18:57, 12 July 2013 (UTC)

But that isn't the only difference: "#:" and "#*" are not the same characters; italicized and non-italicized text is not the same. And a quotation is not simply "nothing more than a usage example with extra info". A usex is a made-up example; a quotation is firm data that demonstrates sense and usage. They are fundamentally different things. --EncycloPetey (talk) 19:04, 12 July 2013 (UTC)

Using things like #:: doesn't allow for easy nesting of elements. But the wiki also supports HTML, so we can use <dd> instead of : . —CodeCa t 13:29, 11 July 2013 (UTC)

Many readers of Cyrillic and RTL scripts don't like italics at all. If you are going to italicize them, I suggest to do that using a CSS class. --Z 13:24, 11 July 2013 (UTC)

Please notice two discussions above for voting: #Vote for interlink correction and the Vote for separation of Hebrew from other same alphabet languages. Thanks Pashute (talk) 15:36, 11 July 2013 (UTC)

I've noticed them, but for both proposals I don't understand what they are. Hence, you're the only person to vote. Mglovesfun (talk) 16:43, 11 July 2013 (UTC)

Phrasebook development, organization, unificationity[edit]

Im an old hat Wikipedian checking in here to deal with a curiosity of mine, namely phrasebook(s), and Wikimedia's status in terms of their development. I know that if I want a developed phrasebook I currently need to go to Wikibooks, which has quite a few, although they are not without their issues (distanced from Wiktionary, disorganized, uncorrelated etc.). And I'm happy to see that Wiktionary has some phrasebooks listed here, although these are not without their issues, and apparently these survived at least one attempt to abolish them altogether.

I'm suggesting that we form a policy with regard to phrases included in Wiktionary which is apparently more liberal than what is currently allowed, but still fitting with the constraints of what we traditionally have called a "diction-ary." For example, in a recent proposed phrasebook CFI (criterion for inclusion), the proposed criterion was for each phrase to be listed in at least three print dictionaries or phrasebooks.^* My line of thinking, and my line of inquiry here deals more with the interlingual aspect of phrasebooks, such that I might propose as an alternative, that any phrase which has direct equivalents in two other languages satisfy the criteria for inclusion. I am tending less toward thinking in terms of the current organization of

..which is fine.. but then adding to that an extra categorical dimension of something like:

Phrases
- English phrases
- Chinese (Mandarin) phrases
  - Chinese (Mandarin) phrasebook
- Spanish phrases

etc.

At some point it would make sense to unify existing phrasebooks, which is in fact what brings me here to Wiktionary, and to express my thoughts on this matter. I think the fact that phrasebooks are confined to Wikibooks does not help their development, or their unification. Hence I think their phrasebooks could be ported here. But then I would agree with some who would argue that a phrasebook naturally can be more developed than what is required for a dictionary, and therefore there must be strict constraints on what is admitted and what isn't.^**

The idea I'm proposing is that Wiktionary undertake creating a Unified Phrasebook, for which all included phrases must belong in all (or most) languages, and where certain liberties are taken with translation, then these are explained.

Unified Phrasebook
- Common phrases
  - in English
  - in Chinese (Mandarin)
  - etc.
- Situational phrases
- Idiomatic phrases (with translated equivalents)
  - in English
  - in Mandarin
  - etc.

This would be in the spirit of a dictionary and not just a Wikibook, although there should always be some overlap between Wikimedia projects, and taking this on would no doubt help to unify and organize Wikibook's divergent phrasebooks. I think where words and translations are concerned Wiktionary can play a vital role in creating a Unified Phrasebook, as well as various well-developed language phrasebooks, which would well-serve both the world and all of humanity. Respects, -Sativen Kuni (talk) 23:14, 11 July 2013 (UTC)

^* (I don't think this is unreasonable given all of the print material available, but the mechanics of such a hard rule tend to chill wiki development, and therefore hard and fast rules should be replaced with something better. In reality, I would hope that should such a hard and fast rule be implemented, it would be only after such pages have been given time to develop and the matter deliberated on talk).

^**To this the straightforward argument goes that Wiktionary is not a traditional paper dictionary and therefore is not obliged to follow any of paper's limitations, except those which serve to constrain its purpose to a singular mission at hand. And the mission at hand in Wiktionary's case seems for the most part accomplished. Hence it might serve this community to do something with itself which is related to its core task, and yet is something reasonably new. -SK

I would suggest, in the other direction, that Wiktionary send all of its phrasebooks to Wikibooks, where books belong. - -sche (discuss) 23:21, 11 July 2013 (UTC)

This view is not necessarily the view of other Wiktionarians. The vote on "removing phrasebook" failed and the phrasebook continues to get developed. --Anatoli ^{(обсудить}/^вклад) 23:56, 11 July 2013 (UTC)

A dictionary is also a book, isn't it? :) —CodeCa t 23:24, 11 July 2013 (UTC)

Indeed. :) ‑‑ Eiríkr Útlendi │ Tala við mig 23:40, 11 July 2013 (UTC)

All we need is for someone to actually do the work required for a serious effort. The last major spurt of activity was embarrassing, as are many of the entries that remain. DCDuring TALK 00:15, 12 July 2013 (UTC)

A dictionary is indeed a "book!" Or rather it's a 'web'-'site!' (Or rather it may be either a book or a web-accessible database. Or rather it is what is contained in such mediums that is the diction-ary). The term "book" itself has some historical usage - the Bible itself was once called a "book" ;), even when it was all written in scrolls! Of course the term "phrasebook" has its own peculiarities, such that its essence is the term "phrase" and the term "book" is an affix that indicates that the object is 'a compendium of information' about phrases. Can a diction-ary be also a compendium that deals with phrases? No! A diction-ary must be about dic-tion, and not anything else. A phrasebook, or a compendium of individual language phrasebooks would by necessity belong at a phrase-ary. And in any case if Wiktionary were to do anything different - anything outside the strict confines of a diction-ary, we would have to call Wiktionary by a different name, perhaps even something better. Regards, :) -Sativen Kuni (talk) 00:45, 12 July 2013 (UTC) Oh, Jeez that Blackadder clip was good. -SK

Sorry but your posts are a bit too wordy, they sound like slogans. It's not clear to me what you need. Importing whole phrasebooks completely may not sit well with opponents of the phrasebook and may get into RFD (deletion) process straight away, especially repetitive, vulgar, rarely used, unnatural or otherwise bad phrases. You can try adding appendices (carefully). We already have a phrasebook structure (not a perfect one) split by languages and existing appendices. We don't know you yet, we've seen you talking but we haven't seen you working. --Anatoli ^{(обсудить}/^вклад) 01:10, 12 July 2013 (UTC)

Sorry, that last bit was mostly having fun with the idea of what a "book" is and what a "dictionary" is. As for me, I have 45k edits on en.wiki since 2002. I have edited here over the years usually without logging in, but not too many edits overall - no more than 100 edits or so. Now with regard to the idea of a phrasebook, in short I am proposing that Wiktionary organize a core phrasebook which stays constrained in accordance with a formal approach, and Wikibooks will assist. Wikibooks at the same time builds upon that core phrasebook to better develop its own phrasebooks, and feeding back to Wiktionary core phrasebook(s). A synergetic idea, one where the overlapping purposes of the dictionary and the open book project get together. -Sativen Kuni (talk) 01:45, 12 July 2013 (UTC)

DC, I can step in and do what I can. Organization is a big part, and in order to do it right the whole idea from conception to construction has to be sensible and agreeable. Once the structure is there the rest is just filling things in as we go on. I think things are halfway there - Wikibooks (whom I will invite) has a lot going for it, but note that for each separate language book there are a different set of goals and constraints. At least they all have the guiding principle of being a useful compendium. The only thing we have to do is figure out how to put it together in a more organized and useful way. K'plah -Sativen Kuni (talk) 02:28, 12 July 2013 (UTC)

Job vacancy: FWOTD-setter[edit]

Foreign Word of the Day is now "hiring" an official co-FWOTD-setter. While anyone who knows what he is doing can set a FWOTD, we need someone who can commit to making sure every day has a FWOTD (and take the blame when things go wrong).

Requirements:

Must be able to cope with the fact that next to no one in the world cares about FWOTD.
Since not enough people nominate words and even those who do often nominate entries without the requirements (pronunciation and a citation,) you must be able to actively seek out, pronounce and cite words.
Thus, knowledge of phonetics is useful.
Must not be Wonderfool.

— Ungoliant ^(Falai) 04:20, 12 July 2013 (UTC)

I'll do it. Since I'm chiming in here I may as well do something useful, and curious and novel foreign words are quite interesting to me. Plus I can deal with IPA, adapted IPA (adapted to English phonology), some translation breakdowns, some etymology, etc. Hire me. -Sativen Kuni (talk) 02:10, 13 July 2013 (UTC)

Great! I'll send you a message explaining the deal. (You're not Wonderfool are you?) — Ungoliant ^(Falai) 02:18, 13 July 2013 (UTC)

Treatment of Kurdish varieties[edit]

As far as I know, there are three or four recognised varieties of Kurdish, called Kurmanji (Northern), Sorani (Central), Kermanshani (Southern) and sometimes Laki which is also often grouped under Southern Kurdish. Currently, our Kurdish entries are all placed under a common Category:Kurdish language. However, ZxxZxxZ argues on User talk:ZxxZxxZ that the dialects are different enough to be considered separate languages. The Wikipedia entry corroborates that as well. Furthermore, there is a difference in script: Kurmanji, the most spoken variety, uses Latin script, while the others are written in an Arabic variety. This means in practice that different varieties necessarily require a different entry because they are written in another script. So should we retire the code "ku" in favour of the codes for these varieties? —CodeCa t 18:45, 12 July 2013 (UTC)

All this (the "different enough to be separate", different scripts) is analogous to Serbo-Croatian. I'd recommend that we treat Kurdish the same way we treat Bosnian/Croatian/Macedonian/Serbian. --EncycloPetey (talk) 18:48, 12 July 2013 (UTC)

Same is with Hindi/Urdu, as well as Romanian/Moldovan, yet we treat them as "different" languages. In Serbo-Croatian it becomes problematic because Bosnian/Serbian/Montenegrin accept both Cyrillic and Latin script, and Croatian is Latin-only, so there would be massive duplication of content if we decided to split them. Hindustani and Romanian-Moldovan link to one another in headword line, so they are de facto treated as a single language, even though they are under separate L2s.. If the only major difference between these Kurdish varieties is script, then perhaps the best option (not to hurt anyone feelings and all that..) would be to treat them as different languages, but link to each other in the headword line or in a box similar to {{fa-regional}}. --Ivan Štambuk (talk) 19:44, 12 July 2013 (UTC)

Moldavian was merged into Romanian by vote; its code and L2 header have been retired. No headwords should link to it... - -sche (discuss) 20:31, 12 July 2013 (UTC)

Whoops. But the point remains, in practice there it makes no difference in treating them as "different languages" or not if the only major difference is script. --Ivan Štambuk (talk) 20:38, 12 July 2013 (UTC)

WP lists the following differences:

The passive conjugation: the Sorani passive morpheme -r-/-ra- corresponds to -y-/-ya- in Gorani and Zazaki, while Kurmanji employs the auxiliary verb, come;
a definite suffix -eke, also occurring in Zazaki;
an intensifying postverb -ewe, corresponding to Kurmanji preverbal ve-;
an 'open compound' construction with a suffix -e, for definite noun phrases with an epithet;
the preservation of enclitic personal pronouns, which have disappeared in Kurmanji and in Zazaki;
a simplified izāfa system.

These don't seem significant enough to consider them different languages. Analogous ones can be found between European and Brazilian Portuguese.

However, Kreyenbroek mentions: "For example, Sorani has neither gender nor case-endings, whereas Kurmanji has both". This seems more serious, but it might be just a characteristic of informal Sorani (compare how informal Brazilian Portuguese doesn't have plural nouns.) I note that دیوار and شیر have genders specified.

We really should wait for a native speaker's (User:George Animal) comment before doing anything. — Ungoliant ^(Falai) 19:18, 12 July 2013 (UTC)

That list doesn't contain all of the differences, and Sorani and Kermanshahi don't have gender at all. BTW, the differences between Kermanshahi and Kurmanji is even much more. --Z 19:22, 13 July 2013 (UTC)

All of the comments pointing out how similar the varities of Kurdish are make me wonder how we justify considering Nynorsk and Bokmal separate languages... - -sche (discuss) 20:38, 12 July 2013 (UTC)

ZxxZxxZ has now started creating entries in Central Kurdish (Sorani), because the language code for it was never deleted or disputed. —CodeCa t 14:55, 19 July 2013 (UTC)

Usexes from subtitles[edit]

Particularly those extracted on glosbe. These contain many common situational dialogues that would be very useful to have in entries. Translations themselves are community effort and thus free, but what about the original English subtitles? Are such sentences (presumably part of the movie script) themselves under some kind of copyright? Their database is huge (for Croatian alone there are 10 million pairs of usexes en-hr) and it could save a lot of time for editors. --Ivan Štambuk (talk) 23:17, 12 July 2013 (UTC)

It would be the same situation as quoting from a book. As long as there is proper attribution and we don't host the entire thing on wiktionary it should be fine (subtitles definitely have some copyrighted status, enough that it would be impossible to host them on any wikimedia project). DTLHS (talk) 23:59, 12 July 2013 (UTC)

Subtitles suffer from the need to translate words and phrases of another language without taking longer than the original, and without any explanatory text / footnotes. It takes someone fluent in both languages and especially skilled in the language of the subtitles to do it right. Instead, the economics of the movie industry dictates a result that butchers both the dialog and the language of the subtitles- the "all your base are belong to us" phenomenon (that's from a video game, but the same principles apply to both). I'm skeptical of using them to illustrate normal usage. Chuck Entz (talk) 00:46, 13 July 2013 (UTC)

Even English subtitles from English language movies don't always match the spoken dialogue. Lines may be omitted, changed, or otherwise butchered. I've seen some truly hilarious examples. One egregious case that comes to mind (and pulled from the DVD for this comment): Spoken: "There must be about a dozen wrecked spaceships out there." Subtitled: "There must be about a dozen red spaceships out there." In another instance I've seen recently, "Moon water" became "Need water" because the person doing the subtitles didn't get the imagery and themes running through the program, but I have the advantage of additional resources that allowed me to very that "Moon water" was correct. So, I agree with the general skepticism noted above, but also agree that a fluent speaker in both languages can better make the determination in a particular instance. --EncycloPetey (talk) 01:38, 13 July 2013 (UTC)

Yes, there's nothing wrong with using subtitles to illustrate usage, as long as they've been reviewed by a native speaker. Let's not go crazy and start importing the entire database automatically though. DTLHS (talk) 01:54, 13 July 2013 (UTC)

glosbe seems like an amazing resource, just need to know how to it properly. The translations are not literal, just conveying the moods, there are also many mistranslations like in Google Translate, so one needs to know the target foreign language to be able to use correctly. --Anatoli ^{(обсудить}/^вклад) 04:05, 13 July 2013 (UTC)

Well you can download the entire OpenSubtitle database used by glosbe in any number of language pairs, however you get two giant text files with lines in one file in one language corresponding in translation to same (by order) text lines in the second file, without proper movie attribution. However, there is a separate search interface that enables tracing any particular word/phrase. Yes there are many errors in the translations, and everything needs to be checked first. I'm not saying that usexes should be imported en masse mechanically and without reviewing. What I'm asking is whether it's appropriate (in terms of copyright and all) for me to cherry-pick several (perhaps dozen at most) usages from glosbe/OpenSubtitles databases, rectify and modify them appropriately, and use them without attribution as usexes at Wiktionary entries. It would save me a lot of time. --Ivan Štambuk (talk) 12:39, 13 July 2013 (UTC)

Thesaurus[edit]

Why not? At bottom of each item, ideas words and terms linked to this one or its antonym. Would assist anyone who is trying to think of a word. The thesaurus section could be hidden, and shown only upon request (like the translation section).

Its use could be "voted" with an "easy +-" button process, the way categories are added and removed on some websites, (and similar to the interlinks system, so that highly used terms are moved up. And also voted against - if they have nothing to do with a word, or if the linked word is controversial in this context.

Has this been discussed before? Pashute (talk) 15:34, 11 July 2013 (UTC)

You mean Wiktionary:Wikisaurus? Also, it's July now. —CodeCa t 15:47, 11 July 2013 (UTC)

Yes, thanks!! I started a discussion on that talk page. Why do you say: 'its July now'? Is there a problem with my signature as you see it. The previous entry says it was signed at 15:34 (that's 19:34 my local time) on 11 July 2013 (UTC). Pashute (talk) 17:48, 11 July 2013 (UTC)

CodeCat is referring to that fact that you've posted this to the June subpage of the BP, rather than the July subpage. - -sche (discuss) 18:16, 11 July 2013 (UTC)

Moved, thanks! Pashute (talk) 23:43, 13 July 2013 (UTC)

OK, I'm proposing an inline thesaurus, where each definition has a new collapsible thesaurus section. Each term or phrase in the thesaurus will be preceded by 'TH:' and end with |n where n is a number for storing "usage" as follows: The users can "vote" for the word or phrase they were looking for, pushing up the usage of that thesaurus-link, and allowing for the creation of a web cloud view.

The 'Thesaurus' section will include words and phrases that are not necessarily defined in the Wiktionary, and possibly linked to other wiki's. For example the term Word in its definition as software, can link to 'word processor', to w:Microsoft Office or w:Google Docs. For its definition as 'part of a sentence' it can link to non synonymous terms such as sentence, discussion, write, saying etc. which in turn can lead to the next level of links such as write->pencil, discussion->newspaper etc.

The Wikisaurus space could then use this information as its database, and show the thesaurus entries for all the definitions of a term or phrase. Most of the benefits of the Wikisaurus described in Dan Polansky's page will be preserved, and all the information about words will be stored in one place. It could also show several levels in a "star view" or in a tag cloud (with the user selecting how many levels they want included) - in the future. Pashute. Its easy to implement, and will immediately even before the views, add to the popularity of Wiktionary in all web searches. (talk) 10:41, 14 July 2013 (UTC)

Old Church Slavonic in Cyrillic/Glagolitic[edit]

Original discussion has been going on here.

Some background: Old Church Slavonic (OCS) is a language attested in a set of manuscripts that are usually called the OCS canon. These manuscripts are written in two scripts - Glagolitic and Early Cyrillic. Glagolitic part of the canon is older and larger, and for most of the Cyrillic monuments can be shown that they stem from Glagolitic originals. Back in 2007/8 I created many OCS entries in Cyrillic and a few of them in Glagolitic that redirected to one another as mutual alternative forms. (NB. Many of those Cyrillic-script entries are "wrong" because the newer version of Unicode 5.1 added proper support for Old Cyrillic letters). Recently CodeCat has started doing some cleanup and expansion of OCS entries, with Glagolitic spellings being redirected to Cyrillic (like this) as alternative forms via the {{cu-Glag spelling of}} template in order to reduce duplication of meanings and etymologies. I object to that kind of redirection as because:

neither script is more "proper"
OCS spellings in the MSS have many variations. Sometimes Glagolitic texts are transcribed into Cyrillic in the dictionaries by substituting some special symbols as conventions because scripts do not map directly 1-1 in order to preserve the original spelling. Each particular spelling is important, as it is attested, on linguistic and paleographic grounds. We should have both normalized entries in a specific scripts, and all of the typographical variations in each. What is an original attestation and what is an unattested Glagolitic/Cyrillic transcription should be clearly marked.
We're dealing with a well-defined and limited vocabulary of a few thousand words, not an open-ended set with infinite combinations. Any kind of duplication would be finite in scope.
Giving priority to e.g. Cyrillic could be treated as blasphemic for Slavicists not coming for Orthodox countries where Cyrillic script is native (and where transcription to Cyrillic as opposed to Latin is more common). Wiktionary should not be making value judgements on what is the "true" spelling. We must be neutral on such matters.
Mirroring the disputed content in entries (that would mostly be sections for etymologies, meanings with possible citations, as well as references) could be easily done automatically. I volunteer to do all that myself.

The question is - whether to soft-redirect Glagolitic spellings to Cyrillic via the aforementioned template or not, with the issues I've listed in mind. Since CodeCat and I cannot agree on the matter (practicality and efficiency vs. cultural issues and preciseness), we ask community for more input. --Ivan Štambuk (talk) 20:52, 14 July 2013 (UTC)

I'm not sure I agree with point 2. Normalisation of spellings is quite common, and we don't make note of this anywhere for any language. Old English entries are placed on a normalised lemma with others linking as alternative spellings. In Old Norse, it's even the norm to normalise among scholars, and barely anyone even uses the originally attested spellings. I'm not saying we should be excluding attestable spellings of course. But for the sake of efficiency and consistency, we should place the definitions on the normalised/most common/etymologically most original variety. By that last point I mean that if a variety is attested spelled with both ь and є then we should normalise to ь. —CodeCa t 21:05, 14 July 2013 (UTC)

Regrettably (in terms of practical application) we should probably have full entries for both as we do for Serbo-Croatian. Unless there's a convincing reason to prioritize Cyrillic. I also don't think that the fact that Cyrillic is now more widely understood is a valid reason. Mglovesfun (talk) 21:10, 14 July 2013 (UTC)

I agree with Ivan Štambuk, because Slavicists from different Slavic countries have different preferences in this matter. It's like other languages such as Serbian, Ojibwe, and Yupik, where there are two different alphabets in use, the alphabets do not map directly 1-to-1, many users insist on one or the other (and may be literate in only one of the alphabets), and especially since in this case there is a limited lexicon and Ivan Štambuk has volunteered to do it himself. —Stephen ^(Talk) 21:16, 14 July 2013 (UTC)

CodeCat, original spellings are very important to paleographers and linguists. That's why we have facsimile editions of OCS canon manuscripts that are read by every student taking OCS classes, and not merely scholarly transcriptions into Latin. Every little dot, dropped sound, preference for a particular typographical variant or even the shape of letters (e.g. transitions from angular to round Glagolitic) tells us something on the document's history. The are many Slavic literary traditions, schools and cultural milieus and each deserves an equitable treatment. Dictionaries such as Старославянский словарь по рукописям X-XI веков come with very large introductions on the conventions used to lemmatize, and every headword carefully lists all of the variant forms, and where they are attested. These manuscripts haven't survived centuries so that we merely redirect an attested Glagolitic word into an artificially constructed Cyrillic equivalent. This efficiency and consistency argument is being (ab)used way too much. --Ivan Štambuk (talk) 00:11, 15 July 2013 (UTC)

What would a lua-cized translation template look like?[edit]

Assume we merged {{trans-top}}, {{trans-mid}}, {{trans-bottom}} into a single lua module. Some obvious features this would make possible would be custom sorting of languages, automatic verification of language names / translation format, and automatic nesting of dialects / scripts. Technically we could keep the same format (with {{t}} et al) and write the module around that, but if we're rethinking things, what format would be easily parseable (for lua and humans), fast (-er than the current implementation?), and allow whatever additional features are desired? DTLHS (talk) 23:41, 14 July 2013 (UTC)

With Lua we can do things like this :

  {{translations|sense=Unit of language  |it = [[parola]] ''f''  |fr = [[mot]] ''m''  }}

  {{translations|sense=Unit of language  |Italian = [[parola]] ''f''  |French = [[mot]] ''m''  }}

This would be more efficient performance-wise, compared to a list of independent templates calls. Compare fr:User:Pamputt/eau (current wiki code, similar to en:) with fr:User:Darkdadaah/eau (Lua-cized table): the Lua version is at least 8 times faster in that case. Dakdada (talk) 08:04, 15 July 2013 (UTC)

More efficient at what it's doing, yes, but the above suggestion eliminates many of the features we currently use in the Translations sections, such as external links to other Wiktionaries. The above example is also not alphabetized, an issue that (in order to keep it correct) would require rewriting a lot of all the tools we currently have for checking this and for automated editing. It would also need to be able to handle things like Ancient Greek and Modern Greek, which are grouped and indented when both are present). It would have to handle the automated transliterations of certain languages. It should use the standard gender calls, not simple italicization. It would need to link to the correct language section of the linked entry. Etc., etc., etc. It's an interesting idea, but the proposal would need a LOT more work to show that it's even feasible. --EncycloPetey (talk) 17:02, 20 July 2013 (UTC)

Here's a version with more features: User:DTLHS/sandbox (Module:User:DTLHS/translations). As you can see there is support for nesting both by script and by language. It also uses code from the existing translation module. DTLHS (talk) 03:22, 21 July 2013 (UTC)

The German Wiktionary long ago enabled a feature called "Stabilversionen". I think the official English name is "flagged revs", but I'm going to call it "patrolled revisions" because I think that's clearer. The point is this: as usual, anyone can edit any article, but when people go to an article, they are initially shown the last diff that a trusted user has marked as vandalism-free, rather than the most recent diff. Readers can optionally click to see the most recent diff, and if they go to edit the page, they are of course shown its current contents. Patrollers are shown the status of pages in their watchlists, in user contributions lists, and on pages themselves. All unpatrolled pages are also stored in a central log. The feature basically removes the urgency of patrolling and allows it to be done at leisure. As it is now on en.Wikt, if someone misses a bad edit while patrolling, it may be months or years before it is noticed. (I recall someone finding a months-old advert for a band in one of our entries.) So I ask: do we want to enable this feature here? - -sche (discuss) 02:08, 15 July 2013 (UTC)

It certainly does seem useful, but I hope there is also a way to keep track of the backlog. We don't want entries to have useful changes for months before we approve them. —CodeCa t 02:15, 15 July 2013 (UTC)

The log of all unpatrolled pages is sorted, with the oldest unpatrolled revisions listed first. On de.Wikt, 79 diffs have gone unapproved for more than a year, often because they are changes to Swahili noun classes or to Arabic vocalizations, etc — things there aren't many people on de.Wikt to check. 94.09% of de.Wikt's entries have had their most recent diff patrolled. And en.Wikt has more active users than de.Wikt who could patrol, and has users from more linguistic backgrounds — you happen to know about Swahili noun classes and several other people know about Arabic vocalization. Thus, we should be able to process things more quickly than de.Wikt. After all, we (read: SemperBlotto) already do(es) process most things using our current patrolling setup... this would just prevent things from slipping through cracks. - -sche (discuss) 02:58, 15 July 2013 (UTC)

Strong support. And if you really do have Swahili noun class issues at de.wikt, I just started learning Swahili and I may be able to help... —Μετάknowledge^{discuss/deeds} 03:52, 15 July 2013 (UTC)

The noun class that needed to be checked was here: the entry itself is actually Xhosa; the change (presumably) went unchecked because the template used to add classes was and is named as if only used for Swahili, though classes are (as Jcwf's summary notes) used in many languages. The oldest unchecked Arabic edit is this one; does anyone here know if it's OK? This Dutch pron also needs to be checked. - -sche (discuss) 04:10, 15 July 2013 (UTC)

Re Xhosa: I trust Jcwf, and it seems plausible in terms of Swahili. —Μετάknowledge^{discuss/deeds} 04:46, 15 July 2013 (UTC)

The Dutch pronunciation looks ok, although I would put a length mark after the o, because Dutch has no phonemic distinction between those two sounds (unlike between [ɔ] and [ɔː] which are marginally distinct). As for noun classes, Module:gender and number supports them, so you can use them as if they were genders. —CodeCa t 11:55, 15 July 2013 (UTC)

Initial oppose. However, if megapatrollers such as SemperBlotto think the features is worth it, I will reconsider. --Dan Polansky (talk) 16:52, 15 July 2013 (UTC)

Well, certainly I have seen it on other Wikis - but I haven't investigated it. Are our sysops any more likely to use this system than the current one? I was also worried about backlogs:- for instance, the current German word-of-the-day has 9 changes pending review (unchecked since August of last year). Would it be easily turnoffable? SemperBlotto (talk) 18:58, 15 July 2013 (UTC)

Suggestion for additional information on the landing page for a "deleted" entry[edit]

This thread on the Feedback page got me to thinking. I've seen similar complaints a number of times in the past, that an anon added an entry, someone deleted it, and the anon re-creates the entry in quick order, only to have that deleted again and then the anon gets blocked. This leads to confusion, alienation, and often the loss of a potential contributor.

Would it be possible to add some kind of additional information on the landing page for a deleted entry, to clue in anons as to what to do? I.e., hints for why the entry might have been deleted, links to relevant pages about format etc., and links to the fora and/or editors who can help newbies? ‑‑ Eiríkr Útlendi │ Tala við mig 17:42, 15 July 2013 (UTC)

Labels "poetic" and "poetry"[edit]

I posted my question at the page Template talk:poetic. -- Andrew Krizhanovsky (talk) 08:31, 16 July 2013 (UTC)

When should square brackets be used?[edit]

I was thinking. In definitions we generally only link to English words. So it makes sense to use {{l|en}} to link to the English, as words don't exist independently of language. I struggle to think of any time where it's best not to link to the language section but rather to link to the page as a whole. So the question is (getting to the section heading) where should square brackets such as [[]] be used? Under what circumstances are they better than a link template? Mglovesfun (talk) 10:29, 16 July 2013 (UTC)

A while ago I proposed using a special shortcut {{d|definition}} for linking to English terms in definitions rather than {{l|en}} which is longer. But that never went anywhere. —CodeCa t 11:54, 16 July 2013 (UTC)

In English and some Translingual sections in Latin characters the benefit from using templated links instead of simple wikilinks seems non-existent, but their use does require template expansion. Any revision in any of the templates underlying their deployment requires many cycles to work through.

Is anything more at work in the global deployment of the descendants of {{l}} than an impulse to standardize? Are we preparing for a time when English may appear in a different font? Why? AFAICT links to English and Latin-character terms from any section should not be templated links either.

Should use of {{l/en}} be banned as a resource-consuming waste? DCDuring TALK 12:13, 16 July 2013 (UTC)

As I said, terms don't exist independently of language. When you use the word chair in a definition, it's not the string of five characters you wish to link to but the English word chair. Mglovesfun (talk) 09:44, 18 July 2013 (UTC)

I'm one of the few who always use {{l}} so I guess I should defend its use:

Using the {{l}} templates links to the correct section. You might say: "oh but there is only one language section, so it's unnecessary." But how often are you willing to check the page to make sure it still has only one section?
You might also say: "but it's linking to the English section, which is the first anyway." Even then, section linking will skip the upper content (user page link, page editing button, etc.) and the ToC, and the multilingual entry if any (again, even if there isn't one, how often are you willing to check?)
If you use tabbed languages, click a regular link and the link's page has a section in the current language, you will remain in the same language instead of English. Even if the page linked to looks like there's no way in hell it can have a non-English section, remember that recent loanwords tend to be unadapted (art dealer.) — Ungoliant ^(Falai) 10:58, 18 July 2013 (UTC)
It tags the term with a span containing lang="langcode". This doesn't have much use now, but it in the future it will be very useful. For example, if a script is made that colours a link green if the page exists but doesn't have a section in the given language, it will be much easier to do so if the link has a lang= parameter.
{{l/en}} and the other {{l}}/foos are so small their resource consumption is negligible compared to the advantages I list above.

— Ungoliant ^(Falai) 10:58, 18 July 2013 (UTC)

UserPage removed[edit]

Someone had some time ago fun with removing my UserPage. I just recreated it and will continue doing so. It's our right as a registered user to create one, isn't it? Who thinks to stand above is, is not better than a dictator. Thank you very much indeed for your attention, bureaucrats and other people with extra buttons they love to abuse to scare users with good faith. Read the famous five "laws" of Wiki. Relax, I won't ask for a ban :-) |Klaas ‌"Z4␟"‌ V| 13:08, 16 July 2013 (UTC)

Why did I waste 10 seconds reading this? --Ivan Štambuk (talk) 13:12, 16 July 2013 (UTC)

I would hardly say it's a 'right', you make it sound like the right to a fair trial or the right to be free from inhumane punishment. Mglovesfun (talk) 13:23, 16 July 2013 (UTC)

This isn't Wikipedia. The consensus of the community here is that user pages are for dictionary business only. For instance, they have decided, by vote, to ban most user boxes (Babel boxes excepted). Users who have made substantial contributions to Wiktionary are given more leeway. Chuck Entz (talk) 13:40, 16 July 2013 (UTC)

{{head}} has used this new module for a little while, and it has been extended and updated some. It has a few more features as a consequence. In particular, headwords and inflected forms that contain wikilinks in them are now automatically linked to the correct language section, so you don't need to do this yourself. I have updated and reorganised the documentation of {{head}} to reflect these changes.

Maybe more important is that a variety of functions used by {{head}} are now exported from Module:headword. These can (and probably should) be used in headword-line modules for individual languages, to reduce duplication of code and to make it easier to make them work consistently the way they should. I have updated Module:nl-headword as an example, which you can base your own modules on. Notice in particular the four calls to m_headword.something at the bottom of the show function, and also the way in which inflected forms are specified (more or less like {{head}}'s parameters, but using Lua tables). The module's functions are now documented, so that it's clear what can be used and what it does. There is also a list of future changes, which would bring the module, and any other templates and modules that use it, in line with some of the features recently added to other templates like {{t}}. —CodeCa t 18:46, 16 July 2013 (UTC)

Blocking new users from creating userpages[edit]

A conversation began on Thread:User_talk:CodeCat/Special:AbuseFilter/21 about whether or not to block new users from creating userpages. That conversation is hereby moved to this page. Billinghurst has imported Abuse Filter 23, which catches fewer good edits, but which also catches markedly fewer bad edits. - -sche (discuss) 23:30, 16 July 2013 (UTC)

From looking at my contribution history, creating a user page was the first thing I did on Wiktionary. --Dan Polansky (talk) 15:05, 17 July 2013 (UTC)

No, You had added several Czech translations before that. SemperBlotto (talk) 15:11, 17 July 2013 (UTC)

This page tells me otherwise. I might have added a couple of entries under an IP, but not under my user name if the page is to be believed. Note that I am not opposing anything; I am merely proving input with some degree of relevance. If my attempt to edit my user page was blocked back then, I would have probably gone on to create entries regardless. --Dan Polansky (talk) 15:28, 17 July 2013 (UTC)

You are right. I was reading the page from top to bottom instead of bottom to top. In that case you were incorrect to say that you were a "contributer" before you had contributed anything! SemperBlotto (talk) 15:36, 17 July 2013 (UTC)

The page I had created did not contain the misspelling "contributer"; it was this revision. You are kind of right that I was not really a contributor at the point at which I had created the page. Well, shame on me, I guess. --Dan Polansky (talk) 15:44, 17 July 2013 (UTC)

Anyway, the following are some of the users who have created their user page as their 1st edit:

--Dan Polansky (talk) 15:53, 17 July 2013 (UTC)

Well, DTLHS is a false positive, of course. As for me, that's because I was already a Wikipedia editor; I suspect it's the same for the others. That's why that global edit count business could help solve a lot of this. —Μετάknowledge^{discuss/deeds} 16:06, 17 July 2013 (UTC)

This seems rather newbie-biting to me (and when have we ever been accused of that before?), but if it is implemented I certainly hope it won't apply to SUL accounts that are have been around for awhile on some other Wikimedia project. —An gr 17:46, 17 July 2013 (UTC)

Actually, this already has already been implemented. We're having this discussion about un-implementing it precisely because we have been getting flak from veteran users of other wikis who come to this site and are blocked from making user pages. - -sche (discuss) 17:51, 17 July 2013 (UTC)

Just add a condition that checks for external links. This is what the spambots want to drop as their payload, and no experienced Wikipedian will generally have an external link on their user page. -- Liliana • 19:30, 17 July 2013 (UTC)

Not a regular user here, but I think that sounds reasonable. --Rs chen 7754 20:08, 17 July 2013 (UTC)

Sounds good to me, too. Ban new (to here) users from creating pages with external links, and any keywords we notice are common in spam (gucci, purse(s), shoe(s), sunglasses). - -sche (discuss) 21:01, 17 July 2013 (UTC)

But nobody (including me back then) needs a userpage at all if they are not a contributor. It wouldn't have stopped me from contributing all those years ago - why would it stop anyone now? SemperBlotto (talk) 06:47, 18 July 2013 (UTC)

Because some people like to be a person, instead of an anonymous pseudonym? Putting up a user page is a nice easy way to sign into a project, saying who you are, what skills you bring and maybe what you plan to do.--Prosfilaes (talk) 07:53, 18 July 2013 (UTC)

FWIW, I think the no-external-links restriction sounds reasonable for purposes of filtering newly created user pages. ‑‑ Eiríkr Útlendi │ Tala við mig 08:13, 18 July 2013 (UTC)

At the original point of discussion, there was some commentary about that the filter that I moved over let garbage through, the purpose of a single filter is not to be as effective as possible, with zero false positives. For that reason I would suggest a series of smaller effective filters rather than one mega filter that stops all garbage. The filter that I built was to stop the NTSAMR-type spam, nothing more, nothing less. If you want to stop users adding off wiki links as new users, then that is not unreasonable as long as you directly tell users why, or you have a good monitoring in place to welcome them. Just don't use the auto-filter, the spambot-hackers clearly know to create an account, and to leave it, often over 2 months. — billinghurst sDrewth 14:32, 18 July 2013 (UTC)

Judeo-Persian and Bukharic[edit]

Until yesterday, WT:LANGTREAT specified that Judeo-Persian was to be considered a dialect of Farsi and banned from having entries. Because I could find no discussion supporting that, and because the two lects differ in vocabulary and script, I concluded following this convo with Metaknowledge that that was a simple error similar to the erroneous ban on Tajik which you can read some history of here. Hence, I updated LANGREAT to allow Judeo-Persian its own entries. But Bukhari also exists. Are Bukhari and Judeo-Persian distinct enough from each other that both should be allowed, or would it make sense to combine them, and if so, under which name? (We currently call Bukhari "Bukharic", but "Bukhari" more common than any of its other names and than "Judeo-Persian", judging by Google and Google Books.) - -sche (discuss) 19:05, 17 July 2013 (UTC)

Did you try searching for "Bokhari" as well? In any case, on the merits of script alone, Persian is fa-Arab only, Judeo-Persian is Hebr only, but Bukhari is Hebr/fa-Arab/Cyrl. So the two could be a bit messy to merge, were we to have entries, but not too bad. —Μετάknowledge^{discuss/deeds} 19:11, 17 July 2013 (UTC)

I compared Bukharic, Bukhari, Bukharian, Bukhori, Bukhoric, Bukhorian, Bokhari, Bokharic, Bokharian (and Judeo-Persian). - -sche (discuss) 19:53, 17 July 2013 (UTC)

Reviewing the literature, I find that [jewish-languages.org/judeo-persian.html‎ jewish-languages.org] does call Bukhari a dialect of Judeo-Persian. On the other hand, a number of scholars, including Solomon Birnbaum, separate Bukhari (considering it more Tajik-ish) from Judeo-Persian / Jidi / Parsic (more Persian-ish). I think we can treat them as separate languages for now, and merge them later if we discover that's warranted. - -sche (discuss) 03:03, 20 July 2013 (UTC)

Users adding ky interwikis[edit]

Obviously we want all valid interwikis, but I suspect these account, a mixture of IPs and named account, are in fact bots. Furthermore they're not even being added in the right place. I think Rukhabot can now sort interwikis as well as add them? Other than this, what to do? Mglovesfun (talk) 09:39, 18 July 2013 (UTC)

Related: I remember at least one IW being added and kywikt didn't have the page. — Ungoliant ^(Falai) 11:00, 18 July 2013 (UTC)

Here's one. Mglovesfun (talk) 11:08, 18 July 2013 (UTC)

German inflections again[edit]

A little while ago there was a discussion about the format of adjective forms in German, I don't remember where it was exactly. The problem is that a single form might have many different functions, which leads to a very long list of definitions. Here is an example of an adjective, for reference: breit. The form breiten for example appears many times in the tables, as does breiter. The alternative that SemperBlotto's bot seems to use is {{inflected form of}}, which is in fact used only for German entries (so it should really be moved to {{de-inflected form of}}). But that template is really a bad substitute because it just says "inflected form of" and doesn't give any other information. So both breiten and breiter would be called "inflected forms" by this template, which isn't terribly useful or informative. (Note that we had a similar debate about Dutch adjective forms some years ago. But in Dutch, "inflected form" is actually the established term for one specific form of the adjective, so it's as concise and accurate as it can be, at least for Dutch. Not so for German.) So what should we do about this? I think at the very least we should get rid of {{inflected form of}}, it's very vague. But how do we display the information in a format that's not too long, but still informative enough that the user knows just what form it actually is? —CodeCa t 20:41, 18 July 2013 (UTC)

We already had this discussion at WT:RFDO#Template:de-form-adj, but did not definitively reach consensus (Angr and I were against long lists, -sche ambivalent, and Ungoliant opposed to removing them). —Μετάknowledge^{discuss/deeds} 21:01, 18 July 2013 (UTC)

Perhaps a small collapsed box could be made for the definition line that simply displays Inflected form of Xxx, and when you click it it expands to possibly dozens of lines containing more detailed information. I suspect that most users are in fact not interested in the the specific details of inflected forms that we usually provide (for nouns that would be the case, gender, animacy, definiteness, possessive forms, etc.), but simply want to jump to the main lemma where they can deduce it themselves from the context (once they know the meaning), or look it up in the inflection table if need be. --Ivan Štambuk (talk) 21:02, 18 July 2013 (UTC)

We can also use an intermediate approach where we provide some details on a few definition lines. Or maybe find ways to combine definitions to make them more concise. For example, breiten appears in both the weak and mixed declensions in the same places, so we can just say "(definition) weak and mixed". We can also group definitions under subdefinitions, like this:

strong inflection of breit:
1. genitive masculine and neuter
2. accusative masculine
3. dative plural
weak and mixed inflection of breit:
1. genitive, dative and accusative masculine
2. genitive and dative feminine and neuter
3. plural of all cases

That's still quite a list but at least it's a bit more manageable? —CodeCa t 21:20, 18 July 2013 (UTC)

How would that sort of list accept the addition of supporting quotations for each form and definition? Have we decided not to do that? --EncycloPetey (talk) 17:09, 20 July 2013 (UTC)

As far as I know, we put those on the lemma entry. It would be silly to require all quotations on the lemma entry to be only of the lemma form itself. —CodeCa t 17:13, 20 July 2013 (UTC)

Scripts and italics[edit]

A lot of scripts have italics disabled for {{term}}, and a few more had them disabled completely for any kind of italics whatsoever. Which is more desirable? Should these scripts never be displayed in italics anywhere on Wiktionary, or should this be limited only to mentions? Or should should some scripts never be italic while for others it's only disabled for mentions? —CodeCa t 23:28, 19 July 2013 (UTC)

No script except Latin and Cyrillic should ever be set in italics. There's some disagreement as to whether Cyrillic should ever be italicized at Wiktionary. (I'm in favor of italicizing it with {{term}}, but I don't feel particularly strongly about it.) —An gr 16:59, 20 July 2013 (UTC)
I'm among those who think Cyrillic should not be set in italics. The italicized versions of some Cyrillic characters are quite different from the standard font. The Cyrillic charcter that looks like a "T" in regular script looks like an "m" in italics, when they're handled correctly. Latin script should be italicized in the circumstances we're discussing, but in my opinion it's the only one that should. --EncycloPetey (talk) 17:12, 20 July 2013 (UTC)
Japanese in italics ranges from mildly funky-looking (like このサンプル this sample) to not-quite-legible (like 比喩的、龍、複雑怪奇、糞 these terms). YMMV, and it depends on what font your system uses. Typographically speaking, italics aren't used much in Japanese text anyway, at least in my experience. Changes in italicization can also make certain characters potentially more ambiguous, such as ソ (so, italicized) and ン (n, not italicized).

FWIW, I don't think italicized Cyrillic is all that problematic. But then again I'm not editing any entries using this script, so my opinion probably shouldn't carry much weight. ‑‑ Eiríkr Útlendi │ Tala við mig 22:28, 20 July 2013 (UTC)

I think scripts that are not italicised by native convention should not be italicised on Wiktionary either. I think that would include Japanese. But Cyrillic certainly does appear in italics natively, so that is a different matter, and we have to look at what we prefer. What we could do is use "font-style: oblique" instead. This tells browsers to just display the normal font slanted, without using any of the special letter forms used in italics. That might help with any confusion with Cyrillic characters, and maybe for other scripts as well. —CodeCa t 22:39, 20 July 2013 (UTC)

Compare: (with italics) метить, (with oblique) метить. —CodeCa t 22:43, 20 July 2013 (UTC)

Both samples here appear identically on my home machine -- Ubuntu 10.4, Chromium 25. Inspecting the elements shows that the browser thinks the former is indeed in italics and the latter in oblique, but the implementation shows up the same for me. ‑‑ Eiríkr Útlendi │ Tala við mig 22:55, 20 July 2013 (UTC)

It may be a matter of the fonts. Some fonts have special italic forms while others don't. For me, the default font for Wiktionary, as well as the one used for Cyrillic, both display italic and oblique the same. But when I add "font-family: serif" like I did above, they appear different to me. —CodeCa t 22:58, 20 July 2013 (UTC)

It could be, but on my system they both look like the Latin letter string Memum with a soft sign after it. I use a Mac, and tried it with both Firefox and Safari. Chuck Entz (talk) 02:30, 21 July 2013 (UTC)

That's strange, it seems like your system isn't interpreting "oblique" the way it should. —CodeCa t 02:32, 21 July 2013 (UTC)

Using Firefox on Windows, italics and oblique both look identically like memum to me, too. - -sche (discuss) 05:21, 21 July 2013 (UTC)

Also using Firefox on Windows, and they look identical for me too, like memumb. Same when I switch to Chrome. When I switch to IE, they still look identical but are in oblique, so they look like mеtиtb —An gr 09:21, 21 July 2013 (UTC)

I'm using Firefox on Linux Mint 15, and for me they differ. So either Mint must be doing correctly what all of those other systems are doing wrong, or the font being selected on my system has distinct italic forms for Cyrillic whereas yours doesn't? —CodeCa t 11:00, 21 July 2013 (UTC)

In any case, this shows that we shouldn't use oblique styling, lol. - -sche (discuss) 19:03, 21 July 2013 (UTC)

The script request categories[edit]

I have often wondered just what the use is of having categories like Category:Entries which need Cyrillic script. We did somewhat mitigate that by adding the language to the name, but I feel that this was done without really looking at the nature of the problem. Generally, these categories are added to a page (by {{term}}, {{rfscript}} and others) when a term is given with only a transliteration, but not the term in the native script. Which script that is doesn't actually matter, because scripts are generally used different enough from language to language that someone who can convert a Russian transliteration to Russian Cyrillic won't do all that well with converting transliterated Serbian into Serbian Cyrillic. And someone who knows how to write and transliterate Sanskrit won't generally understand how to do the same with a Hindi term. So what matters really is the language, the script is only secondary. So I propose to remove the name of the script from these categories altogether, and use a name like "(language) terms needing native script". {{rfscript}} would need to be converted so that it only takes languages as its parameter, rather than the script code. What do you think? —CodeCa t 19:19, 21 July 2013 (UTC)

Isn't that what we already do? Category:Russian entries which need Cyrillic script DTLHS (talk) 19:28, 21 July 2013 (UTC)

No, because that still has the script name in the category. My proposal is Category:Russian entries needing native script (we might as well get rid of the awkward "which need" too). —CodeCa t 19:38, 21 July 2013 (UTC)

They way things are in this case is good enough for me. The proposal to me seems not as good as the status quo, ergo oppose. Mglovesfun (talk) 22:34, 21 July 2013 (UTC)

Can you elaborate? —CodeCa t 22:36, 21 July 2013 (UTC)

Since you asked, CodeCat, you seem to have an agenda that everything must me made to work differently to how it works now, even things which work perfectly well. It worries me that a lot of good infrastructure will be thrown away for your personal reasons (whatever they are) and not for the good of the wiki. Mglovesfun (talk) 22:46, 21 July 2013 (UTC)

Maybe I should try to explain my reasons then. I am primarily concerned with consistency and making things work in a way that is the most intuitive and sensible. Some people like DCDuring complain about template-itis, and I do agree that it is rather confusing with how many templates we have. However, I argue that the confusion stems from how they all work differently from one another. If they all worked similarly, then it would reduce the mental burden on newcomers because they would not need to learn every little slight difference about all the templates, category names and so on. Instead they would be able to actually get things done because it would be easier to actually remember how all of their tools work. In this particular case, try typing {{term|tr=something|lang=ru}} in an entry. It will add the page to Category:Russian entries which need Cyrillic script. Can you see what is wrong with that? That category name will be the same even if you put it in a German entry. So the "Russian entries" part is incorrect, it should be "Russian terms". And you can increase consistency further by changing "which need" to the more usual "needing" (which is used in a lot more cleanup categories). Working "minority" conventions out of the system in favour of the majority so that people no longer have to think "was it this category that was called 'which needs' or was it that other one?". It will always be "needing", no more question needed, which leaves more mental room for questions that actually matter.

The other half of the reason is that I don't feel that the current structure of these categories makes sense. They may have made sense at one point, but there is always a time when you need to re-evaluate things that you once took for granted, and judge whether they really make as much sense as you thought they did. Let's say that the code above was placed on a German entry, and I am a Russian speaker and I want to fix any links to Russian terms that need the Russian spelling. Where do I go? Well, the place to start is Category:Requests (Russian) but that category is already a horrible mess. Let's disregard that for a moment and assume that I make it to Category:Russian terms needing attention (although there is nothing particularly intuitive about that category, since plenty of request categories are placed elsewhere in the tree). Then I see Russian entries which need Cyrillic script. Aha, I think, that is what I am looking for, so I work through that category. But then later on, I come across its second parent category, Category:Entries which need Cyrillic script, and I find more Russian entries there. Why were those not listed in the first category? Why do I need to look in two categories to fulfill essentially the same task? Request categories should be task oriented, so that is bad organisation and it's what I am trying to eliminate with this proposal. Under this proposal, Category:Terms needing native script by language would contain subcategories organised by language, including Category:Russian terms needing native script, and both Category:Entries which need Cyrillic script and Category:Entries needing various scripts would disappear. There would also be no Category:Russian terms needing Cyrillic script because Russian terms are always in Cyrillic, the script is redundant. Maybe an exception could be made for those few languages that use multiple scripts, but for the majority of languages, the script is completely irrelevant to the task. People who can write in Russian don't go adding "Cyrillic" spellings to entries, they add Russian spellings and don't care about Ukrainian or Kazakh spellings because they don't know those. —CodeCa t 23:38, 21 July 2013 (UTC)

British/American spelling and redirects[edit]

This edit popped in in my watchlist today where -sche redirected defense as an alternative spelling of defence. The policy/guideline page WT:AEN however still has this:

Words that are commonly spelled differently in different countries are all considered valid entries that should not be shortcuts to other versions. Full entries exist for both color and colour.

-sche claims that merging is something has been approved by the community. However, most of such redirects are being done by a few people, that invoke their own previous mergers as precedents. I think that this is agenda-pushing and that we should have both spellings as equally valid entries, kept in sync for any changes. If any spelling should be the default one, that would be American English which is the most widely spoken and the most influential variety of English. --Ivan Štambuk (talk) 21:27, 21 July 2013 (UTC)

I agree with removing this duplication, but I disagree that American English should be the default. —CodeCa t 21:29, 21 July 2013 (UTC)

Then what should be the default spelling? Some random criteria like "which spelling was created first as an article" or "which was more often updated" ? --Ivan Štambuk (talk) 22:15, 21 July 2013 (UTC)

Except that "first created" is not random, and it's the only way to do redirects without favouring either side. —CodeCa t 22:20, 21 July 2013 (UTC)

It's random in the meaning "not predictable". The time when the article was created was randomly chosen by mental processes of the editor, which are non-deterministic. It's inconsistent. Furthermore, it's not uniformly distributed due to the inherent bias of humans to prefer their own native (or taught) spelling. Just because in the early days of Wiktionary there were many American/British editors, it doesn't mean we should favor American/British spellings. Criteria of preference should be objective, universal and not subject to cultural prejudices of early editors. --Ivan Štambuk (talk) 22:40, 21 July 2013 (UTC)

Comments:

The line cited above, from the think-tank WT:AEN, hasn't been updated since 2009 (or earlier, since it seems to have been copied into AEN from somewhere else).

Does anyone dispute that syncing does not work? If so, can they point to any entry that has been correctly synced for any major portion of its history? Even colour and color are only synced because I synced them. Fewer than 1 in 20 of the entries I've seen where content was duplicated were actually synced; the rest contained differences of varying degrees of severity, with one entry or the other routinely missing common senses, and with all entries containing definitions different enough to wrongly imply that the terms had distinct meanings and could be contrasted with one another.

The class of entries which would require syncing is huge—indeed, it is open-ended: all verbs ending in -ise/-ize, nouns ending in -our/-or, adjectives ending in -ised/-ized or -oured/-ored... common entries with many senses, like realize, less common entries like paganize... and a limitless set of entries we don't have yet but would have to anticipate somehow so as to sync them once they appear (actuarialize, dictionarize, etc)...

Until now, pairs of entries have been merged by simply picking one spelling or the other. Other people have had different methods, but my method has been to edit two pairs of entries at a time, making the American spelling of one pair the lemma and the British spelling of the other pair the lemma. We could, however, adopt a policy like Wikipedia's, whereby the lemma is whichever spelling (i.e. whichever entry) was created first.

- -sche (discuss) 22:02, 21 July 2013 (UTC)

The only reason why syncing doesn't "work" is because nobody has bothered to actually enforce it. Instead, aggressive POV-pushers such as yourself which favor British spellings use it as an excuse to subtly push their agenda. The only parts that need to be synced are definition lines, the rest can be transcluded from the same place. It's not like these particular entries are updated billions time per day. At most we're dealing with a few dozens edits per day for words that need to be mirrored in alternatively spelled entry. These could be easily logged by a bot. --Ivan Štambuk (talk) 22:15, 21 July 2013 (UTC)

The answer to this is really "been there, done that". Yes, we know that we can do these things. The problem is that nobody actually does them. And in any case I am somewhat wary of relying on bots for the normal operation of the wiki. Just look at what has happened since a formatting dispute shut down Autoformat. —CodeCa t 22:20, 21 July 2013 (UTC)

What has happened? What is "normal operation of the wiki" ? Every entry is a continuous work in progress. Just because edits at defense and defence are not instantly mirrored to each another, it doesn't mean that we should abandon edit mirroring as a concept. Editing mistakes in templates or wiki code are minor technical issues. This has much bigger significance IMHO. Alternatives should at least be openly discussed. --Ivan Štambuk (talk) 22:49, 21 July 2013 (UTC)

LOL. Considering that I've merged entries in pairs (one to British, one to American), and that when I've created new entries I've almost always made American spellings the lemma, the fact that you think I'm pushing British spellings shows how little you pay attention. Of course, so does everything else you've said so far. As CodeCat notes, we've "been there, done that" and it hasn't worked. - -sche (discuss) 22:23, 21 July 2013 (UTC)

I do want to note that -ize is also British, so we should put the main entry there at all times. —CodeCa t 22:35, 21 July 2013 (UTC)

No you haven't really done that. What you have done is the exact opposite - favoring a particular side due on the basis of argument that syncing doesn't work. The problem is that you are in principle against syncing, not because it won't scale (it would, with bot assistance), because you want to reduce amount of duplication that you perceive as redundant. The problem with that line of argument is evident in entries such as defense, the most common English spelling in the world, being redirected to a regional spelling defence. --Ivan Štambuk (talk) 22:42, 21 July 2013 (UTC)

What's your evidence of that? DCDuring TALK 23:11, 21 July 2013 (UTC)

Evidence of what exactly? It haven't really seen any proposal for systematically keeping entries in sync. I've seen people discussing what entries should be the "main" ones, and the benefits of merging (less maintenance). --Ivan Štambuk (talk) 23:25, 21 July 2013 (UTC)

I think you need to look at the problem more closely. defense is used by the most native speakers, but defence is used more widely because the influence of British culture is more widespread. That may be changing gradually, yes, but the US is still a relative newcomer as far as cultural influence goes. A hundred years ago, the chances were that if you went anywhere outside the American continent and they spoke English, it would be some form of British-influenced English. To call defence merely a regional spelling would be like me trying to elevate Ijekavian to standard Serbo-Croatian and calling Ekavian merely regional (it's only used in Serbia!). All of this is beside the point though and not really relevant, because this debate isn't about deciding which variety of English to base Wiktionary on. You complain that the criteria that have now been applied are arbitrary, but I argue that they have to be. Any argument about the relative use of one variety or the other will become a dead end, so only a completely arbitrary decision that has no linguistic merit is going to break this impasse. That's exactly what -sche has done and I think it's very good. —CodeCa t 23:08, 21 July 2013 (UTC)

The days of British Empire are long gone. US is by far the most dominant culture on the planet. From pole to pole on on every TV, radio, newspapers etc. you'll see American movies, artists, celebrities etc. American language is the global English. From the perspective of how things are, today, the spelling defence is unfortunately a regionally-confined variant. It doesn't mean that it's any less "worth" though. I was merely making an observation that redirecting a more common spelling, to a less common one (from the perspective of majority of native speakers and FL learners), seems a bit problematic to me. The argument of relative use of some variety (from the perspective of widespreadedness, number of speakers, cultural relevance etc.) is indeed as arbitrary as "who created this entry first" - but that is the whole point. The only way to resolve this is to keep both spellings as full-blown entries. --Ivan Štambuk (talk) 23:25, 21 July 2013 (UTC)

There are few things quite as irritating as be forced to an entry all of the citations and usage examples for which are culturally irrelevant to you. I'd expect that users who favor the 'losing' spelling will be somewhat alienated by the experience as I am. I'd rather that we had separate entries as long as there is some usage context, regional or otherwise, in which a given spelling is dominant. Also, if there were even a single definition that was much more common in one context rather than another. To avoid the truly pointless, we can use {{trans-see}} to make sure that the translations are consolidated. If work is required, so be it. DCDuring TALK 23:11, 21 July 2013 (UTC)

You are receiving this email because you subscribed to this feed at blogtrottr.com.

If you no longer wish to receive these emails, you can unsubscribe from this feed, or manage all your subscriptions

Wealth Maker

Sunday, July 21, 2013

Wiktionary - Recent changes [en]: Wiktionary:Beer parlour/2013/July