Wealth Maker: Wiktionary - Recent changes [en]: Wiktionary talk:Todo

Wiktionary - Recent changes [en]

Track the most recent changes to the wiki in this feed. // via fulltextrssfeed.com

Wiktionary talk:Todo

Aug 31st 2012, 03:10

Revision as of 03:07, 31 August 2012 (edit)		Latest revision as of 03:10, 31 August 2012 (edit) (undo)
Line 181:		Line 181:

	:::: I just searched the site for instances of each Romani lect's name ("Balkan Romani", etc). It turns out, the only ones* that appear ''anywhere'' outside of English entries about themselves and the external-links section of the entry [[Romani]] are: "[[Template:rmf\|Kalo Romani]]" (which also shows up as "Kalo Finnish Romani"), "[[Template:rmy\|Vlax Romani]]", "Kalderash" (a subdialect of Vlax I will suggest in the BP or RFM converting to Vlax+{{temp\|qualifier\|Kalderash}}){{,}} and "East Slovak" which is a variant of the Eastern variety of the Northern subdialect of Carpathian Romani (that's splitting some hairs!). "Kalderash" and "East Slovak" only appear in one entry, anyway; "Kalo" is in nine, "Vlax" is in six... actually, it looks like I can just find and correct all these by hand, without calling you away from your great work on the new "T-Bot" to make a list. :) [[User:-sche\|- -sche]] [[User talk:-sche\|(discuss)]] 03:01, 31 August 2012 (UTC)		:::: I just searched the site for instances of each Romani lect's name ("Balkan Romani", etc). It turns out, the only ones* that appear ''anywhere'' outside of English entries about themselves and the external-links section of the entry [[Romani]] are: "[[Template:rmf\|Kalo Romani]]" (which also shows up as "Kalo Finnish Romani"), "[[Template:rmy\|Vlax Romani]]", "Kalderash" (a subdialect of Vlax I will suggest in the BP or RFM converting to Vlax+{{temp\|qualifier\|Kalderash}}){{,}} and "East Slovak" which is a variant of the Eastern variety of the Northern subdialect of Carpathian Romani (that's splitting some hairs!). "Kalderash" and "East Slovak" only appear in one entry, anyway; "Kalo" is in nine, "Vlax" is in six... actually, it looks like I can just find and correct all these by hand, without calling you away from your great work on the new "T-Bot" to make a list. :) [[User:-sche\|- -sche]] [[User talk:-sche\|(discuss)]] 03:01, 31 August 2012 (UTC)
−	:::::* that is, the only ones I can find... there may be some more codeless, hairsplitting sublects I have no way of knowing about. Hmm, perhaps something TODO is to find all 'languages' in Translations sections (anything betwen <tt>* <tt> or <tt>*: </tt> and <tt>: <nowiki>{{t</nowiki></tt> or <tt>: <nowiki>[[</nowiki></tt>?) that isn't on a list we could make of (1) [[WT:LANGLIST\|every language we have templates for]], (2) the words "Latin" and "Cyrillic" and "Syriac" and other script-words{{,}} and (3) anything else we're aware of and allow (like "Mandarin", "Egyptian Arabic" etc even if codeless). [[User:-sche\|- -sche]] [[User talk:-sche\|(discuss)]] 03:07, 31 August 2012 (UTC)	+	:::::* that is, the only ones I can find... there may be some more codeless, hairsplitting sublects I have no way of knowing about. Hmm, perhaps something TODO is to find all 'languages' in Translations sections (anything betwen <tt>* </tt> or <tt>*: </tt> and <tt>: <nowiki>{{t</nowiki></tt> or <tt>: <nowiki>[[</nowiki></tt>?) that isn't on a list we could make of (1) [[WT:LANGLIST\|every language we have templates for]], (2) the words "Latin" and "Cyrillic" and "Syriac" and other script-words{{,}} and (3) anything else we're aware of and allow (like "Mandarin", "Egyptian Arabic" etc even if codeless). [[User:-sche\|- -sche]] [[User talk:-sche\|(discuss)]] 03:07, 31 August 2012 (UTC)

Latest revision as of 03:10, 31 August 2012

[edit] Explanation

This was basically an idea to get people coordinated on various little "projects" that have until now been on people's user pages. Just a note of common sense, don't remove stuff that isn't clearly wrong or is up for deletion but hasn't failed yet. If this page remains active for a period of time we might rename it, or at the very least get some more links pointing here. Mglovesfun (talk) 12:07, 26 November 2009 (UTC)

What do we think about clean-up tasks that require in-depth knowledge of a foreign language? If there aren't that many it's probably fine to have them here, but when there's a lot, I think they should be pushed out to the About Language space (eg WT:AJA#Additional help and Wiktionary:About Spanish/Todo). --Bequw → ¢ • τ 14:43, 14 January 2010 (UTC)

[edit] Noting progress

Do we think it'd be a good idea to note progress on this page as well? It might help with motivation. --Bequw → ¢ • τ 15:43, 27 November 2009 (UTC)

[edit] Noting completion

I'd say that single-time issues that have been dealt with can be removed completely from the page. Keep (as stricken) issues that are reoccurring cleanup tasks. Sound fine? --Bequw → ¢ • τ 01:47, 29 December 2009 (UTC)

[edit] Requests

I think anything using '''{{polytonic|}}''' should lose the bold, especially in etymologies. Normally we use italics in etymologies, not bold, and even then only for the Latin script. Mglovesfun (talk) 06:44, 5 January 2010 (UTC)
Japanese words using using the Latin script {{infl|ja|part of speech}} but not sc=Latn. Mglovesfun (talk) 06:44, 5 January 2010 (UTC)

Agreed. The '''{{polytonic|}}''' ones will take more human formatting since usually the transliteration and definition would need to be put into a {{term|sc=polytonic|...}}. --Bequw → ¢ • τ 21:56, 5 January 2010 (UTC)

Here a list of the matching polytonic usages in etymology sections

Wiktionary:Todo/polytonic ety usage

As for the the missing sc=Latn on Japanese entries in Latin script, does this actually cause a problem? We put the script in so that browsers will be able to match the font correctly for uncommon languages. Is there a chance the browser would pick a font that doesn't have glyphs for the the Latin set? --Bequw → ¢ • τ 02:54, 18 January 2010 (UTC)

It defaults to {{Jpan}}. So it's very much needed unless you want people to see the headline in a Japanese font. -- Prince Kassad 02:57, 18 January 2010 (UTC)

I know, but don't most Japanese fonts have glyphs for the Latin blocks as well? --Bequw → ¢ • τ 07:36, 18 January 2010 (UTC)

How does our Wiktionary:Todo#Regular_tasks differ from WT:DW? Should we move regular tasks to that page and leave this page for problems that are still quite large? --Bequw → ¢ • τ 17:35, 8 January 2010 (UTC)

I think that page is inactive, the Richard and Connel are on there quite a lot, and they hardly ever contribute now. Mglovesfun (talk) 08:23, 13 January 2010 (UTC)

WT:DW was also used primarily for regular and on-going tasks that would never be completed, such as responding to requests lists. --EncycloPetey 21:44, 31 January 2010 (UTC)

[edit] List refresh frequency

Once more of these generated lists become manageable it would be nice if refreshes could be synchronized. For instance the 12th of the month could be cleanup day (as Dec 12 is Wiktionary Day). --Bequw → ¢ • τ 17:48, 8 January 2010 (UTC)

Moved here, as no consensus exists Mglovesfun (talk) 10:43, 12 January 2010 (UTC)

Language subcategories need standardization to [[Category:xx:Cardinal numbers]] -- Prince Kassad 22:08, 30 December 2009 (UTC)

I don't think that cardinal number is a part of speech like noun or verb, so these should be topical categories. So it should be [[Category:de:Cardinal numbers]] just as we have [[Category:de:Fish]] not German fish. Mglovesfun (talk) 22:23, 30 December 2009 (UTC)

I don't think there's consensus on this. See Wiktionary:Beer parlour archive/2007/July#Numbers on Wiktionary. People (used to) have differing definitions of "number" & "numeral" (which is written-out and which is ciphered) as well as categorization (what should the PoS be? possibly even Determiner). This should be rediscussed before major cleanup is done (on this and Category:Numbers). --Bequw → ¢ • τ 22:51, 30 December 2009 (UTC)

I consider Numeral the part of speech. Mglovesfun has pointed out a key difficulty, in that many cardinals do not function grammatically like a separate part of speech, even in languages that have a separate PoS function for numerals. Worse, the function of a numeral differs depending on the class of numeral it is, so there aren't any overall guidelines for that part of speech except that "something numerical" is included in the meaning. In my own work on Latin, I've avoided adding the ordinals because I'm not sure whether they ought to be Numerals or Adjectives, even though they are certainly ordinals. Their function and inflection don't seem particularly different from adjecitves. Leaving aside the Latin issue, my preferred solution in the matter is to have an overall "Category:Language numerals" within each language, where words can be listed, but would have separate topical subcategories for those collections of mathematical words that people think are cardinals, ordinals, etc., regardless of how they function in the language. A topical category can do that, where a grammatical category that grouped based on function would probably just confuse most users (even the grammatically-experienced ones). --EncycloPetey 21:42, 31 January 2010 (UTC)

[edit] English nouns without categories

Astonishly enough, the current list (which is imperfect) is 7230 English nouns that are not in the English nouns or English plurals categories! I have a text file with all of them in. Even doing 100 per day it's gonna take me until July to do them, anyone fancy helping me? Oh and worryingly, this is just the nouns in English. It makes you think that at least 10% of our entries are missing PoS categories. Mglovesfun (talk) 15:15, 5 March 2010 (UTC)

My recommendation is to find a the common patterns and use AWB to do a first pass to get most of them (Conrad can regenerate a list for the ones that need to be manually done). An easy pattern is where the page name is bolded on the inflection by itself. To do this, you could replace

(==English==(?:[^=]|==+)+==+Noun==+\s)'''[^\n\r\[\]]+'''[\n\r]+

with this:

${1}{{infl|en|noun}}\n

Then just make sure the term that was bolded was actually the page name. After correcting a bunch I'm sure you're aware of the common patterns for simple plurals as well. --Bequw → τ 18:56, 5 March 2010 (UTC)

[edit] Redirects for macrons

Per the Grease Pit discussion a few weeks ago, how about a list of redirects from macroned forms to macronless forms, for example hūs > hus. Mglovesfun (talk) 16:05, 1 July 2010 (UTC)

Working on this, should be ready in about an hour. Here is my (hopefully complete) list of macrons: ĀāǟǡǢǣḆḇḎḏĒēḔḕḖḗḠḡẖĪīḴḵḺḻḸḹṈṉŌōṒṓṐṑȫǬǭȬȭȱṞṟṜṝṮṯŪūǕǖṺṻȲȳẔẕ Nadando 21:42, 1 July 2010 (UTC)

[edit] Numbered senses

List of entries where we use numbered senses glosses - eg (1). They should be turned into word glosses. Can anyone makeup a good list? --Bequw → τ 16:20, 7 July 2010 (UTC)

Thanks Nadando. --Bequw → τ 21:33, 17 July 2010 (UTC)

Hi. Can someone please generate a list of all entries that call {{form of}} with the second parameter containing a # character? (This relates to a discussion on my talkpage.) I'd do it, but don't know how. Depending on how many there are, they might need automated fixing also, but we can cross that bridge once we know how many there are.—msh210℠ (talk) 19:27, 23 July 2010 (UTC)

There are a ridiculous amount of these. Are we absolutely sure this can't be fixed in the template code? Nadando 20:18, 23 July 2010 (UTC)

We can revert my recent edit to the template, but I think it was a positive edit. I can't think of any other way, though obviously there may be one.—msh210℠ (talk) 20:20, 23 July 2010 (UTC)

I have found 29,691 pages with an unlinked hash mark as the second parameter. If someone wants to use a bot to fix them I can send them the list. Nadando 21:26, 23 July 2010 (UTC)

[edit] reciprocal links

Sometimes existence of a link in one direction should imply existence in the other:

homophones — should always reciprocate, though this is not bottable, as it might be accent-specific
rhymes — the entry and the Rhymes: page should always reciprocate, though this is not bottable, as it might be accent-specific
{{also}} — should usually reciprocate (whether it links to another entry or to a forms-of appendix)
'nyms and related terms— should usually reciprocate, though this is not bottable, as it might be sense-specific
derived terms — where listed as derived at [[foo]] should also list foo in the etymology; not usually bottable, as explanation is needed in the etymology, but perhaps if there is no etymology section at all then one can be added listing just the word?

Any others?—msh210℠ (talk) 19:19, 2 September 2010 (UTC)

(Note that although I said "also should usually reciprocate", I elsewhere questioned how usual the "usually" actually is.—msh210℠ (talk) 17:33, 3 September 2010 (UTC))

I'm struggling to think of specific examples, but for related terms I sometimes use see {{term|foo|lang=foo}} to avoid repetition, something like dogmatically could link to dogma, as a specific example. Mglovesfun (talk) 17:37, 3 September 2010 (UTC)

[edit] Transliterations for Turkish

At least twice now I've seen a transliteration for a Turkish word in the template {{t}}. Turkish uses a more extended version of the Latin alphabet than English, which sticks to 26 letters almost all of the time, but we don't want transliterations for Latin script languages, do we? Mglovesfun (talk) 20:57, 5 September 2010 (UTC)

That's of course nonsense. If a word is already in Latin script, there's no need to add a transliteration to the very same script. -- Prince Kassad 21:03, 5 September 2010 (UTC)

To clear up, stuff like ş = sh. Mglovesfun (talk) 21:21, 5 September 2010 (UTC)

So, can someone make a list, please? Mglovesfun (talk) 10:42, 13 September 2010 (UTC)

Do you want to look for other languages or inside other languages? --Bequw → τ 00:23, 14 September 2010 (UTC)

Done- see Wiktionary:Todo/Latin script transliterations. Nadando 03:19, 14 September 2010 (UTC)

Especially Swedish, but other languages have free floating declension/conjugation/inflection tables not under any appropriate header. Mglovesfun (talk) 10:42, 13 September 2010 (UTC)

WT:Todo/templates with right-aligned elements. Please remove false positives such as templates we want to float right (eg {{wikipedia}}) and ones that merely have a right-aligned elements in a table that is overall left-aligned. --Bequw → τ 00:27, 27 September 2010 (UTC)

[edit] CJKV Characters in translations

The Beer Parlour discussion seemed pretty favorable to dump these all together. Mglovesfun (talk) 23:23, 26 September 2010 (UTC)

WT:Todo/CJK in translation sections. Smaller than I thought, only 277 entries. --Bequw → τ 00:03, 27 September 2010 (UTC)

Such a list would allow us to find a fair few of the entries lacking POS categories. Mglovesfun (talk) 14:42, 7 December 2010 (UTC)

[edit] Chinese translations

I note that best man has a 'Simplified Chinese' translation. Shouldn't this be just under Chinese, specifically Mandarin, Min Nan, Wu, Cantonese (etc.) Simplified Chinese isn't a language so much as a way of writing Chinese. So, should we ditch these (hence make a list of them). Also Traditional Chinese, obviously. Mglovesfun (talk) 10:28, 28 January 2011 (UTC)

I don't know if you've been noticing, but I've been standardising this very stuff for the past few years. How does one make a list of it? ---> Tooironic 22:27, 28 January 2011 (UTC)

By analysing dumps. If I knew how to do it, I would have done it already! Mglovesfun (talk) 22:29, 28 January 2011 (UTC)

You are looking for a list of all pages which have "Simplified Chinese" within a translation table? That seems doable. - TheDaveRoss 14:40, 31 January 2011 (UTC)

Also for "Traditional Chinese". Let us know how you go. ---> Tooironic 22:06, 31 January 2011 (UTC)

There are other variations, where it says "see Mandarin" under "Chinese" or Mandarin is separate.

The only agreed version is:

  * Chinese:  *: Mandarin: {{t|zh|心理學|sc=Hani}}, {{t|zh|心理学|tr=xīnlǐxué|sc=Hani}}

--Anatoli 01:05, 18 February 2011 (UTC)

OK so one month has passed, has anything been done about this? ---> Tooironic 07:48, 18 March 2011 (UTC)

Sorry, totally forgot about this, I have been busy at work. I am running this right now and should have something ready in a short while. - TheDaveRoss 02:00, 19 March 2011 (UTC)

Here is the list. Let me know if there are any on there that shouldn't be, I can refine it. - TheDaveRoss 02:53, 19 March 2011 (UTC)

Wow is that all? I'm quite hopeful. :D ---> Tooironic 14:49, 19 March 2011 (UTC)

Those are all the ones which have translation tables (correctly formatted) which contain "Simplified Chinese", "Traditional Chinese" or "Mandarin" as a primary entry. There may be more in other formats which were not found. - TheDaveRoss 16:28, 19 March 2011 (UTC)

Thanks very much, that is extremely helpful. :) ---> Tooironic 08:44, 20 March 2011 (UTC)

I've been fixing a lot, haven't completed yet but could you make another dump, please? Thanks for your help! --Anatoli 01:21, 9 May 2011 (UTC)

[edit] Robert Ullmann's lists

As I imagine RU's analyses won't be run anytime soon, I've looked through his subpages for cleanup lists that we might want to independently generate. He had other projects, several aimed at finding missing entries, but I'll leave those for others. I've made a rough list of those I think we should try and replicate, and those I'm not sure about.

Unsure:

A bunch of Han stuff that I don't think was current
User:Robert Ullmann/Context labels
User:Robert Ullmann/HTML entities
User:Robert Ullmann/Contexts
User:Robert Ullmann/Missing
User:Robert Ullmann/Missing forms
User:Robert Ullmann/Oldest redlinks
User:Robert Ullmann/Pronunciation exceptions
User:Robert Ullmann/Redirects
User:Robert Ullmann/t16
User:Robert Ullmann/t17 (entries with explicit table syntax)
User:Robert Ullmann/t19
User:Robert Ullmann/t23

Anyone want to tackle any of these. I think I can do the L2/invalid one without too much hassle.done --Bequw → τ 15:37, 29 August 2011 (UTC)

[edit] Mglovesfun's lists

If anyone wants to tackle any of the subpages of User:Mglovesfun/to do‎, please do. I'll be around a lot less so a lot of these lists may never get done unless someone else fixes a few entries. Mglovesfun (talk) 22:03, 24 September 2011 (UTC)

[edit] IPA cleanup things

Until just a moment ago, our edittools wrongly contained a non-IPA g in [g̊]. It could be corrected to [ɡ̊]. (It could be corrected straightaway without a list; there is no reason why a g in text should have a voiceless symbol.) A list could also be made of g (the non-IPA "g") in IPA sections (there may be valid uses of it, e.g. in refs). - -sche (discuss) 03:27, 29 August 2012 (UTC)

A bot could also convert instances of .ˈ and .ˌ, and if they exist even ˈ. and ˌ., to ˈ, .ˌ, if this is indeed policy (to not mark syllable breaks with dots where there is already a stress marker). - -sche (discuss) 03:30, 29 August 2012 (UTC)

A bot could also convert diphthongs like /aɪ̯/ to /aɪ/ (especially in German entries?), if and only if the latter is (as I think) the preferred broad transcription format. - -sche (discuss) 07:06, 30 August 2012 (UTC)

[edit] Translation cleanup things

See [1]. A list of entries which contain Romani sublects not sorted under the macrolect could be made. (Does the trans-adder automatically nest the sublects? If not, it should.) - -sche (discuss) 21:58, 30 August 2012 (UTC)

Re: generating the list of entries: That sounds straightforward enough. What are all the sublects?

Re: trans-adder: If you mean the bot that converts between {{t}}, {{t+}}, {{t-}}, and {{tø}}, then — no, it's nowhere near that smart. (Yet.) Maybe KassadBot (talk • contribs)?

—Ruakh_TALK 00:13, 31 August 2012 (UTC)

Re: trans-adder: I mean the JS that lets users easily add translations (User:Conrad.Irwin/editor.js).

Re: sublects: I'll check what names we give them and get back to you shortly. - -sche (discuss) 00:57, 31 August 2012 (UTC)

Oh, duh, sorry. It looks like that JS has no special understanding of the language code rmy, so it doesn't apply any special nesting rules for Vlax Romani. As for East Slovak and Kalderash, they don't seem to have any language codes at all, so unless I'm missing something, translations into them can't even be added by that JS. —Ruakh_TALK 02:05, 31 August 2012 (UTC)

I just searched the site for instances of each Romani lect's name ("Balkan Romani", etc). It turns out, the only ones* that appear anywhere outside of English entries about themselves and the external-links section of the entry Romani are: "Kalo Romani" (which also shows up as "Kalo Finnish Romani"), "Vlax Romani", "Kalderash" (a subdialect of Vlax I will suggest in the BP or RFM converting to Vlax+{{qualifier|Kalderash}}), and "East Slovak" which is a variant of the Eastern variety of the Northern subdialect of Carpathian Romani (that's splitting some hairs!). "Kalderash" and "East Slovak" only appear in one entry, anyway; "Kalo" is in nine, "Vlax" is in six... actually, it looks like I can just find and correct all these by hand, without calling you away from your great work on the new "T-Bot" to make a list. :) - -sche (discuss) 03:01, 31 August 2012 (UTC)

that is, the only ones I can find... there may be some more codeless, hairsplitting sublects I have no way of knowing about. Hmm, perhaps something TODO is to find all 'languages' in Translations sections (anything betwen * or *: and : {{t or : [[?) that isn't on a list we could make of (1) every language we have templates for, (2) the words "Latin" and "Cyrillic" and "Syriac" and other script-words, and (3) anything else we're aware of and allow (like "Mandarin", "Egyptian Arabic" etc even if codeless). - -sche (discuss) 03:07, 31 August 2012 (UTC)

You are receiving this email because you subscribed to this feed at blogtrottr.com.

If you no longer wish to receive these emails, you can unsubscribe from this feed, or manage all your subscriptions

Wealth Maker

Thursday, August 30, 2012

Wiktionary - Recent changes [en]: Wiktionary talk:Todo