Danbooru

[bulk] nuking japanese tag

Posted under Tags

+1

I've not read the previous discussion but the tag seems totally superfluous. It seems is interchangeably to describe either ethnicity or language. The general "anime-style" theme of the site implies most everything already resembles ethnic Japanese and nearly all translation related content is presumed Japanese. No reason to specify what is the vast majority, only exceptions (e.g., german, caucasion, korean).

Ahough it would be nice to have some tag for characters with a stereotypical Japanese ethnic appearance to them.

-1 / +1

I agree with the notion that the japanese tag is useless for ethnicity purposes. Especially because we already have a number of more detailed tags for that (ie. japanese_clothes, east_asian_architecture, ..)

However, I do think that nuking the tag would be overkill because it is still useful for language purposes. Searching for posts containing Japanese text is fairly straightforward now ( text japanese ). Without this tag it would still be possible-ish with just the text tag (considering the proportion of posts containg Japanese vs posts containing other languages) but it would give a lot of false positives which then need to be weeded out with -english -german -french etc...

And in reverse, searching for post NOT containing Japanese text ( text -japanese ) would change into this behemoth ( text ~english ~french ~german ~russian ~latin... )

My opinion would be to change the definition of the japanese tag, so it only should be used to clarify the language that a post contains. It would also make this tag fit along with the other language tags.
Implementing this shouldn't be too much work, because this tag has already kinda been used like this for some time now. The bulk of the posts that would need cleaning up are japanese -comic -text which is about 19 pages.

To further clarify these tags, and prevent wrong usage in the future we could change them to, for example:
japanese -> japanese_text or japanese_writing
german -> german_text or german_writing
...

chodorov said:

Ahough it would be nice to have some tag for characters with a stereotypical Japanese ethnic appearance to them.

Out of curiosity, how would you define a stereotypical Japanese ethnic appearence? Because it can be measured with clothing, hairstyle/color, environment/setting, and even whether it's realisticly drawn or not.

GabrielWB said:

However, I do think that nuking the tag would be overkill because it is still useful for language purposes. Searching for posts containing Japanese text is fairly straightforward now ( text japanese ).

Given the ubiquitous nature of the Japanese language on this website, it would be an extremely lengthy and laborious effort to populate the tag. If you look at translated and translation request, there is a total of around 350K entries. I'd be willing to bet that 99% of those posts involve Japanese text. The amount of benefit gained from this would be minimal just like the female tag as ghostrigger mentioned..

GabrielWB said:
Out of curiosity, how would you define a stereotypical Japanese ethnic appearence? Because it can be measured with clothing, hairstyle/color, environment/setting, and even whether it's realisticly drawn or not.

I don't have a fixed definition, it's more of blur of an ideal. Here are a few criteria, though they are not absolutely necessary:

  • Conservative/modest attire
  • Black hair and eyes
  • Conservative hair styles to the exclusion of curly, short, crazy, other non-straight/long hairstyles
  • Light skin
  • Close conformity to human anatomy

Hosoo (more on Gelbooru) and hieda_yawe are artists which sometimes achieve my ideal here. There are other artists who do too but I go through so many I do not remember them. Other aspects is an implied understanding of traditionalism and one's place in the world, sameness and subordination to the prevailing norms and expectations.

Pretty much a throwback to times gone as well as a desire to see something more grounded in reality than the oversexualization and crazy styled/colored hair. The native style of dress and appearance largely not influenced by the West (The Prussian/prewar influence seems largely compatible however and integrated into my perception of an ethnic Japanese look, i.e. by serafuku/not wearing kimonos, western attire from the turn of the 20th century).

Updated

+1 for emptying out the tag.

(for tl;dr see end of post)

I don't view retaining it to distinguish Japanese from non-Japanese text as doable for a number of reasons.

1 - It'd require a huge amount of effort.

Right now, japanese is almost entirely useless, with all of 593 posts. Rendering it reliably usable would require quite comprehensively checking hundreds of thousands of posts for Japanese text and tagging those applicable.

Let's get a rough estimate of the number of posts containing text on Danbooru. text is useless for this purpose for reasons I'll get into later, but for now, we can approximate our total using the following method instead. translated (216K) and translation_request (128K) total 344K; subtracting probable sources of overlap (check_translation, 13K, and partially_translated, 9.3K) leaves 322K. However, english on an English-language site takes none of these tags (except edge cases of reverse_translation), so add back another 23.1K or so, for 345K* posts known to contain text - either it's English or it's translated/untranslated, and thus translateable. (Even this total is likely low due to text no one ever bothered tagging with translation_request or translated, as tends to occur with small amounts of text that visibly don't affect comprehension, e.g. many instances of character_name.)

Of these 345K posts, english, chinese, korean, german, russian, french, finnish, spanish, italian, latin, and greek together account for ~32.3K. All other languages I checked accounted for <100 (and often <30) posts each, but assuming (very generously) that they together account for ~13K** more brings our total to ~45K posts with tagged non-Japanese text. Thus, 345K - 45K = 300K posts that contain text, but no language tags. Almost all of these are almost certainly in Japanese - but the whole point of this exercise is that almost isn't good enough, and so all 300K posts must be checked to ensure that they are or are not, in fact, in Japanese. (The 593 posts currently tagged japanese are essentially irrelevant compared to these kinds of numbers.)

Of course, this all assumes that each post only contains text of one language, which is obviously nonsense. Which means that the other 45K posts with language tags have to be checked anyway, to ensure that they need not also be tagged with japanese (or another language). This requires now checking 345K posts.

2 - This huge amount of effort would be largely dependent on another, huger amount of effort.

All of the above, by itself, only necessarily makes japanese viable. Even after doing all of that, text -japanese will still be useless, because text is useless for the purpose of checking for the presence of text in a post. That's because, as defined by text's wiki page, that's not it's purpose in the first place - it's for indicating that text is a "major" part of the image. Redefining text to indicate the presence of any text at all would require a forum consensus, and, if deemed necessary, possibly a new tag for what text currently indicates, i.e. text as a major image element.

Which would be, by a long shot, the easy part, because the other part of making text serve this purpose would then be to populate the tag. And since the entire point of doing this is that nothing else accurately accounts for images with text in them, comprehensively and accurately populating the newly redefined text would require checking every single post of the 2.4 million or so in the database.

Also, text -japanese could very well still be useless at this point. Unless, of course, all 2 million+ posts reviewed for text were also reviewed for Japanese text, which is almost certainly present in posts other than the ~345K checked under point 1. Otherwise, actual non-Japanese text results could be mixed with false positives from Japanese posts tagged with text but not japanese - and again, the whole point is that error rates of this sort aren't good enough.

3 - The cost-benefit tradeoff of this effort would almost certainly be abysmal.

Despite what the Herculean efforts of Provence and others in depopulating ~53K (I believe that's the correct starting number) tagme posts may have one believe, the above is an unreasonably large amount of effort. However, depopulating tagme delivered immediate, definite benefits - namely, bringing ~53K tagme posts up to an acceptable (and usually much more than acceptable) tagging standard, benefiting all users searching for the tags added.

For comparison, this project would involve checking at least 345K posts, up to possibly ~2.4 million. This would allow the use of:

A. text and -text, indicating the presence or absence of any text at all in a post. Perhaps the most useful potential result (more so in combination with other tags), but also requires the most comprehensive checking of posts to be reliable.
B. japanese, indicating the presence of Japanese text. Would eliminate false positive rate on translation_request, etc. from non-Japanese text. However, from personal experience, this rate is already trivially low, and what false positives are encountered are easily skipped over. I strongly doubt this would be worth the effort to make useful, even for JP-EN translators who would likely be the most frequent users (in the combination translation_request japanese), much less users looking for translated japanese posts.
C. -japanese, indicating the absence of Japanese text. I see no practical use for this.

*Main sources of possible error in this total are as cases in which translators may have deleted translation_request while forgetting to add translated, or left translation_request on after adding translated. I can't check exact numbers on either (nor can anyone in the former case without some coding work), but I figure they more or less even each other out enough to not effect the gist of this post.

**Actually more, in the case of overlap between the given tagged languages.

tl;dr see 1st line of post

(also I'm ignoring posts containing solely numbers, gibberish, or other non-language text but it's late and that's well on the margin and I don't care)

As previously mentioned, Japanese is considered the default language used in posts and exceptions are tagged (even English), much like females are considered the default and male-focused imaged are tagged. Even if we were to keep japanese, it's logistically nonviable as 279okshap explained.

1