Random 2-tag maintenance searches

Posted under General

ETA: Edited title to be less horribly long now that this is becoming like... a real thread instead of me just throwing things into the wall. Original title was "Random 2-tag maintenance searches to clear out for funsies if you're bored I guess"

Was going to title this along the lines of "for fun and profit" but then I remembered I can make zero promises as to how much this would help anything in regards to promotions or enjoyment.

Follow-up to forum #394713, where I invent two-tag searches for blue-level users to help clean up.

Tag cleanup

Tag cleanup, commentary edition

Commentary gauntlet (AKA ylimegirl's regex nonsense)

The rest of these are all languages that have characters often used as kaomoji (relative to their frequency as full text on danbooru), so lots of false positives:


Okay i think that's enough for now. Feel free to append with other simpler queries that could potentially use some cleanup.

Updated by Ylimegirl

funnily enough, it seems like we independently came to a similar idea. a month or two ago, i noticed that shinjidude's tasks to do (topic #5076) had been removed from help:home and made help:tasks in response to that using some searches ported over from the thread, and a few of my own.

i wanted to eventually add it to help:home and create a coordination thread on the forum for the article, but never got around to it. i still think it's better to keep this stuff organized on a wiki, but i can't blame anyone for not knowing about my project that i never promoted.

Here's one: duplicate (-is:child or is:parent) (two of these are metatags so this counts as a one-tag search)

  • Theoretically a duplicate should always be a child of another, earlier uploaded post that the duplicate is identical or worse quality of. Some misuses of the tag are when it's retroactively added to a post that was uploaded first, or added to later uploads that are superior quality.
  • And a duplicate should (almost) never be a parent of another post, such as a variant set that needs to be parented to the correct non-duplicate post. The only situation I could imagine where a duplicate could be a parent is if someone made a third-party edit where you can determine they specifically used the duplicate to edit from? So I'm not 100% certain about saying never but it's highly unlikely a duplicate would need to be a parent.

Aight here's a new one, another wretched regex abomination.
https://danbooru.donmai.us/artist_commentaries?search[original_title_regex]=^[%20-%40\[-`{-~𝄀-π‡ͺπŸŒ€-πŸ«Άγ‚ -γƒΏγ€€-γ€Ώ0-9]%2B%24&search[post_tags_match]=-commentary / https://danbooru.donmai.us/artist_commentaries?search[original_description_regex]=^[%20-%40\[-`{-~𝄀-π‡ͺπŸŒ€-πŸ«Άγ‚ -γƒΏγ€€-γ€Ώ0-9]%2B%24&search[post_tags_match]=-commentary

This expression looks for commentary titles & descriptions with only katakana & symbols, which depending on your pattern recognition/Japanese knowledge can be easy enough to replace with character/copyright names. I'd recommend appending the tag for your fandom of specialty, i.e. https://danbooru.donmai.us/artist_commentaries?search[original_title_regex]=^[%20-%40\[-`{-~𝄀-π‡ͺπŸŒ€-πŸ«Άγ‚ -γƒΏγ€€-γ€Ώ0-9]%2B%24&search[post_tags_match]=-commentary+fate_(series)

Here's a version that also accounts for hiragana, but that definitely will involve more cases of potentially more braincells required in translations.
https://danbooru.donmai.us/artist_commentaries?search[original_title_regex]=^[%20-%40\[-`{-~𝄀-π‡ͺπŸŒ€-🫢ぁ-γ‚Ÿγ‚ -γƒΏγ€€-γ€Ώ0-9]%2B%24&search[post_tags_match]=-commentary / https://danbooru.donmai.us/artist_commentaries?search[original_description_regex]=^[%20-%40\[-`{-~𝄀-π‡ͺπŸŒ€-🫢ぁ-γ‚Ÿγ‚ -γƒΏγ€€-γ€Ώ0-9]%2B%24&search[post_tags_match]=-commentary

Can also negate partial commentary in the search if you can only translate the title and want to avoid false positives. (or just toggle the "Translated?" option i guess that does something roughly equivalent)

double eta: added and then removed A-Za-z from the regex for resulting in too many false positives

Updated by Ylimegirl

Also just going to say that searching for specific charnames in artist commentaries and looking for untranslated commentaries can go a pretty long way. Lot of commentaries on Pixiv / Twitter / etc. are just the character's name (maybe with an honorific or brief description), and it's often left untranslated and forgotten about. I'd recommend also just searching for the character's family name / given name by itself as well. Currently absolutely killing it in the specific niche of "translating ηΉ”η”°δΏ‘ι•· to Oda Nobunaga and also ruining my sleep schedule" challenge.

Ylimegirl said:

Also just going to say that searching for specific charnames in artist commentaries and looking for untranslated commentaries can go a pretty long way. Lot of commentaries on Pixiv / Twitter / etc. are just the character's name (maybe with an honorific or brief description), and it's often left untranslated and forgotten about. I'd recommend also just searching for the character's family name / given name by itself as well. Currently absolutely killing it in the specific niche of "translating ηΉ”η”°δΏ‘ι•· to Oda Nobunaga and also ruining my sleep schedule" challenge.

I don't want to take opportunities away from blue users, but I feel like something like this could probably be handled by a bot. That bot could reference a dictionary of strings, look for exact matches (that is, the entire commentary title or description contains no more and no less than what's in the string), and replace them. If the list is big enough, it could translate a very large number of posts in a single run. The bot could translate not just names, but other common words such as η„‘ι‘Œ (Untitled).

Blank_User said:

I don't want to take opportunities away from blue users, but I feel like something like this could probably be handled by a bot. That bot could reference a dictionary of strings, look for exact matches (that is, the entire commentary title or description contains no more and no less than what's in the string), and replace them. If the list is big enough, it could translate a very large number of posts in a single run. The bot could translate not just names, but other common words such as η„‘ι‘Œ (Untitled).

Even then it wouldn't necessarily always catch cases like "the artist put the word (swimsuit) in parentheses after the character name" or "honorifics and emojis after character name" in titles and what have you, so I think there's still edge cases where we probably shouldn't just let the bot go wild. But yes, it could probably be somewhat automated. Unfortunately I have a mental illness that just makes me do repetitive tasks over and over manually instead of automating it because it makes my ADHD brain go brrrrrrrrr, so just throwing it out there if anyone else has the same sort of compulsion for other fandoms I am not as knowledgeable about.

Either way as of current time writing this reply this hypothetical bot doesn't exist, so blue users and insane purple users go wild, I guess. i gotta get up at like 9am tomorrow. i need to go to bed. i gotta. i dont even know if i can blame it on jetlag anymore.

Blank_User said:

I don't want to take opportunities away from blue users, but I feel like something like this could probably be handled by a bot. That bot could reference a dictionary of strings, look for exact matches (that is, the entire commentary title or description contains no more and no less than what's in the string), and replace them. If the list is big enough, it could translate a very large number of posts in a single run. The bot could translate not just names, but other common words such as η„‘ι‘Œ (Untitled).

The problem with this is it is very common for a character's name to be the same as a real word (is 摜 a character called Sakura or talking about cherry blossoms or the cherry tree itself), or of a different name that is written the same but pronounced differently (is 千倏 Chinatsu as in aikawa chinatsu or Chika as in homura chika?)

Ylimegirl said:

Even then it wouldn't necessarily always catch cases like "the artist put the word (swimsuit) in parentheses after the character name" or "honorifics and emojis after character name" in titles and what have you, so I think there's still edge cases where we probably shouldn't just let the bot go wild.

That's why I said the bot should only look for exact matches. If the string has just the name but the commentary includes "swimsuit," it's not an exact match. To put it in terms of Python, the bot should be using "if A == B:", not "if A in B:".

WRS said:

hdk's bot already handles this.

Come to think of it, I do remember hearing about a bot that handles "untitled" commentaries. But does it also handle names like I'm proposing?

skylightcrystal said:

The problem with this is it is very common for a character's name to be the same as a real word (is 摜 a character called Sakura or talking about cherry blossoms or the cherry tree itself), or of a different name that is written the same but pronounced differently (is 千倏 Chinatsu as in aikawa chinatsu or Chika as in homura chika?)

I agree the bot shouldn't search for single names. The two would be handled as separate cases. For Aikawa Chinatsu and Homura Chika, the bot could be set to replace "相川千倏" or "穂村千倏," respectively, but not "千倏." Obviously, highly ambiguous strings like "摜" should not be used.

The only problem would be if there were another character whose full name uses the same kanji but a different romanization, but I don't think that's very likely. For extra security, the bot could be set so it also checks the copytags before translating.

I realize it would be better to discuss this in a separate thread, but I wanted to clear up any misconceptions about how my proposed bot would work.

Here are some other chores that I have bookmarked:

bad_twitter_id bad_pixiv_id status:any / bad_twitter_id pixiv:any status:any / bad_twitter_id -source:*twitter* -source:*x.com* -source:*twimg.com* -source:*t.co* status:any / bad_pixiv_id -pixiv:any -source:*pixiv.net* -source:*pximg.net* -source:*pixiv.org* status:any: Quite often people copy/paste all tags when 1-upping, but forget to remove bad ID tags, so a Pixiv post may have a bad twitter id tag which is incorrect. The first two searches fit the 2 tag search limit.

rating:g ~ai:cleavage ~cleavage limit:100: Some uploaders aren't conservative enough when they rate posts as G, so this is one quick way to clean up such posts. You can come up with more combinations than ~ai:cleavage ~cleavage. I added limit:100 because the search times out with the AI tag search.
Two tag search: rating:g ai:cleavage limit:100 / rating:g cleavage

gentags:<5 is:sfw -blank_page -non-web_source -self-upload -no_humans off-topic: Posts that lack tagme but likely need tagging. You can adjust this search to your preference. I added is:sfw because I don't have enough experience with NSFW posts to tag them properly.
Two tag search: gentags:<5 -no_humans

ribbon_hair -ribbon_girl_(arms) -min_min_(arms) status:any: People confuse this tag with hair ribbon. Ribbon hair is for hair made out of ribbon, like post #7710581.
Two tag search: ribbon_hair status:any

solo multiple_views / solo ai:multiple_views limit:10: Mutually exclusive tags and people often tag solo for pictures of one character without knowing it's not supposed to be used for multiple instances of one character. I had to add limit:10 to the AI search because it keeps timing out. Those searches fit the 2 tag search limit.

rating:e 1boy 1girl -futanari -hetero status:any: Posts that likely lack the hetero tag. Sadly there's no way to make this fit the 2 tag search limit.

Updated by HyphenSam

Here's some two tag searches that I have done bulk edits of in the past. Obviously you will need to keep an eye out for exceptions for all of these (some of them this is rare and others it is most images you will come across). A lot of them have more exceptions than you'd think. Especially be careful with the ones after the gap in the list as too many people just blind tag this stuff or add it to anything that looks about right.

facing_viewer -closed_eyes (both removing mistags of facing viewer and adding missing closed eyes)
monochrome -greyscale (adding 'greyscale' to posts missing it. Watch out for things that look greyscale but aren't, though)
tank_top -bare_shoulders, sleeveless -bare_shoulders, halterneck -bare_shoulders, swimsuit -bare_shoulders etc. (adding bare shoulders to posts missing it. You can do similar with bare arms)
bedroom -indoors, beach -outdoors etc. (adding indoors and outdoors to posts tagged with indoors or outdoor settings)
beach -ocean (adding ocean where it is present but isn't tagged)
ocean -water and ocean -horizon (again, most ocean images also want these tags but often don't have it tagged)
-blush -light_blush (adding blush to... well, most of the site to be honest).
gag -gagged (adding gagged to images where it is missed)
shibari -bondage (this is only not implied due to a handful of weird exceptions. Almost everything should be in there but a lot isn't)
If you want to continue down the bondage train, it's easy to find a lot of things that are missing bound_x style tags just by doing searches like bondage -breast_bondage
twintails two_side_up and side_ponytail one_side_up (tags shouldn't refer to the same character - often people will add the correct one when its missing but not remove the wrong one, or just tag both from the start)
sunlight -day (watch out for dusk etc. pictures though)
night sunlight (except in comics or really weird situations these should be exclusive. Doesn't stop people tagging both on the same thing though - often on pictures that should be twilight rather than night)
feet_out_of_frame cowboy_shot, feet_out_of_frame full_body etc. (should not refer to the same character but often does)

character_tag -tag_for_a_feature_of_the_character. Loads of combinations here. Hair length tags are a common one for me personally that are missed way too often. Works best for characters with a consistent appearance or where a specific outfit has its own tag.
specific_uniform_tag -tag_for_a_feature_of_the_uniform. As per the character one. Probably not as good nowadays as uniform tags aren't as common as they used to be.

skylightcrystal said:
beach -ocean (adding ocean where it is present but isn't tagged)
ocean -water and ocean -horizon (again, most ocean images also want these tags but often don't have it tagged)

Be careful when tagging ocean though; people consistently overtag that. If the ocean is present, but not important to the image like in post #10362115 or post #10402217, then beach should be sufficient. ocean is a mess of a tag, really.

Updated by Placeholder1996

This isn't a 2-tag maintenance search, this is a wiki maintenance search, but I'm sure I'm not the only one who'd get use out of this.

These two, &search[tag][post_count]=0 and &search[tag][is_deprecated]=yes, have to be added manually to the URL since you can't access it from the regular wiki search, but the former gives you wikis that have no posts associated with them and the latter gives you deprecated wikis.

https://danbooru.donmai.us/wiki_pages?commit=Search&page=1&search[is_deleted]=no&search[tag][category]=3&search[tag][post_count]=0&search[tag][is_deprecated]=no - lets you search for wikis with no posts without running into deprecated wikis, great for finding premade wikis for things yet to be represented on Danbooru and active orphaned wikis that no one checked before moving a tag (if a new wiki was made then you'd have to turn the orphan into a deleted redirect); you have to manually change the category in the URL though (it's currently on copytag wikis).

https://danbooru.donmai.us/wiki_pages?commit=Search&search[is_deleted]=true&search[tag][is_deprecated]=yes - lets you search for deprecated wikis which are deleted, deprecated wikis should not be deleted unless the tag in question is genuinely that bad.

1 2