Danbooru

Danbooru 2

Posted under General

I'm with glasnost on his proposed notes/translation/pools changes.

I'd also love it if the pools list had forum index-like features, namely:

  • modified pools since last viewed in bold
  • a direct link to the last page of posts for large pools

Kitsu~ said:
A feature that I would like to see implemented would be able to "group" attachments of tags, or perhaps have the ability to nest tag attachments within other tag attachments. This is not important, the desired effect is.

Log said:
We have discussed this before and it just can't be done because it would require retagging every single image on danbooru.

The "need" to retag existing images should not be an obstacle against enhancing the tagging system. Rather than completely overhaul everything, it is technically feasible to retain existing tags and add the ability to "attach" tags together (pairs, triples) to support more complex semantics, while not breaking the existing tag search, which becomes a fall-back option if the new 'enhanced' search doesn't return enough results.

It's not like all posts are currently "fully-tagged" to begin with, and many tags aren't as well-populated as they should be either. And we certainly don't go through all existing posts to try to apply newly-created tags. Stuff falls between the cracks, it can't be helped.

For something like this, you have got to start somewhere, or else the legacy content argument is simply going to gain greater and greater inertia and we won't get anywhere. Rewriting the site is as good a time as any to consider implementing such changes.

Now, that was for the sake of argument. While I do think richer semantics would be awesome, I also have reservations about ease of use and other design challenges, which ultimately affects how widely changes will be adopted by users.

I vote for the change in the tagging system, it'll be a pain to go back through old posts, but if one were to have a way of searching posts that did not have any of these nested tags, or attached tags, at least over time things could be fixed. There's a large userbase on danbooru, and I'm sure most of them don't know how to tag anything, but I think enough of them should, and overtime things will get updated.

My other suggestion is related to the install of danbooru and upgrading. It would be nice if it were easier to do so. I'm not sure on the state of things at the moment, but when I used to run danbooru's code (I use moe.imouto's fork now), things were a bit of a pain. The install scripts helped considerably though, but often were broken due to changes to danbooru that weren't reflected in the scripts. I'm not asking for Wordpress like installation, but just something documented or easier for non linux power users.

I'd like a feature that detects if someone is a humorless sperging git and bans them if so.

I'm nth-ing the nested/grouped tags suggestion as it's simply a good idea--there's nothing about it that's incompatible with the way things are done now, and saying that it would require retagging every image is stupid. Nobody ever suggested that all the images posted prior to tag OCD should be deleted just because nobody would ever get around to going back and adding tags indicating that there's a blob of cum on the brick wall the girl wearing pinku nekomimi is up against between (47,50) and (99,100).

Updated

Here's my wishlist. I know that some of these may not be feasible or you may not want to do them, but I'll just throw these all out there anyway and see what sticks. Sorry for the wall of text.

Security

XSS protection. Use SafeERB. Mark sensitive cookies as httpOnly. Require users to reenter their password before sensitive actions, such as resetting their email address, permanently deleting things, inviting users, etc.

CSRF protection. Currently Danbooru is completely vulnerable to CSRF attacks. Fortunately Rails has builtin protection against this. It basically just needs to be turned on. However, it only works for HTML requests. I think the JSON/XML API will have to be reworked to completely protect against CSRF.

Mass assignment protection. Using attr_accessible instead of attr_protected is much more secure. Also, avoid this idiom: Post.new(params[:post].merge(:updater_user_id => @current_user.id). IIRC merging params in this way is insecure (updater_used_id can still be overwritten by the user).

Better authentication / session handling. Don't send the user's password hash in plaintext along with every request for god's sake. Use real session tokens.

Host uploaded files on a separate subdomain. This has security as well as performance (1, 2) benefits.

Miscellaneous

Switch to Imagemagick or something for resizing images. The current resizer has a long standing bug with generating corrupt thumbnails/samples sometimes. It doesn't handle transparent GIFs/PNGs well either. Also, turn on progressive JPEG encoding. It's a loseless option and last time I tested it generated on average ~10% smaller files.

Integrate piespy's image similarity search with the server. This way it would be easier to do automatic dupe checking on upload.

Make the tag edit history searchable by tags (Trac #429). Fixing this should also make it possible to order searches by when a tag was added (forum #33750).

Add some kind of versioning to the JSON/XML API. Currently scripts have no way to tell if the API has been changed.

Add tag autocompletion to the search box and the edit box.

DText kind of sucks. Specifically, it doesn't have an escape mechanism and it doesn't handle URLs well. I don't see why you don't just use something standard like Markdown.

Put the help files on the wiki so that it's easier to keep them up to date.

Use a simple custom formatting language for formatting translation notes instead of HTML. The problem with using HTML is that translators format things like sound effects inconsistently and they use all sorts of outdated HTML to do it. There are also security problems with allowing HTML in notes. The current filter doesn't filter out everything it should.

Replace the parent/child concept with anonymous pools. It's redundant to have to two separate mechanisms (pools and parents) for grouping posts.

Make pools taggable.

I was tempted to mention a more in-depth semantic model based on RDF triples, but figured I'd be at least somewhat realistic.

In terms of things a refactor might be able to fix...how about the "Danbooru Bedtime" that happens every day beginning about twenty minutes ago?

Wishlist

I'm not going to hold my breath on these things, but they would be really nice to have.

An image recommendation system, a la Netflix. The site would analyze your votes and favorites and generate recommendations for you. This could be a list of new recommendations on your user page, or it could be a list of "users who liked this post also liked these other posts" on individual post pages.

A method to order search results by relevance. Let's say you search for short_hair. You really want to find posts focusing on short-haired girls, but instead half the results are just posts with short-haired girls somewhere in the background. Ideally posts that are most short-hair-oriented would appear in the results first, and ones where there's just incidentally a short-haired girl in the background would appear last. Overall this should help address the problem of mostly-irrelevant tags being added to posts and polluting the search results.

Obviously, determining the relevance a given tag has to a post isn't easy. I have a few ideas as to how this might be accomplished, but I'm not sure how workable they are.

What if we had a way to flag certain tags as "prominent" for any given post, kind of like ANN allows you to flag some roles as "primary" or "secondary"?

Again, current posts will have all of their tags at the lowest level of prominence by default, but this is something that can be fixed over time.

Some things I'd like to see:

  • Being able to use meta-tags for blacklists (see forum #7409). For example, I could add the line "pool:730" to my blacklist, and all images from the "Rule #34" pool would be filtered-out from the paginator.
  • Moving the "Preview" feature so that it's implemented client-side. See trac #1117.
  • More metadata to describe pools (like evazion said). Currently, the only way to describe a pool is via its name and its description - and that can get a little clunky sometimes. See forum #34087.
  • Implementing logins/sessions in a more secure manner: hashes + salts are not secure, and are vulnerable to dictionary attacks. Sending password hashes in plaintext (as evazion pointed-out, again) is also something that makes me feel uneasy.
  • A short-hand way to link to user profiles using DText formatting. See forum #33926.

Well, that's my short wishlist; I'll see if I can come-up with some more ideas later.

Updated

I'm not too sure about this one, since it's not that big of a deal at all, but it's probably worth tossing the idea around anyway.

But would it be feasible to rig the forum Preview feature so that when you preview your reply, it's formatted exactly like published replies on the forum? What I mean by this is that when you preview your post, the text has a lot more horizontal space than when it's actually published to the forum (forum posts here have a "vertical" feel to them). This has caused some discord for me in the past.

All too many times, I've written something up on the forum, previewed it to make sure its readable and formatted in a legible way, and submitted it - only to see that my post looks like a huge wall of text. I usually then edit my post, indent/refactor it a little, then resubmit.

Again, it's not a big deal, but it might possibly be something that'd be worth addressing.

evazion said:

Wishlist

I'm not going to hold my breath on these things, but they would be really nice to have.

An image recommendation system, a la Netflix. The site would analyze your votes and favorites and generate recommendations for you. This could be a list of new recommendations on your user page, or it could be a list of "users who liked this post also liked these other posts" on individual post pages.

A method to order search results by relevance. Let's say you search for short_hair. You really want to find posts focusing on short-haired girls, but instead half the results are just posts with short-haired girls somewhere in the background. Ideally posts that are most short-hair-oriented would appear in the results first, and ones where there's just incidentally a short-haired girl in the background would appear last. Overall this should help address the problem of mostly-irrelevant tags being added to posts and polluting the search results.

Obviously, determining the relevance a given tag has to a post isn't easy. I have a few ideas as to how this might be accomplished, but I'm not sure how workable they are.

I think the nested/grouped tag suggestion I posted in this thread could solve this issue. An additional tag could be added to a group "background_character" which would allow a search such as "[ short_hair female -background_character ]" to return the results you are looking for.

Since each group could describe each individual character in the picture, much more detailed tags could be provided. "[ background_character female short_hair blonde_hair red_eyes blushing glasses usagimimi ]".

It would be possible to provide many tags on a background character that, with the current tagging system would be far too verbose and would cause searches to be less useful.

About the tagging enhancement, that r0d3n7z and
drvink mentioned:

Instead of simple grouping tags I would love to see if the tags could be connected with a real semantic value.

My idea revolves around the use of simple sentences, that are used to describe the image and could be seen as an additional layer on top of the regular tags. The input for searching and tagging would be either be regular tags or sentences like "izumi_konata hugs hiiragi_kagami.". The sentence would be detected by the presence of the full stop and the number of tags or "words" before the full stop. If it doesn't match the input is handled as a set of tags. If a sentence is entered the application would disjoint this sentences into it's components to extract the tags for tagging and store the tags and the semantic context of the tag for each image.

Please keep in mind, that I only talk about sentences with a simple "object verb subject." structure.

I can write down the concept in detail if there is some interest.

Updated

Speaking as a non-power user I'd like to see better ways to drill down to the different views from one single main page. What I mean by that is when I go to danbooru, I always begin from the front page and it requires several clicks in order to go somewhere, unless it is one of the very first options or a tag search. Actually, scratch that--except a tag search, it will definitely take many clicks, since it's pretty rare (ie., never) I would want a "general" view of, say, all the wiki articles starting from "^ ^", or just browse everything on danbooru from what's latest.

I suppose once Albert determines what views are being normalized, couldn't we update the front page with "most used" views? I guess popular post is one of the view I use that isn't a "tag" of sorts. Another one is your personal info links that takes you to your uploads and your favorites and others. Even better if someone has stats on what "views" or tags are very commonly used, the site could arrange itself to display those links more prominently.

Google gets away with this because their search bar searches everything, so I could just type what I want to see in there. Danbooru's doesn't, so maybe rethinking the front page UI is worthwhile.

I'd like some of the things I've had to customize to be exposed in the admin interface, i.e. custom tag types and the ability to add/remove ads without having to edit a bunch of page templates although I don't see any easy way to do it.

Oh and updated documentation.

And a quick and easy way for an admin to do a site announcement (like I'M GONNA BREAK THE SERVER NOW). It is possible some of this probably exists and I am just unaware of it.

Being able to treat fav, pool, and user like any other tag (intersect them, add to my blacklist, etc) was high up on my wishlist. I'm glad to hear that's going to happen.

Lots of good ideas in this thread. I support whoever wished for something more standard than DText. Tag autocomplete is another one I'd like to see.

As for my own wishlist, PLEASE consider implementing trac #553: comment searching by tags. Comments as they are now are kind of fire-and-forget. You make one, and you're probably never going to see it again unless you explicitly search for it. If someone responds to it, you'll likely only see it if you happen to check the comments within an hour or two of it being posted. If we could filter them by post tags, that would allow us to monitor posts we're interested in a hell of a lot easier.

Coconut said:
Being able to treat fav, pool, and user like any other tag (intersect them, add to my blacklist, etc) was high up on my wishlist.

So when Danbooru 2 comes around we would have the ability to block users from seeing the blocker's uploads, prevent the blockee from commenting on any of the blocker's post, etc? Not sure why this feature wasn't present on Danbooru before.

glasnost said:

albert said:

  • Using sampling to calculate related tags. That is, loading N random posts for a tag and calculating related tags based on that sample. The goal is to eliminate the posts_tags table. Related tag by type would probably be harder. But the join table is at 7 million records now and has all sorts of indexes and triggers associated with it. It'd be nice to eliminate this table entirely.

Go for it. I, for one, wouldn't be too terribly sad if the feature itself wasn't there; it took me a while to think of where it was actually used, and I don't think I've ever used it.

Just for the record, I use it regularly when tagging pics of characters I'm not overly familiar with, to see if I missed any obvious ones. Also I use "related characters" on the series tag when tagging a pic with many characters, so that I don't have to type out each of them.

But it sounds like the proposed change wouldn't affect my usage of it.

Though maybe it'd be better to sample the latest N posts, instead of N random posts, to better reflect recent tagging. An example might be when a character changes hairstyle halfway through the series, then using the last N posts would make the related tags reflect this more quickly.

albert said:
But now is a good time to ask for some blue sky features, things that would require a major migration that could mean hours of downtime. I'll start by listing some stuff that has been on my wishlist for awhile.

Huh. Well, I'll leave it up to you to decide which of my pages of trac requests meet that standard. Off the top of my head, though:

1 2 3 4 5 6 13