Danbooru

Starting to get search timeouts quite often again

Posted under Bugs & Features

I know back at the beginning of the year it was caused by wiki pages not having a timeout, so it shouldn't be that, and I know that it shouldn't be saved searches causing the issue because of topic #25290.

As with last time, it tends to affect searches of older images more often, and usually only when you get a few hundred pages in. I'm also occasionally getting "ActiveRecord::QueryCanceled" error messages in some of my API-driven content scans, which makes me think there's something going on with the database itself.

One simple reason is that Danbooru is growing. Going back to older posts will become slower and slower. You can speed things up by restricting your searches to specific ID or date ranges. Instead of constantly going to the next page, you could add id:6000001..7000000 to the first search, then id:5000001..6000000 to the next search, etc. This is also very effective when running searches that yield very few results, because the database has to go very far into the past to fill one page of thumbnails. Instead of IDs, you can also search by dates, like date:2008-01-01..2009-01-01.

Another aspect of Danbooru growing is that there are more users active at the same time, putting more stress on the database at the same time. There’s nothing you can do about that, though.

What type of searches are you doing? Things like going hundreds of pages deep into a search are inherently slow. Frankly if you have a bot constantly doing searches like this it doesn't help matters. There are lots of people who run bots that constantly do intensive searches and it puts a nontrivial load on the database.

Instead of traversing pages like this:

You can do ID-based pagination like this:

This is much faster when going hundreds of pages deep, at the cost of not being able to see page numbers.

evazion said:

You can do ID-based pagination like this:

Is this actually documented anywhere or is there a way to switch to ID-based pagination without having to adjust the URL?

kittey said:

One simple reason is that Danbooru is growing. Going back to older posts will become slower and slower. You can speed things up by restricting your searches to specific ID or date ranges. Instead of constantly going to the next page, you could add id:6000001..7000000 to the first search, then id:5000001..6000000 to the next search, etc. This is also very effective when running searches that yield very few results, because the database has to go very far into the past to fill one page of thumbnails. Instead of IDs, you can also search by dates, like date:2008-01-01..2009-01-01.

Another aspect of Danbooru growing is that there are more users active at the same time, putting more stress on the database at the same time. There’s nothing you can do about that, though.

I'm already doing searches that have specific date ranges. Some of the ones that are timing out are for only posts from this calendar year or 2022-01-01..2023-01-01.

evazion said:

What type of searches are you doing? Things like going hundreds of pages deep into a search are inherently slow.

As I said in my response to kittey, I'm doing date-bounded searches on some broad sets of tags. Most of the searches I've got going only have between 400-700 pages at most.

The timeout issues only started up again in the past week or two, from what I've noticed. I first noticed the issue about the time people complained about the saved searches, which is why I linked to the topic.

Frankly if you have a bot constantly doing searches like this it doesn't help matters. There are lots of people who run bots that constantly do intensive searches and it puts a nontrivial load on the database.

I've got an automated content scan script that I run once a week on all my searches, which is part of why I found the timeout issue, but the API tends to time out less often than the actual webpage does sometimes, so it will find a page containing posts I haven't seen yet and automatically load the page, and the webpage will encounter a search time out multiple times in a row.

The content scan script also has a built-in concurrency limit of 6 searches at the same time.

You can do ID-based pagination like this:

This is much faster when going hundreds of pages deep, at the cost of not being able to see page numbers.

I was unaware ID-based pagination was possible. I'm pretty sure I could get my API-driven content scan to use that instead with a little effort.

Updated

@evazion I know it would probably be controversial, but have you considered implementing API read rate limits? They're currently the only thing that's unlimited, and it would improve the experience for those of us who only occasionally or never use the API by freeing up database time from those who are just spamming it.

Updated

Getting a nasty search timeout that won't go away just from page 1 of pool search. Searching on browser, not an API program.

Specific query:
Searching for "touhou" post tags (nothing in the name/description box, FWIW), type is "series", sorting by post count.

Please advise.
~ Kurzov

Kurzov said:

Getting a nasty search timeout that won't go away just from page 1 of pool search. Searching on browser, not an API program.

Specific query:
Searching for "touhou" post tags (nothing in the name/description box, FWIW), type is "series", sorting by post count.

I just tried your search at it took about 13 seconds, so it’s basically unusable for all users, not just you. For reference, your search timeout is 3 seconds.

I think you’ll have to make do without the pool tag search. The problem is that the database has to check all pools if they contain any image with the tag you’re searching for. That’s simply incredibly inefficient any not related to any high load on the site.

Series pools usually have the copyright in the title, so if you’re looking for Touhou-related series pools, just search the title for "touhou", which only takes a second to load.

kittey said:

I just tried your search at it took about 13 seconds, so it’s basically unusable for all users, not just you. For reference, your search timeout is 3 seconds.

I think you’ll have to make do without the pool tag search. The problem is that the database has to check all pools if they contain any image with the tag you’re searching for. That’s simply incredibly inefficient any not related to any high load on the site.

Series pools usually have the copyright in the title, so if you’re looking for Touhou-related series pools, just search the title for "touhou", which only takes a second to load.

I see, thanks

1