Slow SQL queries?

Posted under General

I've been experiencing problems when selecting images by multiple tags, and now I got this exception:

Error: Error: PGError: ERROR: canceling statement due to statement timeout : SELECT COUNT(*) FROM posts p JOIN posts_tags rpt0 ON rpt0.post_id = p.id AND rpt0.tag_id = (SELECT id FROM tags WHERE name = E'tag1') JOIN posts_tags rpt1 ON rpt1.post_id = p.id AND rpt1.tag_id = (SELECT id FROM tags WHERE name = E'tag2') JOIN posts_tags rpt2 ON rpt2.post_id = p.id AND rpt2.tag_id = (SELECT id FROM tags WHERE name = E'tag3') WHERE true AND p.status <> 'deleted'

To me, this query is inefficient. Why join the same table posts_tags three times, each of which is formed by an additional select? One additional select would be enough, if it is needed at all. Consider this statement (I'm leaving true because it means nothing):

SELECT COUNT(*) FROM posts p JOIN posts_tags rpt0 ON rpt0.post_id = p.id AND rpt0.tag_id = (SELECT id FROM tags WHERE name = E'tag1' or name = E'tag2' or name = E'tag3') WHERE true AND p.status <> 'deleted'

Or this:

SELECT COUNT(*) FROM posts p JOIN posts_tags rpt0 ON rpt0.post_id = p.id AND rpt0.tag_id = (SELECT id FROM tags WHERE name in (E'tag1', E'tag2', E'tag3')) WHERE true AND p.status <> 'deleted'

I understand that often, queries are generated automatically, but sometimes it's just worth it to manually code them. :)

Updated by miton

Yes, I'm sorry, that was a mistake.

How about this one:

SELECT COUNT(*) FROM posts p JOIN posts_tags rpt0 ON rpt0.post_id = p.id AND rpt0.tag_id = (SELECT DISTINCT id FROM ((SELECT id FROM tags WHERE name = E'tag1') INTERSECT (SELECT id FROM tags WHERE name = E'tag2') INTERSECT (SELECT id FROM tags WHERE name = E'tag3'))) WHERE true AND p.status <> 'deleted'

This way, only join is needed (perhaps, joins are what is slow).

Your query isn't equivalent.

SELECT COUNT(*) FROM posts p JOIN posts_tags rpt0 ON rpt0.post_id = p.id AND rpt0.tag_id = (SELECT DISTINCT subquery.id FROM ((SELECT t.id FROM tags t WHERE t.name = E'ragnarok_online') INTERSECT (SELECT t.id FROM tags t WHERE t.name = E'panties') INTERSECT (SELECT t.id FROM tags t WHERE t.name = E'thighhighs')) subquery) WHERE true AND p.status <> 'deleted'; => 0

SELECT COUNT(*) FROM posts p JOIN posts_tags rpt0 ON rpt0.post_id = p.id AND rpt0.tag_id = (SELECT id FROM tags WHERE name = E'ragnarok_online') JOIN posts_tags rpt1 ON rpt1.post_id = p.id AND rpt1.tag_id = (SELECT id FROM tags WHERE name = E'panties') JOIN posts_tags rpt2 ON rpt2.post_id = p.id AND rpt2.tag_id = (SELECT id FROM tags WHERE name = E'thighhighs') WHERE true AND p.status <> 'deleted'; => 158

I have been experimenting with using Postgres's full text search to index a denormalized text field, but the way it normalizes text means I'd have to write a custom parser. It seems to be faster by about 30% though.

Don't have an access to a database from home, so I can't really experiment.. Sorry.. Apparently, my query is buggy. I just thought that it would be possible to improve productivity by reducing the number of joins. I'll think a bit more on it.

A gin index perform well (most queries being milliseconds, at worst ~1 second from what I've seen) even with millions of images and tags having hundreds of thousands of members. However, if your indexes do not reasonably fit in memory performance will suck.

1