Danbooru

Mass uploading revisions

Posted under General

Thought it might be best to actually ask before doing this.

Just done a check on the database I'm using for PVDB shows there is "at least" 2.3k~ revisions that haven't been uploaded.
This is using a newly downloaded database that currently only has 3/31M of pixiv images, so the amount of revisions will most likely increase over time.

Considering it's a fair amount of images, I was just wondering if these should actually be uploaded.
Would it be worth uploading these? or just leaving them since they are only revisions (which seem to rarely be uploaded as it is)?

Updated by MyrMindservant

It's subjective whether a revision is noteworthy enough or the original image interesting enough to bother uploading, so it's going to be a case-by-case basis. Generally, I'm in favor of revisions being uploaded, though.

Too many of the revisions are visually indistinguishable from the originals. Uploading revisions en masse would effectively flood Danbooru with duplicates.

The Pixiv API doesn't return filesize or dimensions, and parsing raw HTML will only net you the latter. That might suffice to find a few significant revisions through automation...

RaisingK said:
Too many of the revisions are visually indistinguishable from the originals. Uploading revisions en masse would effectively flood Danbooru with duplicates.

I'd think you must be only eyeballing the images to claim that, because I've always found the majority of revisions I've gone through to be distinguishable when comparing the two images (which is my standard practice when finding more than one version of an image on here or seeing something labelled duplicate). Too frequently I've found users mislabeling the images duplicate if anything.

Although, I'll admit what one considers "distinguishable" is a subjective concept.

Frankly I'm in favor of uploading revisions, especially if the author has removed the original. Not allowing the revision because an earlier version is already on here is no different in my mind then rejecting the complete version of an image simply because someone uploaded the in progress version of it earlier.

Updated

NWF_Renim said:
Although, I'll admit what one considers "distinguishable" is a subjective concept.

You've just beaten me to it when I was typing a response and habitually checked for new replies/edits while doing that. Let's take a recent post #1288608 and it's revision parent post #1288627. I am next to sure, that at least 85% of revisions would be of this level or some minor background change.
DakuTree, if you've got too much time on your hands (which you clearly do) then try tag gardening, the site would benefit from this much more than from revision uploads.

Yes, you may take that as an "against" voice.

ed:

NWF_Renim said:
Not allowing the revision because an earlier version is already on here is no different in my mind then rejecting the complete version of an image simply because someone uploaded the in progress version of it earlier.

However, the case of a "progress version" -> "final version" is VERY easily distinguishable and is nowhere close to be even comparable to the "final version" -> "some minor polishing". I am just unable to think in such categories when both concepts vary that much.

Updated

RaisingK said:
Too many of the revisions are visually indistinguishable from the originals. Uploading revisions en masse would effectively flood Danbooru with duplicates.

Well I was planning to go through everything manually, since as you said, there will be duplicates. This was why I was asking if it would be worth skipping minor revisions, most of the time this is not noticeable (strangely enough, this is the type of revision I see getting uploaded the most).

Wypatroszony said:
Let's take a recent post #1288608 and it's revision parent post #1288627. I am next to sure, that at least 85% of revisions would be of this level or some minor background change.

At least from what I've checked through so far, there seems to be more revisions with major changes rather than minor. Even then, there is quite a fair amount of minor revisions I would say would be worth uploading.

DakuTree said:
At least from what I've checked through so far, there seems to be more revisions with major changes rather than minor. Even then, there is quite a fair amount of minor revisions I would say would be worth uploading.

Since this seems to be pretty subjective, could you post a couple of examples of major and minor revisions worth uploading?

NWF_Renim said:
Frankly I'm in favor of uploading revisions, especially if the author has removed the original.

Just thought it's worth mentioning that the original version is still accessible in pixiv even if the author replaces it with a revised one. For example:

Revision: http://i2.pixiv.net/img02/img/yuumikouki/30693602.jpg?1352110052
Original: http://i2.pixiv.net/img02/img/yuumikouki/30693602.jpg

(My thanks to RaisingK for bringing this to my attention)

Fred1515 said:
Since this seems to be pretty subjective, could you post a couple of examples of major and minor revisions worth uploading?

Major:
post #783951 / revision
post #783757 / revision
post #749926 / revision

Minor:
post #785869 / revision
post #785601 / revision
post #747792 / revision

Just thought it's worth mentioning that the original version is still accessible in pixiv even if the author replaces it with a revised one.

This doesn't always work.
See post #747636 / pixiv #13236076

Updated

In general, I'm not against revisions unless they are of lower quality or imperceptible from the originals. As a suggestion of how to triage things, I wonder if you were ever able to play around with IQDB's codebase, imgSeek, or ImageMagick at all? Something like that that is able to do at least a gross similarity comparison could probably serve to highlight the revisions most worthy of addition. Of course it also goes without saying that any revisions need to be linked to their originals as well. I'd say we ought to have at least one set of eyeballs review anything before it is posted though.

Updated

Shinjidude said:
In general, I'm not against revisions unless they are of lower quality or imperceptible from the originals. As a suggestion of how to triage things, I wonder if you were ever able to play around with IQDB's codebase, or ImageMagick at all? Something like that that is able to do at least a gross similarity comparison could probably serve to triage and highlight the revisions most worthy of addition. Of course it also goes without saying that any revisions need to be linked to their originals as well.

Haven't had any luck in actually getting IQDB to run. Linux isn't exactly my forte, so fixing it is out of the question..

As for imagemagick, I never actually knew it had such a feature.
Looks like it could be handy, although most differences can be found simply by opening the image in two tabs and flicking between the two.

DakuTree said:
As for imagemagick, I never actually knew it had such a feature.
Looks like it could be handy, although most differences can be found simply by opening the image in two tabs and flicking between the two.

True, but running a simple subtraction between two images can be even faster to find the differences, especially when they are subtle, or especially if you are running it in batch automatically to prioritize images with large apparent differences.

From what little I've played with it, ImageMagick isn't super at detecting similarity except with near matches, which fortuitously is what most of these revisions are. http://www.imagemagick.org/Usage/compare/ has a bunch of potentially useful examples.

IQDB / imgSeek seems to do a much better job at judging similarity (I believe they take a wavelet approach rather than simply matching histograms or making direct pixel to pixel comparisons). Like you though, I've had problems building it to play with.

Fred1515 said:
Just thought it's worth mentioning that the original version is still accessible in pixiv even if the author replaces it with a revised one. For example:

Revision: http://i2.pixiv.net/img02/img/yuumikouki/30693602.jpg?1352110052
Original: http://i2.pixiv.net/img02/img/yuumikouki/30693602.jpg

(My thanks to RaisingK for bringing this to my attention)

DakuTree said:
This doesn't always work.

It only works for a short period of time following the revision. After about a day or two, the revision replaces the original.

DakuTree said:
Well I was planning to go through everything manually, since as you said, there will be duplicates. This was why I was asking if it would be worth skipping minor revisions, most of the time this is not noticeable (strangely enough, this is the type of revision I see getting uploaded the most).

The "mass uploading" in the thread title made me think you meant some sort of automatic process. I favor skipping the minor revisions, but it's a fuzzy area and it's not a problem to push ahead anyway.

Updated

RaisingK said:
It only works for a short period of time following the revision. After about a day or two, the revision replaces the original.

This wasn't the case with your example in post #1240518 and there are images like post #1123864 where you can still access the original 7 months later or more. So it's definitely more than just a day or two.

To get back on topic though:

DakuTree said:
Minor:
post #785601 / revision
post #747792 / revision

This is where I would draw the line. The changes are barely visible and the images aren't that unique or well-made to be worth it IMHO.

I'd wager a lot of seriously minor revisions are already being uploaded because the image's MD5 changed and the uploader didn't use the IQDB to check for a duplicate. There's little to no harm in having these extra revisions batch-uploaded (if only to prevent duplicate uploading and tagging mishaps), so long as they are properly parented.

To address the issue of revisions showing up as duplicates in a search query, we could have a tag that covers those prior images (old_version?) so people can blacklist it if they want to. I don't think one currently exists, but if it does, please correct me.

I'll chime in, although all I have to say is that the people who said "it's subjective" are right. It's just one of those things you need to use your judgment on. Hopefully people with Contrib+ status have good judgment, and people who are Priv- can hurt their upgrade odds if they show bad judgment on it.

Favor major revisions, but it's not a punishable offense to upload minor ones, especially if done unknowingly.

1 2