Danbooru

Pixiv mismatch reports: 1086 + 3313 + 6689 posts

Posted under General

As brought up in topic #10206, here are my reports on pixiv md5 mismatches. Artist revisions are the major source of these. It was an interesting exercise, but not one worth repeating. Perhaps some of you will find the results (in CSV form) useful.

NOTE: Due to shortcuts I took as well as the volume of information, I can't guarantee the results are correct--but they're probably mostly correct. Maybe.

Cheers.

There is no consensus for such a drastic action. The only one in that thread rallying against them was you (and now BirdieRumia). OOZ662, for instance, turned out to actually be talking about how I was bumping my sample/thumbnail comments in the Comments index, and I've stopped that. There was no discussion on stopping or deleting existing comments; jxh2154 did not say that. Some users found the comments useful. Save your complaints for that thread, not this one.

jxh2154 said:

Ehhhhhh, I think the comments are useful but I kinda see the complaints about comment spam. I don't know, maybe it's because I almost never read comments, let alone using order:comment (which I didn't know existed until this thread), so I'm not a good person to ask. I don't really have a snap decision on this, it's purely a matter of opinion and what features you happen to use.

I don't think limiting the rate per day would help either, 774457 would require like two years even at 1000/day. And 1000 a day for 2 years would be a much bigger issue than one lump run.

When there's no clear answer I guess status quo wins by default, with negative effect to an existing feature being a bigger problem than not implementing a new feature (or expanding an existing one rather). So, I guess the comments should not be made (on the 700k legacy batch anyway, on newly uploaded stuff is fine).

Updated

RaisingK said:

Some users found the comments useful.

Also, just because some people find this information useful does not mean it has to be in the form of comments. There are tons of alternate ways to make this information available, most of which are less spammy than comments.

I support what RaisingK is doing, and I prefer them being in the comments. I prefer the fact that this info is right there with the image and not placed somewhere else.

Toks said:

Also, just because some people find this information useful does not mean it has to be in the form of comments.

I'm not sure how the personal opinion of "other some people" like yourself outweighs the personal opinions of the "some people" who support it. Just because you dislike it doesn't mean we have to change it to suit what you desire.

I'm torn on this issue - while it's informative for sure, it would make more sense to automate the UPLOADING of posts where the mismatch is caught.

Additionally, I think RaisingK's tool should be ignoring mismatches where the Danbooru image is larger than the current Pixiv image, to discount JPEG compression images.

NWF_Renim said:

I'm not sure how the personal opinion of "other some people" like yourself outweighs the personal opinions of the "some people" who support it. Just because you dislike it doesn't mean we have to change it to suit what you desire.

It doesn't, but it would be nice if a compromise could be reached that suits both sides.

NWF_Renim said:

I support what RaisingK is doing, and I prefer them being in the comments. I prefer the fact that this info is right there with the image and not placed somewhere else.

Then can I ask why you prefer it in comments?

As opposed to, for example, having the information in a separate tab and Ctrl+Fing for the post ID. Or having it embedded in the site itself somehow so that it is still "right there" (like how artist commentary was changed).

Kikimaru said:

Additionally, I think RaisingK's tool should be ignoring mismatches where the Danbooru image is larger than the current Pixiv image, to discount JPEG compression images.

I don't think Pixiv compresses JPEG images, and I don't think the metadata removal mentioned in topic #9745 was applied on images before it started happening. Even if that was the case, legitimate revisions could be smaller than the current Danbooru image, and I wouldn't want them ignored.

Kikimaru said:

I'm torn on this issue - while it's informative for sure, it would make more sense to automate the UPLOADING of posts where the mismatch is caught.

Uploading the revisions automatically is an interesting idea. For one, it makes things easier on uploaders so they don't have to do it manually. And I assume these comments and the information in them would no longer be necessary at all? The reason EB just said he liked them ("so I can check if I need to upload the revision") would seem to no longer apply.

Of course, we'd want to maintain quality standards, so deleted/pending/flagged posts would not be automatically uploaded even when a revision exists. And if a rare bad image or two manage to slip through anyway somehow they could just be flagged like normal.

I had hoped to keep this topic about these reports, but oh well.

Toks said:

Then why are you still posting them on the 700k legacy batch?

Yes, yes, I saw that bit, so I stopped the one-time scan over the remaining legacy posts. I still monitor recent changes in the post tag history.

Kikimaru said:

I'm torn on this issue - while it's informative for sure, it would make more sense to automate the UPLOADING of posts where the mismatch is caught.

Additionally, I think RaisingK's tool should be ignoring mismatches where the Danbooru image is larger than the current Pixiv image, to discount JPEG compression images.

Revisions are not the only reason for the md5 mismatch. A post could have been gotten from a non-pixiv source, then given the source of the pixiv version. That the pixiv version was revised at some point would be a coincidence.

And like I said the first time you brought this up, the metadata-removal issue doesn't apply here. If the difference is because the image wasn't from pixiv, then it shouldn't have the pixiv source. There is no match with any of the pixiv versions, so it isn't incorrect to call it a mismatch.

Updated

RaisingK said:

Yes, yes, I saw that bit, so I stopped the one-time scan over the remaining legacy posts. I still monitor recent changes in the post tag history.

So you're saying the reason that post #219545, post #235808, post #281242, and dozens of other legacy posts are still receiving the comments is because of the recent change monitoring, and not the one-time scan?

But those posts haven't actually received any recent changes (sans your adding of the md5_mismatch tag)... now I'm confused.

Toks said:

So you're saying the reason that post #219545, post #235808, post #281242, and dozens of other legacy posts are still receiving the comments is because of the recent change monitoring, and not the one-time scan?

But those posts haven't actually received any recent changes (sans your adding of the md5_mismatch tag)... now I'm confused.

But those are all posts where I myself uploaded the revision. In those cases, the tag/comment comes from the function that uploads the pixiv image for me, which I run manually. I'm simply not running my scans over the images, which is all the "legacy post" comment was about.

Kikimaru said:

I'm torn on this issue - while it's informative for sure, it would make more sense to automate the UPLOADING of posts where the mismatch is caught.

RaisingK said:

Revisions are not the only reason for the md5 mismatch. A post could have been gotten from a non-pixiv source, then given the source of the pixiv version. That the pixiv version was revised at some point would be a coincidence.

And also, there are plenty of instances where the "revision" is much worse (significantly smaller, etc...) or even a completely different image, even when the Danbooru image is really what the link used to point to.

Toks said:

As opposed to, for example, having the information in a separate tab and Ctrl+Fing for the post ID. Or having it embedded in the site itself somehow so that it is still "right there" (like how artist commentary was changed).

And do that for every single image from pixiv, and keep track of which ones you've already checked? Comments are more convenient and immediate.

RaisingK said:

But those are all posts where I myself uploaded the revision. In those cases, the tag/comment comes from the function that uploads the pixiv image for me, which I run manually. I'm simply not running my scans over the images, which is all the "legacy post" comment was about.

Oh, I see. But what's the point of creating the comments after the revised version is already uploaded? I thought the purpose of the comments was to help uploaders with re-uploading the revised version? (Well, at least you don't seem to be creating 1000 comments per day now, which was the biggest issue.)

RaisingK said:

And also, there are plenty of instances where the "revision" is much worse (significantly smaller, etc...) or even a completely different image, even when the Danbooru image is really what the link used to point to.

It should be possible to programmatically detect those cases, and avoid automatic uploads in those cases.

Comparing filesizes is one simplistic way of going about it, but more accurate results could be gotten by using actual image similarity algorithms. e.g. Is the similarity of the two images <50%? Then they're completely different images; don't auto-upload.

Similarly, is the width*height of the revised version <80% of the one already one Danbooru? Then the revised version is significantly smaller, so don't auto-upload. Those kind of things.

RaisingK said:

And do that for every single image from pixiv, and keep track of which ones you've already checked? Comments are more convenient and immediate.

Why would someone want to manually go through literally every single image from Pixiv? With or without comments, going through 700,000 images is not convenient at all.

Toks said:

Well, at least you don't seem to be creating 1000 comments per day now, which was the biggest issue.

For you, at least. I really don't see a brief logjam in order:comment as a big problem--at 1000/day, I could finish in less than a week. With the spreadsheets to assist me, I could finish in a day or two. It wasn't going to be a regular thing, it was just clearing a backlog.

BirdieRumia said:

There's another issue with the MD5 mismatch comments, that I'd like to point out : it makes using comment search by tag almost entirely useless.

For example, suppose I have a vague memory of a comic about Mystia that got some comments and I try to find it by searching for comments on posts that are tagged mystia_lorelei ... 95% of what I find is comments about MD5 mismatches. And there's no way to filter them out, either. Very frustrating.

See, Toks? This is a complaint I can actually understand. Maybe not do anything about, depending on user consensus or executive action, but at least understand.

Toks said:

It should be possible to programmatically[...]

Do whatever you want with that idea, but leave me out of it.

Toks said:
Why would someone want to manually go through literally every single image from Pixiv? With or without comments, going through 700,000 images is not convenient at all.

Not every image from Pixiv, just every image from Pixiv that you happen to look at, when you aren't specifically hunting for images to reupload or aren't so interested in the difference that you'd make the effort to load and inspect the image yourself.

Updated

RaisingK said:

For you, at least. I really don't see a brief logjam in order:comment as a big problem--at 1000/day, I could finish in less than a week. With the spreadsheets to assist me, I could finish in a day or two. It wasn't going to be a regular thing, it was just clearing a backlog.

But order:comment is not the only argument against the comments you created. I haven't even mentioned order:comment in this thread.

RaisingK said:

See, Toks? This is a complaint I can actually understand. Maybe not do anything about, depending on user consensus or executive action, but at least understand.

I apologize if my poor wording has caused you to misunderstand me. Let me try my argument from a different angle, I hope it will be more clear:

These comments are really not suited to be comments. Comments are generally for making, well, comments about parts of the image, what the characters in the image are doing, and whatnot. Using comments as a way of tracking metadata, not even about the image itself but about alternate versions of the image, is just bound to cause various different problems, because comments are fundamentally not designed for being used this way. When you work against the very way the site is designed, you're going to run into problems. So I am against using comments in this way.

Look at artist commentary for example - using comments for that technically worked, but it caused problems, which were fixed by finding an alternate way of dealing with it, and now everyone is better off, right? Or do you think using comments for artist commentary was better than the current system of having a dedicated field for it?

Toks said:

But order:comment is not the only argument against the comments you created. I haven't even mentioned order:comment in this thread.

Don't split hairs, that was your argument in the other thread.

Toks said:

Or do you think using comments for artist commentary was better than the current system of having a dedicated field for it?

Apples to oranges. Translating is a collaborative effort, and you aren't providing an equivalent alternative either.

Updated

RaisingK said:

Apples to oranges. Translating is a collaborative effort, and you aren't providing an equivalent alternative either.

No alternative? What about auto-uploading revisions? What about adding dedicated fields for this into the site? What about the other suggestions made in the other thread?

Saying that equivalent alternatives haven't been provided is not true at all; there have been plenty.

1 2