(Note: Nerdy stuff)
So, yesterday, I decided I would use the ability to get random images to attempt to determine the composition of the images on this site – as in, how many of each kind of images are there? This came to mind while tagging images, as I found a few that were untagged, improperly tagged, and/or out of their proper gallery, meaning that attempting to find out how many of a kind there are through the search wouldn't be 100% accurate. I started out small, with a sample size of 100 images, and only 18 categories for images (if I really get going I'd expect to have at least 25 categories and 500 images).
My methods:
I used the ability to get a random image on the site to record 100 images under 18 categories, being Demotivational Posters, RWBY, Pokemon, Super Smash Bros., Undertale, JoJo's Bizarre Adventure, Steven Universe, MLP, Reaction Images, Tumblr, Hentai Quotes, Rage Comics, Fanart, Twitter, Image Macro, and an Other category for images that don't fit into any of the others. I also recorded the number assigned to each image in the URL.
A few people in the IRC expressed concerns about a random sample of such a small size, so I included two methods to check the reliability. The first was averaging the numbers assigned to the images. If the average of all the image numbers wasn't close to 500k, then we know that it had uneven distribution to either older or newer images. Also, I took some of the larger galleries and compared that to how the numbers gathered differed from the numbers in those galleries.
Numbers:
(Note that the numbers add up to above 100% because images could be added to multiple categories.)
- 41% of images found did not fit into any of the categories of interest.
- 31% of the images were fanart.
- 25% of the images were MLP related.
- 14% were image macros.
- 5% were Pokemon related.
- Super Smash Bros., JoJo's Bizzare Adventure, Steven Universe, and Tumblr each were related to 2% of the images found.
- Reaction Images, Hentai Quotes, and Rage Comics each were related to 1% of the images found.
- The rest of the categories had 0%,
Accuracy tests:
- MLP, Hentai Quotes, Steven Universe, Tumblr, Super Smash Bros., Pokemon, RWBY, and Undertale got about the percentage expected. All is good there.
- The average of the numbers came out to 578479, meaning it had a slight bias toward newer images. This means older ones, specifically rage comics, demotivational posters, and image macros, likely got slightly skimped out.
- Reaction images were 2% lower than the minimum expected, which I'm guessing was caused by the newer image bias.
- I'm only human, and this is my first time doing something like this. It's unlikely but possible I mislabeled something, which would be a whole percentage point changed.
So that's my findings thus far. This is just the preliminary test, which I think went pretty well, so no hard numbers about predicting the number of a certain type of image quite yet.
If you have any suggestions for what I should do next time, I'd be glad to hear them. As I said I'm new to this, so I might not be getting something right.
Mod edit: Removed personal info