You will get
much better results if you don't just compare the average color of "mosaic window" & tile screenshot, but consider image structure. A mosaic window that looks like
######
#####.
####..
###...
##....
#.....
is better approximated by a screenshot with diagonal features than something that averages to gray.
You can simply scale the screenshots to match the resolution of the mosaic window, then compare differences for each pixel. Sum up the errors (or squared errors) of each pixel and pick the best tile.
Since this is computationally expensive, it may be wise to use your averaging method as a first step to filter out any screenshots that are way off, then use pixel-perfect matching on the best fits to determine the winner.
Further improvements can be made by disallowing repeating screenshots. That's easy to code, but obviously needs a larger library of screenshots to work.
(of course there are plenty of pre-made programs that can already do this, but where's the fun in that? ;))