I have a better idea for a captcha. Instead of using an image (which is what every spammer expects) just use text. Simply give the captcha as a text like:
Please write the string xyzzy7 in the textfield below to complete your registration.
Now, how many spambots will crack that? You could add an additional security measurement by blocking (at least for some time) the ip if it makes several (like eg. 10) consecutive failed attempts.
Remember when there was that university study that pretended like it was big news that people could read sentences when all but the first and last letters of words were scrambled? You could probably combine that effect with Warp's idea to make something reasonably difficult to attack with a computer.
So, first, come up with 20 different ways of saying the instructions:
Hello potential user, would you please write xyzzy in the captcha box?
xyzzy is what you should definitely type into the following box if you want to register.
Welcome, please type xyzzy in the box should you desire to join these forums.
Etc.
Then pick one of the phrases at random, and scramble the letters of all the words (except for the code word) randomly: "Hlelo pteotianl uesr, wulod you pelase wtire xyzzy in the catphca box?" It might be even less susceptible to attack by a bot if the code word was nothing more than a scrambled real word, as well. Of course, this scheme breaks the "shouldn't have to understand English" rule, but oh well.
I was hoping you wouldn't say that, so I wouldn't have to say that the cracking program could eventually store all the variations of the sentence, since scrambling the words wouldn't change where the password was located.
put yourself in my rocketpack if that poochie is one outrageous dude
store all the variations ? okay there's only 20 of them, but to tell them apart you would have to store all the scrambled possibilities, that could make a lot...
I never sleep, 'cause sleep is the cousin of death - NAS
You could create a virtually limitless number of original sentences just by coming up with 20 or so base variations and then substituting at random "hi/hello/howdy" and "could/would/should/might/ought" and whatever. You get the picture.
uh But yet even still though, once you've got all the possible sentence structures figured out, all the bot has to do is jump to the correct word. And even if all it did was select a random word out of a sentence, there are so few words in the sentence that it would only take a few dozen tries to stumble onto a correct word.
put yourself in my rocketpack if that poochie is one outrageous dude
a dictionary attack on a scrambled word is easy, and even quite fast if the first and last letters are known. All a bot has to do is figure out which one is not an english word.
And even if you make sure to either have the password be a scrambled word as well, or to include spelling mistakes in the other words, this captcha still has a (1/(number of words))-chance of being guessed. Since you can't IP-ban on first attempt (you know, humans make mistakes as well), a bot has quite a chance to succeed.
What about a magic eye (sterogram) captcha? I doubt the computer could morph the image together and figure out which words/numbers are 3d.
Even if the computer is able to recognize how to organize the overlapping frames, how would be able to read the resulting image?
But then there's the inherent problem of humans who can't do magic eye puzzles.
Plus, doesn't it take a computer to create a stereogram anyway? So if it could figure out the original pattern, it would just have to find the pixels that vary to put together the image.
put yourself in my rocketpack if that poochie is one outrageous dude
actually, the computer should be able to do so with much more reliability than a human. I don't know if it's been done yet, but I'm pretty sure it's possible.
guessing the image is another matter, but here applies the same as to regular captchas:
- you either generate a random sequence of letters. OCR-software can read it as well, unless it's really scrambled - but scrambling on top of magic-eye will give humans a hard time registering
- you pick images, e.g. an outline of a horse to type "horse" into the field. Those images need to be stored on the server. If somebody manages to retrieve and manually label all of them, the captcha is broken. So, unless you wish to spend a couple of hours per day to replace the images, that won't help either.
basically, you'll have two choices to survive:
a) use a captcha that's not widespread enough, so nobody will bother to break it. Best is custom-made, replaced regularly. This offers about the same level of security as the bot-traps mentioned in the initial post, so I suggest spending your time on the non-annoying solutions.
b) get to the topic with much more knowledge about the matters than we do. Especially knowledge about the advances in image- and text-recognition are valuable. Sorry for the harsh words, but the concepts I've seen here were mostly wild ideas, maybe interesting, but ultimately worthless.
You need to find concepts where human and computer abilities differ. The trouble is, that a computer should be unable to know the solution, while it's still a computer that has to create the captcha and determine if the answer is correct.
That means, the most interesting tests for human behavior cannot be done, since a computer is unable to randomly generate them. Reliable tests could be
pictures of animals or other object groups, that need to be labelled
random words or images with the "what does not belong?"-question
a short story that has to be understood and summarized, or questions about it have to be answered (could be great for a forum, to check the ability for reading comprehension as well :p)
melodies you have to classify into genre (although computers get better at this, so it's probably not a good idea)
But as said, that's useless without random generation, as predefined questions can be answered from a database.
so, yeah, nobody read this far anyway, I'll stop here.
Reverse Engineering something is very difficult, quite impossible in some cases.
For example, if you compile/link a c program, and don't include a symbol table indicating what the variable labels were, and want to decompile it...
It is practically impossible to recover the original program in it's original syntax, which includes variable names, and sometimes overall structure.
Figuring out the original pattern is the area where we can fool spambots.
We somehow insert image(s) that prevent reconstruction of the original image, but allow the human to still see the stereogram. And we insert these "imperfections" randomly.
It's almost like dumping a bunch of identical lego pieces on the ground, and trying to figure out what order they originally came into the box. All the computer can do is guess. However, the human can use a video to determine what pieces went where, while the computer does not have access to the video.
Speaking of random generation:
http://www.enweirdenment.org/cgi-bin/cube.cgi or gifs here:
http://www.enweirdenment.org/cgi-bin/cube.gif
Concept could be useful.
>But then there's the inherent problem of humans who can't do magic eye puzzles.
Sign me up for that category. (I think it's all a lie and people are just pretending that they can see something in that complete mess of colors.)
Anyway, I don't get what the big deal is. I've never seen a single spambot post on this forum, so this mostly seems like a hypothetical discussion. But perhaps that's because they are deleted at once instead of being filtered at sign-up.
There has been at least one incident of a spammer posting advertisements on these forums (it was deleted at once), and about 50 incidents of spammers registering an account with a pornsite or drugselling site URL in their profile (activated or not activated). Naturally, I delete them as soon as I notice them.
Since the changes I did that I described in this post, there has been zero of either kind, even though there have been about 30 new registrations.