Fighting Comment Spam With Project Honey Pot (Posted on May 11th, 2013)

Early on when developing this blog my goal was to allow for a commenting system that didn't require login of my site or any other third party site to post. This has made me a huge target for spammers. In fact when my site first launched I didn't use recaptcha so I was getting thousands of comments per day and I didn't really have all that much content. However, a majority of these posts never saw the light of day thanks to Project Honey Pot's HTTP Blacklist. Taken from their site:

The HTTP Blacklist, or "http:BL", is a system that allows website administrators to take advantage of the data generated by Project Honey Pot in order to keep suspicious and malicious web robots off their sites. Project Honey Pot tracks harvesters, comment spammers, and other suspicious visitors to websites. Http:BL makes this data available to any member of Project Honey Pot in an easy and efficient way.

The service works by tracking the IPs of bad things and giving it a rating. With that rating you can choose how content gets marked as spam. So all I do is send the IP address of the poster to the service and then check to see what the odds are of it being a spammer.

Using HTTP Blacklist

The first step is registering for an API key which also requires an account on the site. Once you have that done using the service with Python (or any language really) is actually only a few lines of code which is pretty sweet. I've included some extra code for those using Django as well.

#Getting the users IP from a Django request
x_forwarded_for = request.META.get('HTTP_X_FORWARDED_FOR')
if x_forwarded_for:
    ip = x_forwarded_for.split(',')[0]
else:
    ip = request.META.get('REMOTE_ADDR')

#Step 1: Reverse the order of the IP address octets
iplist = ip.split('.')
iplist.reverse()

#Step 2: Build the query
query = YOUR_HTTP_BL_API_KEY + '.' + '.'.join(iplist) + '.' + 'dnsbl.httpbl.org'

#Step 3: Execute the query
from socket import gethostbyname
httpbl_result = gethostbyname(query)
httpbl_resultlist = httpbl_result.split('.')

From there it's up to you to interpret the data. The first item in the result list should be 127 signaling that the query was successful. The second item will be a value from 0-255 of when the last time that IP address was marked. The third item will be a threat score. I find this to be the most useful metric. From my experience anything above 45 is usually spam. The fourth and final item will be the type of user such as search engine, suspicious, harvester, or comment spammer. The API goes in to more detail for all of these fields. My code for marking comments as spam looks like this:

#Check if response is proper
if httpbl_resultlist[0] == "127":
    #Check threat level
    if httpbl_resultlist[2] > 45:
        comment.spam = True
    else:
        comment.spam = False
else:
    comment.spam = True

Overall I've found http:BL to be a super useful and FREE service to catch spammers. A lot of people also use Akismet so definitely check that out to see which fits your needs better. As of now I'm using recaptcha and http:BL to filter comments. I'd say about 10 comments get past recaptcha per day and about 5% of the comments that get past recaptcha also get past http:BL. So about 2-3 comments make it to the site every week which isn't bad for a userless commenting system. I'll definitely be looking to cut this number down with some more advanced filtering in the near future.

As always if you have any feedback or questions feel free to drop them in the comments below or contact me privately on my contact page. Thanks for reading!

Tags: Django, Python, Tools

Comments:

  • Jeni - 1 month, 2 weeks ago

    Our oracle database syllabus desined by oracle certified experts which includes basic and advanced level training. At the end of the oracle database course students can clear oracle global certification exams like OCA, OCP ,OCM.Trainingbangalore povides latest oracle version training like 10g,11g,11g R2.Our training center equiped with high level lab facilites.

    reply

  • showbiz news - 1 month, 1 week ago

    Just admiring your work and wondering how you managed this blog so well. It’s so remarkable that I can't afford to not go through this valuable information whenever I surf the internet!

    reply

  • pou hack - 1 month ago

    Really a great addition. I have read this marvelous post. Thanks for sharing information about it. I really like that. Thanks so lot for your convene.

    reply

  • hill climb racing cheats - 1 month ago

    Very good points you wrote here..Great stuff...I think you've made some truly interesting points.Keep up the good work.

    reply

  • Please Visit Tom Hardy Jacket - 1 week ago

    Its fantastic comparison of the topic and able to study much more material here.

    reply