Fighting Comment Spam With Project Honey Pot (Posted on May 11th, 2013)

Early on when developing this blog my goal was to allow for a commenting system that didn't require login of my site or any other third party site to post. This has made me a huge target for spammers. In fact when my site first launched I didn't use recaptcha so I was getting thousands of comments per day and I didn't really have all that much content. However, a majority of these posts never saw the light of day thanks to Project Honey Pot's HTTP Blacklist. Taken from their site:

The HTTP Blacklist, or "http:BL", is a system that allows website administrators to take advantage of the data generated by Project Honey Pot in order to keep suspicious and malicious web robots off their sites. Project Honey Pot tracks harvesters, comment spammers, and other suspicious visitors to websites. Http:BL makes this data available to any member of Project Honey Pot in an easy and efficient way.

The service works by tracking the IPs of bad things and giving it a rating. With that rating you can choose how content gets marked as spam. So all I do is send the IP address of the poster to the service and then check to see what the odds are of it being a spammer.

Using HTTP Blacklist

The first step is registering for an API key which also requires an account on the site. Once you have that done using the service with Python (or any language really) is actually only a few lines of code which is pretty sweet. I've included some extra code for those using Django as well.

#Getting the users IP from a Django request
x_forwarded_for = request.META.get('HTTP_X_FORWARDED_FOR')
if x_forwarded_for:
    ip = x_forwarded_for.split(',')[0]
    ip = request.META.get('REMOTE_ADDR')

#Step 1: Reverse the order of the IP address octets
iplist = ip.split('.')

#Step 2: Build the query
query = YOUR_HTTP_BL_API_KEY + '.' + '.'.join(iplist) + '.' + ''

#Step 3: Execute the query
from socket import gethostbyname
httpbl_result = gethostbyname(query)
httpbl_resultlist = httpbl_result.split('.')

From there it's up to you to interpret the data. The first item in the result list should be 127 signaling that the query was successful. The second item will be a value from 0-255 of when the last time that IP address was marked. The third item will be a threat score. I find this to be the most useful metric. From my experience anything above 45 is usually spam. The fourth and final item will be the type of user such as search engine, suspicious, harvester, or comment spammer. The API goes in to more detail for all of these fields. My code for marking comments as spam looks like this:

#Check if response is proper
if httpbl_resultlist[0] == "127":
    #Check threat level
    if httpbl_resultlist[2] > 45:
        comment.spam = True
        comment.spam = False
    comment.spam = True

Overall I've found http:BL to be a super useful and FREE service to catch spammers. A lot of people also use Akismet so definitely check that out to see which fits your needs better. As of now I'm using recaptcha and http:BL to filter comments. I'd say about 10 comments get past recaptcha per day and about 5% of the comments that get past recaptcha also get past http:BL. So about 2-3 comments make it to the site every week which isn't bad for a userless commenting system. I'll definitely be looking to cut this number down with some more advanced filtering in the near future.

As always if you have any feedback or questions feel free to drop them in the comments below or contact me privately on my contact page. Thanks for reading!

Tags: Django, Python, Tools


  • Jeni - 2 months, 4 weeks ago

    Our oracle database syllabus desined by oracle certified experts which includes basic and advanced level training. At the end of the oracle database course students can clear oracle global certification exams like OCA, OCP ,OCM.Trainingbangalore povides latest oracle version training like 10g,11g,11g R2.Our training center equiped with high level lab facilites.


  • showbiz news - 2 months, 3 weeks ago

    Just admiring your work and wondering how you managed this blog so well. It’s so remarkable that I can't afford to not go through this valuable information whenever I surf the internet!


  • pou hack - 2 months, 1 week ago

    Really a great addition. I have read this marvelous post. Thanks for sharing information about it. I really like that. Thanks so lot for your convene.


  • hill climb racing cheats - 2 months, 1 week ago

    Very good points you wrote here..Great stuff...I think you've made some truly interesting points.Keep up the good work.


  • Please Visit Tom Hardy Jacket - 1 month, 2 weeks ago

    Its fantastic comparison of the topic and able to study much more material here.


  • instagram get more followers - 1 week, 5 days ago

    Your blog provided us with valuable information to work with. Each & every tips of your post are awesome. Thanks a lot for sharing. Keep blogging..


  • get comments on instagram - 1 week, 5 days ago

    I am happy to find this post Very useful for me, as it contains lot of information. I Always prefer to read The Quality and glad I found this thing in you post. Thanks


  • Chris Pratt Vest Tedwed - 1 week, 4 days ago

    It’s very easy to understand! The involved analysis is very knowledgeable and a lot to know about this. Keep sharing here!


  • Governor Vest - 1 week, 3 days ago

    Just having read your article for a while now and impressed. Thanks for this great analysis. Outstanding assumptions


  • superiorpapers review - 1 week, 2 days ago

    I am using this superior papers reviews it is very motivating site for students it is very simply to understand and how to read and write simply. We are very happy with this superior papers making sites well.


  • Salesforce Training in Chennai - 1 day, 23 hours ago

    VEry informative article . Thanks.