This Blog continues on

Saturday, May 17, 2008

Blogger Comment Spam - Deleting it

It seems over recent months that my blog gets comment spam. I imagine any bloggers out there experience the same thing and it is a bit of a pain.

I have three immediate problems with this and

1. Blogger doesn't notify me of all comments at the time they are posted. It notifies me of some, and I have of course configured it to notify me of all comments, but it seems to miss off about 70%. So not only do I not notice the spam, I also miss a bunch of legitimate comments. Please get it together Blogger! Ajax panel configuration is nice, but only if the core functions work.

2. Blogger should/could/might try to stop this spam before it happens. I am not guessing how, but then the company that runs are much brighter than me, and I am sure they have a solution.

3. The interface for browsing comments and deleting many at a time simply does not exist. This would make the task of sifting through, identifying, and delting spam much easier.

Now that I have had my grumble about it, I will offer my small solution. In praise of Google, they do provide a nice API and Python bindings to access all of their services and blogger is one of them. So I wrote a small script to go through all the comments, do a little bit of flagging on dodgy looking ones and offer you a chance of deleting them.

The script is uncommented, has no tests, and I don't plan in any way to maintain it or release it, but for those people suffering the same problems, I provide it here.

It is worth noting that the spam detection is really pathetic, and it could be vastly improved. I targetted it at my particular spam.

Full script available here

(c) Ali Afshar 2008
MIT License

import sys, getpass

from gdata import service

def get_details():
email = raw_input('email: ').strip()
password = getpass.getpass()
return email, password

def create_service(email, password):
blogger_service = service.GDataService(email, password)
blogger_service.source = 'blogger_spam_killer'
blogger_service.service = 'blogger'
blogger_service.server = ''
return blogger_service

def get_all_blog_ids(svc):
query = service.Query()
query.feed = '/feeds/default/blogs'
feed = svc.Get(query.ToUri())
for entry in feed.entry:
blog_id = entry.GetSelfLink().href.split("/")[-1]
yield blog_id

def get_blog_comments(svc, blog_id):
query = service.Query()
query.feed = '/feeds/%s/comments/default' % blog_id
query.max_results = sys.maxint
feed = svc.Get(query.ToUri())
for entry in feed.entry:
yield entry

def get_all_comments(svc):
for blog_id in get_all_blog_ids(svc):
for comment in get_blog_comments(svc, blog_id):
yield comment

def rank_comment(comment):
words = 0
for word in spamwords:
words += comment.content.text.count(word)

author =[0]
has_uri = (author.uri is not None and
# I figure no one who puts a URI would link to a blogger
# profile. They would link to whatever they are spamming.
'' not in author.uri.text)
print 'Spam words: %s' % words
print 'Dodgy author uri: %s' % has_uri
return bool(words) or has_uri

def delete_comment(svc, comment):

def filter_all_comments(svc):
for comment in get_all_comments(svc):
print '--'
t = comment.content.text
print t[:70] + '...'
print '...' + t[-70:]
a =[0]
print 'Author Info: ',
if rank_comment(comment):
print '**** LOOKS DODGY'
print '==== OK'
s = raw_input('Delete? (y/N) ').strip()
if s == 'y':
print 'Deleting.'
delete_comment(svc, comment)
print 'Not deleting.'

spamwords = """

if __name__ == '__main__':
em, pw = get_details()
svc = create_service(em, pw)