Blogger Comment Spam - Deleting it

May 17, 2008

It seems over recent months that my blog gets comment spam. I imagine any bloggers out there experience the same thing and it is a bit of a pain.

I have three immediate problems with this and blogger.com.

1. Blogger doesn't notify me of all comments at the time they are posted. It notifies me of some, and I have of course configured it to notify me of all comments, but it seems to miss off about 70%. So not only do I not notice the spam, I also miss a bunch of legitimate comments. Please get it together Blogger! Ajax panel configuration is nice, but only if the core functions work.

2. Blogger should/could/might try to stop this spam before it happens. I am not guessing how, but then the company that runs Blogger.com are much brighter than me, and I am sure they have a solution.

3. The interface for browsing comments and deleting many at a time simply does not exist. This would make the task of sifting through, identifying, and delting spam much easier.

Now that I have had my grumble about it, I will offer my small solution. In praise of Google, they do provide a nice API and Python bindings to access all of their services and blogger is one of them. So I wrote a small script to go through all the comments, do a little bit of flagging on dodgy looking ones and offer you a chance of deleting them.

The script is uncommented, has no tests, and I don't plan in any way to maintain it or release it, but for those people suffering the same problems, I provide it here.

It is worth noting that the spam detection is really pathetic, and it could be vastly improved. I targetted it at my particular spam.

Full script available here


"""
(c) Ali Afshar 2008
MIT License
"""

import sys, getpass

from gdata import service


def get_details():
    email = raw_input('email: ').strip()
    password = getpass.getpass()
    return email, password


def create_service(email, password):
    blogger_service = service.GDataService(email, password)
    blogger_service.source = 'blogger_spam_killer'
    blogger_service.service = 'blogger'
    blogger_service.server = 'www.blogger.com'
    blogger_service.ProgrammaticLogin()
    return blogger_service


def get_all_blog_ids(svc):
    query = service.Query()
    query.feed = '/feeds/default/blogs'
    feed = svc.Get(query.ToUri())
    for entry in feed.entry:
        blog_id = entry.GetSelfLink().href.split("/")[-1]
        yield blog_id


def get_blog_comments(svc, blog_id):
    query = service.Query()
    query.feed = '/feeds/%s/comments/default' % blog_id
    query.max_results = sys.maxint
    feed = svc.Get(query.ToUri())
    for entry in feed.entry:
        yield entry


def get_all_comments(svc):
    for blog_id in get_all_blog_ids(svc):
        for comment in get_blog_comments(svc, blog_id):
            yield comment


def rank_comment(comment):
    words = 0
    for word in spamwords:
        words += comment.content.text.count(word)

    author = comment.author[0]
    has_uri = (author.uri is not None and
                # I figure no one who puts a URI would link to a blogger
                # profile. They would link to whatever they are spamming.
                'http://www.blogger.com/profile/' not in author.uri.text)
    print 'Spam words: %s' % words
    print 'Dodgy author uri: %s' % has_uri
    return bool(words) or has_uri


def delete_comment(svc, comment):
    svc.Delete(comment.GetEditLink().href)


def filter_all_comments(svc):
    for comment in get_all_comments(svc):
        print '--'
        t = comment.content.text
        print t[:70] + '...'
        print '...' + t[-70:]
        a = comment.author[0]
        print 'Author Info: ', a.name.text
        if rank_comment(comment):
            print '**** LOOKS DODGY'
        else:
            print '==== OK'
        s = raw_input('Delete? (y/N) ').strip()
        if s == 'y':
            print 'Deleting.'
            delete_comment(svc, comment)
        else:
            print 'Not deleting.'


# http://codex.wordpress.org/Spam_Words
spamwords = """
4u
adipex
advicer
...
""".strip().splitlines()


if __name__ == '__main__':
    em, pw = get_details()
    svc = create_service(em, pw)
    filter_all_comments(svc)

Toward a Secret Sky

Blogger Comment Spam - Deleting it

Popular posts from this blog

Kiwi proxy widgets, a common widget API

PyGTK, Py2exe, and Inno setup for single-file Windows installers

Using threads in PyGTK