Skip to content

April 28, 2011

A Blog Analyzer Project

In the coming days, I will be writing about a project I’m working on which will perform analysis on Andrew Sullivan’s The Dish blog, which is one of the most popular blogs on American politics.  The intent of the project, which will utilize such technologies as Spring 3, JSP/JSTL, JDBC, PostgreSQL, and jQuery/Ajax, is to web scrape the blog, extract key data elements, and reorganize and present this data in new and interesting ways.  Additionally, I will create a bookmarklet that will add value to the blog site itself.

Development tools used include Netbeans 7.0, Firefox with Firebug, and the always handy psql Postgres command line tool.

There are many interesting technical challenges involved, and I will write about them on this blog.  Additionally, there is the question of copyright law, which is an unavoidable concern when building off of content from a third party.  Copyright law was not meant to stifle innovation, though, provided certain criteria are met: the content originator must not be harmed in the marketplace, repurposed content must be transformed into a novel work, and small portions must be used.  I believe my project fits these criteria.

Read more from Java

Share your thoughts, post a comment.


Note: HTML is allowed. Your email address will never be published.

Subscribe to comments