<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>A Web Coding Blog</title>
	<atom:link href="http://www.gotoquiz.com/web-coding/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.gotoquiz.com/web-coding</link>
	<description>Notes on Java, PHP, and web design, mostly.</description>
	<lastBuildDate>Sun, 26 Jun 2011 12:18:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Why are JSPs so slow? (Tomcat 7 vs. Apache)</title>
		<link>http://www.gotoquiz.com/web-coding/misc/why-are-jsps-so-slow-tomcat-7-vs-apache/</link>
		<comments>http://www.gotoquiz.com/web-coding/misc/why-are-jsps-so-slow-tomcat-7-vs-apache/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 20:02:15 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[jsp]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[tomcat]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=475</guid>
		<description><![CDATA[It would be unreasonable to]]></description>
			<content:encoded><![CDATA[<p>It would be unreasonable to expect JSPs to be served at the same rate as static files.  Even after being compiled, requests to JSPs must pass through layers of servlet container code, not to mention the latency of the JSP code itself.  However, the difference between static files vs. JSP is much larger than I would have expected.</p>
<p>I ran a test on a 6KB HTML file using <a title="apache benchmark" href="http://httpd.apache.org/docs/2.0/programs/ab.html">Apache Benchmark</a> on my <a title="inspiron N7010" href="/web-coding/computers/dell-inspiron-17r-n7010-review/">Dell Inspiron N7010 laptop</a>, using 1,000 requests @ 10 concurrent requests.  I also created a JSP by pasting the same HTML into a file, adding only a few dynamic elements: outputting the of the host name, port, and app URI.  No calls to any beans, JDBC, or other potentially slow resources were made.  Default installations were used for both Apache and Tomcat 7.  The results were stark:</p>
<ul>
<li>Apache static HTML file: 971.25 pages/second (blazin&#8217;)</li>
<li>Tomcat 7 JSP file: 92.5 pages/second</li>
</ul>
<p>As you can see, the JSP is not merely slower.  It is a full order of magnitude slower.  Because the dynamic content is trivial in this test file, the performance of the servlet container fully accounts for this difference.</p>
<p>I thought I&#8217;d also test loading the JSP through Apache using mod_jk to see what kind of performance penalty this imparts.  Results:</p>
<ul>
<li>Tomcat 7 JSP file via Apache mod_jk: 74.80 pages/second</li>
</ul>
<p>So in this test case, there was a 19% performance penalty fronting Tomcat with Apache.  Of course speed is not the only consideration when evaluating using this type of setup, but it is something to weigh against other concerns.</p>
<p>There is also a debate about whether Tomcat or Apache is faster at serving static files.  This test is only one data point, but Apache easily wins here.  Tomcat serving the static 6KB file:</p>
<ul>
<li>Tomcat 7 static HTML file: 203.5 pages/second</li>
</ul>
<p>In this test, Apache was nearly five times faster at serving this file (though more than twice as fast compared to the JSP version).</p>
<p>And now, for people who like charts. =)</p>
<p><a href="http://www.gotoquiz.com/web-coding/wp-content/uploads/2011/06/html_v_jsp_chart.gif"><img class="aligncenter size-full wp-image-481" title="html vs jsp speed chart" src="http://www.gotoquiz.com/web-coding/wp-content/uploads/2011/06/html_v_jsp_chart.gif" alt="Chart shows Apache/HTML easily bests Tomcat/JSP" width="504" height="336" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/misc/why-are-jsps-so-slow-tomcat-7-vs-apache/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Configure Lucene IndexWriter and IndexSearcher in Spring applicationContext.xml</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/configure-lucene-indexwriter-and-indexsearcher-in-spring-applicationcontext-xml/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/configure-lucene-indexwriter-and-indexsearcher-in-spring-applicationcontext-xml/#comments</comments>
		<pubDate>Wed, 01 Jun 2011 17:13:06 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[spring]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=472</guid>
		<description><![CDATA[Problem: you want to define]]></description>
			<content:encoded><![CDATA[<p>Problem: you want to define Lucene IndexWriter and IndexSearcher as beans inside your Spring application to be injected/autowired into other beans.</p>
<p>Solution: follow the following steps.</p>
<ul>
<li>define the Lucene version as a constant</li>
<li>define a Lucene analyzer (StandardAnalyzer) as a bean</li>
<li>define a Lucene directory as a bean, using a factory-method for instantiation</li>
<li>define an IndexWriter, wiring in the Lucene directory and an IndexWriterConfig set to use your previously-defined analyzer</li>
<li>define an IndexSearcher, wiring in the Lucene directory</li>
<li>define also a query parser (StandardQueryParser), wiring in the analyzer bean</li>
</ul>
<p>You can then wire/autowire these beans into your application beans, for example:<br />
<span id="more-472"></span></p>
<pre class="brush: java; title: ;">
@Autowired
private IndexWriter indexWriter;
</pre>
<p>Here is the resulting XML config after following the above steps, which you will place in your Spring applicationContext.xml:</p>
<pre class="brush: xml; title: ;">
&lt;!-- LUCENE SEARCH CONFIG --&gt;
&lt;!-- set the Lucene version --&gt;
&lt;util:constant id=&quot;LUCENE_VERSION&quot; static-field=&quot;org.apache.lucene.util.Version.LUCENE_31&quot; /&gt;
&lt;!-- set your analyzer, to be used by the IndexWriter and QueryParser --&gt;
&lt;bean id=&quot;luceneAnalyzer&quot; class=&quot;org.apache.lucene.analysis.standard.StandardAnalyzer&quot;&gt;
  &lt;constructor-arg ref=&quot;LUCENE_VERSION&quot;/&gt;
&lt;/bean&gt;
&lt;!-- set your Lucene directory --&gt;
&lt;!-- in this case I am pulling the location from a properties file --&gt;
&lt;!-- also, using the SimpleFSLockFactory --&gt;
&lt;bean id=&quot;luceneDirectory&quot; class=&quot;org.apache.lucene.store.FSDirectory&quot; factory-method=&quot;open&quot;&gt;
  &lt;constructor-arg&gt;
    &lt;bean class=&quot;java.io.File&quot;&gt;
      &lt;constructor-arg value=&quot;${config.search.indexDir}&quot; /&gt;
    &lt;/bean&gt;
  &lt;/constructor-arg&gt;
  &lt;constructor-arg&gt;
    &lt;bean class=&quot;org.apache.lucene.store.SimpleFSLockFactory&quot; /&gt;
  &lt;/constructor-arg&gt;
&lt;/bean&gt;
&lt;!-- now you're ready to define the IndexWriter --&gt;
&lt;bean id=&quot;indexWriter&quot; class=&quot;org.apache.lucene.index.IndexWriter&quot;&gt;
  &lt;constructor-arg ref=&quot;luceneDirectory&quot; /&gt;
  &lt;constructor-arg&gt;
    &lt;bean class=&quot;org.apache.lucene.index.IndexWriterConfig&quot;&gt;
      &lt;constructor-arg ref=&quot;LUCENE_VERSION&quot;/&gt;
      &lt;constructor-arg ref=&quot;luceneAnalyzer&quot; /&gt;
    &lt;/bean&gt;
  &lt;/constructor-arg&gt;
&lt;/bean&gt;

&lt;!-- define the IndexSearcher --&gt;
&lt;bean id=&quot;indexSearcher&quot; class=&quot;org.apache.lucene.search.IndexSearcher&quot; depends-on=&quot;indexWriter&quot;&gt;
  &lt;constructor-arg ref=&quot;luceneDirectory&quot; /&gt;
&lt;/bean&gt;
&lt;!-- also useful is to define a query parser --&gt;
&lt;bean id=&quot;queryParser&quot; class=&quot;org.apache.lucene.queryParser.standard.StandardQueryParser&quot;&gt;
  &lt;constructor-arg ref=&quot;luceneAnalyzer&quot; /&gt;
&lt;/bean&gt;
</pre>
<p>Now you&#8217;re ready to rock and roll with Lucene in your application.  There is one problem, however.  On first execution, instantiation of the IndexSearcher will fail because no index exists yet.  I have not found an elegant solution to this problem, so what I do is simply comment out the IndexSearcher bean definition on first run.  Doing so necessitates this bean be optional in your application.  So, where I autowire IndexWriter, I set required=false:</p>
<pre class="brush: java; title: ;">
@Autowired(required=false)
private IndexSearcher indexSearcher;
</pre>
<p>Once a Lucene index exists, I uncomment the searcher bean.  If there is a better way to solve this problem, please share a comment!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/configure-lucene-indexwriter-and-indexsearcher-in-spring-applicationcontext-xml/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Simple pagination taglib for JSP</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/simple-pagination-taglib-for-jsp/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/simple-pagination-taglib-for-jsp/#comments</comments>
		<pubDate>Fri, 27 May 2011 17:39:08 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[jsp]]></category>
		<category><![CDATA[pagination]]></category>
		<category><![CDATA[taglib]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=464</guid>
		<description><![CDATA[Pagination is a common requirement]]></description>
			<content:encoded><![CDATA[<p>Pagination is a common requirement when writing JSPs.  Long sets of data must be broken up across multiple pages.  There is no standard way to implement pagination in JSP or JSTL, however.  You must use a custom tag to generate the page numbers.</p>
<p>I&#8217;ve created a simple pagination taglib that generates the page links.  You can customize it to your needs.  Usage example below.  Here, I&#8217;m using it to paginate search results from Lucene:</p>
<pre class="brush: xml; title: ;">
&lt;c:url var=&quot;searchUri&quot; value=&quot;/searchResults.html?s=${searchval}&amp;page=##&quot; /&gt;
&lt;paginator:display maxLinks=&quot;10&quot; currPage=&quot;${page}&quot; totalPages=&quot;${totalPages}&quot; uri=&quot;${searchUri}&quot; /&gt;
</pre>
<p>The paginator:display tag produces this output:</p>
<p><a href="http://www.gotoquiz.com/web-coding/wp-content/uploads/2011/05/pagination.png"><img src="http://www.gotoquiz.com/web-coding/wp-content/uploads/2011/05/pagination.png" alt="shows links 1 through 10" title="pagination" width="306" height="37" class="size-full wp-image-465" /></a></p>
<p><span id="more-464"></span></p>
<p>This is obviously showing Page 4 of the search results.  As you can see, you specify the URI, and the tag expects a ## where the page number will go.  You also specify the current page, the total number of pages, and the maximum number of page links you want to show.</p>
<p>The paginator output is an unordered list, which you&#8217;ll want to style something like this:</p>
<pre class="brush: css; title: ;">
.paginatorList { margin: 2px 6px; list-style: none outside none; }
.paginatorList li { float: left; padding: 2px 4px; font-size: 1.2em; }
li.paginatorCurr { font-weight: bold; font-size: 1.5em; margin-top: -2px; }
li.paginatorLast { float: none; }
</pre>
<p>This CSS will style the page links just like the example above.</p>
<p>Here is the java code for the paginator taglib:</p>
<pre class="brush: java; title: ;">
package my.taglibs;

import java.io.Writer;
import javax.servlet.jsp.JspWriter;
import javax.servlet.jsp.JspException;
import javax.servlet.jsp.tagext.SimpleTagSupport;

public class Paginator extends SimpleTagSupport {
    private String uri;
    private int currPage;
    private int totalPages;
    private int maxLinks = 10;

    private Writer getWriter() {
        JspWriter out = getJspContext().getOut();
        return out;
    }

    @Override
    public void doTag() throws JspException {
        Writer out = getWriter();

        boolean lastPage = currPage == totalPages;
        int pgStart = Math.max(currPage - maxLinks / 2, 1);
        int pgEnd = pgStart + maxLinks;
        if (pgEnd &gt; totalPages + 1) {
            int diff = pgEnd - totalPages;
            pgStart -= diff - 1;
            if (pgStart &lt; 1)
                pgStart = 1;
            pgEnd = totalPages + 1;
        }

        try {
            out.write(&quot;&lt;ul class=\&quot;paginatorList\&quot;&gt;&quot;);

            if (currPage &gt; 1)
                out.write(constructLink(currPage - 1, &quot;Previous&quot;, &quot;paginatorPrev&quot;));

            for (int i = pgStart; i &lt; pgEnd; i++) {
                if (i == currPage)
                    out.write(&quot;&lt;li class=\&quot;paginatorCurr&quot;+ (lastPage &amp;&amp; i == totalPages ? &quot; paginatorLast&quot; : &quot;&quot;)  +&quot;\&quot;&gt;&quot;+ currPage + &quot;&lt;/li&gt;&quot;);
                else
                    out.write(constructLink(i));
            }

            if (!lastPage)
                out.write(constructLink(currPage + 1, &quot;Next&quot;, &quot;paginatorNext paginatorLast&quot;));

            out.write(&quot;&lt;/ul&gt;&quot;);

        } catch (java.io.IOException ex) {
            throw new JspException(&quot;Error in Paginator tag&quot;, ex);
        }
    }

    private String constructLink(int page) {
        return constructLink(page, String.valueOf(page), null);
    }

    private String constructLink(int page, String text, String className) {
        StringBuilder link = new StringBuilder(&quot;&lt;li&quot;);
        if (className != null) {
            link.append(&quot; class=\&quot;&quot;);
            link.append(className);
            link.append(&quot;\&quot;&quot;);
        }
        link.append(&quot;&gt;&quot;)
            .append(&quot;&lt;a href=\&quot;&quot;)
            .append(uri.replace(&quot;##&quot;, String.valueOf(page)))
            .append(&quot;\&quot;&gt;&quot;)
            .append(text)
            .append(&quot;&lt;/a&gt;&lt;/li&gt;&quot;);
        return link.toString();
    }

    public void setUri(String uri) {
        this.uri = uri;
    }

    public void setCurrPage(int currPage) {
        this.currPage = currPage;
    }

    public void setTotalPages(int totalPages) {
        this.totalPages = totalPages;
    }

    public void setMaxLinks(int maxLinks) {
        this.maxLinks = maxLinks;
    }
}
</pre>
<p>And the .tld file:</p>
<pre class="brush: xml; title: ;">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;taglib version=&quot;2.1&quot; xmlns=&quot;http://java.sun.com/xml/ns/javaee&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-jsptaglibrary_2_1.xsd&quot;&gt;
  &lt;tlib-version&gt;1.0&lt;/tlib-version&gt;
  &lt;short-name&gt;paginator&lt;/short-name&gt;
  &lt;uri&gt;/WEB-INF/tlds/Paginator&lt;/uri&gt;

  &lt;tag&gt;
    &lt;name&gt;display&lt;/name&gt;
    &lt;tag-class&gt;my.taglibs.Paginator&lt;/tag-class&gt;
    &lt;body-content&gt;empty&lt;/body-content&gt;
    &lt;attribute&gt;
      &lt;name&gt;uri&lt;/name&gt;
      &lt;required&gt;true&lt;/required&gt;
      &lt;rtexprvalue&gt;true&lt;/rtexprvalue&gt;
      &lt;type&gt;java.lang.String&lt;/type&gt;
    &lt;/attribute&gt;
    &lt;attribute&gt;
      &lt;name&gt;currPage&lt;/name&gt;
      &lt;required&gt;true&lt;/required&gt;
      &lt;rtexprvalue&gt;true&lt;/rtexprvalue&gt;
      &lt;type&gt;int&lt;/type&gt;
    &lt;/attribute&gt;
    &lt;attribute&gt;
      &lt;name&gt;totalPages&lt;/name&gt;
      &lt;required&gt;true&lt;/required&gt;
      &lt;rtexprvalue&gt;true&lt;/rtexprvalue&gt;
      &lt;type&gt;int&lt;/type&gt;
    &lt;/attribute&gt;
    &lt;attribute&gt;
      &lt;name&gt;maxLinks&lt;/name&gt;
      &lt;rtexprvalue&gt;true&lt;/rtexprvalue&gt;
      &lt;type&gt;int&lt;/type&gt;
    &lt;/attribute&gt;
  &lt;/tag&gt;
&lt;/taglib&gt;
</pre>
<p>Now you can easily add pagination to any JSP.  Leave a comment if you found this useful!  If you have a blog as well, I wouldn&#8217;t mind a link. ^_^</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/simple-pagination-taglib-for-jsp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Convert between Java enums and PostgreSQL enums</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/convert-between-java-enums-and-postgresql-enums/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/convert-between-java-enums-and-postgresql-enums/#comments</comments>
		<pubDate>Sun, 15 May 2011 16:48:12 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[sql postgresql enum]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=455</guid>
		<description><![CDATA[PostgreSQL allows you to create]]></description>
			<content:encoded><![CDATA[<p>PostgreSQL allows you to create enum types using the following syntax:</p>
<pre class="brush: sql; title: ;">CREATE TYPE animal_type AS ENUM('DOG', 'CAT', 'SQUIRREL');</pre>
<p>You can now use &#8216;animal&#8217; as a datatype in your tables, for example:</p>
<pre class="brush: sql; title: ;">
create table pet (
                  pet_id        integer         not null,
                  pet_type      animal_type     not null,
                  name          varchar(20)     not null
                  );
</pre>
<p>In Java, you&#8217;d have a corresponding enum type:</p>
<pre class="brush: java; title: ;">
public enum AnimalType {
    DOG,
    CAT,
    SQUIRREL;
 }
</pre>
<p>Converting between Java and PostgreSQL enums is straightforward.  For example, to insert or update an enum field you could use the CAST syntax in your SQL PreparedStatement:</p>
<pre class="brush: sql; title: ;">
INSERT INTO pet (pet_id, pet_type, name) VALUES (?, CAST(? AS animal_type), ?);

--or

INSERT INTO pet (pet_id, pet_type, name) VALUES (?, ?::animal_type, ?);
</pre>
<p>Postgres will also let you insert/update an enum just by passing its value as a string.</p>
<p>Whether casting or not, the Java side is the same.  You would set the fields like this:</p>
<pre class="brush: java; title: ;">
stmt.setInt(1, 1);
stmt.setString(2, AnimalType.DOG.toString());
stmt.setString(3, 'Rex');
</pre>
<p>Retrieving the enum from a SELECT statement looks like this:</p>
<pre class="brush: java; title: ;">
AnimalType.valueOf(stmt.getString(&quot;pet_type&quot;));
</pre>
<p>Take into consideration that enums are case-sensitive, so any case mismatches between your Postgres enums and Java enums will have to be accounted for.  Also note that the PostgreSQL enum type is non-standard SQL, and thus not portable.</p>
<p>Also, FYI, to view the set of values in a given Postgres enum type, you can use the following SQL query:</p>
<pre class="brush: sql; title: ;">
SELECT enumlabel FROM pg_enum
    WHERE enumtypid = 'your_enum'::regtype ORDER BY oid;
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/convert-between-java-enums-and-postgresql-enums/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web scraping in Java with Jsoup, Part 2 (How-to)</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-2-how-to/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-2-how-to/#comments</comments>
		<pubDate>Sat, 14 May 2011 15:17:18 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[blog analyzer]]></category>
		<category><![CDATA[jsoup]]></category>
		<category><![CDATA[spring]]></category>
		<category><![CDATA[web scraping]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=435</guid>
		<description><![CDATA[Web scraping refers to programmatically]]></description>
			<content:encoded><![CDATA[<p>Web scraping refers to programmatically downloading a page and traversing its DOM to extract the data you are interested in.  I wrote a parser class in Java to perform the web scraping for my <a href="/web-coding/programming/java-programming/a-blog-analyzer-project/">blog analyzer</a> project.  In <a href="/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-1/">Part 1</a> of this how-to I explained how I set up the calling mechanism for executing the parser against blog URLs.  Here, I explain the parser class itself.</p>
<p>But before getting into the code, it is important to take note of the HTML structure of the document that will be parsed.  The pages of <a href="http://andrewsullivan.thedailybeast.com/">The Dish</a> are quite heavy&#8211;full of menus and javascript and other stuff, but the area of interest is the set of blog posts themselves.  This example shows the HTML structure of each blog post on The Dish:<br />
<span id="more-435"></span></p>
<pre class="brush: xml; title: ;">
&lt;article&gt;
    &lt;aside&gt;
        &lt;ul class=&quot;entryActions&quot; id=&quot;meta-6a00d83451c45669e2014e885e4354970d&quot;&gt;
            &lt;li class=&quot;entryEmail ir&quot;&gt;
                &lt;div class=&quot;st_email_custom maildiv&quot; st_url=&quot;http://andrewsullivan.thedailybeast.com/2011/05/fac-5.html&quot; st_title=&quot;Face Of The Day&quot;&gt;email&lt;/div&gt;
            &lt;/li&gt;
            &lt;li class=&quot;entryLink ir&quot;&gt;
                &lt;a href=&quot;http://andrewsullivan.thedailybeast.com/2011/05/fac-5.html&quot; title=&quot;permalink this entry&quot;&gt;permalink&lt;/a&gt;
            &lt;/li&gt;
            &lt;li class=&quot;entryTweet&quot;&gt;&lt;/li&gt;
            &lt;li class=&quot;entryLike&quot;&gt;&lt;/li&gt;
        &lt;/ul&gt;

        &lt;time datetime=&quot;2011-05-12T23:37:00-4:00&quot; pubdate&gt;12 May 2011 07:37 PM&lt;/time&gt;
    &lt;/aside&gt;

    &lt;div class=&quot;entry&quot;&gt;
        &lt;h1&gt;
            &lt;a href=&quot;http://andrewsullivan.thedailybeast.com/2011/05/fac-5.html&quot;&gt;Face Of The Day&lt;/a&gt;
        &lt;/h1&gt;
        &lt;p&gt;
            &lt;a href=&quot;http://dailydish.typepad.com/.a/6a00d83451c45669e2014e885e4233970d-popup&quot; onclick=&quot;window.open( this.href, &amp;#39;_blank&amp;#39;, &amp;#39;width=640,height=480,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0&amp;#39; ); return false&quot; style=&quot;display: inline;&quot;&gt;
                &lt;img alt=&quot;GT_WWII-VET-JEWISH-110511&quot; class=&quot;asset  asset-image at-xid-6a00d83451c45669e2014e885e4233970d&quot; src=&quot;http://dailydish.typepad.com/.a/6a00d83451c45669e2014e885e4233970d-550wi&quot; style=&quot;width: 515px;&quot; title=&quot;GT_WWII-VET-JEWISH-110511&quot; /&gt;
            &lt;/a&gt;
        &lt;/p&gt;
        &lt;p&gt;
        A decorated  veteran takes part [truncated]
        &lt;/p&gt;
    &lt;/div&gt;
&lt;/article&gt;
</pre>
<p>Blog posts are each contained within an HTML5 <code>article</code> tag.  There is a <code>time</code> tag holding the date and time the post was published.  A <code>div</code> with class a<code>entry</code> holds both the title and body of the post.  The title is within an <code>h1</code> and also contains the permalink for the post.</p>
<p>Now, the code to parse this page.</p>
<p>The simple blog parser interface again:</p>
<pre class="brush: java; title: ;">
public interface BlogParser {
    public List&lt;Link&gt; parseURL(URL url) throws ParseException;
}
</pre>
<p>Now to talk about the implementation class: DishBlogParser.  The goal is to return a list of Link objects (a &#8220;Link&#8221; in this context represents one blog URL and its associated data).  DishBlogParser will extract the title and body text of each blog post along with the post date, images, videos, and links contained therein.  I&#8217;ll go through the class a section at a time.  Starting from the top:</p>
<pre class="brush: java; title: ;">
@Component(&quot;blogParser&quot;)
public class DishBlogParser implements BlogParser {

    @Value(&quot;${config.excerptLength}&quot;)
    private int excerptLength;
    @Autowired
    private DateTimeFormatter blogDateFormat;
    private final Cleaner cleaner;
    private final UrlValidator urlvalidator;

    public DishBlogParser() {
        Whitelist clean = Whitelist.simpleText().addTags(&quot;blockquote&quot;, &quot;cite&quot;, &quot;code&quot;, &quot;p&quot;, &quot;q&quot;, &quot;s&quot;, &quot;strike&quot;);
        cleaner = new Cleaner(clean);
        urlvalidator = new UrlValidator(new String[]{&quot;http&quot;,&quot;https&quot;});
    }
</pre>
<p>The excerptLength field defines the maximum length for post body excerpts.  The @Value annotation pulls in the value from a properties file configured in applicationContext.xml.</p>
<p>The blogDateFormat is a Joda formatter configured also in applicationContext.xml to match the date/time format used on The Dish.  It will be used to parse dates from HTML into Joda DateTime objects.  Here is how blogDateFormat is configured in applicationContext.xml:</p>
<pre class="brush: xml; title: ;">
&lt;bean id=&quot;blogDateFormat&quot;
         class=&quot;org.joda.time.format.DateTimeFormat&quot;
         factory-method=&quot;forPattern&quot;&gt;
    &lt;constructor-arg value=&quot;dd MMM yyyy hh:mm aa&quot;/&gt;
&lt;/bean&gt;
</pre>
<p>The Cleaner object is a Jsoup class that applies a whitelist filter to HTML.  In this case, the cleaner is used to whitelist tags that will be allowed to appear in blog body excerpts.</p>
<p>Finally, the UrlValidator comes from Apache Commons and will be used to validate the syntax of URLs contained within blog posts.</p>
<p>Now, for the parseURL method:</p>
<pre class="brush: java; title: ;">
    public List&lt;Link&gt; parseURL(URL url) throws ParseException {
        try {
            // retrieve the document using Jsoup
            Connection conn = Jsoup.connect(url.toString());
            conn.timeout(12000);
            conn.userAgent(&quot;Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0)&quot;);
            Document doc = conn.get();

            // select all article tags
            Elements posts = doc.select(&quot;article&quot;);

            // base URI will be used within the loop below
            String baseUri = (new StringBuilder())
                .append(url.getProtocol())
                .append(&quot;://&quot;)
                .append(url.getHost())
                .toString();

            // initialize a list of Links
            List&lt;Link&gt; links = new ArrayList&lt;Link&gt;();
</pre>
<p>Here, Jsoup is used to connect to the URL.  I set a generous connection timeout, because at times The Dish server is not very snappy.  I also set a common user agent, just as a general practice when requesting a web page programmatically.</p>
<p>On Line 7 the Document is retrieved&#8211;this is a DOM representation of the entire page.  For this project, only the blog posts themselves are needed.  Because each blog post is contained in an <code>article</code> tag, the set of posts is obtained by calling <code>doc.select("article")</code> (Line 10).  We&#8217;re about to loop through them, but first we need to define the base URI of our URL for something a bit further down, and also initialize the <code>List</code> which will hold our extracted <code>Link</code> objects.</p>
<p>Now, the loop.  It starts like this:</p>
<pre class="brush: java; title: ;">
            // loop through, extracting relevant data
            for (Element post : posts) {
                Link link = new Link();

                // extract the title of the post
                Elements elms = post.select(&quot;.entry h1&quot;);
                String title = (elms.isEmpty() ? &quot;No Title&quot; : elms.first().text().trim());
                link.setTitle(title);
</pre>
<p>First, an empty <code>Link</code> object is initialized.  Then we extract the title. Recall that &#8220;post&#8221; is a Jsoup element pointing to the <code>article</code> tag in the DOM.  <code>post.select(".entry h1")</code> grabs the h1 title tag, from which we get the title string.</p>
<p>In a similar fashion, we grab the URL and the date:</p>
<pre class="brush: java; title: ;">
                // extract the URL of the post
                elms = post.select(&quot;aside .entryLink a&quot;);
                if (elms.isEmpty()) {
                    Logger.getLogger(DishBlogParser.class.getName()).log(Level.WARNING, &quot;UNABLE TO LOCATE PERMALINK, TITLE = &quot;+ title +&quot;, URL = &quot;+ url);
                    continue;
                }
                link.setUrl(elms.first().attr(&quot;href&quot;));

                // extract the date of the post
                elms = post.select(&quot;aside time&quot;);
                if (elms.isEmpty()) {
                    Logger.getLogger(DishBlogParser.class.getName()).log(Level.WARNING, &quot;UNABLE TO LOCATE DATE, TITLE = &quot;+ title +&quot;, URL = &quot;+ url);
                    continue;
                }
                // parse the date string into a Joda DateTime object
                DateTime dt = blogDateFormat.parseDateTime(elms.first().text().trim());
                link.setLinkDate(dt);
</pre>
<p>Note that failure to extract the URL or date is unacceptable, a warning is logged, and further processing is skipped.  Note also on Line 16 <code>blogDateFormat</code> is used to parse the date string from the HTML into a DateTime object.</p>
<p>Next, let&#8217;s grab the body of the post and create an excerpt from it:</p>
<pre class="brush: java; title: ;">
                // extract the body of the post (includes title tag at this point)
                Elements body = post.select(&quot;.entry&quot;);
                // remove the &quot;more&quot; link
                body.select(&quot;.moreLink&quot;).remove();

                // remove the title (h1) now from the body
                body.select(&quot;h1&quot;).remove();
                // set full text on link, used for indexing/searching (not stored)
                link.setFullText(body.text());

                // create a body &quot;Document&quot;
                Document bodyDoc = Document.createShell(baseUri);
                for (Element bodyEl : body)
                    bodyDoc.body().appendChild(bodyEl);
                // remove unwanted tags by applying a tag whitelist
                // the whitelisted tags will appear when displaying excerpts
                String bodyhtml = cleaner.clean(bodyDoc).body().html();

                if (bodyhtml.length() &gt; excerptLength) {
                    // we need to trim it down to excerptLength
                    bodyhtml = trimExerpt(bodyhtml, excerptLength);
                    // we need to parse this again now to fix possible unclosed tags caused by trimming
                    bodyhtml = Jsoup.parseBodyFragment(bodyhtml).body().html();
                }
                link.setExerpt(bodyhtml);
</pre>
<p>Recall the body is contained in a <code>div</code> classed <code>entry</code>.  The body may contain a  &#8220;read on&#8221; link that expands the content.  That link, if present, is removed on Line 4.  The title <code>h1</code> tag is also removed, and the remaining text is stored on Line 9.  This full text is not destined to be stored in the database&#8211;instead it will be indexed by our search engine.</p>
<p>To create the excerpt, unwanted HTML tags must be removed.  This is where the Jsoup Cleaner comes in.   Because the Cleaner only processes Document objects, a dummy Document is created for the post (this is also where baseUri is used).</p>
<p>If, after processing the post body through the Cleaner, the length exceeds the excerptLength, it must be trimmed down to size.  The <code>trimExcerpt</code> method does this.  Because trimming might truncate closing HTML tags, Jsoup is used once more to parse the excerpt string, correcting any unbalanced tags.  Finally, we have our excerpt.</p>
<p>This is the trimExerpt method that is called on Line 21 above:</p>
<pre class="brush: java; title: ;">
    private String trimExcerpt(String str, int maxLen) {
        if (str.length() &lt;= maxLen)
            return str;

        int endIdx = maxLen;
        while (endIdx &gt; 0 &amp;&amp; str.charAt(endIdx) != ' ')
            endIdx--;

        return str.substring(0, endIdx);
    }
</pre>
<p>The idea is to use maxLen as a suggestion, and keep backing up until a space character is found. In this way, words will not be cut off in the middle.</p>
<p>Continuing the loop, next the links are extracted. They are represented by <code>InnerLink</code> objects.  Any invalid or self links are skipped.</p>
<pre class="brush: java; title: ;">
                // extract the links within the post
                List&lt;InnerLink&gt; inlinks = new ArrayList&lt;InnerLink&gt;();
                Elements innerlinks = body.select(&quot;a[href]&quot;);                

                // loop through each link, discarding self-links and invalids
                for (Element innerlink : innerlinks) {
                    String linkUrl = innerlink.attr(&quot;abs:href&quot;).trim();
                    if (linkUrl.equals(link.getUrl()))
                        continue;
                    else if (urlvalidator.isValid(linkUrl)) {
                        //System.out.println(&quot;link = &quot;+ linkUrl);
                        InnerLink inlink = new InnerLink();
                        inlink.setUrl(linkUrl);
                        inlinks.add(inlink);
                    }
                    else
                        Logger.getLogger(DishBlogParser.class.getName()).log(Level.INFO, &quot;INVALID URL: &quot;+ linkUrl);
                }
                link.setInnerLinks(inlinks);
</pre>
<p>Next, extract any images:</p>
<pre class="brush: java; title: ;">
                // extract the images from the post
                List&lt;Image&gt; linkimgs = new ArrayList&lt;Image&gt;();
                Elements images = body.select(&quot;img&quot;);
                for (Element image : images) {
                    Image img = new Image();
                    img.setOrigUrl(image.attr(&quot;src&quot;));
                    img.setAltText(image.attr(&quot;alt&quot;).replaceAll(&quot;_&quot;, &quot; &quot;));
                    linkimgs.add(img);
                }
                link.setImages(linkimgs);
</pre>
<p>Finally, extract any Youtube or Vimeo videos (the two most-popular types).  Note that this requires a more complex selector syntax (Line 2), in particular because over the years several different HTML codes have been used:</p>
<pre class="brush: java; title: ;">
                // extract Youtube and Vimeo videos from the post
                elms = body.select(&quot;iframe[src~=(youtube\\.com|vimeo\\.com)], object[data~=(youtube\\.com|vimeo\\.com)], embed[src~=(youtube\\.com|vimeo\\.com)]&quot;);
                List&lt;Video&gt; videos = new ArrayList&lt;Video&gt;(2);
                for (Element video : elms) {
                    String vidurl = video.attr(&quot;src&quot;);
                    if (vidurl == null)
                        vidurl = video.attr(&quot;data&quot;);
                    if (vidurl == null || vidurl.trim().equals(&quot;&quot;))
                        continue;
                    Video vid = new Video();
                    vid.setUrl(vidurl);
                    if (vidurl.toLowerCase().contains(&quot;vimeo.com&quot;))
                        vid.setProvider(VideoProvider.VIMEO);
                    else
                        vid.setProvider(VideoProvider.YOUTUBE);
                    videos.add(vid);
                }
                link.setVideos(videos);
</pre>
<p>Finally, the loop is finished; all data has been gathered. So this <code>Link</code> object is added to the <code>List</code>, end loop, and return:</p>
<pre class="brush: java; title: ;">
                links.add(link);
            }
            return links;
        }
        catch (IOException ex) {
            Logger.getLogger(DishBlogParser.class.getName()).log(Level.SEVERE, &quot;IOException when attemping to parse URL &quot;+ url, ex);
            throw new ParseException(ex);
        }
    }
</pre>
<p><b>In conclusion&#8230;</b></p>
<p>This post has demonstrated web scraping using the open-source Jsoup library.  Specifically, we loaded a page from a URL and used Jsoup&#8217;s selector syntax to extract the desired pieces of data.  In a future post, I will write about what happens next: the list of Links is processed by a service bean and stored in the database.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-2-how-to/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Disable URL session IDs (JSESSIONID) in Tomcat 7, Glassfish v3</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/disable-url-session-ids-jsessionid-in-tomcat-7-glassfish-v3/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/disable-url-session-ids-jsessionid-in-tomcat-7-glassfish-v3/#comments</comments>
		<pubDate>Thu, 12 May 2011 15:53:57 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[tomcat]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=429</guid>
		<description><![CDATA[URL-based session tracking is intended]]></description>
			<content:encoded><![CDATA[<p>URL-based session tracking is intended for web clients that do not support session cookies.  Every browser worth mentioning supports these cookies, and almost nobody surfs with them disabled.  Most web sites either state explicitly or assume that a user&#8217;s browser supports session cookies.  URL rewriting schemes that add the session ID as a parameter on every URL thus provide very little benefit, if any at all.  Session IDs showing up in URLs is just bad form, and may confuse search engine spiders.  Thankfully the Servlet 3.0 standard gives you two ways to disable URL session rewriting.  This works in Tomcat 7, Glassfish v3, and any other Servlet 3.0-compliant servlet container.</p>
<p>First, you can add this to your web.xml web-app config:</p>
<pre class="brush: xml; title: ;">
&lt;session-config&gt;
    &lt;tracking-mode&gt;COOKIE&lt;/tracking-mode&gt;
&lt;/session-config&gt;
</pre>
<p>Or programmatically, you can use:</p>
<pre class="brush: java; title: ;">servletContext.setSessionTrackingModes(EnumSet.of(SessionTrackingMode.COOKIE));</pre>
<p>I&#8217;ve used the web.xml method in Tomcat 7, and it works.  No jsessionid in the URLs when using &lt;c:url &#8230;&gt; in my JSPs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/disable-url-session-ids-jsessionid-in-tomcat-7-glassfish-v3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web scraping in Java with Jsoup, Part 1</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-1/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-1/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 18:55:22 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[blog analyzer]]></category>
		<category><![CDATA[jsoup]]></category>
		<category><![CDATA[spring]]></category>
		<category><![CDATA[web scraping]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=415</guid>
		<description><![CDATA[In order to obtain the]]></description>
			<content:encoded><![CDATA[<p>In order to obtain the data to feed into my <a href="/web-coding/programming/java-programming/a-blog-analyzer-project/">blog analyzer,</a> content must be parsed from the pages of the blog itself.  This is called &#8220;web scraping&#8221;.  <a href="http://jsoup.org/">Jsoup</a> will be used to parse the pages, and because this is a Spring project, <a href="http://static.springsource.org/spring/docs/current/spring-framework-reference/html/scheduling.html">Spring scheduling</a> will be used to invoke the parser.</p>
<p>The following classes were created:</p>
<ul>
<li>BlogRequest &#8211; invokes the parser on a given blog URL, passes parsed content to service layer</li>
<li>BlogRequestQueue &#8211; queues up and executes blog requests</li>
<li>BlogParser &#8211; interface with parseURL method</li>
<li>DishBlogParser &#8211; implements BlogParser, used to parse the blog <a href="http://andrewsullivan.thedailybeast.com/">The Dish</a></li>
</ul>
<p>Each of these (aside from the interface) is configured as a Spring-managed bean.  The code for BlogRequest:<br />
<span id="more-415"></span></p>
<pre class="brush: java; title: ;">
@Component(&quot;blogRequest&quot;)
public class BlogRequest {
    @Autowired(required=true)
    private BlogParser parser;
    @Autowired(required=true)
    private BlogService service;
    @Autowired(required=true)
    private BlogRequestQueue requestQueue;
    /**
     * blogUrl URL to pages of the blog, in a format like
     * http://andrewsullivan.thedailybeast.com/page/##/ where
     * ## stands in for the page number.
     */
    @Value(&quot;${config.blogUrl}&quot;)
    private String blogUrl;

    /**
     * Execute a blog request for a given page number, invoking the parser,
     * and passing the data to the service layer for further processing.
     * @param pageNumber
     */
    public void execute(int pageNumber) {
        System.out.println(&quot;executing blog request&quot;);
        try {
            List&lt;Link&gt; links = parser.parseURL(new URL(blogUrl.replace(&quot;##&quot;, String.valueOf(pageNumber))));
            for (Link link : links) {
                service.addLinkAsync(link);
            }
            //if page &gt; 1 and links not empty, queue a request for the next page
            if (pageNumber &gt; 1 &amp;&amp; !links.isEmpty())
                requestQueue.enqueue(pageNumber + 1);
        }

        catch (ParseException pe) {
            Logger.getLogger(BlogRequest.class.getName()).log(Level.SEVERE, null, pe);
            //reprocess
            requestQueue.enqueue(pageNumber);
        }
        catch (MalformedURLException mue) {
            Logger.getLogger(BlogRequest.class.getName()).log(Level.SEVERE, null, mue);
        }
    }
}</pre>
<p>The @Component annotation allows Spring to autowire this bean to other beans.  Here, we autowire in a parser, a service bean, and the request queue bean.  blogUrl is configured in a properties file and @Value is used to wire it in.</p>
<p>The execute method invokes the parser, which returns a List of Link objects.  A &#8220;Link&#8221;, in the context of this project, is a blog entry&#8211;specifically, its URL, posted date, excerpt, and assorted other data.  This list is passed to the service bean for further processing.  Provided certain conditions are met, the next page is added to the queue.  In this way, older blog pages will be scraped one by one until the oldest is reached.</p>
<p>The code for BlogRequestQueue is pretty simple:</p>
<pre class="brush: java; title: ;">
public class BlogRequestQueue {
    @Autowired(required=true)
    private BlogRequest blogRequest;
    private Queue&lt;Integer&gt; queue = new LinkedList&lt;Integer&gt;();
    /**
     * pageScanStart page number to initialize the queue to.
     * A value of zero or less will disable page scanning.
     */
    @Value(&quot;${config.pageScanStart}&quot;)
    private int pageScanStart = 0;

    @PostConstruct
    public void initializeQueue() {
        if (pageScanStart &gt; 0)
            queue.add(pageScanStart);
    }

    public boolean enqueue(int pageNumber) {
        return queue.add(pageNumber);
    }

    /**
     * Execute the next request in the queue
     */
    //@Scheduled(fixedDelay=30000l)
    public synchronized void executeNext() {
        Integer pageNumber = queue.poll();
        if (pageNumber != null)
            blogRequest.execute(pageNumber);
    }
</pre>
<p>The value of pageScanStart is set via @Value from a properties file.  The @PostConstruct annotation is used to mark initializeQueue() as an init method.  Note that the @Scheduled annotation on the <code>executeNext</code> method is commented out.  Rather than hard-code this setting, I moved the scheduling to applicationContext.xml to make the interval configurable.</p>
<p>BlogParser.java:</p>
<pre class="brush: java; title: ;">
public interface BlogParser {

    public List&lt;Link&gt; parseURL(URL url) throws ParseException;

}
</pre>
<p>The code for DishBlogParser is where all the interesting web scraping happens.  See my post <a href="/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-2-how-to/">web scraping Part 2</a> for the details.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/web-scraping-in-java-with-jsoup-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Blog Analyzer Project</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/a-blog-analyzer-project/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/a-blog-analyzer-project/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 17:51:35 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[blog analyzer]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=413</guid>
		<description><![CDATA[In the coming days, I]]></description>
			<content:encoded><![CDATA[<p>In the coming days, I will be writing about a project I&#8217;m working on which will perform analysis on Andrew Sullivan&#8217;s <a href="http://andrewsullivan.thedailybeast.com/">The Dish</a> blog, which is one of the most popular blogs on American politics.  The intent of the project, which will utilize such technologies as Spring 3, JSP/JSTL, JDBC, PostgreSQL, and jQuery/Ajax, is to web scrape the blog, extract key data elements, and reorganize and present this data in new and interesting ways.  Additionally, I will create a bookmarklet that will add value to the blog site itself.</p>
<p>Development tools used include Netbeans 7.0, Firefox with Firebug, and the always handy psql Postgres command line tool.</p>
<p>There are many interesting technical challenges involved, and I will write about them on this blog.  Additionally, there is the question of copyright law, which is an unavoidable concern when building off of content from a third party.  Copyright law was not meant to stifle innovation, though, provided certain criteria are met: the content originator must not be harmed in the marketplace, repurposed content must be transformed into a novel work, and small portions must be used.  I believe my project fits these criteria.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/a-blog-analyzer-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Profile SQL statements in Java / Spring</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/profile-sql-statements-in-java-spring/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/profile-sql-statements-in-java-spring/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 19:34:12 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[aspectj]]></category>
		<category><![CDATA[spring]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=392</guid>
		<description><![CDATA[Wouldn&#8217;t it be nice if]]></description>
			<content:encoded><![CDATA[<p>Wouldn&#8217;t it be nice if there were a way to time your application&#8217;s SQL statements unobtrusively? This information could give you insight into the performance of your queries and updates and help you identify slow, poorly-performing SQL. Of course, there is a way to add such SQL profiling to your Spring application, by using AspectJ.</p>
<p>I use Spring JDBC and wanted to identify slow SQL queries in my application so that I could tune them in order to improve overall performance.  Capturing the execution times whenever SQL is executed can be done by creating a pointcut on the methods of JdbcTemplate.  Here is what we need:</p>
<ol>
<li>aspectjrt.jar and aspectjweaver.jar from <a href="http://www.eclipse.org/aspectj/downloads.php">here</a>.</li>
<li>An aspect with a pointcut on the <a title="javadoc" href="http://static.springsource.org/spring/docs/3.0.x/api/org/springframework/jdbc/core/JdbcOperations.html">JdbcOperations</a> interface, which JdbcTemplate implements.</li>
<li>@Around advice that times the execution of JdbcTemplate methods and stores this data for later retrieval.</li>
<li>Configuration of Spring applicationContext.xml to get it working.</li>
</ol>
<p>The pointcut looks like this, with the String argument being the SQL statement:<br />
<span id="more-392"></span></p>
<pre>execution(* org.springframework.jdbc.core.JdbcOperations.*(String, ..))</pre>
<p>Here is the profiling java class, with imports omitted (call it SqlProfiler):</p>
<pre class="brush: java; title: ;">
@Aspect
public class SqlProfiler {
    private final Map&lt;String, SqlTiming&gt; sqlTimings;

    public SqlProfiler() {
        sqlTimings = Collections.synchronizedMap(new HashMap&lt;String, SqlTiming&gt;());
    }

    @Pointcut(&quot;execution(* org.springframework.jdbc.core.JdbcOperations.*(String, ..))&quot;)
    public void sql() {}

    @Around(&quot;sql()&quot;)
    public Object profile(ProceedingJoinPoint pjp) throws Throwable {
        long start = System.currentTimeMillis();
        Object obj = pjp.proceed();
        long time = System.currentTimeMillis() - start;
        String statement = pjp.getArgs()[0].toString();
        SqlTiming sqlTiming = null;
        synchronized(sqlTimings) {
            sqlTiming = sqlTimings.get(statement);
            if (sqlTiming == null) {
                sqlTiming = new SqlTiming(statement);
                sqlTimings.put(statement, sqlTiming);
            }
        }
        sqlTiming.recordTiming(time);

        return obj;
    }

    public List&lt;SqlTiming&gt; getTimings(final SortedBy sort) {
        List&lt;SqlTiming&gt; timings = new ArrayList&lt;SqlTiming&gt;(sqlTimings.values());
        Collections.sort(timings, new Comparator&lt;SqlTiming&gt;() {

            public int compare(SqlTiming o1, SqlTiming o2) {
                switch(sort) {
                    case AVG_EXECUTION_TIME:
                        return Math.round(o1.getAvgExecutionTime() - o2.getAvgExecutionTime());
                    case CUMULATIVE_EXECUTION_TIME:
                        long diff = o1.getCumulativeExecutionTime() - o2.getCumulativeExecutionTime();
                        if (diff &gt; 0l)
                            return 1;
                        else if (diff == 0)
                            return 0;
                        else
                            return -1;
                    case NUMBER_OF_EXECUTIONS:
                        return o1.getExecutionCount() - o2.getExecutionCount();
                }
                return 0;
            }

        });

        return timings;
    }
}
</pre>
<p>To explain this a bit, <code>sqlTimings</code> is a map where keys are the SQL statements, and values are of type <code>SqlTiming</code> (shown below). The @Around advice applies to all methods that execute SQL. The profile method simply times the execution and stores the result in the map. The getTimings method is how clients can retrieve the profiling data that has been gathered, with several different sort options.</p>
<p>Here is the <code>SqlTiming</code> class, each instance of which holds the data for one SQL statement:</p>
<pre class="brush: java; title: ;">
public class SqlTiming {

    private final String statement;
    private int count;
    private long cumulativeMillis;

    SqlTiming(String statement) {
        this.statement = statement;
    }

    synchronized SqlTiming recordTiming(long time) {
        count++;
        cumulativeMillis += time;
        return this;
    }

    public String getSqlStatement() {
        return statement;
    }

    public int getExecutionCount() {
        return count;
    }

    public long getCumulativeExecutionTime() {
        return cumulativeMillis;
    }

    public float getAvgExecutionTime() {
        return (float)cumulativeMillis / (float)count;
    }
}
</pre>
<p>Finally, we need to add the following to the Spring applicationContext.xml:</p>
<pre class="brush: xml; title: ;">
&lt;!-- enable AspectJ support --&gt;
&lt;aop:aspectj-autoproxy/&gt;

&lt;!-- declare the profiler class (shown above) as a bean --&gt;
&lt;bean id=&quot;sqlProfiler&quot; class=&quot;my.profiler.SqlProfiler&quot; /&gt; 

&lt;!-- JdbcTemplate must be a Spring-managed bean --&gt;
&lt;bean id=&quot;jdbcTemplate&quot; class=&quot;org.springframework.jdbc.core.JdbcTemplate&quot; p:dataSource-ref=&quot;jdbcDataSource&quot; /&gt;

&lt;!-- wire the JdbcTemplate into your DAO --&gt;
&lt;bean id=&quot;myDao&quot; class=&quot;my.dao.ExampleDao&quot; p:jdbcTempate-ref=&quot;jdbcTemplate&quot; /&gt;
</pre>
<p>It&#8217;s that simple. To view the results that have been gathered (cumulative execution time, average execution time, and number of executions), call the <code>getTimings</code> method. Here I&#8217;ve outputted some of the data via JSP:</p>
<p style="padding-left: 30px;">INSERT INTO link (link_id, url, link_date, title, exerpt, link_collection_id) VALUES (?, ?, ?, ?, ?, ?)<br />
<strong>Avg execution:</strong> 122.26ms<br />
<strong>Total # of executions:</strong> 34</p>
<p style="padding-left: 30px;">SELECT link_id, url, link_date, title, exerpt FROM link WHERE url = ?<br />
<strong>Avg execution:</strong> 123.39ms<br />
<strong>Total # of executions:</strong> 690</p>
<p style="padding-left: 30px;">UPDATE inner_link SET title = ?, refers_to_id = ? WHERE inner_link_id = ?<br />
<strong>Avg execution:</strong> 124.91ms<br />
<strong>Total # of executions:</strong> 526</p>
<p>And there you have it.  To review: we&#8217;ve created an Aspect to profile SQL statements in Java unobtrusively, wired up our profiler in the Spring config, and outputted the SQL timings for analysis.  Please leave a comment if this code can be improved!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/profile-sql-statements-in-java-spring/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Add custom annotation to Spring MVC controller</title>
		<link>http://www.gotoquiz.com/web-coding/programming/java-programming/add-custom-annotation-to-spring-mvc-controller/</link>
		<comments>http://www.gotoquiz.com/web-coding/programming/java-programming/add-custom-annotation-to-spring-mvc-controller/#comments</comments>
		<pubDate>Mon, 04 Apr 2011 23:07:39 +0000</pubDate>
		<dc:creator>James H. (admin)</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[mvc]]></category>
		<category><![CDATA[spring]]></category>

		<guid isPermaLink="false">http://www.gotoquiz.com/web-coding/?p=384</guid>
		<description><![CDATA[The question of how to]]></description>
			<content:encoded><![CDATA[<p>The question of how to add custom annotations to my Spring MVC controllers puzzled me for some time, because the documentation in this area is lacking. Even the bible of Spring 3.0, <a href="http://www.amazon.com/gp/product/1430224991/ref=as_li_ss_tl?ie=UTF8&amp;tag=4degreezcom&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1430224991">Spring Recipes by Gary Mak, et. al.</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=&amp;l=as2&amp;o=1&amp;a=1430224991" border="0" alt="" width="1" height="1" />, did not address the topic.</p>
<p>But then I found <a href="http://karthikg.wordpress.com/2009/11/08/learn-to-customize-spring-mvc-controller-method-arguments/">this great blog post</a> detailing exactly what I was interested in.  In a nutshell, you need to implement WebArgumentResolver and set your class as a customArgumentResolver of the AnnotationMethodHandlerAdapter bean.  What I was interested in was adding a @RequestAttribute annotation that would work like @RequestParam, but would obviously pull the value from a request attribute rather than a request parameter.<br />
<span id="more-384"></span><br />
Following the tutorial in the link above, I created this annotation:</p>
<pre class="brush: java; title: ;">package my.package;

import java.lang.annotation.Documented;
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target(ElementType.PARAMETER)
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface RequestAttribute {

    /**
     * The name of the request attribute to bind to.
     */
    String value() default &quot;&quot;;

    /**
     * Whether the parameter is required.
     * Default is true, leading to an exception thrown in case
     * of the parameter missing in the request. Switch this to
     * false if you prefer a null in case of the parameter missing.
     * Alternatively, provide a {@link #defaultValue() defaultValue},
     * which implicitly sets this flag to false.
     */
    boolean required() default true;

    /**
     * The default value to use as a fallback. Supplying a default value
     * implicitly sets {@link #required()} to false.
     */
    String defaultValue() default &quot;&quot;;
}</pre>
<p>And the corresponding argument resolver:</p>
<pre class="brush: java; title: ;">package my.package;

import java.lang.annotation.Annotation;
import javax.servlet.http.HttpServletRequest;
import org.springframework.core.MethodParameter;
import org.springframework.web.bind.support.WebArgumentResolver;
import org.springframework.web.context.request.NativeWebRequest;

public class RequestAttributeArgumentResolver implements WebArgumentResolver {

    public Object resolveArgument(MethodParameter param,
                                  NativeWebRequest request) throws Exception {

        Annotation[] paramAnns = param.getParameterAnnotations();

        Class paramType = param.getParameterType();

        for (Annotation paramAnn : paramAnns) {
            if (RequestAttribute.class.isInstance(paramAnn)) {
                RequestAttribute reqAttr = (RequestAttribute) paramAnn;
                HttpServletRequest httprequest = (HttpServletRequest) request.getNativeRequest();
                Object result = httprequest.getAttribute(reqAttr.value());

                if (result == null)
                    result = reqAttr.defaultValue();

                if (result == null &amp;&amp; reqAttr.required())
                    raiseMissingParameterException(reqAttr.value(), paramType);
                else
                    return result;
            }
        }

        return WebArgumentResolver.UNRESOLVED;
    }

    protected void raiseMissingParameterException(String paramName,
                                                  Class paramType) throws Exception {
        throw new IllegalStateException(&quot;Missing parameter '&quot; + paramName
                                        + &quot;' of type [&quot; + paramType.getName() + &quot;]&quot;);
    }
}</pre>
<p>Just wire it up in your Spring configuration:</p>
<pre class="brush: xml; title: ;">&lt;bean class=&quot;org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter&quot;&gt;
  &lt;property name=&quot;customArgumentResolver&quot;&gt;
    &lt;bean class=&quot;my.package.RequestAttributeArgumentResolver&quot; /&gt;
  &lt;/property&gt;
&lt;/bean&gt;</pre>
<p>You may already be doing something with AnnotationMethodHandlerAdapter, such as setting a binding initializer, so in that case just add the customArgumentResolver to your existing config.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.gotoquiz.com/web-coding/programming/java-programming/add-custom-annotation-to-spring-mvc-controller/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

