It began like this: Amazon.com’s vast product catalog contains many clever and unique items, the sort that you may not know you wanted until you’ve heard of it. Alternately, these items might make an ideal gift when shopping for the person “who already has everything”. So I figured it would be a neat idea to curate a collection of these items and build a gift recommendation site around them. Doing so would allow me to explore some new server-side technologies and help keep my skills fresh.
- Ubuntu VPS from Linode
- Apache2 HTTP server
- Apache Tomcat 7
- Apache ActiveMQ
- Spring 3.1 with Spring MVC
- PostgreSQL 9.3
- HTML5 + CSS3
- jQuery, flot, TinyMCE
- Java libraries such as Joda Time, Jsoup, Jackson, Logback, and Commons DBCP
- Amazon Product Advertising API
- Reddit REST API
To get some data populated in the database as a starting point, I set up a scheduled task to pull data from several Reddit forums where Amazon links are shared. Reddit conveniently makes this data available via their REST API. All products discovered in this way are set to unapproved status pending manual review.
Next, I set up another scheduled task to populate and refresh metadata about the products via Amazon’s Product Advertising API. Per Amazon’s terms in order to display price information, this data has to be refreshed hourly. For efficiency I request data on batches of ten products at a time, which is the maximum limit.
I created a manual process for reviewing and approving products to be shown. This process includes writing a custom description, adding relevant tags (e.g. “For Kids” or “Gadget Lovers”), and setting an age range and gender specificity (if applicable).
Spring 3.1 ties it all together. Spring MVC handles the front-end. Spring JDBC is used for interacting with PostgreSQL. I could have used Spring’s event system, but I wanted to get some experience with ActiveMQ. There are a number of message senders and listeners set up for events such as “price changed” or “product suggested”.
I’ll probably think of a snappier name eventually, but for now I registered http://spectacular.gift (new “.gift” TLD). Have a look if you like! It’s basically in beta, and I’m still adding new products and tags.
The adoption of CSS3 has provided many possibilities for creating beautiful visual effects without using image files. Here is an example: on an early iteration of the design of the various headings on GoToQuiz, I used this sprite combined with CSS to create the rounded color bar appearance.
Let’s say you have a set of URLs and you want the web page titles associated with them. Maybe you’ve data-mined a bunch of links from HTML pages, or acquired a flat file listing URLs. How would you go about getting the corresponding page titles, and associating them with the URLs using Java?
You could use an HTML parser such as Jsoup to request the HTML document associated with each URL and parse it into a DOM document. Once obtained, you could navigate the document and select the text from the title tag, like so:
String titleText = document.select("title").first().text();
Elegant, but a lot of overhead for such a simple task. You’d be loading the whole page into memory and parsing it into a DOM structure just to extract the title. Instead, you could use the Apache HTTP Client library, which provides a robust API for requesting resources over the HTTP protocol. But it would be unnecessary in this case. Let’s keep it simple and rely only on the java standard library.
It is especially important, if you allow any HTML at all in user-submitted content, to sanitize that content by actually parsing the HTML and filtering it for any tags or attributes you wish to exclude. If you fail to do so, your site may be vulnerable to XSS (cross-site scripting) attacks.
Q: “But isn’t it overkill to parse the HTML? Can’t I use other techniques, such as regular expressions or simple string replacement, to filter out dangerous tags and attributes?” A: No, and I’ll explain why. Read more