jQuery Differences in Firefox & Safari
I've had to do a fair amount of view layer work recently using the jQuery Javascript library, which is really a brilliant piece of work. However, I experienced a few difficulties in achieving consistent cross browser behavior, specifically within Firefox (2.0.x) and Safari (3.1.1). IE has been purposefully left alone for now. Nothing too egregious, but potentially time consuming to resolve...especially for people with rusty Javascript skills.
The attr function
The attr function allows for quickly updating a set of matched DOM elements. The following was my first attempt - it works without issue in Firefox:
jQuery("a[@id ^= '" + updateParentLink + "']")
.attr({class: "thickbox btn edit",title: "Edit Details"});
Running this in Safari caused a general and unfortunate Javascript parse error. Through trial and error, I discovered that Safari apparently does not like style (class) attributes being manipulated using the attr function. The fact that it causes a parse error is still not really clear to me. I got around this by simply using the jQuery functions addClass and removeClass, and only used attr to change the 'title' attribute.
The post function
I'm using the jQuery post (ajax) function for quick authentication within a view. The response of the POST operation is obviously important here, and again I was seeing different behavior in Safari compared to Firefox. The callback function that handles the response expects a (valid) URI to initiate another request; in Safari, the response URI was not being resolved right. The problem was that the URI in the response had an encoded line break character - originating from the server page responsible for passing back the URI. Firefox knew to remove it, Safari did not.
Upgrading Typo 4 to 5
This site used to be hosted at bluehost.com, but is now running on my own reasonably speedy Mac Mini running OS X. Bluehost is not a bad hosting provider, for the most part. But they upgraded to Rails 2.0 without notice, which broke Typo 4.0. I should have frozen the Rails version into Typo, and perhaps the Bluehost fine print includes something about doing system upgrades without notice. I was also not pleased with having to run FastCGI.
So, I upgraded to Typo 5 and moved away from the shared hosting environment. I also used the opportunity to become familiar with configuring a Rails production environment using a few popular industrial strength components, namely: Nginx, mongrel, and mongrel cluster.
I chose Nginx as the web server/reverse proxy component because 1) it appears to be very lightweight and fast, and 2) configuring and managing Apache can be very tiresome. The basic set-up has the mongrel cluster managing the spawning of multiple mongrel processes ("mongrels"), with Nginx load balancing and round robin'ing requests to the mongrels.
Here are the steps I took to do it. On OS X, Xcode must be installed.
- Install Rails 2.0, mongrel, mongrel cluster, and the mysql native bindings gem. I used the standard 'gem install --include-dependencies {gem-name}' to get these. Rails should probably be frozen to the application itself.
- Download and build Nginx. Instructions and more info can be found here: Nginx, my new favorite front end for mongrel cluster. One issue I ran into was that PCRE could not be found during the configure step. This article helped me install it.
- Edit nginx.conf. Again, the article mentioned above is a good reference. The key section is where the "upstream mongrels" are declared. To properly configure Nginx to start on boot, etc., refer to this article.
- Download Typo 5. Configure the mongrel cluster at the application root: mongrel_rails cluster::configure -e production -p 3000 -a 127.0.0.1 -N 3 This will create a config file that will be included to actually kick off the cluster. The -p 3000 sets the port and must match the "upstream mongrel" declaration in nginx.conf. In this instance, three (-N 3) mongrel processes are spawned. The mongrel cluster is started at the application root with: mongrel_rails cluster::start
- There were some manual edits and stylesheet replacements I did to retain what this site looked like in Typo 4...not too interesting to include the mundane details.
- Import the existing Typo database: mysql -uroot -p ossolab_typo < ossolab_typo.sql
- Run a migration at application root to reflect the changed model in Typo 5: rake db:migrate RAILS_ENV=production This actually failed for me with a strange mysql related error message: dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib I did some research and was able to proceed by creating a folder named 'mysql' in /usr/local/mysql/lib/ and then copying 'libmysqlclient.15.dylib' to it.
- The one disappointment I had was that managing the sidebars after the upgrade was broken. Not wanting to spend anymore time on the issue, I got around it by getting rid of the sidebar records from Typo 4, since they aren't a huge deal to me. The recent comments section on the right was added by directly writing that code in one of the view files.
Pursuing an RDF Epiphany
I've worked with XML and related web technologies for a while now, and I've struggled to fully grasp or otherwise grok the Resource Description Framework. It seems to have an unfortunate taboo of being too complicated, esoteric, or impractical. At first glance, it can appear to be a solution looking for a problem.
The main question that came up for me has always been why one would use RDF (or RDF/XML) instead of a plain, non-RDF XML vocabulary for data transfer, sharing, integration, etc. I'll attempt to answer that question...primarily for my own edification...and then a real world example will be briefly discussed to hopefully dislodge the lingering RDF monkey from my back.
Flexibility: RDF is the Model
The plain XML vs. RDF question was posed by Leigh Dodds
a while ago, with a few members of the RDF community responding.
One of the most concise answers was from Shelley Powers:
RDF is based on a domain-neutral model that allows one set of statements to be merged with another set of statements, even though the information contained in each set of statements may differ dramatically. Plain XML is hierarchical and only needs to be well-formed (and hopefully valid against a schema); extracting anything semantically within the document is dependent upon some shared, explicit understanding between consumers and producers of the XML.
In contrast, RDF is composed of simple statements (subject-predicate-object triples) which facilitate immediate consumption without having
to worry about structure or order (i.e, elements, child nodes, attributes, etc.).
RDF is the model. The processing of triples is highly predictable and static, reducing the effort involved when things change.
Plain XML has an ever varying model depending on the vocabulary - only its syntax remains the same.
What happens if a plain XML schema changes, structurally and/or semantically? Combined with an environment of distributed data and multiple parties involved
in owning or generating that data, the time and effort required to accommodate modifications could be quite high.
Efficient Integration of Decentralized Data
As alluded to above, perhaps the most significant aspect of RDF is how the basic triple model enables the merging and integration of decentralized data.
The processing of triples from two or more sources (and with different RDF vocabularies all together) can occur immediately thanks to namespaces.
Integration of decentralized data also requires the ability to uniquely identify resources.
RDF's reliance on URIs (which by definition and nature of the Internet must be unique) provides this uniqueness in a simple and elegant manner.
Graph Based Data Models
Data sets that adhere to a basic graph model are especially well suited for representation in RDF.
The simple hyperlinked characteristics of RDF
allow loose coupling and late binding of resources directly in the RDF model.
The object in a triple statement is often another resource with its own URL, effectively creating a relationship between resources that may not
reside in the same domain. In addition, the RDF construct rdfs:seeAlso offers extension and linking of other sets
of RDF data that exist elsewhere on the web.
Plays Nicely with RESTful Web APIs
RDF fits nicely with web services adhering to a REST architecture.
In an article about connecting social content services,
Leigh Dodds points out the complimentary features of RDF and REST:
as RDF uses URIs as the means of identifying resources, the API URL structure and the response format can be closely related.
Not that a plain XML vocabulary does not have a place in RESTful web APIs, but both RDF and REST are inherently "resource" centric
and can result in a more elegant and flexible service.
DOAP & the OSS Community
My interest in RDF has been piqued by the DOAP vocabulary created by Edd Dumbill, which describes open source software projects. A basic goal of DOAP is to allow people managing a project
to maintain and control project meta data on their own terms and in one place...and, in theory, avoid the time and effort involved in notifying various
repositories or services that updates have occurred.
An interesting project called DOAPspace started by Rob Cakebread is a DOAP repository being actively seeded from freely available project data from sources such as Freshmeat and SourceForge. Rob also has a solution (doapurl.org) to provide an authoritative source of DOAP project URLs following the model of Persistent URLs (PURLs). DOAP URLs will essentially be permanent, allowing authorized project members to edit the PURL-like DOAP URL if the actual project URL they control ever changes. DOAPspace can then reference doapurl.org managed URLs. These services are basically a platform for enabling the decentralized nature of DOAP and allowing project members to maintain their project data. However, there is one more critical piece here - notifying interested parties and services of DOAP updates. It involves an intermediary service called Ping the Semantic Web (PTSW). When DOAP is updated, the PTSW service can be pinged and the update event will be archived and time-stamped. DOAPspace (or any other service) can then use PTSW to learn of any DOAP update events.
ossmosis, a nascent web service I've briefly mentioned before, is in the same 'semantic' realm as it were and the role of DOAP with respect to ossmosis is evolving. The service is focused on contextual aspects of OSS projects and people, and we hope to contribute to (as well as benefit from) the emerging DOAP friendly OSS community. I hope to write a bunch more on this in the future when various pieces and thoughts have solidified.
More on the Software Stack & Components vs. Tools
Steve Parker has described a simple and logical classification scheme for components and tools related to software development. In essence, most software can be associated with a location on a vertical stack that represents low to high functionality. Low in this sense meaning closer to the hardware level (e.g. the Linux kernel), and high being related to a user facing, fully functional application. The stack is composed of a limited number of generic "way-point" categories, such as data or middle-ware, to establish reference points as one moves up or down the stack. This single hierarchy classification scheme should be sufficient in organizing a foundation or substrate of software without attempting to create the perfect categorization scheme (which is arguably impossible for any non trivial topic), or relying on complex or esoteric approaches to categorization.
We have designated the highest section of the stack as "packaged applications". This would be anything that is not a software component, and would not traditionally play a role in a living system. For example: Firefox, OpenOffice, etc. However, there is quite a bit of open source software that fits into that category of full user facing applications. It seems the stack would lose some of its value if all user-facing applications were at the top and not really differentiated in some way.
Consider MySQL Query Browser. It's a useful front-end tool for interacting with a MySQL database. If it were placed high on the stack in the miscellaneous range of "applications", its simple function as a database front-end would not be properly reflected. I would say that the MySQL Query Browser belongs near the MySQL database server itself in the "data" section, but with a slightly higher placement on the stack. This suggests that if a user-facing tool can be logically associated with a given range within the stack, then it belongs there. Ideally, the highest level "packaged applications" section would only contain applications that could not otherwise be reliably placed within lower ranges of the stack. The stack favors function over form.
There is one more simple addition to the stack model to help further classify software at all levels. We are viewing all software as a component or a tool. A component is a piece of software directly involved in the development process and plays a role at runtime; components participate in a living system. Examples: database servers, web development frameworks, code libraries, etc. A tool is a stand-alone, fully functional application or utility - such as MySQL Query Browser, but of course they do not have to be full GUI based applications (e.g. mytop).
I have not yet been able to think of something that falls into a gray area of both component and tool. Eclipse came to mind, but it is usually viewed as individual pieces of software anyway. Please let me know if you can think of something that fits both roles...
Saxon SQL Extension: Importing XML into a Relational Database 1
I finally started making time for another project that Mr. Parker and I have been discussing now for well over a year. Currently we're calling it ossmosis, and the purposefully vague description of it is a contextual resource for open source software - targeting both developers and less technical project manager types.
Research has led me to discover a really useful project called FLOSSmole, which is described as a "collaborative collection and analysis of free/libre/open source project data". I was originally planning on developing some sort of a crawler to retrieve this type of data myself, but luckily came across FLOSSmole before writing a single line of code.
Part of the FLOSSmole data is all freshmeat.net projects and their associated "troves", or facets. This is what I'm currently most interested in. However, the actual trove hierarchy (facet names) is currently not available in FLOSSmole, but hosted by freshmeat.net here (this is a large file so think twice before telling your browser to view it!)
On to the specific topic of this article - importing an XML document (the trove hierarchy) into a relational database (MySQL). There are a 1001 different ways to do this. You could probably do it in three lines of Ruby code. Maybe. I chose the approach of using XSLT and SQL extension functions available within Saxon. Why? Because all that is needed is a single template matching on <descriminator> nodes, which then fires <sql:insert> calls. A script written in (Perl, Ruby, Python, etc.) would have to establish the database connection, parse the XML tree and find the <descriminator> nodes, extract the values of the child elements under <descriminator>, construct SQL insert statements using those values, and then interact with the database API to do the actual inserts. SAX is another option but the SQL inserts would still have to be constructed, as well as interaction with a database API.
Here's the stuff:
ImportFMTroveDefs.xslt (includes SQL to create target table)
Saxon jars available below:
saxon8.jar saxon8-sql.jar
You will also need a JDBC driver. The best one for MySQL is here.
Example command line java call to run this:
java -Xmx84M -cp saxon8.jar:saxon8-sql.jar:mysql-connector-java-5.0.7-bin.jar net.sf.saxon.Transform fm-trove.rdf ImportFMTroveDefs.xslt driver="com.mysql.jdbc.Driver"
database="jdbc:mysql://localhost/ossmosis" user="ossmosis" password="ossmosis" datasourceid="81"
The JVM memory argument was necessary to boost the default due to the size of the 'fm-trove.rdf' file.
Also - watch out for empty string database user passwords - passing a parameter to Saxon from the command line can be funky for empty strings.
Finally, here is the dump of the data once imported into MySQL.
Generating RSS in Java Web Frameworks 1
Sportsvite, a web start-up focused on 1) connecting recreational sports enthusiasts and 2) facilitating communication and scheduling of sports activities (e.g team and league games), is expanding the role of RSS across the site. The most recent addition is a feed of the Sportsvite classifieds section. For example, if I wanted a list of all of the soccer teams looking for more players in my area by zip code, the relevant url would be: http://www.sportsvite.com/xml/rss/listings?type=1&zip=20009&sportId=14
The task of generating and serving up RSS first involves initiating the appropriate query to the persistence layer from a given HTTP GET request. The query string parameters in the example above are defining the "type" of classified listing (teams looking for players), the zip code, and the associated sport identifier with the team. Sportsvite is built with the popular Java framework Struts and simply requires an Action class mapped to the endpoint url.
Hibernate is the object-relational mapping and persistence layer of choice, so the query for soccer teams looking for players in northwest Washington DC will return a set of POJOs derived from the data model for Sportsvite classifieds. For each object in the result set list, an instance of a class representing an individual RSS "item" is instantiated. This RSS object is then populated with "get" method calls on the object from the result set list. The RSS class is very basic and defined as:
public class RSS {
public RSS() {}
private String title;
private String link;
private String guid;
private String pubDate;
private String description;
// (getter and setter methods...)
As an example, populating the pubDate involves creating the RFC822 date format as per the RSS specification.
Using java.text.SimpleDateFormat, an RFC822 date string can be formatted with:
SimpleDateFormat RFC822DATEFORMAT = new SimpleDateFormat
("EEE', 'dd' 'MMM' 'yyyy' 'HH:mm:ss' 'Z",Locale.US);
The RSS object would then be populated with:
Date createdOn = (Date)listing.getCreatedOn(); rssItem.setPubDate(RFC822DATEFORMAT.format(createdOn));...where 'listing' is the name of the result set object and 'rssItem' is the instance of the RSS class.
As RSS objects are created and populated, they are collected in a list. The next step is to convert this list to XML using the ever powerful Castor marshalling tool. Castorization will produce an XML representation of the list data structure containing the RSS objects, which will then be transformed to the actual RSS feed. Here is how Castor is called:
private Document marshallRSSXML(ArrayList RSSList) {
Document doc = buildDocument();
try {
Marshaller.marshal(RSSList, (Node)doc);
} catch (MarshalException e) {
e.printStackTrace();
} catch (ValidationException e) {
e.printStackTrace();
}
return doc;
}
The Document object is of type org.w3c.dom, and the 'buildDocument' method simply creates an empty Document (using javax.xml.parsers.DocumentBuilderFactory)
to accept the result of the 'marshal' call.
The last step involves passing this XML Document in memory to a separate servlet of the web application that functions as an XSLT serialization service. In this case, it is serialization in the sense of sending the output of the XSLT engine over HTTP to the web client. The XSLT servlet is very straightforward and the key lines of code are below:
DOMSource source = new DOMSource(doc);
transformer.transform( source,
new StreamResult(response.getOutputStream()));
The javax.xml.transform.dom.DOMSource instance is created using the generated Document object available in memory
(stored in the session by the forwarding Action class).
The javax.xml.transform.Transformer instance is grabbed from a pre-compiled cache of XSLT modules, and using the
'getOutputStream' method on the javax.servlet.http.HttpServletResponse of the main 'service' method of the
servlet class allows us to send the transformation output to the requesting web client.
The XSLT is not even worth including as it just matches on '/array-list' (the root of the Castor generated XML document), creates the top level RSS 'channel' elements from passed in parameters, and applies-templates on <RSS> elements to create <item> elements.
XML Modeling for an ESB - Part 2 3
In XML Modeling for an ESB - Part 1, a simple and modular approach to building a canonical XML vocabulary was described. Core schemas represent a data model across an organization or enterprise without worrying about specific systems or applications; this is essentially a data dictionary encoded in XML Schema, split up by logical and/or functional area. For example, a core schema in the city government case study I will be using is Property.xsd, which captures data items across systems that can be associated with the logical or functional area of "property" (as in physical buildings and houses that are owned by someone or some organization).
Derived from the core are the context schemas which are directly related to specific applications or systems (to be integrated as consumer and/or producer services within the enterprise service bus). The context schemas that define request and response instance documents for ESB clients are referred to as external schemas, and these would be referenced by WSDL (for fans of SOAP and related technologies). Internal schemas define the XML message documents that are pulled or pushed among specific systems on the bus itself. One simple example of this is to consider what represents a single record of interest from a producer system. This internal schema references a core schema to grab elements under the Permit.xsd core model, and in turn is included by its corresponding external schema to define a response document.
By the way, the basic concept of core vs. context is a document engineering fundamental - more information available at Doc or Die.
Despite not being convinced of the practicality of SOAP anymore, SOAP over HTTP and WSDL are used in my current ESB architecture and development project. Reasons for this include the specific ESB product being used does not allow for simple HTTP GET requests (!), the product seems to be geared towards SOAP client calls, and using existing tools such as the Axis WSDL2Java facilitate a direct reliance on the schemas to interact with the ESB.
This excellent write-up by Dennis Sosnoski concerning WSDL2Java, Castor, and Axis was the basis for how I developed client interfaces to the bus directly from the schemas.
The work-flow involved in building Java, SOAP based web service clients to interact with ESB exposed services consists of the following: ESB internal development, core schema modifications, external and internal context schema authoring, WSDL authoring, and client code generation. To avoid numerous schema inclusions in WSDL, a wrapper schema was created to allow the WSDL to include a single document.
Here is the full set of relevant files; (the Ant script for the client stub generation is heavily based on available code from the previously mentioned article at sosnoski.com).
WSDL
External Context Schema wrapper (all schemas can be found from here)
build.xml
ERD
Removing Extraneous Namespace Declarations - By Force
In developing XML->XML transformations (XSLT 1.x), I've frequently experienced unnecessary ancestor namespace declarations propagating to child elements of the output tree (in Saxon, and Xalan too, I believe). The ultimate consumer of the resultant XML document usually wouldn't care; I just viewed it as annoying behavior...highly annoying, but not a show stopping problem.
The exclude-result-prefixes attribute placed at the root of a stylesheet never seems to perform as advertised, that is - exclude the namespace declarations for the following space separated list of prefixes (the value of the attribute). It is working as it should...but only for literal result elements. Literal result elements, by the way, are elements explicitly placed in the XSLT module to be written to the output (e.g using <myelement> vs. <xsl:element>). There is a decent example here, with a nice comment from Michael Kay.
So, when using <xsl:copy> and <xsl:copy-of>, the processor is going to output any ancestor namespace declarations of the input document it feels it should copy (more detailed explanation of why this is the case is out there, but I'm not going to touch for now).
For various reasons, let's say you really, really wanted to prevent the extraneous namespace declarations when transforming a particular XML document.
A template matching node() and two separate templates matching attributes and text() function as a modified version of the traditional identity template. The node() template creates entirely new elements using <xsl:element>, with an explicit namespace setting of the context node's namespace URI. Thus, the newly created element will be free of any propagated namespace declarations from ancestors.
The template matching/copying only attributes is separate, as the behavior/output of the node() template should not be triggered for attributes.
The template matching text() is separate for the same reason, and requires a priority to override the selection between node() and text() templates when a text node is encountered.
<xsl:template match="node()">
<xsl:element name="{local-name()}"
namespace="{namespace-uri()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:copy/>
</xsl:template>
<xsl:template match="text()" priority="1">
<xsl:value-of select="."/>
</xsl:template>
On a slightly related note, here is a classic Michael Kay mailing list response.
XML Modeling for an ESB - Part 1
Enterprise Service Bus (ESB) technology has arguably become the latest and greatest approach to enterprise integration and SOA. An ESB provides a standards based solution for building a service oriented platform. Systems and applications targeted for integration (either as consumers, producers, or both) only need to worry about "getting on the bus".
David Chappell's book stresses the importance of a canonical XML format. An ESB domain, or instance of a bus implementation within a specific environment/organization, must speak a common language between all systems on the bus. This common language can be some existing XML vocabulary or standard, or modeled from scratch using sound document engineering (this is the approach I took for various reasons).
One of the most prevalent and common problems that large organizations could face is data redundancy or inefficient data sharing. New systems and applications can become information silos over time; the quickest and cheapest solution often forgoes proper research and analysis to avoid creating redundant data.
The canonical XML format or vocabulary defining message content within an ESB IS the solution to the simple yet potentially severe data sharing problem. The vocabulary is a representation of the universal set of data items across the enterprise, agnostic of any specific system or application (i.e. the XML encoded model need not worry from which applications the elements were pulled). Commonalities across systems must be identified and defined only once, or else data redundancy or ambiguity could be exacerbated by the very solution trying to solve it.
This application agnostic, lowest level XML model is referred to as the core. The core model documents (I use XML Schema) should be 1) in the same namespace and 2) split up by functional or business area. Number one also implies to in fact use a namespace for your ESB domain's XML vocabulary. Why? Because of the most basic reason namespaces exist - to avoid element collision (if and when inter-domain ESB integration occurs) and to associate some sort of an identity to the vocabulary. Number two just avoids creating an unwieldy and huge document - simple modularization. For the set of common data items that span functional areas (and thus your set of core schemas), a separate document should be created. An example of a data item that would be placed in this common schema are globally unique identifiers shared by two or more applications.
The next level of ESB vocabulary modeling above the core is the context in which elements from the core are referenced to interface with a specific system or application. The context can be further segmented into external and internal documents. External refers to ESB request & response message documents (and could be included in WSDL). Internal directly corresponds to a specific system or application's XML encoded data flowing through the bus.
The next write up on this will include a specific, real world example on how this approach was actually implemented.
XSLT 1.0 string reversal & tail recursion
The need or desire to reverse a string in XSLT (1.0) might be rare, but I have found it useful in the past to manipulate or deal with characters at the end. However, I will admit that I could not immediately determine the exact purpose for its use looking at some older code I had written - having the input XML would have helped.
Here was my first attempt at writing a named template to do reversal:
<xsl:template name="reverseString">
<xsl:param name="inputStr"/>
<xsl:variable name="strLength"
select="string-length($inputStr)"/>
<xsl:choose>
<xsl:when test="$strLength < 2">
<xsl:value-of select="$inputStr"/>
</xsl:when>
<xsl:when test="$strLength = 2">
<xsl:value-of select="substring($inputStr,2,1)"/>
<xsl:value-of select="substring($inputStr,1,1)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring
($inputStr,$strLength,1)"/>
<xsl:call-template name="reverseString">
<xsl:with-param name="inputStr"
select="substring($inputStr,1,$strLength - 1)"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
I was curious how XSLT 2.0 would accomplish this, and consulted my copy of O'Reilly's XSLT Cookbook, 2nd Edition by Sal Mangano (the solution, by the way, is only three lines, using codepoints-to-string and string-to-codepoints).
Included in the book is a section on string reversal - with a close to identical recursive template example to the one above that is deemed "an ineffcient tail recursive implementation". This led to me look up tail recursion as the term was not totally familiar to me. Tail recursion involves a function calling itself as the very last step in the function - allowing the existing underlying stack frame to be used instead of creating a new one (the recursion is handled by iteration). I believe most major XSLT engines are optimized for tail recursion - Saxon (my favorite) definitely is.
Doing the recursive, self call at the end of a named template seems natural and logical to me anyway - without knowing the processor is optimized for it.
Back to my inefficient reversal function - it's inefficient because each recursive call is only repositioning a single character. I failed to heed CS 101 (or maybe 201) basics of dividing and conquering, attempting to cut work in half.
Sal's most efficient solution to XSLT 1.0 string reversal is fairly similar to the one above, but with two tail recursive calls, each one reversing one-half of the input string.