Posted by Ajay
Lately I have been instrumental in converting our GIS web based functionality into more modularized service based functionality. A year back, our application had a very tightly coupled custom built WMS/WFS server. Imagine being constrained to only using the client that our web application provided, imagine a rogue application that did not follow the WMS/WFS specification, thats what we had - a tightly coupled rogue. This was fine as long as we serviced just one client. But we could not hope to market our product unless we could shift to a more service oriented framework, or atleast a more modular one.
That is when we envisioned the idea to make our GIS more service oriented and modular. This laid the beginning of our trailblazing path towards geoserverization. Now after a couple of months of effort we have a WMS/WFS server that can be used as a GIS service from any client the user prefers to use. All was nice and dandy with this world. Until the day the user wanted more.
One fine day our users decided they wanted to edit the layers on the fly. They wanted to change layer styles through an admin interface in our web application. They even went all the way to asking us to provide functionality, in the admin GUI, to allow them to modify a layer’s data and update it on the fly, as the application was running. Guess who came to my rescue, none other than the REST functionality packaged within geoserver. The rest of this post gives you an introduction to REST and a quick info about geoserver’s REST interface and how I implemented our functionality using REST.
What are thou REST ?
REST or Representational State Transfer is an architectural style, it is not a design pattern nor a true architecture, but just a style, a set of rules that any architecture should conform to. It defines a common interface set that any REST based design can interpret and use. REST revolves around the use of URIs and the use of the HTTP protocol.
REST provides a lot of inbuilt security safeguards since the method that is used can be an indicator of the action that is to be performed – a GET is safe, POST, PUT and DELETE are relatively unsafe. REST provides authorization and authentication through the web server’s inbuilt functionality.
Key Concepts in REST
- Application state is called as a resource and is identified by a URI. Resources communicate through a standard interface, HTTP. Ex: book is a resource concept. http://myserver.com/book is the URI that returns a file which is a representation of the book resource.
- Information about a resource is called state and is encoded as part of the resource; these encodings along with the resource are called representations of the resource. Ex: http://myserver.com/book/book1 with an Accept header of text/xml is a particular representation of the resource, which returns the data as XML.
- Representations are linked together through hypermedia or links. Ex: I can access readers of a book as follows: http://myserver.com/book/users
- Invoking a REST interface method transfers the state of the resource through representations.
- The key set of standard interactions are
- GET – it says give me the information in my chosen format
- POST – it says add information that I give you
- PUT – it says replace my information with what you have currently
- DELETE - it says delete the information that you have
- REST can evolve with the standard interactions staying the same, new document types can be created and yet the interactions stay the same.
- It extends the statelessness that HTTP provides, every request is completely encapsulated to perform it’s task independent of previous tasks
REST and Geoserver
Geoserver has been providing a very efficient REST based interface to allow access to geoserver’s admin functionality. Geoserver’s REST defines the following key concepts:
- workspace – A group of data stores and feature types. This is equivalent to a namespace.
- datastore – A source of vector based spatial data. It may be a PostGis datastore, shapefile or even a WFS server.
- featuretypes – A vector spatial resource that originates from a data store
- coveragestore – A source of raster based spatial data.
- coverages – A raster based dataset with a coveragestore as the data store
- styles – describe how a resource should be symbolized or rendered
- layers – it is a published resource, equivalent to a feature type or coverage
- layergroups – grouping of layers accessible under a single resource name
One of the key aspects of using REST in my project was to implement two major functionalities:
Provide users a way to dynamically set user styles on the fly.
There are many ways to implement this. The WMS specification allows for SLD and SLD_BODY parameters to be provided as part of the GetMap request. Unfortunately these two methodologies would not work if users store their data in a database. One of the main problems with the SLD parameter is that it requires a SLD file to be created on the server, on the fly which was not an option for us. So. Moving on to SLD_BODY. I was so happy when I implemented this, and it worked, ooooh but I got happy a bit too early. Unfortunately for me I hit upon a restriction in IE, the dreaded GET request length, something I did not even think of since I primarily use Firefox, but unfortunately users use IE. Finally I decide to settle on using the REST interface provided with geoserver.
I decided to use Jersey as my RESTful client API. Jersey builds on a builder pattern to simplify the task of creating a Client, Resources and invoking the RESTful methods, all in a couple of lines of code. At it’s base it uses apache’s HttpClient to perform it’s task with multithreaded efficiency.
A simple PUT as follows:
PUT http://myserver/geoserver/rest/styles/layerName with the SLD string as the request body performed the work for me – modify existing layer’s style with the new style information.
Status codes play a big role in defining how the process was performed. The major ones to be really concerned about are
- 200 – OK - if GET, PUT executed fine
- 201 – Created – if POST executed fine
- 403, 404, 405 – if resource could not be updated for some reason
Allow users to be able to modify a layer
Imagine that you have an application and the user wants to be able to modify a layer on the fly, maybe he has his own data that he wants to upload. Now imagine a restriction that you cannot use the geoserver admin interface. How would you do this? REST baby REST!!
My requirement was to change the originating datastore from a postgis to a shapefile. This was as easy as a PUT to create a new datastore, a DELETE to delete the existing the featuretype and layer, a POST to create a new featuretype, tadaaa…, as easy as pie, all courtesy of the REST API.
How is REST different from RPC
- RPC application is exposed as a network object with one or more exposed public functions, so before communication client must have information about the network object. In REST, this is accomplished through hyperlink relations between objects. RPC uses named operations while REST uses named resources.
- RPC like SOAP can have support for multiple protocols, REST depends on HTTP.
- RPC invariably involves some concept of transaction like WS-AtomicTransaction, REST on the other hand follows HTTP phiolosophy, i.e. deal with failures through retries
- REST provide more diverse formats like JSON,a dn better scalability since it can be cached, SOAP requests cannot be cached
References
- http://rest.blueoxen.net/cgi-bin/wiki.pl
- http://tomayko.com/writings/rest-to-my-wife
- http://en.wikipedia.org/wiki/Representational_State_Transfer
- http://www.infoq.com/articles/rest-introduction
Tags: gis, java, jersey, REST
Posted by Ajay
I have been working on implementing an interesting application to manage all the photographs (I will talk more about this project in a later post). In the process, I have been thinking about a feature I want to add to my application which involves alerting users using my web application to be alerted as and when new additions are made to my photo repository. I came across Grizzly Comet and felt an inane desire to try this out on my application, so here is my technical knowledge sharing about this experience
Let’s talk NIO
Before we talk about Grizzly and Comet, let’s try to understand where this whole concept started. Java 1.4 introduced a collection of Java programming API’s called New I/O with features for intensive I/O operations and was developed as part of JSR 51. The main purpose of NIO is to have an implementation that would allow use of the more efficient underlying platform implementation of I/O. A single thread can operate on a bunch of connections instead of having one thread per connection, which provides high degree of performance. This API provides variety of features that include buffers for primitive data types, character set encoders and decoders, pattern matching features, channels. Details about NIO is probably for another post.
Introducing the “Comet” philosophy
Another interesting concept is what is called Comet application model. The basic idea is that a web server would push HTTP data to a browser without the browser requesting the data. The Comet approach typically uses Ajax with long polling to achieve it’s task.
A simple diagram to illustrate the Comet methodology as opposed to the web application model is as below

Web applications have evolved through various stages, as depicted in the following diagram

In the Page by page model, that most web applications started off from and a model that many still follow, every new browser page would send a request to the web server. So essentially we had to refresh the whole page every time a small bit of data on the page changed, which started to become a real pain.
Thus began the era of Ajax where any new information would take the form of Ajax requests to the server to update the web page asynchronously. Web apps would poll the servers periodically for new information, but still the point of the matter is that users have to periodically hit the server to request information.
Finally Netscape introduced the concept called web server push that has evolved into the comet programming model. Essentially imagine having the benefit of server to client messaging without issues of fat clients.
Imagine a real time event like a baseball game or the stock market scenario, when users would want to be kept informed of events happening during the game or certain stock market events. Such scenarios cannot be implemented using traditional web application methodologies. This is where Comet or Reverse Ajax methodologies fit in.
Comet is a collection of technologies that provides the functionality of web server push through persistent HTTP connection. Comet can be implanted using Streaming where the browser opens a single persistent connection to the server for all server events or long polling. Many applications use the comet model, some of them include Meebo, Gmail chat, Jotspot, ICEFaces JSF framework. The Comet approach involves a departure from the usual web based platform approach. It may involve storing some kind of state information about the clients who wish to receive notifications on the server., similar to messaging systems. This is the reason most Comet based approaches rely on custom adapted application servers. In Java Jetty and Grizzly have support for Comet based approaches. This type of design is also being introduced through the continuation concept in other platforms. But how does the browser itself stay in contct with the server applications? Some approaches include long polling, dynamic script tags and Iframes which are all non standards. Many client side Comet designs rely on frameworks to iron away incompatibilities, Dojo for example is an Ajax/Comet implemntation
What is Grizzly?
Writing scalable server application has been a big task always. Threading issues caused issue with scaling. This prompted Grizzly to make it’s appearance. It is a HTTP Connector based on NIO that ships with Glassfish. It is designed to replace Apache Tomcat’s Coyote connector. All Java based web connectors have scalability limited by the number of available threads. This is where Grizzly improves by providing plug ability of any kind of thread pool.
Grizzly essentially is based on a task based architecture where each task represents an operation. Every task executes on it’s own thread pool or a shared one. The main entry point is Pipeline which has nothing in common with the Catalina Pipeline. The Grizzly Pipeline is essentially a Thread Pool Wrapper and is responsible to execute a task. The SelectorThread is another important component where the NIO selector is created. When processing a request, the SelectorThread will create Task instances and pass it on to the Pipeline. There are three types of Tasks - Accept Task - to handle NIO OP_ACCEPT, ReadTask/AsyncReadTask/ReadBlockingTask - to handle OP_READ, Processor Task - to handle OP_WRITE. This SelectorThread can either create one Pipeline per Task or share a Pipeline among tasks.
Grizzly has introduced Comet support and is implemented on top of the Asynchronous Request Processing extension of Grizzly.
So thats that for my technical sharing session for today, in my next post I will give a brief intro to the Comet API and talk about how I used it in my project. Ciao, for now!
References
http://alex.dojotoolkit.org/2006/03/comet-low-latency-data-for-the-browser/
http://weblogs.java.net/blog/jfarcand/archive/2005/06/grizzly_an_http.html
http://www.pathf.com/blogs/2006/06/infrastructure_/
http://searchsoa.techtarget.com/tip/0,289483,sid26_gci1301487,00.html
http://weblogs.java.net/blog/jfarcand/archive/2006/01/introduction_to.html
Tags: comet, grizzly, java
Posted by Ajay
For the last few months I have been looking at performance improvements for my application on multiple ends. One is at the GIS end and the other is of course the database end. Caching is a great way to provide the performance improvement. Caching on the GIS end was an interesting exercise I implemented, which is a story for another day.
Today let me pen my thoughts on improving performance at the Hibernate level. Hibernate has many performance improvement techniques, of course we have implemented a small sub set of that for our task. Let me first talk about Hibernate’s performance improvement strategies. If you want to take it all in at a glance, take a look at this mind map image (maybe click on it to enlarge it)

First we need to understand that
- Sessionfactory is an immutable thread safe factory that initalize JDBC connections, connection pools and create Sessions.
- Session is a non thread safe single unit of work that represents a transaction
Caching, a blessing in disguise
Caching reduces traffic between the database and application by conserving data that has already been loaded into the application. Caches store data that was already fetched so that multiple accesses on the same data takes lesser time. Essentially caching reduces disk access, reduces computation time and speeds up response to users.
Hibernate uses three levels of caching.
- Level 1 mainly caches at the Session level
- Level 2 cache does it as the SessionFactory level.
- Query cache
Hibernate uses Level 1 cache to mainly reduce the number of SQL queries. It is always the default cache. If there are several modification on the same object it will simply generate a single SQL query for this. The level 1 cache is usually restrained to be for a single session, it is short lived. Essentially the general idea behind the fist level cache is that it batches queries.
A Level 2 cache is designed to interoperate between sessions. Level 2 cache is usually recommended when we are dealing with read only objects. It is not enabled by default. It is conceptually a map that has the id of the object as the key and the set of attributes the entity has as the value.
The Query cache is not on by default either. It uses two cache regions -
- StandardQueryCache - stores the query along with the parameters as key to the cache region. So any subsequent queries with the same key will hit the query cache and retrieve the object from the cache
- UpdateTimeStampsCache - tracks the timestamps of the most recent updates to particular tables to identify stale results
Remember all this caching will only be effective in reducing the number of queries if we use session.get to load the object. Using HQL to load the object may in fact create more queries.
Hibernate has four basic types of cache providers-
- EHCache - fast, lightweight, read-only and read write caching support,memory and disk based caching , no clustering.
- OSCache - read only and read write caching, memory and disk based caching, clustering support via JMS or JavaGroups.
- SwarmCache - cluster based caching based on JavaGroups, read only and nonstrict read write caching, usually used when there are more read operations than write.
- JBoss TreeCache - replicated and transactional cache.
- Tangosol Coherence Cache
The caching strategy is specified using a <cache usage = “”> tag. The caching strategies maybe:
- read only - for frequently read data, simple, best performer.
- read-write - data needs to be updated, never used if serializable transaction isolation level is needed, need to specify a manager_lookup_class in JTA environment.
- nonstrict read-write - rarely updating data , need to specify a manager_lookup_class in JTA environment.
- transactional - only used in a JTA environment
If the hibernate.cache.provider_class property is set, second level cache is enabled. Cache can be configured within hibernate.cfg.xml. Cache’s usage patterns can be defined within the <cache> element in the hbm’s associated with each domain class. Enable query caching by setting hibernate.cache.use_query_cache to true and call the setCacheable(true) on the Query object. Query cache always uses the second level cache. The Cache is loaded whenever an object is passed to save(), update(), saveOrUpdate() or when retrieving objects using load(), get(), list(). Invoking flush() will synchronize the object with the database. Use evict() to remove it from cache. A CacheMode defines how a particular session interacts with second level cache -
NORMAL - read and write to cache,
GET - read but dont put,
PUT - write but dont read,
REFRESH - force refresh of cache for all items read fromt he database
Fetching strategies
A fetching strategy identifies how hibernate will fetch an object along with it’s associations once a query is executed. There are four types of strategies
- Join Fetching - All associated instances are retrieved in the same SELECT using OUTER JOIN. But having too many of this can result in a huge chunk of the database coming into memory, cause performance hurdles there.
- Select Fetching - This is the default strategy. A second SELECT retrieves associated entity or collection. This is usually lazy unless specified otherwise. This is extremely vulnerable to the N+1 select problem, so instead the join fetching can be enabled.
- Subselect fetching - similar to select but retrieves associated collections for all entries fetched previously.
- Batch fetching - optimization on select fetching where a batch of entities are retrieved in one select
As far as we are concerned, we pretty much use EHCache as our caching strategy and do a lot of join fetching / lazy select fetching based on our requirements.
Of course all these technical ideas are borrowed from these websites
http://acupof.blogspot.com/2008/01/background-hibernate-comes-with-three.html
http://www.devx.com/dbzone/Article/29685
http://www.hibernate.org/hib_docs/reference/en/html/
Tags: hibernate, java, performance
Posted by Ajay
Just the other day i started thinking about application state in our web application. HTTP is essentially a stateless protocol, which basically means that every request between the client and server is essentially a self contained unit with all the information required for each transactions embedded within the request.

Consider the figure above, that represents a typical Http model. Every HttpRequest will encode the information in the HttpRequest header and sends it over to the server which does processing and returns the response back as HTML.
But then there is more to Http than meets the eye. Http has been extended to allow state information to be stored. How many times has the word “session” been thrown around. What exactly is this session?
The HTTP model has three major scoped containers - HttpRequest, HttpSession and ServletContext. We will look at ServletContext some other time. HttpRequest of course works on a per request basis. All variables will have scope only for the lifetime of the request. HttpSession is a container where the scope of variable is essentially global. But how does this HttpSession work? The magic behind HttpSession is a concept called cookies and URL rewriting (if cookies are disabled).
Esentially there are three major ways to maintain the state between multiple stateless protocol requests.
- cookies - store state information client side, and send that state to server everytime a request is made
- hidden form fields - store data hidden in forms and send the form everytime a request is sent to the server
- URL rewriting - encode the state information within the URL. This is usually done using a jsessionid name value pair appended to the URL by the server at the end of each request.
Essentially once a cookie is created with session information, it is stored at the client, for future requests to look it up and regain the state. Alternatively using hidden form fields, the state itself can be transferred from one request to another. Last but not the least, URL rewriting can also do the trick.
Maintaining state is done by the servlet container like Tomcat. Tomcat maintains HttpSession either using client side cookies, or if cookies are disabled through URL rewriting.
Tags: application state, java
Posted by Ajay
So the other day someone asked me
“Does Java pass by reference or by value?”
Hey I answered, I ought to know this,
“By value of course.”
So imagine an object Person P passed into a function doModifyPerson(Person p). I call P’s setter, say p.setName(”initalName”) and i then invoke p.setName(”new Name”) inside the function doModifyPerson.
Now the question is does Person p get modified?
So i answered,
“No way jose, Java is pass by value aint it”
Guess what amigos!!…I was dead wrong. Java is pass by value, of course, but in this case the Person object’s reference is passed by value, not the value itsef. So we have two references pointing to the same Person object, so u modify the reference’s value of course the object’s value changes.
But then why does this not happen with Strings, Strings are objects too, but they dont get modified inside the function. Guess what Strings are immutable.
So another instance of not getting my concepts right.
Tags: java