Wednesday, August 30, 2006

EJB3 Wishlist

J2EE persistence has continued to mature over the past few years with the evolution of EJB3. EJB3 is a huge improvement over its predecessor, EJB2, in a number of areas. First, EJB3 has done away with the redundant class structures for remoting, as well as the practice of requiring meta data declarations in descriptors. Instead, EJB3 has been designed around annotations to provide such configurations as transaction models or entity table affinities. Second, EJB3 has a sophisticated Object Relational Mapping layer, capable of supporting a variety of inheritance strategies. This allows the business objects in an application to be modeled on hierarchies, helping to normalize data and enhance code reuse, and for these models to be mapped to a database schema.

Some may disagree with the annotation model, but the reality is that EJB3 has resulted in a 50% reduction in effort required to implement middle tier designs compared to most EJB2 efforts. And the simplification of the model has resulted in an equivalent decrease in maintenance costs.

However, our team did run into some issues with the API, and these resulted in a small wish list that I'd like to put to the EJB3 standards committee.

Better Support for Large Batches

Specifically with persisting entities, the performance of EJB3 can be significantly compromised in certain situations. In our case, we used Oracle for the back end database, and were developing a system that in some edge cases, needed to create at least 100k records or more in the span of a single HTTP transaction. Oracle does not feature auto-increment fields, so primary key generation is handled through a sequence, which is a synchronized resource of the Oracle server. This sequence ended up being the Achilles heel of the system with regard to batching large sets of entites.

The reason for this was due to the practice of issuing two queries for every entity sent to the EntityManager's persist method. The initial SQL statement was a "select [sequenceName].nextval from dual" to derive the ID of the entity to be created. The second call was the SQL insert statement to create the new record.

What is needed is an API developed around the concept of batch processing, where entities submitted for create (or other operations) are handled in a way to optimize throughput. In the case of entity creation, this might be handled by skipping the initial ID select for cases where auto-increment database fields aren't available.

I understand that EJB3 is really an API for the management of individual application objects, and that batch processing is probably outside of the initial intended scope. However, the reality is that many developers will run into this issue, costing unnecessary time and effort when Session APIs must be refactored from EntityManager calls to PreparedStatements.

Better Support for N:M Mapping Tables

The case is a little complex, but it is important. In the case that you have a mapping table where you're associating a many-to-many relationship between two entities, it is sometimes the practice to include additional fields in the mapping table to provide attributes on the relationship. If these fields should not be nullable, then powerful annotations like @ManyToMany can't be leveraged. The developer then has two choices remaining: Either model the mapping table as an entity and provide mappings between all three, or forego the mapping annotations altogether and manually derive the relationship through queries. On the various forums where this quandary has been posed, the former was preferred. This is an ungainly and inefficient approach, since it requires that two separate queries are performed to traverse the relationship.

What would be ideal would be a call-back or initializer method that could be specified in a @ManyToMany annotation. This initializer would populate the fields as needed, and then return control back to the ORM layer.

Support for Audit Logging

Given the growth of Federal legislation governing the auditing of information systems, it would seem a natural fit to provide an annotation API for designating both an audit logging schema as well as the tables or fields that would participate in the audit logging.

Currently, developers have a choice of either implementing an interceptor strategy, or designing an encapsulating persistence APIs with a logging facade.

In the former, either AOP or vendor specific APIs are engaged to intercept method calls in the persistence stack to detect loggable operations. This approach, while attractive in it's transparency to the business method developer, is risky in that errors in the interceptor can be difficult to detect.

The latter option requires an customized logging API to sit as a facade over the actual persistence calls. While safer in the regard that you're not attempting to apply a generic set of rules to all operations, the amount of coding required is increased over the interceptor approach. Though more conservative, it is an error prone strategy as well, since it requires the cooperation of each developer to ensure that all necessary data is logged.

If the EBJ3 spec can reduce inheritance strategies down to the choice between "SINGLE_TABLE", "TABLE_PER_CLASS", or "JOINED", then certainly the specification should be able to define best practices for audit logging, limiting the types of schema down to a few distinct forms.

Audit logging is an old problem that's only gotten more serious over the last few years. Instead of requiring development teams to re-invent the wheel at every shop, why not provide a single implementation that we can all leverage?


Anonymous said...

hi, great post. do you have any good articles you could point me to on audit logging? i've reached the same fork in the road re: implementation and don't have the experience to tell me how to proceed.

Julian Klappenbach said...

There's a few possible alternatives:

1. Develop your own logging services, with an API for explicit invocation.

2. Use EJB3 interceptors. There's a good article, though with little in terms of concrete examples, in JBoss' online docs. EJB3 Entity Interceptors are callbacks that are invoked when operations occur on an entity. Interceptors are passed an InvocationContext that tells you what is being done. You can find out when it's being done through generating a TimeStamp. You'll need a SessionContext to figure out who is doing the work. With this data in hand, you should be able to implement an auditing strategy.

One strategy to look into would be basing your audited entities on a base class that established the interceptor, and utilized reflection to programmatically browse the entity and store necessary data. Reflection, however, might not be suitable in cases where scalability is a large consideration.

3. Use AOP. AOP is somewhat complex, but there's ample documentation on the technology.

Hope this helps.