Thursday, May 10, 2007

Glassfish, JPA, and Compass


Never underestimate search. It's made an empire out of Google, and is possibly the single most valuable function that software performs in our society. Modern search engines have simplified and refined the approach to search to the point where applications consisting of millions of lines of code can be interfaced with a single text edit box, a blinking cursor that is the doorway to the combined knowledge of all of humanity.

Recently, I've desired to bring this kind of functionality to the applications I design. The idea of enabling the same concepts of a "Google" search to an application domain is actually not as easy as one would think, especially since most applications store their data within Relational Database Systems. While these architectures excel at storing elements of data and the relationships between them (a schema), they are not quite so good at indexing the structure within data elements, or providing an abstraction whereby a search term can easily be applied over an entire schema. SQL, with its syntax, would become impractical when presented with the task of a boolean logic keyword search applied over a schema consisting of little more than a dozen domain objects.

To some extent, RDBMS providers have been reacting to demand for more sophisticated indexing and search capabilities. MySQL has introduced the FULLTEXT index, which can be applied to MyISAM tables to index the contents of text fields. The index gets us part of the way, but doesn't provide the domain wide search scope one would desire. Even more depressing is the lack of InnoDB support, meaning that tables supporting this feature aren't capable of sophisticated transaction behavior. But, even if domain-wide indexing via a database became an easy solution, it would be of little use to applications relying on flat files, or alternative data sources. What is needed is a layer that sits on top. If you're in the Java realm, what you're looking for is an open source Apache project called Lucene.

Lucene has been around for several years, gradually evolving to the point where it now offers a sophisticated and powerful set of search features, including cross language and platform capabilities. But despite it's sophistication and power, it was somehow missing the target. With Lucene, as instances of domain objects are managed within a transaction, these activities need to be mirrored to Lucene to keep its index synchronized with the application's data. This makes for redundant code and with limited transaction support, some serious headaches.

Fortunately, a genre of persistence frameworks known as Object Relational Mapping systems (ORMs) have enabled a simple and elegant enhancement for Lucene. These frameworks, including Spring, Hibernate, JDO and JPA, all feature centralized objects to handle persistence management. Due to this architecture, these management objects provide ideal points for interceptors -- callbacks that are triggered when objects are created, modified, or destroyed. These callbacks allow a Lucene index to be updated auto-magically with ORM operations, as well as provide transaction services for index manipulation.

Identifying an elegant solution was noteworthy, but delivering an implementation would be a supreme labor of love. And that's exactly what Shay Bannon set out to accomplish with a project called Compass.

As a bridge between ORM and search engine technologies, Compass could very well be one of the most influential concepts to enter the open source arena in years. Supporting Java annotations, it is now a trivial chore to make domain classes available to search, establish additional meta data, or even specify indexable relationships. I only had a small problem: initializing Compass in Glassfish.

I discussed the issue with Bannon a few days ago, and he confessed that every Glassfish deployment of Compass was actually in concert with Spring. I like Spring, and have used it in the past. But to use Spring within Glassfish just offended my sensibilities. Both feature similar API's and feature sets, so why not go with one or the other? I also am a fan of JPA, and have had positive experiences with Oracle's Toplink libraries, which provide the JPA backbone in Glassfish. Glassfish also supports a sophisticated management interface, and a growing legion of developers and users. There is, however, one thing Spring makes easy that Glassfish does not: the initialization of Compass with JPA.

To initialize Compass with JPA, one must derive the managed container instance of a class called the EntityManagerFactory. This factory object is used, in turn, to provide container objects with EntityManager objects, the gatekeepers of the JPA persistence API. With EntityManagers, you create, delete, find and query for objects from a data source. By instrumenting the EntityManagerFactory, one can ensure that all EntityManager references will trigger the desired callbacks to Compass handlers.

The JPA specification includes two methods for gaining access to an EMF. The first is by annotation, using the @PersistenceUnit declaration. The second is programmatic, using the static method Persistence.createEntityManagerFactory. In Glassfish, the annotation and programmatic approaches each grant you an instance of an EMF, but not the same instance (a major problem with Compass). Even more problematic is that the annotation approach grants you an EntityManagerFactoryWrapper, and not the real deal. Having a wrapper wouldn't be a problem, save for the fact that there's no API to access the contained EMF instance.

All of these combine together to make it difficult to initialize compass, but it gets worse. In Compass, initialization is a heavyweight activity, and should be executed only once. Since initialization requires a Bean Managed Transaction context, initialization needs to happen within the context of a call to a properly configured session bean. Both @Stateful and @Stateless session beans are incompatible with the task of maintaining compass object instances, both due to passivation issues as well as pooling. I'm looking into Glassfish's management API, including the JMX specification to see if Compass management can be handled in a more elegant manner. But for now, I hacked it with a Singleton.

The following classes represent a workable solution to establishing Compass with JPA on Glassfish. If I find a better solution, I'll update this article accordingly. Feel free to provide suggestions if you know of a better way to do this:


/**
* File GlassfishTransactionManagerLookup
*/
package org.compass.core.transaction.manager;

import org.compass.core.transaction.manager.JNDITransactionManagerLookup;

/**
* @author Julian Klappenbach
*
* This class provides JNDI names for transaction components within Glassfish
*/
public class GlassfishTransactionManagerLookup extends
JNDITransactionManagerLookup
{
/* (non-Javadoc)
* @see org.compass.core.transaction.manager.JNDITransactionManagerLookup#getName()
*/
@Override
protected String getName()
{
return "java:appserver/TransactionManager";
}

/* (non-Javadoc)
* @see org.compass.core.transaction.TransactionManagerLookup#getUserTransactionName()
*/
public String getUserTransactionName()
{
return "UserTransaction";
}
}

/**
* File GlassfishCompassSingleton
*/

package org.compass.gps;

import javax.persistence.EntityManagerFactory;
import org.compass.annotations.config.CompassAnnotationsConfiguration;
import org.compass.core.Compass;
import org.compass.core.config.CompassConfiguration;
import org.compass.gps.CompassGps;
import org.compass.gps.device.jpa.JpaGpsDevice;
import org.compass.gps.device.jpa.lifecycle.TopLinkEssentialsJpaEntityLifecycleInjector;
import org.compass.gps.impl.SingleCompassGps;
import com.sun.enterprise.ComponentInvocation;
import com.sun.enterprise.InvocationManager;
import com.sun.enterprise.Switch;
import com.sun.enterprise.util.EntityManagerFactoryWrapper;

/**
* @author Julian Klappenbach
*
*/
public class GlassfishCompassSingleton
{
private JpaGpsDevice jpaDevice = null;
private Compass compass = null;
private EntityManagerFactory emf = null;
static private GlassfishCompassSingleton theInstance = new GlassfishCompassSingleton();

/**
* The Singleton to establish the one and only instance of the compass object, and the associated JpaGpsDevice
* for the Glassfish Application Server Environment.
* The first call to getInstance should occur within a method
* governed by a BMT managed transction scope
* hack. A better approach would be to have a single instance, management
* bean associated with the application -- provided
* the bean had access to a valid transactional context.
*
* More to come, but this works for now.
*
*/
private GlassfishCompassSingleton()
{
CompassConfiguration conf = new CompassAnnotationsConfiguration();

for (Class c : getSearchableClasses())
{
conf.addClass(c);
}

// File based index, can be changed to JDBC data source
conf.setConnection("compassIndex");
conf.setSetting("compass.transaction.managerLookup", "org.compass.core.transaction.manager.GlassfishTransactionManagerLookup");
conf.setSetting("compass.transaction.factory", "org.compass.core.transaction.JTASyncTransactionFactory");
compass = conf.buildCompass();

// Hack to get the actual EntityManagerFactory, and not the wrapper.
InvocationManager invMgr = Switch.getSwitch().getInvocationManager();
ComponentInvocation inv = invMgr.getCurrentInvocation();

if (inv != null)
{
Object descriptor = Switch.getSwitch().getDescriptorFor(inv.getContainerContext());
// Replace persistence unit name with your own (PrototypePU)
emf = EntityManagerFactoryWrapper.lookupEntityManagerFactory(inv.getInvocationType(), "PrototypePU", descriptor);
}

CompassGps gps = new SingleCompassGps(compass);

jpaDevice = new JpaGpsDevice("jpa", emf);
jpaDevice.setInjectEntityLifecycleListener(true);
jpaDevice.setLifecycleInjector(new TopLinkEssentialsJpaEntityLifecycleInjector());
gps.addGpsDevice(jpaDevice);
gps.start();

// Causes the device to synchronize with the existing application state
gps.index();
}

protected Class[] getSearchableClasses()
{
return new Class[] { /* TODO: Add class array elements to define the domain
objects that will be indexed */ };
}

public Compass getCompass()
{
return compass;
}

public static GlassfishCompassSingleton getInstance()
{
return theInstance;
}
}

/**
* File SearchManager
*/

package com.acme.jpa.compass;

import java.io.Serializable;
import java.util.Collection;
import javax.ejb.Local;

/**
* @author Julian Klappenbach
* Provides the interface for the SearchManagerBean
*/

@Local
public interface SearchManager extends Serializable
{
/**
*
* @param searchTerm The search term
* @return The generic collection of beans retrieved by the Compass Search Engine
*/
public Collection search(String searchTerms);
}

/**
* File SearchManagerBean
*/

package com.acme.jpa.compass;

import java.util.Collection;

import javax.ejb.Stateless;
import javax.ejb.TransactionManagement;
import javax.ejb.TransactionManagementType;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.compass.core.CompassException;
import org.compass.core.CompassHits;
import org.compass.core.CompassSession;
import org.compass.core.CompassTransaction;

/**
* @author Julian Klappenbach
*
*/
@Stateless
@TransactionManagement(value = TransactionManagementType.BEAN)
public class SearchManagerBean implements SearchManager
{
private static final long serialVersionUID = 1L;
private static final Log log = LogFactory.getLog(SearchManagerBean.class);

/* (non-Javadoc)
* @see com.rampgroup.jpa.compass.SearchManager#search(java.lang.String)
*/
public Collection search(String searchTerms)
{
CompassSession session = GlassfishCompassSingleton.getInstance().getCompass().openSession();

CompassTransaction tx = null;
CompassHits hits;
try
{
tx = session.beginTransaction();
hits = session.find(searchTerms);
for (int i = 0; i < hits.length(); i++)
{
log.debug("Search found: " + hits.data(i).toString());
}

tx.commit();
}
catch (CompassException ce)
{
if (tx != null) tx.rollback();
}
finally
{
session.close();
}
return null;
}
}


I hope that I have provided a clear and concise approach to configuring Compass within Glassfish with JPA. Some might complain that this is a bit of a hack (it is), while others would insist that Singletons are a bad idea within the realm of session beans. I would point out that, in this case, I'm using static initialization for the Singleton instance. Therefore, synchronization is removed as an issue and I have avoided a situation which otherwise would have taken a carefully threaded and pooled model, and run it through a single, global critical section.

If you're new to Compass, take some time to read up on the project documentation, which is excellent. Bannon and the rest of the team have provided a good picture of what can be accomplished when search meets applications.

11 comments:

Anonymous said...

Good post, and a very important &interesting one. However, I feel there should be two parts to 1) How to Glassfish, JPA, and Compass 2) Why would someone want to do it.

I knew exactly why I wanted to use compass with JPA. I was bored out of my mind reading the first few paragraphs of opinions. I skipped over them and got to the juicy part.

Very good post.

Thanks,
Paul

Julian Klappenbach said...

Thanks for the compliment, and for putting up with the expository. Some people, however, have no clue about Lucene or Compass. I figured I'd give them a little background before diving into the code.

Anjan Bacchu said...

Hi There,

nice post.

More people need to use Compasss like tools.

Does anyone know if Hibernate Search(http://www.hibernate.org/410.html) has any similarities with Compass ?

thank you,

BR,
~A

Julian Klappenbach said...

Read more on the differences between Hibernate's internal Lucene integration and that provided by Compass

Emmanuel Bernard said...

Hibernate Search (former Hibernate Lucene), has evolved since then.

I would say the main differences are:
- the singleton hack is not needed in the first place ;-)
- the query API and semantic is fully consistent and integrated with Hibernate (and soon JPA)
- consequence is that there is no programmatic model rupture
- asynchronous cluster (avoiding contention lock and close to usual Lucene architectures)
- sync / async work even out of cluster
- "Correlated queries" (but I think compass have some sort of support, can't remember from memory)
- I do *not* require to store the data in the index since I see the index as a complement of the DB. If I need a DB backup there are other tools ;-)

Interestingly, I've just given a J1 talk where I explain what is needed in JPA 2.0 to integrate Hibernate Search.

Shay said...

Here we go again, throwing in disinformation about Compass and its Hibernate integration. I was hoping that the first time would teach you JBoss guys a lesson, but then again, you never seem to learn.

[Emmanuel] consequence is that there is no programmatic model rupture
- Well, again, this is really not what happens with Compass. But it does sounds very frightening, way to go!
[Emmanuel] - asynchronous cluster (avoiding contention lock and close to usual Lucene architectures)
- Well, the first suggestion here is that Compass goes against typical Lucene architecture, which is wrong. Second, having an MDB or JMS message listener that index data is again, a single line of code with Compass (take the JMS message, and index it). This is just *one* of the options for clustering.
[Emmanuel]- "Correlated queries" (but I think compass have some sort of support, can't remember from memory)
- You are correct, Compass has had it for ages now. And it is much more powerful than the Hibernate one.
[Emmanuel]- I do *not* require to store the data in the index since I see the index as a complement of the DB. If I need a DB backup there are other tools ;-)
- This really pushes the limit here with the disinformation and bashing. Compass does not require storing the data in the index. It even has a special mode where things only gets stored for search in the index. What Compass does provide is the option to choose. Many times, you would not want to *hit* the database to display search results, and Compass gives you that option.

I guess it is business as usual on your side Emmanuel. I trust Java developers to know better...

Emmanuel Bernard said...

Doh, I did not thought of my post as FUD, Let me clarify then.

1. "No programmatic model rupture"
Let me rephrase it so that it's clear to everybody.
When you switch a HQL or JPA-QL query to a Lucene query, you just change the way the Query object is created. session.createQuery() => session.createFullTextQuery
The rest of the code does not change, the same query interface is returned, the way to paginate is the same, the way to retrieve the results are the same.
If you change one of those returned objects, Hibernate Core will take care of the DB synchronization the exact same way it would have been with a HQL query returned object. Same for lazy loading. In other words, the Persistence Context (the session / entityManager) is the same.

And yes I make a big deal of that. ORM is all about that, Seam is all about that.

I also make a big deal about having a zero bootstrap solution leveraging the infrastructure of JPA and Hibernate.

2. Asynchronous cluster
You read too much in my words, stop the paranoia mode.
You are correct, receiving a message and triggering an indexation based on the message is simple (not one line of code though - it could if JMS was not so overengineered)... provided that someone send the message.
You are correct, that is one of the possible architectures for clustering, and that is one of the clustered architecture HSearch support. choosing one over an other is mostly a matter of trade-offs.
I happen to have seen majoritarily this architecture (if we exclude batch crawling that both our projects want to prevent - but not forbid).

3. Correlated queries
My bad, I haven't been able to find a query example in Compass doc ; as I said, on top of my head was the idea that compass supported it but I was unsure. "Much more powerful" is an interesting idea though.

4. Index and storage
My statement is missing some context (I assumed it was implicit with 1., and I shouldn't have).
When retrieving objects (graphs) (like JPA-QL does), Hibernate Search does not requires the information to be stored in the index.
Projection (ie loading data from the index) is on the TODO list.
I should have made this one clearer, apologies. I have had a hell of a week.

You and I clearly have a different view on the integration between Lucene and the object model. It's all good, that's the reason why there are 2 projects.

"I guess it is business as usual on your side Emmanuel."
This one is funny. In my public career, esp in the ORM space, I am clearly not the one that people refers to as FUD. I admit I replied to the initial comment quickly, but that's inherent to the blog format. The conspiracy theory is ridiculous.

Shay said...

>> This one is funny. In my public career, esp in the ORM space, I am clearly not the one that people refers to as FUD. I admit I replied to the initial comment quickly, but that's inherent to the blog format. The conspiracy theory is ridiculous.

Well, thanks for clearing up your posts yet again. You say you never spread FUD, but somehow you manage to do that all the same.

Regarding your post, I don't think that there is much different between Hibernate Search and the 5% out of Compass is tries to compete with. Personally, I think that with Compass you have more control and more options at doing what you want. I agree with you that having the same ORM API might be sexy (though not a model rupture), might implement it at Compass as well as a Hibernate extension.

strator said...

Hi,

I'm not sure about GlassFish but
in Suns SJAS i implemented
the LifecycleListener interface which allows you to write code
which can use all jee services and gets executed only once at the container startup.
You can initialize your objects and register them in jndi.
This avoids the need of a "static singleton".

Bye,
Torsten

Julian Klappenbach said...

I'll take a look at the interface to see if it applies to Glassfish. I had posted a question about "a better way" to the Glassfish forums, but received no answer. I suppose this *is* more of a TopLink issue.

Thanks for your reply.

Sam D said...

Heh EJB 3.1 support of Singletons will help alleviate your problem that is whenever we can use it.

Google