Showing posts with label Jackrabbit. Show all posts
Showing posts with label Jackrabbit. Show all posts

Tuesday, July 20, 2010

How to access Jackrabbit content repository via JNDI?

In this post I described how an Jackrabbit content repository can be set up. I will show now how it can be accessed via JNDI. I dont' want to write an JNDI lookup although it's not difficult. I would like to use Google Guice, a dependency injection framework. At first we need a Guice configuration module.
public class DefaultConfigurationGuiceModule extends AbstractModule
{
    protected void configure()
    {
        bind(String.class).annotatedWith(Names.named("repository name")).toInstance("jcr/repository");

        // bind naming context to the default InitialContext
        bind(Context.class).to(InitialContext.class);

        // bind to the repository from JNDI
        bind(Repository.class).toProvider(JndiIntegration.fromJndi(Repository.class, "jcr/repository"));

        // bind to the factory class for the creation of repository accessor
        // see http://code.google.com/docreader/#p=google-guice&s=google-guice&t=AssistedInject
        bind(RepositoryAccessorFactory.class).toProvider(FactoryProvider.newFactory(RepositoryAccessorFactory.class,
                                             JackrabbitRepositoryAccessor.class)).in(Singleton.class);

    }
}
Guice has a helpful class JndiIntegration to create a provider which looks up objects in JNDI using the given name. Furthermore I use Guice's AssistedInject and define a factory interface to create an instance for repository access (JackrabbitRepositoryAccessor). The real factory will be created by AssistedInject.
public interface RepositoryAccessorFactory
{
    /**
     * Greates an instance of {@link RepositoryAccessor}. Sets an input stream of XML file which describes custom node
     * types and appropriated custom namespace mapping. Custom node types and namespace will be registered one-time if
     * the JCR session is requested and they were not registered yet.
     *
     * @param  nodeTypeConfigs configurations of custom node types to be registered
     * @return RepositoryAccessor the instance of {@link RepositoryAccessor}
     */
    RepositoryAccessor create(@Assisted final NodeTypeConfig[] nodeTypeConfigs);
}
The class NodeTypeConfig is used for the registration of custom node types. Node types are described here. More about custom node types in XML notation see in my previous post.
/**
 * Configuration infos about a node type to be registered.
 */
public class NodeTypeConfig
{
    /** input stream of XML file which describes node types */
    private InputStream inputStream;

    /** namespace prefix of the node type */
    private String namespacePrefix;

    /** namespace uri of the node type */
    private String namespaceUri;

    /**
     * Creates a new NodeTypeConfig object.
     *
     * @param inputStream     input stream of XML file which describes node types
     * @param namespacePrefix namespace prefix of the node type
     * @param namespaceUri    namespace uri of the node type
     */
    public NodeTypeConfig(final InputStream inputStream, final String namespacePrefix, final String namespaceUri)
    {
        this.inputStream = inputStream;
        this.namespacePrefix = namespacePrefix;
        this.namespaceUri = namespaceUri;
    }

    setter / getter ...

    /**
    * Loads node type configuration from XML file in classpath.
    *
    * @param  fileName        file name
    * @param  namespacePrefix namespace prefix of the node type
    * @param  namespaceUri    namespace uri of the node type
    * @return NodeTypeConfig configuration
    */
    public static NodeTypeConfig getNodeTypeConfig(final String fileName, final String namespacePrefix, final String namespaceUri)
    {
        InputStream inputStream = getInputStreamConfig(fileName);
        return new NodeTypeConfig(inputStream, namespacePrefix, namespaceUri);
    }

    /**
     * Gets input stream from XML file in classpath.
     *
     * @param  fileName file name
     * @return NodeTypeConfig configuration
     */
    public static InputStream getInputStreamConfig(final String fileName)
    {
        Validate.notNull(fileName, "XML file with node type configuration is null");

        ClassLoader classloader = Thread.currentThread().getContextClassLoader();
        if (classloader == null) {
            classloader = NodeTypeConfig.class.getClassLoader();
        }

        return classloader.getResourceAsStream(fileName);
    }
}
The most important class is JackrabbitRepositoryAccessor. This is an entry point into the content repository. This class implements an interface RepositoryAccessor. This interface looks as follows
public interface RepositoryAccessor
{
 /**
  * Gets the content repository. If no repository has been yet created it will be created.
  *
  * @see    #startRepository()
  * @return Repository repository {@link Repository}
  * @throws RepositoryException if the repository could be not acquired
  */
 Repository getRepository() throws RepositoryException;

 /**
  * Starts and initializes the content repository by the configured repository name via JNDI.
  *
  * @throws RepositoryException if the repository could be not acquired or an error occured
  */
 void startRepository() throws RepositoryException;

 /**
  * Retrieves the current JCR Session local to the thread which it is tied to one workspase. If no JCR Session is
  * open, opens a new JCR Session for the running thread.
  *
  * @param  workspaceName name of the workspace (<code>null</code> is not allowed)
  * @param  viewerId      viewer id from {@link SecurityToken}
  * @return Session JCR session {@link Session}
  * @throws LoginException            if the login fails
  * @throws NoSuchWorkspaceException  if a specific workspace is not found
  * @throws AccessDeniedException     if the session associated with the workspace object does not have sufficient
  *                                   permissions to register core / custom namespaces, to create a new workspace or
  *                                   some access-related methods failed
  * @throws NamespaceException        if an illegal attempt is made to register a mapping
  * @throws RegisterNodeTypeException if registration of core / custom node type(s) failed
  * @throws RepositoryException       if the repository could be not acquired or an error occured
  */
 Session getSession(final String workspaceName, final String viewerId)
     throws LoginException, NoSuchWorkspaceException, AccessDeniedException, NamespaceException,
            RegisterNodeTypeException, RepositoryException;

 /**
  * Closes all JCR Sessions local to the thread.
  */
 void releaseSession();

 /**
  * Releases the content repository
  */
 void releaseRepository();
}
And the implementation (a little bit big code) looks as follows
public class JackrabbitRepositoryAccessor implements RepositoryAccessor
{
 private static final Logger LOG = LoggerFactory.getLogger(RepositoryAccessor.class);

 private static final ThreadLocal<Map<String, Session>> THREAD_SESSION = new ThreadLocal<Map<String, Session>>();

 /** repository instance */
 private Repository repository;

 /** defauilt workspace */
 private Workspace defaultWorkspace;

 /** repository name (not mandatory) */
 @Inject(optional = true)
 @Named("repository name")
 private String repositoryName;

 /** flag whether the core namespace mapping and node types were already registered */
 private boolean isRegistered = false;

 /** flag whether the custom namespace mapping and node types were already registered */
 private boolean isCustomRegistered = false;

 /** input stream of XML file which describes custom node types (not mandatory) */
 private NodeTypeConfig[] customNodeTypeConfigs;

 /** provider for repository */
 private Provider<Repository> repositoryProvider;

 /**
  * Creates a new <code>JackrabbitRepositoryAccessor</code> object and sets repository providers.
  * Note: Custom node types and namespace will be registered one-time if the JCR session is
  * requested and they were not registered yet.
  *
  * @param repositoryProvider                  repository provider to get an access to the configured repository
  *                                            {@link Repository}
  * @param customNodeTypeConfigs               custom node types configurations (if <code>null</code> no custom node
  *                                            types will be registered)
  */
 @Inject
 public JackrabbitRepositoryAccessor(final Provider<Repository> repositoryProvider,
                                     @Assisted
                                     @Nullable
                                     final NodeTypeConfig[] customNodeTypeConfigs)
 {
  // set repository provider
  this.repositoryProvider = repositoryProvider;

  this.customNodeTypeConfigs = customNodeTypeConfigs;
 }

 //~ Methods ----------------------------------------------------------------

 /**
  * Gets the default workspace. If no default workspace has been yet created it will be created.
  *
  * @see    #startRepository()
  * @return Workspace default workspace {@link Workspace}
  * @throws RepositoryException if the repository or workspace could be not acquired
  */
 protected Workspace getDefaultWorkspace() throws RepositoryException
 {
  if (defaultWorkspace == null) {
   synchronized (JackrabbitRepositoryAccessor.class) {
    Repository repository = getRepository();
    if (defaultWorkspace == null) {
     defaultWorkspace = repository.login().getWorkspace();
     if (LOG.isDebugEnabled()) {
      LOG.debug("==> Default workspace '"
                + (defaultWorkspace != null ? defaultWorkspace.getName() : "null")
                + "' acquired.");
     }
    }
   }
  }

  return defaultWorkspace;
 }

 /**
  * Registers a node type.
  *
  * @param  jcrSession  current JCR session
  * @param  inputStream input stream of XML file which describes node types
  * @throws RegisterNodeTypeException if registration of core / custom node type failed
  * @throws RepositoryException       if an error occured
  */
 @SuppressWarnings("unchecked")
 protected void registerNodeType(final Session jcrSession, final InputStream inputStream)
     throws RegisterNodeTypeException, RepositoryException
 {
  try {
   NodeTypeManagerImpl ntManager = (NodeTypeManagerImpl) jcrSession.getWorkspace().getNodeTypeManager();
   NodeTypeRegistry ntRegistry = ntManager.getNodeTypeRegistry();
   NodeTypeDefStore ntDefStore = new NodeTypeDefStore();

   ntDefStore.load(inputStream);

   Collection<NodeTypeDef> ntDefs = ntDefStore.all();
   Iterator<NodeTypeDef> iter = ntDefs.iterator();
   while (iter.hasNext()) {
    NodeTypeDef ntDef = iter.next();
    if (!ntRegistry.isRegistered(ntDef.getName())) {
     ntRegistry.registerNodeType(ntDef);
    }
   }
  } catch (IOException e) {
   throw new RegisterNodeTypeException(e);
  } catch (InvalidNodeTypeDefException e) {
   throw new RegisterNodeTypeException(e);
  } finally {
   IOUtils.closeQuietly(inputStream);
  }
 }
 
 /**
  * {@inheritDoc}
  */
 public Repository getRepository() throws RepositoryException
 {
  if (repository == null) {
   synchronized (JackrabbitRepositoryAccessor.class) {
    if (repository == null) {
     startRepository();
    }
   }
  }

  return repository;
 }

 /**
  * {@inheritDoc}
  */
 public void startRepository() throws RepositoryException
 {
  try {
   repository = repositoryProvider.get();

   if (repository == null) {
    throw new RepositoryException("Unable to acquire Repository '" + repositoryName
                                  + "' via JNDI");
   }

   if (LOG.isDebugEnabled()) {
    LOG.debug("==> Repository started.");
   }

   // get default workspace (it's always available)
   defaultWorkspace = repository.login().getWorkspace();
   if (LOG.isDebugEnabled()) {
    LOG.debug("==> Default workspace '" + (defaultWorkspace != null ? defaultWorkspace.getName() : "null")
              + "' acquired.");
   }
  } catch (Throwable t) {
   throw new RepositoryException("Unable to acquire Repository '" + repositoryName
                                 + "' via JNDI", t);
  }
 }

 /**
  * {@inheritDoc}
  */
 public Session getSession(final String workspaceName, final String viewerId)
     throws LoginException, NoSuchWorkspaceException, AccessDeniedException, NamespaceException,
            RegisterNodeTypeException, RepositoryException
 {
  if (workspaceName == null) {
   throw new NoSuchWorkspaceException("Workspace name is null. JCR Session can be not opened.");
  }

  Session jcrSession = null;
  Map<String, Session> workspace2Session = THREAD_SESSION.get();
  if (workspace2Session == null) {
   workspace2Session = new HashMap<String, Session>();
  } else {
   jcrSession = workspace2Session.get(workspaceName);
  }

  if (jcrSession != null && !jcrSession.isLive()) {
   jcrSession = null;
  }

  if (jcrSession == null) {
   if (LOG.isDebugEnabled()) {
    LOG.debug("==> Opening new JCR Session for the current thread.");
   }

   SimpleCredentials credentials = new SimpleCredentials(viewerId, "".toCharArray());
   try {
    // authentication to get jcr session
    jcrSession = getRepository().login(credentials, workspaceName);
   } catch (NoSuchWorkspaceException e) {
    // try to create new workspace with the given name because it doesn't exist yet
    Workspace workspace = getDefaultWorkspace();
    if (workspace == null) {
     throw new NoSuchWorkspaceException("Default workspace could be not created. JCR Session can be not opened.");
    }

    if (LOG.isDebugEnabled()) {
     LOG.debug("==> Try to create workspace '" + workspaceName + "'.");
    }

    // create new workspace
    ((JackrabbitWorkspace) workspace).createWorkspace(workspaceName);
    if (LOG.isDebugEnabled()) {
     LOG.debug("==> Workspace '" + workspaceName + "' has been created.");
    }

    // authentication again to get jcr session
    jcrSession = getRepository().login(credentials, workspaceName);
   }

   if (jcrSession == null) {
    throw new LoginException("JCR Session could be not opened (null).");
   }

   workspace2Session.put(workspaceName, jcrSession);
   THREAD_SESSION.set(workspace2Session);
  }

  // register core namespace mapping and node types if they were not registered yet
  if (!isRegistered) {
   synchronized (JackrabbitRepositoryAccessor.class) {
    if (!isRegistered) {
     NamespaceRegistry namespaceRegistry = jcrSession.getWorkspace().getNamespaceRegistry();

     // check whether the namespace prefix or uri already exist
     if (!ArrayUtils.contains(namespaceRegistry.getPrefixes(), Constants.NAMESPACE_PREFIX)
         || !ArrayUtils.contains(namespaceRegistry.getURIs(), Constants.NAMESPACE_URI)) {
      // register namespace
      namespaceRegistry.registerNamespace(Constants.NAMESPACE_PREFIX, Constants.NAMESPACE_URI);
      if (LOG.isDebugEnabled()) {
       LOG.debug("Namespace prefix '" + Constants.NAMESPACE_PREFIX
                 + "' has been registered to the uri '"
                 + Constants.NAMESPACE_URI + "'");
      }
     }

     // register core node types!
     InputStream inputStream = NodeTypeConfig.getInputStreamConfig("core_node_types.xml");
     if (inputStream == null) {
      LOG.error("Node type definition 'core_node_types.xml' was not found");
      throw new RegisterNodeTypeException("Node type definition 'core_node_types.xml' was not found");
     }

     registerNodeType(jcrSession, inputStream);

     if (LOG.isDebugEnabled()) {
      LOG.debug("Register of core node types is ensured");
     }

     isRegistered = true;
    }
   }
  }

  // register core namespace mapping and node types if they were not registered yet
  if (!isCustomRegistered) {
   synchronized (JackrabbitRepositoryAccessor.class) {
    if (!isCustomRegistered) {
     if (!ArrayUtils.isEmpty(customNodeTypeConfigs)) {
      NamespaceRegistry namespaceRegistry = jcrSession.getWorkspace().getNamespaceRegistry();

      for (NodeTypeConfig ndc : customNodeTypeConfigs) {
       if (ndc.getNamespacePrefix() != null && ndc.getNamespaceUri() != null) {
        // check whether the namespace prefix or uri already exist
        if (!ArrayUtils.contains(namespaceRegistry.getPrefixes(), ndc.getNamespacePrefix())
            || !ArrayUtils.contains(namespaceRegistry.getURIs(), ndc.getNamespaceUri())) {
         // register namespace
         namespaceRegistry.registerNamespace(ndc.getNamespacePrefix(),
                                             ndc.getNamespaceUri());
         if (LOG.isDebugEnabled()) {
          LOG.debug("Custom namespace prefix '" + ndc.getNamespacePrefix()
                    + "' has been registered to the custom uri '"
                    + ndc.getNamespaceUri() + "'");
         }
        }
       }

       if (ndc.getInputStream() != null) {
        registerNodeType(jcrSession, ndc.getInputStream());
       }
      }

      if (LOG.isDebugEnabled()) {
       LOG.debug("Register of " + customNodeTypeConfigs.length + " custom node types is ensured");
      }
     }

     isCustomRegistered = true;
    }
   }
  }

  return jcrSession;
 }

 /**
  * {@inheritDoc}
  */
 public void releaseSession()
 {
  Map<String, Session> workspace2Session = THREAD_SESSION.get();
  if (workspace2Session != null) {
   Collection<Session> sessions = workspace2Session.values();
   for (Session jcrSession : sessions) {
    if (jcrSession != null && jcrSession.isLive()) {
     if (LOG.isDebugEnabled()) {
      LOG.debug("==> Closing JCR Session for the current thread.");
     }

     jcrSession.logout();
    }
   }
  }

  THREAD_SESSION.set(null);
 }

 /**
  * {@inheritDoc}
  */
 public void releaseRepository()
 {
  // Jackrabbit specific
  if (repository instanceof JackrabbitRepository) {
   ((JackrabbitRepository) repository).shutdown();
  }

  repository = null;
 }
}
RepositoryAccessor should be accessible from application scope and can be instantiated during application startup (e.g. in ServletContextListener's contextInitialized() or in an JSF managed bean's method annotated with @PostConstruct). Well. Let's put all classes together! I would like to show typically steps to get an instance of JackrabbitRepositoryAccessor.
// create a google guice injector for the configuration module
Injector injector = Guice.createInjector(new DefaultConfigurationGuiceModule());

// create the factory instance to create a repository accessor instance
RepositoryAccessorFactory repositoryAccessorFactory = injector.getInstance(RepositoryAccessorFactory.class);

// create custom node type configurations from describing XML file and given namespace prefix / URI
NodeTypeConfig[] nodeTypeConfigs = new NodeTypeConfig[1];
nodeTypeConfigs[0] = NodeTypeConfig.getNodeTypeConfig("custom_node.xml", "xyz", "http://mysite.net/xyz");

// create an instance of repository accessor (parameter can be null if no custom node types are available)
repositoryAccessor = repositoryAccessorFactory.create(nodeTypeConfigs);

// method and field injection
injector.injectMembers(repositoryAccessor);

// start and initialize the content repository
repositoryAccessor.startRepository();
Now you can access both - Repository and JCR Session somewhere you want
javax.jcr.Repository repository = repositoryAccessor.getRepository();
javax.jcr.Session session = repositoryAccessor.getSession(workspaceName, viewerId);
Not forget to release repository when the application goes down (e.g. in ServletContextListener's contextDestroyed() or in an JSF managed bean's method annotated with @PreDestroy).
repositoryAccessor.releaseRepository();
repositoryAccessor = null;
That's all :-)

Setting up shared Jackrabbit content repository

There are several deployment models which are described in detail by Jackrabbit. For our purpose the "Model 2: Shared J2EE Resource" seems to be the best applicable. This way to deploy a repository is to make it visible as a resource to all the web applications that are running inside a servlet container by registering the repository as a resource adapter to the application server. The repository is started and stopped with the application server. I'm going to describe all necessary steps for the GlassFish application server:

Open up "Resources/JNDI/Custom Resources" in the GlassFish administration console
  • Put in a JNDI name "jcr/repository"
  • Put in a resource type "javax.jcr.Repository"
  • Put in the factory class "org.apache.jackrabbit.core.jndi.BindableRepositoryFactory"
  • Create the property configFilePath, pointing to a configuration XML file with an absolute path on the server, e.g. "c:/repository/repository.xml"
  • Create the property repHomeDir pointing to the absolute filesystem path for the repository, e.g. "c:/repository"
Copy Jackrabbit dependencies to GLASSFISH_HOME/glassfish/domains/domain1/lib/ext. These are in case of Jackrabbit 1.6.2:
 commons-collections-3.2.1.jar
 commons-io-1.4.jar
 commons-lang-2.4.jar
 concurrent-1.3.4.jar
 derby-10.2.1.6.jar
 jackrabbit-api-1.6.2.jar
 jackrabbit-core-1.6.2.jar
 jackrabbit-jcr-commons-1.6.2.jar
 jackrabbit-spi-1.6.2.jar
 jackrabbit-spi-commons-1.6.2.jar
 jackrabbit-text-extractors-1.6.2.jar
 jcr-1.0.jar
 log4j-1.2.15.jar
 lucene-core-2.4.1.jar
 mysql-connector-java-5.1.12-bin.jar (if datastore is MySQL)
 pdfbox-0.7.3.jar
 poi-3.2-FINAL.jar
 poi-scratchpad-3.2-FINAL.jar
To configure resource factory in Tomcat, add a <Resource> element to the <Context> in the file context.xml (global) or in the server.xml (for specific web app)
<Context ...>
  ...
  <Resource name="jcr/repository" auth="Container"
            type="javax.jcr.Repository"
            factory="org.apache.jackrabbit.core.jndi.BindableRepositoryFactory"
            configFilePath="c:/repository/repository.xml"
            repHomeDir="c:/repository"/>
  ...
</Context>
Jackrabbit dependencies have to be copied  to TOMCAT_HOME/lib. Now you can set up JNDI reference to jcr/repository and use JCR to manage content in the application server. The configured resource needs to be declared in the web.xml file:
<web-app>
    ...
    <resource-ref>  
    <description>JCR Repository</description>  
    <res-ref-name>jcr/repository</res-ref-name>  
    <res-type>javax.jcr.Repository</res-type>  
    <res-auth>Container</res-auth>  
    </resource-ref>
</web-app>
After all steps the repository will be created automatically by the registered above factory class.

Monday, June 28, 2010

How to configure Apache Jackrabbit for binary content search?

The configuration file repository.xml is described in detail by Apache Jackrabbit. The section for workspace and versioning configuration must be extended to support binary content search as follows:
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${wsp.home}/index"/>
    <param name="extractorPoolSize" value="2"/>
    <param name="supportHighlighting" value="true"/>
    <param name="textFilterClasses"
      value="org.apache.jackrabbit.extractor.PlainTextExtractor,
      org.apache.jackrabbit.extractor.MsWordTextExtractor,
      org.apache.jackrabbit.extractor.MsExcelTextExtractor,
      org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
      org.apache.jackrabbit.extractor.PdfTextExtractor,
      org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
      org.apache.jackrabbit.extractor.RTFTextExtractor,
      org.apache.jackrabbit.extractor.HTMLTextExtractor,
      org.apache.jackrabbit.extractor.XMLTextExtractor"/>
</SearchIndex>
Content of following document types can be full text searchable with this configuration
  • Plain Text
  • MS Word
  • MS Excel
  • MS Powerpoint
  • PDF
  • OpenOffice
  • RTF
  • HTML
  • XML
There are two points to be considered: Content repository needs some time after document adding to parse documents content and extract needed informations. In my tests I had to wait ca. 7 sek.
// add document and save changes
....

// sleep 7 sek. to allow the content of document be indexed
try {
    Thread.sleep(7000);
} catch (InterruptedException e) {
    ;
}

// do full text search
....
The second point is related to the configuration of node types. The full text search works if you use quite normally node type nt:file which contains a sub node jcr:content of type nt:resource. If you use custom node types you must ensure that the node type describing binary content has at least two properties: jcr:data (content is stored here) and jcr:mimeType. The second property for mime type is very important. Without the mime type there isn't text extraction (consequential, isn't it?). Here is an example in XML notation:
<nodeType name="cssns:resource"
          isMixin="false"
          hasOrderableChildNodes="false"
          primaryItemName="jcr:data">
    <supertypes>
        <supertype>nt:base</supertype>
        <supertype>mix:referenceable</supertype>
    </supertypes>
    <propertyDefinition name="jcr:mimeType"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false">
    </propertyDefinition>
    <propertyDefinition name="jcr:data"
                        requiredType="Binary"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false">
    </propertyDefinition>
</nodeType>

<nodeType name="cssns:file"
          isMixin="false"
          hasOrderableChildNodes="false"
          primaryItemName="jcr:content">
    <supertypes>
        <supertype>mix:versionable</supertype>
        <supertype>cssns:hierarchyNode</supertype>
    </supertypes>
    <propertyDefinition name="cssns:size"
                        requiredType="Long"
                        autoCreated="true"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false">
        <defaultValues>
            <defaultValue>-1</defaultValue>
        </defaultValues>
    <childNodeDefinition name="jcr:content"
                          defaultPrimaryType=""
                         autoCreated="false"
                         mandatory="true"
                          onParentVersion="COPY"
                          protected="false"
                          sameNameSiblings="false">
     <requiredPrimaryTypes>
         <requiredPrimaryType>cssns:resource</requiredPrimaryType>
     </requiredPrimaryTypes>
    </childNodeDefinition>
</nodeType>