Solr

From PTAGISWiki

Jump to: navigation, search

Contents

Apache Solr indexing and search

I've installed solr on snapper and am trying to index the documentation on sockeye as part of the knowledge transfer from Doug.

I've written a script that walks the filesystem on sockeye below /home/ptagdev/ and generates an xml file describing the content and metadata of each file it encounters. Those xml files are then submitted to solr.

Reference this tutorial: http://lucene.apache.org/solr/tutorial.html

create a custom schema

I started with the example schema and tried to turn it into a description of files in a filesystem. The result is here: solr schema.xml for sockeye. The schema sits here in the filesystem:

/home/rday/downloads/apache-solr-1.3.0/example/solr/conf/schema.xml

indexing files

The perl script that walks sockeye's filesystem and emits xml is here: solr-sockeye.pl

A sample output xml is here: sample solr xml for sockeye

posting files to solr

The solr package I downloaded included this tool for posting files:

/home/rday/downloads/apache-solr-1.3.0/example/exampledocs/post.jar

It is accepts a list of files as arguments and inserts them into the index. It is invoked like this:

[rday@snapper exampledocs]$ java -jar post.jar /home/rday/bin/solr/usr/pit/ptagdev/ftp_test.txt.xml 
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file ftp_test.txt.xml
SimplePostTool: COMMITting Solr index changes..



deleting files from solr

It should work to submit a document named delete.xml with this content to solr:

<delete><query>*:*</query></delete>

There is a script in /home/rday/downloads/apache-solr-1.3.0/example/exampledocs that should do the trick:

java -jar post.jar delete.xml

searching solr

The default search URL is here:

http://localhost:8983/solr/admin/

custom output

The output can be routed through a "response writer" and transformed by an xslt. An example of this is here:

http://localhost:8983/solr/select/?stylesheet=&q=pittag&wt=xslt&tr=example.xsl

A cgi wrapper for that url is here:

http://snapper.psmfc.org/cgi-bin/solr.pl

schema browser

Going to the admin console and clicking on schema browser, then fields allows you to see each field, how it is configured, the top ten values, and a histogram of value distribution.

solr statistics

This url http://localhost:8983/solr/admin/stats.jsp tells how many documents are currently in the index so you can watch the progress of the indexing program.

faceted search results

This query returns the top results and facet information about the query:

http://snapper.psmfc.org:8983/solr/select/?&fl=*,score&q=rday&facet=true&facet.field=extension&facet.field=uid&facet.field=mtime

This query returns facet information based on a date field:

http://snapper.psmfc.org:8983/solr/select/?
  &fl=*,score
  &q=rday
  &facet=true
  &facet.field=extension
  &facet.field=uid
  &facet.field=mtime
  &facet.query=mtime:[* TO 2000-01-01T00:00:00Z]
  &facet.query=mtime:[2000-01-01T00:00:00Z TO 2002-01-01T00:00:00Z]
  &facet.query=mtime:[2002-01-01T00:00:00Z TO *]
Personal tools