Search in C# with Lucene.Net / DotLucene

written by The Admin on Sunday, December 30 2007

For projects that require full-text search, its worth taking a look at Lucene.Net. I think the commercially usable open-source library is a useful tool to be familiar with. It powers high traffick websites and some interesting open source search projects. The code snippets below show the bare minimum needed to get search working. In another post we'll look at how Lucene.Net is used by the Cuyahoga website framework.

The first step is to index the data that is to be searched. Basically, we loop through all the files and add their data to an "index" file. The index file can then be used for fast search and retrieval of data. Lucene.Net's IndexWriter class is used to write data to the index file. It takes a StandardAnalyzer object as a parameter, which is just a tokenizer for English and Latin-based languages (tokenizers are also available for other languages). Data is added to an index in the form of Lucene.Net "Document" types. Lucene.Net.Document types contain fields that uniquely identify the document, such as author, file path, date created, and its text.

static void Main(string[] args)
{
string path = args[0];
IndexWriter writer = new IndexWriter("index", new StandardAnalyzer(), true);
DirectoryInfo docDir = new DirectoryInfo(path);
FileInfo[] files = docDir.GetFiles();

foreach(FileInfo f in files)
{
Document doc = new Document();
doc.Add(new Field("path", f.FullName,
Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("modified",
DateTools.TimeToString(f.LastWriteTime.Ticks,
DateTools.Resolution.MINUTE),
Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("contents",
new StreamReader(f.FullName,
System.Text.Encoding.Default)));
writer.AddDocument(doc);
Console.Out.WriteLine("adding " + f.ToString());
}
writer.Optimize();
writer.Close();
}
Something like the below can then be used to search an index:
static void Main(string[] args)
{
string index = "index";
string field = "contents";

IndexReader reader = IndexReader.Open(index);
Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
StreamReader in_Renamed = null;
in_Renamed =
StreamReader(StreamReader(System.Console.OpenStandardInput(),
System.Text.Encoding.GetEncoding("UTF-8")).BaseStream,
new System.IO.StreamReader(System.Console.OpenStandardInput(),
System.Text.Encoding.GetEncoding("UTF-8")).CurrentEncoding);

QueryParser parser = new QueryParser(field, analyzer);

while (true)
{
System.Console.Out.Write("Search: ");
string line = in_Renamed.ReadLine();

if (line == null || line.Length == 0)
break;

Query query = parser.Parse(line);
WriteLine("Searching for: " + query.ToString(field));
Hits hits = searcher.Search(query);
WriteLine(hits.Length() + " total matching documents");

for (int hitIdx = 0; hitIdx < hits.Length(); hitIdx++)
{
Document doc = hits.Doc(hitIdx);
string path = doc.Get("path");
if (path != null)
{
WriteLine((hitIdx + 1) + ". " + path);
string title = doc.Get("title");
if (title != null)
{
WriteLine(" Title: " + doc.Get("title"));
}
else
{
WriteLine((i + 1) + ". " + "No path for this document");
}
}
reader.Close();
}
}
}

Similar Posts

  1. Cuyahoga's Search Implementation
  2. Breaking Apart at the Seams
  3. Project Management Software

Comments are closed

Options:

Size

Colors