zsmeijin
6/16/2019 - 2:01 PM

Lucene配置和使用

[Spring Lucene] #Database #Spring

Maven

<dependency>
  <groupId>org.apache.lucene</groupId>
  <artifactId>lucene-queryparser</artifactId>
  <version>7.1.0</version>
</dependency>

连接初始化

public void init() throws IOException {
    String indexpath = environment.getProperty("lucene.indexpath");
    logger.info("初始化Lucene服务, Lucene本地索引文件 " + indexpath + " 导入中....");
    File file = new File(indexpath);
    if (!file.exists()) {
      file.mkdirs();
    }
    directory = FSDirectory.open(Paths.get(indexpath));
    indexWriterConfig = new IndexWriterConfig();
    indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
    indexWriter = new IndexWriter(directory, indexWriterConfig);
    searcherManager = new SearcherManager(indexWriter, new SearcherFactory());
    indexSearcher = searcherManager.acquire();
    timer = new Timer();
    timerTask = new updateIndex(indexWriter, searcherManager);
    timer.schedule(timerTask, 0, 10);
    logger.info("Lucene服务已初始化成功");
}

插入记录

public void insert() throws IOException {
    Document doc = new Document();
    doc.add(new StringField("id1", id1, Field.Store.YES));
    doc.add(new StringField("id2", id2, Field.Store.YES));
    //将doc对象保存到索引库中
    indexWriter.addDocument(doc);
}

记录查询

private Boolean isNew() throws IOException {
    BooleanQuery.Builder builder = new BooleanQuery.Builder();

    TermQuery query1 = new TermQuery(new Term("id1", id1));
    builder.add(query1, BooleanClause.Occur.MUST);
    TermQuery query2 = new TermQuery(new Term("id2", id2));
    builder.add(query2, BooleanClause.Occur.MUST);

    BooleanQuery query = builder.build();
    TopDocs result = indexSearcher.search(query, 1);
    if(result.totalHits > 0){
      searcherManager.release(indexSearcher);
      return false;
    }else {
      searcherManager.release(indexSearcher);
      return true;
    }
}

连接释放

public void end() throws IOException {
    indexWriter.commit();
    timerTask.cancel();
    timer.cancel();
    indexWriter.close();
    searcherManager.close();
    logger.info("Lucene服务已结束,indexWriter已关闭");
}

索引更新

Lucene在执行indexWriter.addDocument()插入文档后并不会立即更新查询索引,仅当indexWriter.commit()执行后才会进行更新。如果每次新增文档后均执行indexWriter.commit(),一定程度可避免数据不一致,但可能带来额外的性能开销。
常见的折中方案是:新建一个定时任务定期更新索引,并使用searcherManager.maybeRefresh()来避免未新增文档时的无意义更新。
为使用searcherManager.maybeRefresh()方法,需要在创建连接时将indexWriter托管给searcherManager:

searcherManager = new SearcherManager(indexWriter, new SearcherFactory());

定时更新索引代码如下:

class updateIndex extends TimerTask {
    private IndexWriter indexWriter;
    private SearcherManager searcherManager;

    public updateIndex(IndexWriter indexWriter, SearcherManager searcherManager){
      this.indexWriter = indexWriter;
      this.searcherManager = searcherManager;
    }

    @Override
    public void run(){
      process();
    }

    private void process(){//通过独立的定时任务更新索引,每次提交均更新开销过大
      try {
        if(indexWriter.isOpen()){
          indexWriter.commit();
          searcherManager.maybeRefresh();
        }
      } catch (Exception e) {
        logger.info("indexWriter已关闭,未执行commit和refresh操作");
      }
    }
}

参考

https://segmentfault.com/a/1190000003101607
https://segmentfault.com/a/1190000011916639