{"id":571,"date":"2018-11-07T10:25:17","date_gmt":"2018-11-07T01:25:17","guid":{"rendered":"https:\/\/dong1lkim.oboki.net\/?p=571"},"modified":"2019-09-01T22:20:24","modified_gmt":"2019-09-01T13:20:24","slug":"elasticsearch-lucene","status":"publish","type":"post","link":"https:\/\/oboki.net\/workspace\/data-engineering\/elasticsearch\/elasticsearch-lucene\/","title":{"rendered":"[ElasticSearch] Lucene"},"content":{"rendered":"<h1>Lucene<\/h1>\n<blockquote><p>\n  OpenSource \uac80\uc0c9 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub85c\uc11c \ud558\ub461 \uac1c\ubc1c\uc790\ub85c \uc798 \uc54c\ub824\uc9c4 Doug Cutting\uc774 \uac1c\ubc1c\ud588\ub2e4. Lucene\uc774\ub77c\ub294 \uc774\ub984\uc740 \uadf8\uc758 \uc544\ub0b4 middle name \uc744 \ub530\uc11c \uc9c0\uc5c8\ub2e4\uace0.\n<\/p><\/blockquote>\n<p>Lucene\uc740 <a href=\"https:\/\/blog.naver.com\/ndb796\/220870218783\">Levenshtein distance<\/a>\uc5d0 \uae30\ubc18\ud55c <a href=\"https:\/\/whatis.techtarget.com\/definition\/fuzzy-search\">fuzzy search<\/a> \uae30\ub2a5\uae4c\uc9c0 \uc788\ub294 \uac80\uc0c9 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub85c \uac80\uc0c9 \ub2a5\ub825\uc774 \ub6f0\uc5b4\ub098\ub2e4\uace0 \ud55c\ub2e4.<\/p>\n<h2>Lucene-demo<\/h2>\n<ul>\n<li><a href=\"http:\/\/lucene.apache.org\/\">http:\/\/lucene.apache.org\/<\/a><\/li>\n<li><a href=\"http:\/\/mirror.navercorp.com\/apache\/lucene\/java\/7.5.0\/lucene-7.5.0.tgz\">apache\/lucene\/java\/7.5.0<\/a><\/li>\n<\/ul>\n<p>\uc704 \uacbd\ub85c\uc5d0\uc11c Lucene \ucd5c\uc2e0 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub97c \ub2e4\uc6b4\ub85c\ub4dc \ubc1b\uc73c\uba74 <code>demo\/lucene-demo-7.5.0.jar<\/code> \ub370\ubaa8 \ud074\ub798\uc2a4\uac00 \uc788\ub2e4.<\/p>\n<p>IndexFiles\ub97c \uc774\uc6a9\ud558\uba74 \ud30c\uc77c \uc2dc\uc2a4\ud15c\uc758 \ud2b9\uc815 \uacbd\ub85c \ub0b4\uc758 \ubaa8\ub4e0 \ubb38\uc11c(text,txt,html \ub4f1)\uc758 \ub0b4\uc6a9\uc744 \uc778\ub371\uc2f1\ud560 \uc218 \uc788\uace0 SearchFiles\ub97c \uc774\uc6a9\ud558\uba74 \ubb38\uc11c \ub0b4\uc758 \ud0a4\uc6cc\ub4dc\ub4e4\uc744 \uc774\uc6a9\ud574\uc11c \ud30c\uc77c \uc704\uce58\ub97c \ucc3e\uc544\ub0bc \uc218 \uc788\ub2e4.<\/p>\n<h3>Lucene \ub77c\uc774\ube0c\ub7ec\ub9ac \uacbd\ub85c\uc758 \ubb38\uc11c \uc0c9\uc778\ud558\uae30<\/h3>\n<p>\uc544\ub798\uc640 \uac19\uc774 <code>IndexFiles<\/code>\ub97c \uc774\uc6a9\ud574\uc11c <code>Lucene<\/code> \uacbd\ub85c\uc758 \ubaa8\ub4e0 \ubb38\uc11c\ub97c <code>index<\/code> \ub77c\ub294 \uc778\ub371\uc2a4\ub85c \uc0dd\uc131\ud55c\ub2e4.<\/p>\n<p><code>java -cp Lucene\/7.5.0\/demo\/lucene-demo-7.5.0.jar:Lucene\/7.5.0\/core\/lucene-core-7.5.0.jar org.apache.lucene.demo.IndexFiles -index index -docs Lucene<\/code><\/p>\n<p>\ub2e4\uc74c\uacfc \uac19\uc774 \ubaa8\ub4e0 \ubb38\uc11c\ub4e4\uc744 \uc77d\uc5b4\ub4e4\uc774\uace0 \uc0c9\uc778\ud568.<\/p>\n<pre><code class=\"log\">Indexing to directory 'index'...\nadding Lucene\/lucene-7.5.0.tgz\nadding Lucene\/7.5.0\/licenses\/ant-1.8.2.jar.sha1\nadding Lucene\/7.5.0\/licenses\/ant-LICENSE-ASL.txt\n..\n.\nadding Lucene\/5.5.5\/suggest\/lucene-suggest-5.5.5.jar\n35647 total milliseconds\n<\/code><\/pre>\n<h4>\uc0c9\uc778\ub41c \ubb38\uc11c\uc5d0\uc11c &#8220;Apache Lucene is a high-performance, full-featured text search engine library.&#8221; \ubb38\uc790\uc5f4 \ucc3e\uae30<\/h4>\n<p>\ub2e4\uc74c\uacfc \uac19\uc774 <code>index<\/code> \uc778\ub371\uc2a4\ub97c \uc870\ud68c\ud558\uc5ec &quot;Apache Lucene is a high-performance, full-featured text search engine library.&quot; \ubb38\uc790\uc5f4\uc744 \uac80\uc0c9\ud55c\ub2e4.<\/p>\n<p><code>java -cp Lucene\/7.5.0\/queryparser\/lucene-queryparser-7.5.0.jar:Lucene\/7.5.0\/demo\/lucene-demo-7.5.0.jar:Lucene\/7.5.0\/core\/lucene-core-7.5.0.jar org.apache.lucene.demo.SearchFiles -query &quot;Apache Lucene is a high-performance, full-featured text search engine library.&quot;<\/code><\/p>\n<p>\uac80\uc0c9 \uacb0\uacfc 12967 \uac74\uc758 \uc77c\uce58\ud558\ub294 \ubb38\uc11c\uac00 \ubc1c\uacac\ub418\ub294\ub370 ..<\/p>\n<pre><code class=\"log\">Searching for: apache lucene high performance full featured text search engine library\n12967 total matching documents\n1. Lucene\/5.5.5\/docs\/core\/overview-summary.html\n2. Lucene\/7.5.0\/docs\/core\/overview-summary.html\n3. Lucene\/7.5.0\/docs\/index.html\n4. Lucene\/5.5.5\/docs\/index.html\n5. Lucene\/7.5.0\/README.txt\n6. Lucene\/5.5.5\/README.txt\n7. Lucene\/7.5.0\/docs\/highlighter\/org\/apache\/lucene\/search\/vectorhighlight\/package-summary.html\n8. Lucene\/5.5.5\/docs\/highlighter\/org\/apache\/lucene\/search\/vectorhighlight\/package-summary.html\n9. Lucene\/5.5.5\/CHANGES.txt\n10. Lucene\/5.5.5\/licenses\/javax.servlet-LICENSE-CDDL.txt\n<\/code><\/pre>\n<p>\uc704\uc640 \uac19\uc774 \uac80\uc0c9 \uc870\uac74\uc5d0 \ubd80\ud569\ud558\ub294 \uac83\ub4e4 \uc911\uc5d0 top 10 \uacb0\uacfc\ub9cc \ucd9c\ub825\ud574\uc900\ub2e4.<\/p>\n<p>\uc544\ub798\uc640 \uac19\uc774  \uba85\ub839\uc73c\ub85c 1\uc21c\uc704 \ubb38\uc11c\uc5d0 \ud574\ub2f9 \ubb38\uc790\uc5f4\uc744 \ucc3e\uc544\ubcf4\uba74<br \/>\n<code>grep &quot;Apache Lucene is a high-performance, full-featured text search engine library.&quot; Lucene\/5.5.5\/docs\/core\/overview-summary.html<\/code><\/p>\n<p>\ud574\ub2f9 \ubb38\uc790\uc5f4\uc774 1\uc21c\uc704 \ubb38\uc11c\uc5d0\uc11c \uc815\ud655\ud788 \ubc1c\uacac\ub41c\ub2e4.<\/p>\n<pre><code class=\"bash\">&lt;div class=\"block\"&gt;Apache Lucene is a high-performance, full-featured text search engine library.&lt;\/div&gt;\n&lt;div class=\"block\"&gt;&lt;p&gt;Apache Lucene is a high-performance, full-featured text search engine library.\n<\/code><\/pre>\n<p>\ub2e4\uc74c \uba85\ub839\uc73c\ub85c 3\uc21c\uc704 \ubb38\uc11c\uc5d0 \ud574\ub2f9 \ubb38\uc790\uc5f4\uc744 \ucc3e\uc544\ubcf4\uba74<br \/>\n<code>grep &quot;Apache Lucene is a high-performance, full-featured text search engine library.&quot; Lucene\/7.5.0\/docs\/index.html<\/code><\/p>\n<p>\uc548 \ub098\uc628\ub2e4.<\/p>\n<pre><code class=\"bash\"><br \/><\/code><\/pre>\n<p>\ub300\uc2e0 <code>Apache Lucene<\/code>\uc774\ub77c\ub294 \ubb38\uc790\uc5f4\ub9cc 3\uc21c\uc704 \ubb38\uc11c\uc5d0\uc11c \ucc3e\uc73c\uba74<br \/>\n<code>grep &quot;Apache Lucene&quot; Lucene\/7.5.0\/docs\/index.html<\/code><\/p>\n<p>\ub2e4\uc74c\uacfc \uac19\uc774 \uc870\ud68c\ub41c\ub2e4.<\/p>\n<pre><code class=\"bash\">&lt;title&gt;Apache Lucene 7.5.0 Documentation&lt;\/title&gt;\n&lt;a href=\"http:\/\/lucene.apache.org\/core\/\"&gt;&lt;img src=\"lucene_green_300.gif\" title=\"Apache Lucene Logo\" alt=\"Lucene\" border=\"0\"&gt;&lt;\/a&gt;\n&lt;h1&gt;Apache Lucene&lt;span style=\"vertical-align: top; font-size: x-small\"&gt;TM&lt;\/span&gt; 7.5.0 Documentation&lt;\/h1&gt;\n          This is the official documentation for &lt;b&gt;Apache Lucene 7.5.0&lt;\/b&gt;. Additional documentation is available in the\n        audiences: first-time users looking to install Apache Lucene in their\n<\/code><\/pre>\n<p>Exact-Match\uac00 \uc544\ub2cc, Full-Text Search \uae30\ubc95\uc744 \uc774\uc6a9\ud574\uc11c \ubb38\uc11c \ub0b4\uc6a9\uc744 \uc0c9\uc778\ud55c \uac83 \uac19\ub2e4. \uc815\ud655\ud558\uac8c\ub294 Lucene-demo \uc18c\uc2a4\ub97c \uc77d\uc5b4\ubcf4\uace0, Lucene-core \ud074\ub798\uc2a4\ub97c \uacf5\ubd80\ud574\ubd10\uc57c\ud560 \uac83 \uac19\ub2e4.<\/p>\n<h5>Exact-Match<\/h5>\n<ul>\n<li>SELECT SQL ( WHERE COL LIKE &#8216;%A%&#8217; )<\/li>\n<li>grep<\/li>\n<\/ul>\n<p>\uc704\uc640 \uac19\uc774 Exact-Match\ub294 \uacf5\ubc31\uc744 \ud3ec\ud568\ud558\uc5ec \ubaa8\ub4e0 \ub2e8\uc5b4\uc640 \uadf8 \ub2e8\uc5b4\uc758 \ubc30\uc5f4 \uc21c\uc11c\uac00 \uc644\uc804\ud788 \uc77c\uce58\ud558\ub294 \uac83\ub4e4\uc744 \ucc3e\uc544\ub0bc \uc218 \uc788\ub2e4. \uc815\ud655\ud558\uc9c0\ub9cc \ubaa8\ub4e0 record\ub4e4\uc744 full-scan \ud574\uc57c\ud55c\ub2e4.<\/p>\n<h5>Full-Text Search<\/h5>\n<ul>\n<li>email \uac80\uc0c9<\/li>\n<li>\uac80\uc0c9\uc5d4\uc9c4 ( \uad6c\uae00 | \ub124\uc774\ubc84 )<\/li>\n<\/ul>\n<p>\ubb38\uc11c \ub0b4\uc758 \uc5b4\ub5a4 \ub2e8\uc5b4\ub4e4\uc774 \uc788\ub294\uc9c0 \uba3c\uc800 \ud30c\uc545\ud558\uace0, \uac01\uac01\uc758 \ub2e8\uc5b4\ub4e4\uc774 \uc5b4\ub290 \ubb38\uc11c\uc5d0 \uc704\uce58\ud588\uc5c8\ub294\uc9c0\ub97c \uc0c9\uc778\ud55c\ub2e4.<\/p>\n<h2>Lucene-demo Source code<\/h2>\n<h3>IndexFiles.java<\/h3>\n<pre><code class=\"java\">\/*\n * Licensed to the Apache Software Foundation (ASF) under one or more\n * contributor license agreements.  See the NOTICE file distributed with\n * this work for additional information regarding copyright ownership.\n * The ASF licenses this file to You under the Apache License, Version 2.0\n * (the \"License\"); you may not use this file except in compliance with\n * the License.  You may obtain a copy of the License at\n *\n *     http:\/\/www.apache.org\/licenses\/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an \"AS IS\" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n *\/\npackage org.apache.lucene.demo;\n\nimport java.io.BufferedReader;\nimport java.io.IOException;\nimport java.io.InputStream;\nimport java.io.InputStreamReader;\nimport java.nio.charset.StandardCharsets;\nimport java.nio.file.FileVisitResult;\nimport java.nio.file.Files;\nimport java.nio.file.Path;\nimport java.nio.file.Paths;\nimport java.nio.file.SimpleFileVisitor;\nimport java.nio.file.attribute.BasicFileAttributes;\nimport java.util.Date;\n\nimport org.apache.lucene.analysis.Analyzer;\nimport org.apache.lucene.analysis.standard.StandardAnalyzer;\nimport org.apache.lucene.document.LongPoint;\nimport org.apache.lucene.document.Document;\nimport org.apache.lucene.document.Field;\nimport org.apache.lucene.document.StringField;\nimport org.apache.lucene.document.TextField;\nimport org.apache.lucene.index.IndexWriter;\nimport org.apache.lucene.index.IndexWriterConfig.OpenMode;\nimport org.apache.lucene.index.IndexWriterConfig;\nimport org.apache.lucene.index.Term;\nimport org.apache.lucene.store.Directory;\nimport org.apache.lucene.store.FSDirectory;\n\n\/** Index all text files under a directory.\n * &lt;p&gt;\n * This is a command-line application demonstrating simple Lucene indexing.\n * Run it with no command-line arguments for usage information.\n *\/\npublic class IndexFiles {\n\n  private IndexFiles() {}\n\n  \/** Index all text files under a directory. *\/\n  public static void main(String[] args) {\n    String usage = \"java org.apache.lucene.demo.IndexFiles\"\n                 + \" [-index INDEX_PATH] [-docs DOCS_PATH] [-update]\\n\\n\"\n                 + \"This indexes the documents in DOCS_PATH, creating a Lucene index\"\n                 + \"in INDEX_PATH that can be searched with SearchFiles\";\n    String indexPath = \"index\";\n    String docsPath = null;\n    boolean create = true;\n    for(int i=0;i&lt;args.length;i++) {\n      if (\"-index\".equals(args[i])) {\n        indexPath = args[i+1];\n        i++;\n      } else if (\"-docs\".equals(args[i])) {\n        docsPath = args[i+1];\n        i++;\n      } else if (\"-update\".equals(args[i])) {\n        create = false;\n      }\n    }\n\n    if (docsPath == null) {\n      System.err.println(\"Usage: \" + usage);\n      System.exit(1);\n    }\n\n    final Path docDir = Paths.get(docsPath);\n    if (!Files.isReadable(docDir)) {\n      System.out.println(\"Document directory '\" +docDir.toAbsolutePath()+ \"' does not exist or is not readable, please check the path\");\n      System.exit(1);\n    }\n\n    Date start = new Date();\n    try {\n      System.out.println(\"Indexing to directory '\" + indexPath + \"'...\");\n\n      Directory dir = FSDirectory.open(Paths.get(indexPath));\n      Analyzer analyzer = new StandardAnalyzer();\n      IndexWriterConfig iwc = new IndexWriterConfig(analyzer);\n\n      if (create) {\n        \/\/ Create a new index in the directory, removing any\n        \/\/ previously indexed documents:\n        iwc.setOpenMode(OpenMode.CREATE);\n      } else {\n        \/\/ Add new documents to an existing index:\n        iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);\n      }\n\n      \/\/ Optional: for better indexing performance, if you\n      \/\/ are indexing many documents, increase the RAM\n      \/\/ buffer.  But if you do this, increase the max heap\n      \/\/ size to the JVM (eg add -Xmx512m or -Xmx1g):\n      \/\/\n      \/\/ iwc.setRAMBufferSizeMB(256.0);\n\n      IndexWriter writer = new IndexWriter(dir, iwc);\n      indexDocs(writer, docDir);\n\n      \/\/ NOTE: if you want to maximize search performance,\n      \/\/ you can optionally call forceMerge here.  This can be\n      \/\/ a terribly costly operation, so generally it's only\n      \/\/ worth it when your index is relatively static (ie\n      \/\/ you're done adding documents to it):\n      \/\/\n      \/\/ writer.forceMerge(1);\n\n      writer.close();\n\n      Date end = new Date();\n      System.out.println(end.getTime() - start.getTime() + \" total milliseconds\");\n\n    } catch (IOException e) {\n      System.out.println(\" caught a \" + e.getClass() +\n       \"\\n with message: \" + e.getMessage());\n    }\n  }\n\n  \/**\n   * Indexes the given file using the given writer, or if a directory is given,\n   * recurses over files and directories found under the given directory.\n   * \n   * NOTE: This method indexes one document per input file.  This is slow.  For good\n   * throughput, put multiple documents into your input file(s).  An example of this is\n   * in the benchmark module, which can create \"line doc\" files, one document per line,\n   * using the\n   * &lt;a href=\"..\/..\/..\/..\/..\/contrib-benchmark\/org\/apache\/lucene\/benchmark\/byTask\/tasks\/WriteLineDocTask.html\"\n   * &gt;WriteLineDocTask&lt;\/a&gt;.\n   *  \n   * @param writer Writer to the index where the given file\/dir info will be stored\n   * @param path The file to index, or the directory to recurse into to find files to index\n   * @throws IOException If there is a low-level I\/O error\n   *\/\n  static void indexDocs(final IndexWriter writer, Path path) throws IOException {\n    if (Files.isDirectory(path)) {\n      Files.walkFileTree(path, new SimpleFileVisitor&lt;Path&gt;() {\n        @Override\n        public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {\n          try {\n            indexDoc(writer, file, attrs.lastModifiedTime().toMillis());\n          } catch (IOException ignore) {\n            \/\/ don't index files that can't be read.\n          }\n          return FileVisitResult.CONTINUE;\n        }\n      });\n    } else {\n      indexDoc(writer, path, Files.getLastModifiedTime(path).toMillis());\n    }\n  }\n\n  \/** Indexes a single document *\/\n  static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException {\n    try (InputStream stream = Files.newInputStream(file)) {\n      \/\/ make a new, empty document\n      Document doc = new Document();\n\n      \/\/ Add the path of the file as a field named \"path\".  Use a\n      \/\/ field that is indexed (i.e. searchable), but don't tokenize \n      \/\/ the field into separate words and don't index term frequency\n      \/\/ or positional information:\n      Field pathField = new StringField(\"path\", file.toString(), Field.Store.YES);\n      doc.add(pathField);\n\n      \/\/ Add the last modified date of the file a field named \"modified\".\n      \/\/ Use a LongPoint that is indexed (i.e. efficiently filterable with\n      \/\/ PointRangeQuery).  This indexes to milli-second resolution, which\n      \/\/ is often too fine.  You could instead create a number based on\n      \/\/ year\/month\/day\/hour\/minutes\/seconds, down the resolution you require.\n      \/\/ For example the long value 2011021714 would mean\n      \/\/ February 17, 2011, 2-3 PM.\n      doc.add(new LongPoint(\"modified\", lastModified));\n\n      \/\/ Add the contents of the file to a field named \"contents\".  Specify a Reader,\n      \/\/ so that the text of the file is tokenized and indexed, but not stored.\n      \/\/ Note that FileReader expects the file to be in UTF-8 encoding.\n      \/\/ If that's not the case searching for special characters will fail.\n      doc.add(new TextField(\"contents\", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));\n\n      if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {\n        \/\/ New index, so we just add the document (no old document can be there):\n        System.out.println(\"adding \" + file);\n        writer.addDocument(doc);\n      } else {\n        \/\/ Existing index (an old copy of this document may have been indexed) so \n        \/\/ we use updateDocument instead to replace the old one matching the exact \n        \/\/ path, if present:\n        System.out.println(\"updating \" + file);\n        writer.updateDocument(new Term(\"path\", file.toString()), doc);\n      }\n    }\n  }\n}\n<\/code><\/pre>\n<h3>SearchFiles.java<\/h3>\n<pre><code class=\"java\">\/*\n * Licensed to the Apache Software Foundation (ASF) under one or more\n * contributor license agreements.  See the NOTICE file distributed with\n * this work for additional information regarding copyright ownership.\n * The ASF licenses this file to You under the Apache License, Version 2.0\n * (the \"License\"); you may not use this file except in compliance with\n * the License.  You may obtain a copy of the License at\n *\n *     http:\/\/www.apache.org\/licenses\/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an \"AS IS\" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n *\/\npackage org.apache.lucene.demo;\n\nimport java.io.BufferedReader;\nimport java.io.IOException;\nimport java.io.InputStreamReader;\nimport java.nio.charset.StandardCharsets;\nimport java.nio.file.Files;\nimport java.nio.file.Paths;\nimport java.util.Date;\n\nimport org.apache.lucene.analysis.Analyzer;\nimport org.apache.lucene.analysis.standard.StandardAnalyzer;\nimport org.apache.lucene.document.Document;\nimport org.apache.lucene.index.DirectoryReader;\nimport org.apache.lucene.index.IndexReader;\nimport org.apache.lucene.queryparser.classic.QueryParser;\nimport org.apache.lucene.search.IndexSearcher;\nimport org.apache.lucene.search.Query;\nimport org.apache.lucene.search.ScoreDoc;\nimport org.apache.lucene.search.TopDocs;\nimport org.apache.lucene.store.FSDirectory;\n\n\/** Simple command-line based search demo. *\/\npublic class SearchFiles {\n\n  private SearchFiles() {}\n\n  \/** Simple command-line based search demo. *\/\n  public static void main(String[] args) throws Exception {\n    String usage =\n      \"Usage:\\tjava org.apache.lucene.demo.SearchFiles [-index dir] [-field f] [-repeat n] [-queries file] [-query string] [-raw] [-paging hitsPerPage]\\n\\nSee http:\/\/lucene.apache.org\/core\/4_1_0\/demo\/ for details.\";\n    if (args.length &gt; 0 &amp;&amp; (\"-h\".equals(args[0]) || \"-help\".equals(args[0]))) {\n      System.out.println(usage);\n      System.exit(0);\n    }\n\n    String index = \"index\";\n    String field = \"contents\";\n    String queries = null;\n    int repeat = 0;\n    boolean raw = false;\n    String queryString = null;\n    int hitsPerPage = 10;\n\n    for(int i = 0;i &lt; args.length;i++) {\n      if (\"-index\".equals(args[i])) {\n        index = args[i+1];\n        i++;\n      } else if (\"-field\".equals(args[i])) {\n        field = args[i+1];\n        i++;\n      } else if (\"-queries\".equals(args[i])) {\n        queries = args[i+1];\n        i++;\n      } else if (\"-query\".equals(args[i])) {\n        queryString = args[i+1];\n        i++;\n      } else if (\"-repeat\".equals(args[i])) {\n        repeat = Integer.parseInt(args[i+1]);\n        i++;\n      } else if (\"-raw\".equals(args[i])) {\n        raw = true;\n      } else if (\"-paging\".equals(args[i])) {\n        hitsPerPage = Integer.parseInt(args[i+1]);\n        if (hitsPerPage &lt;= 0) {\n          System.err.println(\"There must be at least 1 hit per page.\");\n          System.exit(1);\n        }\n        i++;\n      }\n    }\n\n    IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));\n    IndexSearcher searcher = new IndexSearcher(reader);\n    Analyzer analyzer = new StandardAnalyzer();\n\n    BufferedReader in = null;\n    if (queries != null) {\n      in = Files.newBufferedReader(Paths.get(queries), StandardCharsets.UTF_8);\n    } else {\n      in = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));\n    }\n    QueryParser parser = new QueryParser(field, analyzer);\n    while (true) {\n      if (queries == null &amp;&amp; queryString == null) {                        \/\/ prompt the user\n        System.out.println(\"Enter query: \");\n      }\n\n      String line = queryString != null ? queryString : in.readLine();\n\n      if (line == null || line.length() == -1) {\n        break;\n      }\n\n      line = line.trim();\n      if (line.length() == 0) {\n        break;\n      }\n\n      Query query = parser.parse(line);\n      System.out.println(\"Searching for: \" + query.toString(field));\n\n      if (repeat &gt; 0) {                           \/\/ repeat &amp; time as benchmark\n        Date start = new Date();\n        for (int i = 0; i &lt; repeat; i++) {\n          searcher.search(query, 100);\n        }\n        Date end = new Date();\n        System.out.println(\"Time: \"+(end.getTime()-start.getTime())+\"ms\");\n      }\n\n      doPagingSearch(in, searcher, query, hitsPerPage, raw, queries == null &amp;&amp; queryString == null);\n\n      if (queryString != null) {\n        break;\n      }\n    }\n    reader.close();\n  }\n\n  \/**\n   * This demonstrates a typical paging search scenario, where the search engine presents \n   * pages of size n to the user. The user can then go to the next page if interested in\n   * the next hits.\n   * \n   * When the query is executed for the first time, then only enough results are collected\n   * to fill 5 result pages. If the user wants to page beyond this limit, then the query\n   * is executed another time and all hits are collected.\n   * \n   *\/\n  public static void doPagingSearch(BufferedReader in, IndexSearcher searcher, Query query, \n                                     int hitsPerPage, boolean raw, boolean interactive) throws IOException {\n\n    \/\/ Collect enough docs to show 5 pages\n    TopDocs results = searcher.search(query, 5 * hitsPerPage);\n    ScoreDoc[] hits = results.scoreDocs;\n\n    int numTotalHits = Math.toIntExact(results.totalHits.value);\n    System.out.println(numTotalHits + \" total matching documents\");\n\n    int start = 0;\n    int end = Math.min(numTotalHits, hitsPerPage);\n\n    while (true) {\n      if (end &gt; hits.length) {\n        System.out.println(\"Only results 1 - \" + hits.length +\" of \" + numTotalHits + \" total matching documents collected.\");\n        System.out.println(\"Collect more (y\/n) ?\");\n        String line = in.readLine();\n        if (line.length() == 0 || line.charAt(0) == 'n') {\n          break;\n        }\n\n        hits = searcher.search(query, numTotalHits).scoreDocs;\n      }\n\n      end = Math.min(hits.length, start + hitsPerPage);\n\n      for (int i = start; i &lt; end; i++) {\n        if (raw) {                              \/\/ output raw format\n          System.out.println(\"doc=\"+hits[i].doc+\" score=\"+hits[i].score);\n          continue;\n        }\n\n        Document doc = searcher.doc(hits[i].doc);\n        String path = doc.get(\"path\");\n        if (path != null) {\n          System.out.println((i+1) + \". \" + path);\n          String title = doc.get(\"title\");\n          if (title != null) {\n            System.out.println(\"   Title: \" + doc.get(\"title\"));\n          }\n        } else {\n          System.out.println((i+1) + \". \" + \"No path for this document\");\n        }\n\n      }\n\n      if (!interactive || end == 0) {\n        break;\n      }\n\n      if (numTotalHits &gt;= end) {\n        boolean quit = false;\n        while (true) {\n          System.out.print(\"Press \");\n          if (start - hitsPerPage &gt;= 0) {\n            System.out.print(\"(p)revious page, \");  \n          }\n          if (start + hitsPerPage &lt; numTotalHits) {\n            System.out.print(\"(n)ext page, \");\n          }\n          System.out.println(\"(q)uit or enter number to jump to a page.\");\n\n          String line = in.readLine();\n          if (line.length() == 0 || line.charAt(0)=='q') {\n            quit = true;\n            break;\n          }\n          if (line.charAt(0) == 'p') {\n            start = Math.max(0, start - hitsPerPage);\n            break;\n          } else if (line.charAt(0) == 'n') {\n            if (start + hitsPerPage &lt; numTotalHits) {\n              start+=hitsPerPage;\n            }\n            break;\n          } else {\n            int page = Integer.parseInt(line);\n            if ((page - 1) * hitsPerPage &lt; numTotalHits) {\n              start = (page - 1) * hitsPerPage;\n              break;\n            } else {\n              System.out.println(\"No such page\");\n            }\n          }\n        }\n        if (quit) break;\n        end = Math.min(numTotalHits, start + hitsPerPage);\n      }\n    }\n  }\n}\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Lucene OpenSource \uac80\uc0c9 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub85c\uc11c \ud558\ub461 \uac1c\ubc1c\uc790\ub85c \uc798 \uc54c\ub824\uc9c4 Doug Cutting\uc774 \uac1c\ubc1c\ud588\ub2e4. Lucene\uc774\ub77c\ub294 \uc774\ub984\uc740 \uadf8\uc758 \uc544\ub0b4 middle name \uc744 \ub530\uc11c \uc9c0\uc5c8\ub2e4\uace0. Lucene\uc740 Levenshtein distance\uc5d0 \uae30\ubc18\ud55c fuzzy search \uae30\ub2a5\uae4c\uc9c0 \uc788\ub294 \uac80\uc0c9 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub85c \uac80\uc0c9 \ub2a5\ub825\uc774 \ub6f0\uc5b4\ub098\ub2e4\uace0 \ud55c\ub2e4. Lucene-demo http:\/\/lucene.apache.org\/ apache\/lucene\/java\/7.5.0 \uc704 \uacbd\ub85c\uc5d0\uc11c Lucene \ucd5c\uc2e0 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub97c \ub2e4\uc6b4\ub85c\ub4dc \ubc1b\uc73c\uba74 demo\/lucene-demo-7.5.0.jar \ub370\ubaa8 \ud074\ub798\uc2a4\uac00 \uc788\ub2e4. IndexFiles\ub97c \uc774\uc6a9\ud558\uba74 \ud30c\uc77c \uc2dc\uc2a4\ud15c\uc758 \ud2b9\uc815 \uacbd\ub85c \ub0b4\uc758 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[25,29],"class_list":["post-571","post","type-post","status-publish","format-standard","hentry","category-elasticsearch","tag-elasticsearch","tag-lucene"],"_links":{"self":[{"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/posts\/571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/comments?post=571"}],"version-history":[{"count":5,"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/posts\/571\/revisions"}],"predecessor-version":[{"id":1199,"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/posts\/571\/revisions\/1199"}],"wp:attachment":[{"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/media?parent=571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/categories?post=571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/oboki.net\/workspace\/wp-json\/wp\/v2\/tags?post=571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}