ElasticSearch 설정/관리

elasticsearch.yml

아래는 엘라스틱서치 바이너리 내에 yml.template 로 기본 기재된 파라미터

Parameter Name	Default Value
cluster.name	my-application
node.name	node-1
node.attr.rack	r1
path.data	/path/to/data
path.logs	/path/to/logs
bootstrap.memory_lock	true
network.host	192.168.0.1
http.port	9200
discovery.zen.ping.unicast.hosts	[“host1”, “host2”]
discovery.zen.minimum_master_nodes
gateway.recover_after_nodes	3
action.destructive_requires_name	true

cluster.name
클러스터 이름. 이 이름이 같아야 멤버 노드드들이 조인이 가능하다. 다른 엘라스틱서치 시스템 노드의 클러스터 이름이 같지 않도록 주의해야 한다.

node.name
${HOSTNAME} 또는 노드 이름
${…} 과 같이 환경 변수를 참조할 수도 있다.

path.data
샤드 및 레플리카 데이터파일이 저장되는 경로.
아래와 같이 여러 경로를 사용할 수도 있다.

path:
  data:
    - /mnt/elasticsearch_1
    - /mnt/elasticsearch_2
    - /mnt/elasticsearch_3

path.logs
elasticsearch 노드의 로그가 저장되는 경로.

bootstrap.memory_lock

network.host
기본값으로는 loopback address 에만 바인딩 된다. 클러스터 구성을 하기 위해서는 클러스터 통신이 가능한 NIC의 IP Address 를 지정해주면 된다.

http.port
HTTP REQUEST 를 수신하는 포트
단일 값 또는 범위 값을 입력할 수 있다.
기본값: 9200-9300 . 범위 값으로 입력하는 경우 최초로 획득한 포트 넘버를 사용한다.

transport.tcp.port
노드 간 통신을 하는 포트
단일 값 또는 범위 값을 입력할 수 있다.
기본값: 9300-9400 . 범위 값으로 입력하는 경우 최초로 획득한 포트 넘버를 사용한다.

discovery.zen.ping.unicast.hosts
엘라스틱서치에는 "Zen Discovery" 라고 불리우는 클러스터링, 마스터 election 구조가 있다.
생존신고를 할 수 있는 가까운 노드의 list 를 적는 것인가(?)

discovery.zen.minimum_master_nodes
split-brain 으로 인해 별개의 두 클러스터가 생기는 상황을 방지하기 위한 설정.

https://www.elastic.co/guide/en/elasticsearch/reference/current/discovery-settings.html
To avoid a split brain, this setting should be set to a quorum of master-eligible nodes
(master_eligible_nodes / 2) + 1

gateway.recover_after_nodes
여기에 지정된 갯수 이상의 node가 클러스터에 참가했을 때에 recovery 를 수행함.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html

action.destructive_requires_name
이 설정이 true 로 되어있는 경우, 인덱스 삭제 API 에서 wildcard 또는 _all 을 수행할 수 없다.

node.attr.rack

node.attr.zone

서로 다른 랙 또는 존에 위치한 노드들 간의 shard 할당 룰을 정의할 수 있다. rack 지정 시에는, 가능한한 다른 랙에 레플리카를 설정할 수 있도록 하고 불가능하면 같은 랙 내에도 복제본을 할당한다.
zone 지정시에는 한 존이 장애 상황일 때에 해당 존의 shard를 다른 존에 재할당하지 않는다.
https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html

cluster.routing.allocation.awareness.force.zone.values

cluster.routing.allocation.awareness.attributes

log4j2.properties

다음과 같이 로거 설정을 할 수 있다.
https://www.elastic.co/guide/en/elasticsearch/reference/current/logging.html

command-line
ex) -E logger.org.elasticsearch.transport=trace

elasticsearch.yml

logger.org.elasticsearch.transport: trace

cluster setting API

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
  "logger.org.elasticsearch.transport": "trace"
}
}
'

log4j2.properties
전문적(?)으로 로깅 설정을 하기 위해서 이 파일을 수정할 수 있다.
```
logger.transport.name = org.elasticsearch.transport
logger.transport.level = trace
```

jvm.options

Option Name	Default Value
-Xms	-Xms1g
-Xmx	-Xmx1g

Index Settings

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html

Static index settings

Below is a list of all static index settings that are not associated with any specific index module:

Parameter Name	Default Value
index.number_of_shards	5
index.shard.check_on_startup	false
index.code	LZ4
index_routing_partition_size	1

index.number_of_shards
인덱스 별 primary shard 의 갯수를 정의. default:5

index.shard.check_on_startup
index open (기동 또는 close 이후 다시 open) 작업 때에, corruption 확인

false
기본값. corruption 확인하지 않음.
checksum
Check for physical corruption.
true
Check for both physical and logical corruption.
fix
Check for both physical and logical corruption. Segments that were reported as corrupted will be automatically removed. This option may result in data loss. Use with extreme caution!

index.codec
압축 코덱

null
LZ4
best_compression
DEFLATE

index.routing_partition_size
index별로 document에 custom id 를 부여하는 경우, 이 속성을 통해서 검색하는 파티션 갯수를 줄일 수 있다.
기본적으로 모든 샤드에 균등하게 document 를 분배하지만, custom id 를 사용하는 경우는 아닐 수 있다. 기본값은 1
잘 모르겠다 어려움

Dynamic index settings

Below is a list of all dynamic index settings that are not associated with any specific index module:

Parameter Name	Default Value
index.number_of_replicas	1
index.auto_expand_replicas	false
index.refresh_interval	1s
index.max_result_window	10000
index.max_inner_result_window	100
index.max_rescore_window	10000
index.max_docvalue_fields_search	100
index.max_script_fields	32
index.max_ngram_diff	1
index.max_shingle_diff	3
index.blocks.read_only	false
index.blocks.read_only_allow_delete	false
index.blocks.read	false
index.blocks.write	false
index.blocks.metadata	false
index.max_refresh_listeners
index.highlight.max_analyzed_offset	-1
index.max_terms_count	65536
index.routing.allocation.enable	all
index.routing.rebalance.enable	all
index.gc_deletes	60s

index.number_of_replicas
기본값 1
동적으로 API 를 이용해 변경할 수 있다. 빠르게 데이터를 인덱싱해야하는 경우 (예를 들어 초기적재), 이 값을 0으로 해서 수집한 뒤 이후 클러스터에 부하가 없을 때 동적으로 변경하면 유리하다.

curl -X PUT "localhost:9200/twitter/_settings" -H 'Content-Type: application/json' -d'
{
    "index" : {
        "number_of_replicas" : 2
    }
}
'

index.auto_expand_replicas
"0-all" 값으로 설정하는 경우 모든 노드에 replica를 생성시키는 것 같다. shard allocation awareness rule 에 구애받지 않는다는 것 같은데, rack, zone 설정 이후에 추가 테스트를 해봐야할 것 같다.
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index-modules.html#_static_index_settings

false
기본값
range
0-5 or 0-all

index.refresh_interval
index에 checkpoint(?) 가 발생하는 주기. 지금 데이터가 수집됐다면 이 기간만큼 지나야 조회가 가능하다.
기본값 1s
-1 로 설정하는 경우 refresh 하지 않음

index.max_result_window
기본값 10000
"from + size" 값이 이 수치를 넘어서는 경우 에러가 발생. 페이징 처리와 관련된 것 같다. 페이징 처리에서 문제가 생겨 이 수치를 굳이 높이지 말고 Scroll, Search After 기능을 사용해보라고 하는듯. 어렵네

index.max_inner_result_window
The maximum value of from + size for inner hits definition and top hits aggregations to this index. Defaults to 100. Inner hits and top hits aggregation take heap memory and time proportional to from + size and this limits that memory.

index.max_rescore_window
The maximum value of window_size for rescore requests in searches of this index. Defaults to index.max_result_window which defaults to 10000. Search requests take heap memory and time proportional to max(window_size, from + size) and this limits that memory.

index.max_docvalue_fields_search
The maximum number of docvalue_fields that are allowed in a query. Defaults to 100. Doc-value fields are costly since they might incur a per-field per-document seek.

index.max_script_fields
The maximum number of script_fields that are allowed in a query. Defaults to 32.

index.max_ngram_diff
The maximum allowed difference between min_gram and max_gram for NGramTokenizer and NGramTokenFilter. Defaults to 1.

index.max_shingle_diff
The maximum allowed difference between max_shingle_size and min_shingle_size for ShingleTokenFilter. Defaults to 3.

index.blocks.read_only
false(기본값) -> true: 해당 인덱스를 read_only 상태로 변경. AnyMiner의 Warm 상태?

index.blocks.read_only_allow_delete
false(기본값) -> true: 해당 인덱스를 read_only 상태로 변경하고 삭제가 가능하도록 함.

index.blocks.read
false(기본값) -> true: 해당 인덱스를 읽지 못하게 함

index.blocks.write
false(기본값) -> true: 해당 인덱스를 쓰기 불가능하게 함. read_only 항목과는 다르게 metadata에는 형향을 미치지 않음

index.blocks.metadata
false(기본값) -> true: 해당 인덱스의 메타데이터의 읽기/쓰기를 못하게 함

index.max_refresh_listeners
Maximum number of refresh listeners available on each shard of the index. These listeners are used to implement refresh=wait_for.

index.highlight.max_analyzed_offset
The maximum number of characters that will be analyzed for a highlight request. This setting is only applicable when highlighting is requested on a text that was indexed without offsets or term vectors. By default this settings is unset in 6.x, defaults to -1.

index.max_terms_count
The maximum number of terms that can be used in Terms Query. Defaults to 65536.

index.routing.allocation.enable
Controls shard allocation for this index. It can be set to:

all (default) – Allows shard allocation for all shards.
primaries – Allows shard allocation only for primary shards.
new_primaries – Allows shard allocation only for newly-created primary shards.
none – No shard allocation is allowed.

index.routing.rebalance.enable
Enables shard rebalancing for this index. It can be set to:

all (default) – Allows shard rebalancing for all shards.
primaries – Allows shard rebalancing only for primary shards.
replicas – Allows shard rebalancing only for replica shards.
none – No shard rebalancing is allowed.

index.gc_deletes
The length of time that a deleted document’s version number remains available for further versioned operations. Defaults to 60s.

delay allocation

https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html

클러스터 구성 고급

http://kimjmin.net/2018/01/2018-01-build-es-cluster-5/

[ElasticSearch] Configuration

ElasticSearch 설정/관리

elasticsearch.yml

log4j2.properties

jvm.options

Index Settings

Static index settings

Dynamic index settings

delay allocation

클러스터 구성 고급

You Might Also Like

[Books] Source Code – Bill Gates

Ollama ✕ WSL 2 ✕ VSCode Code GPT

re:Invent 2022 참가 후기