[Fluentd] Fluentd 소개

Fluentd 소개

with Fluentd

내부 언어

Cruby 로 작성. 루비는 객체 지향 인터프리터 언어
https://www.ruby-lang.org/ko/about/

Cruby vs. Jruby

The default Ruby, the one people think of as "just Ruby," is CRuby.

JRuby is a Ruby interpreter written in Java. It’s written and maintained by a different team. It focuses hard on performance – especially for long-running servers like web servers. It’s far better for concurrency, especially multithreading. The garbage collection is more advanced, but JRuby uses far more memory and has much longer startup time. Don’t write your tiny command-line apps in it! It also takes more warmup to get to full speed. JRuby has great compatibility with Java libraries, but has more trouble with the C libraries CRuby is good with. It’s basically a whole different language project that happens to interpret exactly the same source code.
http://engineering.appfolio.com/appfolio-engineering/2017/12/28/cruby-mri-jruby-rubyspec-rubinius-yarv-a-little-bit-of-ruby-naming

Fluentd vs. Logstash

https://www.loomsystems.com/blog/single-post/2017/01/30/a-comparison-of-fluentd-vs-logstash-log-collector

데이터 처리 형태

Fluentd는 가능한한 모든 데이터를 JSON 형태로 구조화 하여 수집/필터/버퍼/적재 작업을 진행한다. 따라서 다양한 데이터 소스, 타겟에 적용 가능하다. But, 단순히 텍스트 파일을 읽어 전달만 하면 되는 상황 (CSV 파일을 단순히 읽어, CSV 형태로 하둡에 저장) 에서는 굳이 JSON 포매팅 및 CSV 재전환 작업이 불필요하게 수행되어 성능이 떨어지는 것으로 보인다.

장점

Open community 에서 다양한 Plugin을 제공하기 때문에 다양한 데이터 소스, 저장소를 채택할 수 있다. But, 최신 버전에서 제대로 동작하지 않는 plugin 들이 있는 듯. 자체 버전관리가 필요할 수 있다.
File-based/Memory-based buffering 을 제공함으로써 inter-node data loss 를 사전에 방지할 수 있다. 또한 in/out forwarding plugin 을 통해서 다양한 구조/단계로 수집기 구성을 할 수 있다.
Fluentd agent 별로 1개씩의 config 파일 설정이 필요한데, configuration 문법이 직관적이고 플러그인별로 파라미터들을 알기만 하면 쉽게 설정할 수 있다. fluentd-ui 를 통해서 configuration 수정, fluentd 로그 모니터링, fluentd 데몬 실행/종료 가 가능하다.
java 로 구동되는 수집기들에 비해 가벼운 것 같다. 실제 서비스/운영 시스템에서 ruby, openssl 이 기본적으로 설치돼 있는지는 모르겠지만 ruby, openssl 의존성이 필요하다.
https://www.fluentd.org/ 사이트에서 core plugins 에 대한 guides & recipes 들이 잘 정리돼 있다.

아키텍처, 핵심 구성 요소

Input

HTTP, tail, TCP/UDP 등

Parser

Input plugin에서 수집된 데이터를 곧바로 Fluentd 내부에서 사용할 수 없는 형태인 경우 선택적으로 사용 가능.
정규표현식, apache, nginx, syslog 등 plugin 을 이용하여 json 데이터로 파싱.

Filter

row filter
grep plugin 을 이용하여 특정 필드에 특정 패턴이 존재하는 경우에만 수집 가능.
column add
로그 데이터에 해당 로그 소스 또는 로그가 발생한 호스트 정보 등을 추가 가능.
column del
불필요한 필드 삭제 가능

Formatter

elasticsearch 를 저장소로 지정하는 경우 무조건 json 으로 저장이 될 것이기 때문에 필요없지만, file, hdhfs, stdout 과 같은 파일 포맷 변경이 가능한 저장소를 지정하는 경우 텍스트 파일 내, 데이터 형태를 변경할 수 있다.

csv
json

Buffer

flush 주기 설정
file buffer 사용 시 temp file 생성 경로 설정

Output

stdout/file/elasticsearch/webhdfs/s3/mongo/mysql…
다양한 저장소에 대한 plugin 을 지원한다. 각 플러그인 별로 사용되는 파라미터들이 다르다.