chore: sync local changes
This commit is contained in:
@@ -1,111 +1,197 @@
|
||||
# ClickHouse + Fluent Bit 使用手册(Ubuntu 22.04 / Amazon Linux 2023)
|
||||
# ClickHouse + Fluent Bit 快速部署(Ubuntu 22.04 / Amazon Linux 2023)
|
||||
|
||||
## 1. 支持范围
|
||||
## 1. 脚本说明
|
||||
|
||||
- Ubuntu 22.04
|
||||
- Amazon Linux 2023(AWS)
|
||||
|
||||
安装脚本:`install_clickhouse_linux.sh`(自动识别上述系统)。
|
||||
|
||||
## 2. 安装 ClickHouse
|
||||
- `setup_clickhouse.sh`:一键入口(推荐),默认顺序执行 安装 ClickHouse -> 配置 HTTPS -> 应用运行参数 -> 初始化日志表。
|
||||
- `install_clickhouse_linux.sh`:安装 `clickhouse-server`、`clickhouse-client`,并启动服务。
|
||||
- `configure_clickhouse_https.sh`:生成自签名 `server.crt + server.key`,写入 HTTPS 配置并重启服务。
|
||||
- `configure_clickhouse_runtime.sh`:默认将日志级别设为 `warning`,并禁用高开销系统日志表(`text_log`、`part_log`、`metric_log`、`asynchronous_metric_log`、`trace_log`)。
|
||||
- `init_waf_logs_tables.sh`:执行建表脚本。
|
||||
- `init_waf_logs_tables.sql`:`logs_ingest`、`dns_logs_ingest` 表结构定义。
|
||||
|
||||
进入脚本所在目录
|
||||
```bash
|
||||
cd /path/to/waf-platform/deploy/clickhouse
|
||||
chmod +x install_clickhouse_linux.sh
|
||||
sudo ./install_clickhouse_linux.sh
|
||||
cd /opt/waf-platform/deploy/clickhouse
|
||||
chmod +x setup_clickhouse.sh
|
||||
```
|
||||
|
||||
可选:安装时初始化 `default` 用户密码:
|
||||
## 2. 一键部署
|
||||
|
||||
### 2.1 方式A:不设置 ClickHouse 密码(用户名固定 `default`)
|
||||
|
||||
```bash
|
||||
sudo CLICKHOUSE_DEFAULT_PASSWORD='YourStrongPassword' ./install_clickhouse_linux.sh
|
||||
```
|
||||
|
||||
## 3. 开启 HTTPS(默认仅 crt+key)
|
||||
|
||||
脚本默认生成 `server.crt + server.key`(带 SAN)并启用 8443:
|
||||
|
||||
```bash
|
||||
cd /path/to/waf-platform/deploy/clickhouse
|
||||
chmod +x configure_clickhouse_https.sh
|
||||
sudo CH_HTTPS_PORT=8443 \
|
||||
CH_CERT_CN=clickhouse.example.com \
|
||||
CH_CERT_DNS=clickhouse.example.com \
|
||||
CH_CERT_IP=<CLICKHOUSE_IP> \
|
||||
./configure_clickhouse_https.sh
|
||||
```
|
||||
|
||||
使用已有证书:
|
||||
|
||||
```bash
|
||||
sudo SRC_CERT=/path/to/server.crt \
|
||||
SRC_KEY=/path/to/server.key \
|
||||
CH_HTTPS_PORT=8443 \
|
||||
./configure_clickhouse_https.sh
|
||||
```
|
||||
|
||||
## 4. 初始化日志表(含优化)
|
||||
|
||||
```bash
|
||||
cd /path/to/waf-platform/deploy/clickhouse
|
||||
chmod +x init_waf_logs_tables.sh
|
||||
sudo CH_HOST=127.0.0.1 \
|
||||
CH_PORT=9000 \
|
||||
CH_USER=default \
|
||||
CH_PASSWORD='YourStrongPassword' \
|
||||
CH_DATABASE=default \
|
||||
./init_waf_logs_tables.sh
|
||||
sudo ./setup_clickhouse.sh
|
||||
```
|
||||
|
||||
说明:
|
||||
- `init_waf_logs_tables.sql` 已内置主要优化(`CODEC`、`LowCardinality`、跳数索引)。
|
||||
- `optimize_schema.sql` 主要用于历史表补齐优化,不是首次建表必需步骤。
|
||||
- ClickHouse 连接用户是 `default`
|
||||
- 未设置密码时,后续平台连接密码留空
|
||||
|
||||
## 5. 平台侧配置(EdgeAdmin)
|
||||
### 2.2 方式B:设置用户名/密码(示例使用 `default`)
|
||||
|
||||
在 ClickHouse 设置页配置:
|
||||
```bash
|
||||
sudo CH_USER='default' \
|
||||
CH_PASSWORD='YourStrongPassword' \
|
||||
CH_DATABASE='default' \
|
||||
./setup_clickhouse.sh
|
||||
```
|
||||
|
||||
- Host:ClickHouse 地址
|
||||
- Port:`8443`
|
||||
- Database:`default`
|
||||
- Scheme:`https`
|
||||
说明:
|
||||
- `CH_USER`/`CH_PASSWORD`:初始化日志表时用于连接 ClickHouse
|
||||
- 如果你使用自定义用户,把 `CH_USER` 改为你的用户名,并保证该用户已有对应数据库权限
|
||||
|
||||
当前实现说明:
|
||||
- 前端不再提供 `TLS跳过校验` 和 `TLS Server Name` 配置项。
|
||||
- 后端固定 `TLSSkipVerify=true`(默认不校验证书)。
|
||||
可选:单独应用运行参数(日志级别/系统日志表开关):
|
||||
|
||||
保存后点击“测试连接”。
|
||||
```bash
|
||||
sudo CH_LOG_LEVEL=warning ./setup_clickhouse.sh runtime
|
||||
```
|
||||
|
||||
## 6. Fluent Bit 配置方式
|
||||
## 3. ClickHouse 安装后关键目录
|
||||
|
||||
推荐平台托管模式(在线安装/升级 Node、DNS 时自动下发):
|
||||
- 配置目录:`/etc/clickhouse-server/`
|
||||
- 客户端配置目录:`/etc/clickhouse-client/`
|
||||
- 数据目录:`/var/lib/clickhouse/`
|
||||
- 日志目录:`/var/log/clickhouse-server/`
|
||||
- HTTPS 覆盖配置:`/etc/clickhouse-server/config.d/waf-https.xml`
|
||||
- 运行参数覆盖配置:`/etc/clickhouse-server/config.d/waf-runtime.xml`
|
||||
- HTTPS 证书和私钥:`/etc/clickhouse-server/server.crt`、`/etc/clickhouse-server/server.key`
|
||||
- 证书生成中间文件目录:`/etc/clickhouse-server/pki/`
|
||||
|
||||
- `/etc/fluent-bit/fluent-bit.conf`
|
||||
- `/etc/fluent-bit/.edge-managed.env`
|
||||
- `/etc/fluent-bit/.edge-managed.json`
|
||||
## 4. 管理平台配置(EdgeAdmin)
|
||||
|
||||
检查状态:
|
||||
页面路径:
|
||||
- 左侧菜单:`系统设置` -> `高级设置`
|
||||
- 顶部标签:`日志数据库(ClickHouse)`
|
||||
|
||||
表单填写:
|
||||
- `连接地址(Host)`:ClickHouse 地址(IP 或域名),如 `10.0.0.8` 或 `clickhouse.example.com`
|
||||
- `协议(Scheme)`:`https`
|
||||
- `端口(Port)`:`8443`
|
||||
- `用户名(User)`:`default`(或你自定义的用户名)
|
||||
- `密码(Password)`:对应用户密码
|
||||
- `数据库(Database)`:`default`(或你初始化日志表时使用的库名)
|
||||
|
||||
提交顺序:
|
||||
1. 点“测试连接”
|
||||
2. 连接成功后点“保存”
|
||||
|
||||
## 5. Fluent Bit(两种方式)
|
||||
|
||||
### 5.1 跟随节点在线自动安装(推荐)
|
||||
|
||||
说明:
|
||||
- Node / DNS 在线安装或升级时,平台会自动安装/升级 Fluent Bit 并下发配置。
|
||||
- 默认由平台托管,不需要逐台手改配置文件。
|
||||
|
||||
安装后所在节点关键文件:
|
||||
- `/etc/fluent-bit/fluent-bit.conf`:Fluent Bit 主配置(输入日志路径、输出 ClickHouse、性能参数)。
|
||||
- `/etc/fluent-bit/parsers.conf`:日志解析器定义(当前主要使用 JSON parser)。
|
||||
- `/etc/fluent-bit/.edge-managed.env`:平台下发的 ClickHouse 认证环境变量(`CH_USER`/`CH_PASSWORD`)。
|
||||
- `/etc/fluent-bit/.edge-managed.json`:平台下发的元数据(角色、配置哈希、版本、更新时间)。
|
||||
|
||||
|
||||
说明:
|
||||
- 在线安装时,节点上的 `/etc/fluent-bit/fluent-bit.conf` 会被平台下发覆盖。
|
||||
|
||||
fluent-bit中ClickHouse 账号密码下发与更新逻辑:
|
||||
- 下发来源:管理平台 -日志数据库(ClickHouse)中保存的账号密码。
|
||||
- 落地文件:平台在线安装或升级时写入节点 `/etc/fluent-bit/.edge-managed.env`,内容为 `CH_USER`、`CH_PASSWORD`。
|
||||
- 更新触发:当平台里的 ClickHouse 账号或密码变更后,需触发一次节点安装/升级任务以下发新凭证。
|
||||
|
||||
- 常见问题:只在 ClickHouse 侧改密码、未同步更新平台配置时,Fluent Bit 会出现认证失败(401/unauthorized)。
|
||||
|
||||
高配机器调优(当前默认按 4C8G 参数):
|
||||
- 当前默认参数:`Flush=1`、`storage.backlog.mem_limit=512MB`、`Mem_Buf_Limit=256MB`、`workers=2`。
|
||||
- 机器升配后优先调这 4 个参数:
|
||||
- `storage.backlog.mem_limit`:总缓冲上限(先增大,降低突发堆积丢日志风险)。
|
||||
- `Mem_Buf_Limit`:每个 tail input 的内存缓冲(HTTP 与 DNS 两段都要改)。
|
||||
- `workers`:输出并发写入线程数(HTTP 与 DNS 两段都要改)。
|
||||
- `Flush`:刷盘/发送间隔(值越小越实时,CPU/网络开销更高)。
|
||||
- 8C16G 参考值可按 `deploy/fluent-bit/fluent-bit-sample-8c16g.conf`:
|
||||
- `storage.backlog.mem_limit=1024MB`
|
||||
- `Mem_Buf_Limit=512MB`
|
||||
- `workers=4`
|
||||
- `Refresh_Interval=1`
|
||||
- 修改方法:
|
||||
1. 编辑 `EdgeAPI/internal/installers/fluent_bit.go` 的 `renderManagedConfig()`。
|
||||
2. 按上面参数同步修改 Node/DNS 两段 `[INPUT]` 和 `[OUTPUT]`。
|
||||
3. 重新发布 API 并触发节点安装/升级任务,下发新配置。
|
||||
|
||||
检查:
|
||||
|
||||
```bash
|
||||
sudo systemctl status fluent-bit --no-pager
|
||||
sudo cat /etc/fluent-bit/.edge-managed.json
|
||||
sudo journalctl -u fluent-bit -n 100 --no-pager
|
||||
```
|
||||
|
||||
## 7. 验证与排障
|
||||
### 5.2 手动安装(自动安装失败时)
|
||||
|
||||
查看 Fluent Bit 日志:
|
||||
说明:
|
||||
- 适合节点在线自动安装 Fluent Bit 失败的场景。
|
||||
- 采用在线安装方式,由你手动安装并维护配置。
|
||||
|
||||
步骤:
|
||||
|
||||
1. 在线安装 Fluent Bit。
|
||||
|
||||
Ubuntu 22.04:
|
||||
|
||||
```bash
|
||||
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
|
||||
sudo apt-get update -y
|
||||
sudo apt-get install -y fluent-bit
|
||||
```
|
||||
|
||||
AWS 2023:
|
||||
|
||||
```bash
|
||||
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
|
||||
sudo dnf makecache -y
|
||||
sudo dnf install -y fluent-bit
|
||||
```
|
||||
|
||||
2. 放置配置文件:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /etc/fluent-bit
|
||||
sudo cp /opt/waf-platform/deploy/fluent-bit/fluent-bit.conf /etc/fluent-bit/
|
||||
sudo cp /opt/waf-platform/deploy/fluent-bit/clickhouse-upstream.conf /etc/fluent-bit/
|
||||
sudo cp /opt/waf-platform/deploy/fluent-bit/parsers.conf /etc/fluent-bit/
|
||||
```
|
||||
|
||||
3. 修改 `/etc/fluent-bit/clickhouse-upstream.conf` 的 ClickHouse `Host`、`Port`(如 `8443`)。
|
||||
4. 配置认证环境变量(按需):
|
||||
|
||||
```bash
|
||||
sudo tee /etc/fluent-bit/fluent-bit.env >/dev/null <<'EOF'
|
||||
CH_USER=default
|
||||
CH_PASSWORD=YourStrongPassword
|
||||
EOF
|
||||
```
|
||||
|
||||
5. 让 systemd 读取环境变量:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /etc/systemd/system/fluent-bit.service.d
|
||||
sudo tee /etc/systemd/system/fluent-bit.service.d/override.conf >/dev/null <<'EOF'
|
||||
[Service]
|
||||
EnvironmentFile=/etc/fluent-bit/fluent-bit.env
|
||||
EOF
|
||||
```
|
||||
|
||||
6. 启动并检查:
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable fluent-bit
|
||||
sudo systemctl restart fluent-bit
|
||||
sudo systemctl status fluent-bit --no-pager
|
||||
sudo journalctl -u fluent-bit -n 100 --no-pager
|
||||
```
|
||||
|
||||
## 6. 验证
|
||||
|
||||
```bash
|
||||
sudo journalctl -u fluent-bit -f
|
||||
```
|
||||
|
||||
查看写入:
|
||||
|
||||
```sql
|
||||
SELECT count() FROM default.logs_ingest;
|
||||
SELECT count() FROM default.dns_logs_ingest;
|
||||
```
|
||||
|
||||
常见错误:
|
||||
- `connection refused`:8443 未监听或网络未放行。
|
||||
- `legacy Common Name`:证书缺 SAN,需重签。
|
||||
|
||||
@@ -46,18 +46,12 @@ CH_CERT_CN="${CH_CERT_CN:-$(hostname -f 2>/dev/null || hostname)}"
|
||||
CH_CERT_DNS="${CH_CERT_DNS:-}"
|
||||
CH_CERT_IP="${CH_CERT_IP:-}"
|
||||
CH_CERT_DAYS="${CH_CERT_DAYS:-825}"
|
||||
CH_GENERATE_CA="${CH_GENERATE_CA:-false}"
|
||||
|
||||
SRC_CERT="${SRC_CERT:-}"
|
||||
SRC_KEY="${SRC_KEY:-}"
|
||||
SRC_CA="${SRC_CA:-}"
|
||||
|
||||
CH_DIR="/etc/clickhouse-server"
|
||||
CH_CONFIG_D_DIR="${CH_DIR}/config.d"
|
||||
PKI_DIR="${CH_DIR}/pki"
|
||||
SERVER_CERT="${CH_DIR}/server.crt"
|
||||
SERVER_KEY="${CH_DIR}/server.key"
|
||||
CA_CERT="${CH_DIR}/ca.crt"
|
||||
OVERRIDE_FILE="${CH_CONFIG_D_DIR}/waf-https.xml"
|
||||
|
||||
mkdir -p "${CH_CONFIG_D_DIR}" "${PKI_DIR}"
|
||||
@@ -117,72 +111,13 @@ EOF
|
||||
|
||||
cp -f "${server_crt}" "${SERVER_CERT}"
|
||||
cp -f "${server_key}" "${SERVER_KEY}"
|
||||
rm -f "${CA_CERT}"
|
||||
}
|
||||
|
||||
generate_cert_with_ca() {
|
||||
echo "[INFO] generating local CA and server certificate ..."
|
||||
local ca_key="${PKI_DIR}/ca.key"
|
||||
local ca_crt="${PKI_DIR}/ca.crt"
|
||||
local server_key="${PKI_DIR}/server.key"
|
||||
local server_csr="${PKI_DIR}/server.csr"
|
||||
local server_crt="${PKI_DIR}/server.crt"
|
||||
local ext_file="${PKI_DIR}/server.ext"
|
||||
local san_line
|
||||
san_line="$(build_san_line)"
|
||||
|
||||
openssl genrsa -out "${ca_key}" 4096
|
||||
openssl req -x509 -new -nodes -key "${ca_key}" -sha256 -days 3650 \
|
||||
-out "${ca_crt}" -subj "/CN=ClickHouse Local CA"
|
||||
|
||||
openssl genrsa -out "${server_key}" 2048
|
||||
openssl req -new -key "${server_key}" -out "${server_csr}" -subj "/CN=${CH_CERT_CN}"
|
||||
|
||||
cat >"${ext_file}" <<EOF
|
||||
subjectAltName=${san_line}
|
||||
keyUsage=digitalSignature,keyEncipherment
|
||||
extendedKeyUsage=serverAuth
|
||||
EOF
|
||||
|
||||
openssl x509 -req -in "${server_csr}" -CA "${ca_crt}" -CAkey "${ca_key}" -CAcreateserial \
|
||||
-out "${server_crt}" -days "${CH_CERT_DAYS}" -sha256 -extfile "${ext_file}"
|
||||
|
||||
cp -f "${server_crt}" "${SERVER_CERT}"
|
||||
cp -f "${server_key}" "${SERVER_KEY}"
|
||||
cp -f "${ca_crt}" "${CA_CERT}"
|
||||
}
|
||||
|
||||
if [[ -n "${SRC_CERT}" || -n "${SRC_KEY}" ]]; then
|
||||
if [[ -z "${SRC_CERT}" || -z "${SRC_KEY}" ]]; then
|
||||
echo "[ERROR] SRC_CERT and SRC_KEY must be provided together"
|
||||
exit 1
|
||||
fi
|
||||
echo "[INFO] using provided certificate files ..."
|
||||
cp -f "${SRC_CERT}" "${SERVER_CERT}"
|
||||
cp -f "${SRC_KEY}" "${SERVER_KEY}"
|
||||
if [[ -n "${SRC_CA}" ]]; then
|
||||
cp -f "${SRC_CA}" "${CA_CERT}"
|
||||
else
|
||||
rm -f "${CA_CERT}"
|
||||
fi
|
||||
else
|
||||
case "$(echo "${CH_GENERATE_CA}" | tr '[:upper:]' '[:lower:]')" in
|
||||
1|true|yes|on)
|
||||
generate_cert_with_ca
|
||||
;;
|
||||
*)
|
||||
generate_self_signed_cert
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
generate_self_signed_cert
|
||||
|
||||
chown clickhouse:clickhouse "${SERVER_CERT}" "${SERVER_KEY}" || true
|
||||
chmod 0644 "${SERVER_CERT}"
|
||||
chmod 0640 "${SERVER_KEY}"
|
||||
if [[ -f "${CA_CERT}" ]]; then
|
||||
chown clickhouse:clickhouse "${CA_CERT}" || true
|
||||
chmod 0644 "${CA_CERT}"
|
||||
fi
|
||||
|
||||
echo "[INFO] writing ClickHouse HTTPS override config ..."
|
||||
cat >"${OVERRIDE_FILE}" <<EOF
|
||||
@@ -221,7 +156,3 @@ echo "[OK] ClickHouse HTTPS setup finished"
|
||||
echo " HTTPS port : ${CH_HTTPS_PORT}"
|
||||
echo " cert file : ${SERVER_CERT}"
|
||||
echo " key file : ${SERVER_KEY}"
|
||||
if [[ -f "${CA_CERT}" ]]; then
|
||||
echo " CA file : ${CA_CERT}"
|
||||
echo " import this CA file into API/Fluent Bit hosts if tls.verify=On"
|
||||
fi
|
||||
|
||||
50
deploy/clickhouse/configure_clickhouse_runtime.sh
Normal file
50
deploy/clickhouse/configure_clickhouse_runtime.sh
Normal file
@@ -0,0 +1,50 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
if [[ "${EUID}" -ne 0 ]]; then
|
||||
echo "[ERROR] please run as root"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
CH_LOG_LEVEL="${CH_LOG_LEVEL:-warning}"
|
||||
CH_DIR="/etc/clickhouse-server"
|
||||
CH_CONFIG_D_DIR="${CH_DIR}/config.d"
|
||||
OVERRIDE_FILE="${CH_CONFIG_D_DIR}/waf-runtime.xml"
|
||||
|
||||
case "${CH_LOG_LEVEL}" in
|
||||
none|fatal|critical|error|warning|notice|information|debug|trace|test)
|
||||
;;
|
||||
*)
|
||||
echo "[ERROR] invalid CH_LOG_LEVEL: ${CH_LOG_LEVEL}"
|
||||
echo " allowed: none,fatal,critical,error,warning,notice,information,debug,trace,test"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
mkdir -p "${CH_CONFIG_D_DIR}"
|
||||
|
||||
echo "[INFO] writing ClickHouse runtime override config ..."
|
||||
cat >"${OVERRIDE_FILE}" <<EOF
|
||||
<clickhouse>
|
||||
<logger>
|
||||
<level>${CH_LOG_LEVEL}</level>
|
||||
</logger>
|
||||
|
||||
<text_log remove="1"/>
|
||||
<part_log remove="1"/>
|
||||
<metric_log remove="1"/>
|
||||
<asynchronous_metric_log remove="1"/>
|
||||
<trace_log remove="1"/>
|
||||
</clickhouse>
|
||||
EOF
|
||||
|
||||
echo "[INFO] restarting clickhouse-server ..."
|
||||
systemctl restart clickhouse-server
|
||||
sleep 2
|
||||
|
||||
echo "[INFO] service status ..."
|
||||
systemctl --no-pager -l status clickhouse-server | sed -n '1,15p'
|
||||
|
||||
echo "[OK] ClickHouse runtime config applied"
|
||||
echo " file : ${OVERRIDE_FILE}"
|
||||
echo " logger level: ${CH_LOG_LEVEL}"
|
||||
@@ -1,123 +0,0 @@
|
||||
-- =============================================================================
|
||||
-- ClickHouse logs_ingest 表优化脚本
|
||||
--
|
||||
-- 说明:
|
||||
-- - 所有 ALTER 操作均为在线操作,无需停服
|
||||
-- - 建议按阶段顺序执行,每阶段执行后观察 system.parts 确认生效
|
||||
-- - 压缩编解码器变更仅影响新写入的 part,存量数据需等 merge 或手动 OPTIMIZE
|
||||
--
|
||||
-- 执行方式:
|
||||
-- clickhouse-client --host 127.0.0.1 --port 9000 --user default --password 'xxx' < optimize_schema.sql
|
||||
-- =============================================================================
|
||||
|
||||
-- =============================================
|
||||
-- 阶段 1:大字段压缩优化(效果最显著)
|
||||
-- =============================================
|
||||
|
||||
-- 大文本字段改用 ZSTD(3),对 JSON / HTTP 文本压缩率远优于默认 LZ4
|
||||
-- 预期效果:磁盘占用减少 40%-60%
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN request_headers String CODEC(ZSTD(3));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN request_body String CODEC(ZSTD(3));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN response_headers String CODEC(ZSTD(3));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN response_body String CODEC(ZSTD(3));
|
||||
|
||||
-- 中等长度文本字段用 ZSTD(1),平衡压缩率与 CPU 开销
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN ua String CODEC(ZSTD(1));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN path String CODEC(ZSTD(1));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN referer String CODEC(ZSTD(1));
|
||||
|
||||
-- 低基数字段改用 LowCardinality(内存+磁盘双降)
|
||||
-- method 的基数极低(GET/POST/PUT/DELETE 等),host 基数取决于站点数量
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN method LowCardinality(String);
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN log_type LowCardinality(String);
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN host LowCardinality(String);
|
||||
|
||||
-- 数值字段使用 Delta + ZSTD 编码(利用相邻行的时间/大小相关性)
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN bytes_in UInt64 CODEC(Delta, ZSTD(1));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN bytes_out UInt64 CODEC(Delta, ZSTD(1));
|
||||
ALTER TABLE logs_ingest MODIFY COLUMN cost_ms UInt32 CODEC(Delta, ZSTD(1));
|
||||
|
||||
-- =============================================
|
||||
-- 阶段 2:添加 Skipping Index(加速高频过滤查询)
|
||||
-- =============================================
|
||||
|
||||
-- trace_id 精确查找(查看日志详情 FindByTraceId)
|
||||
-- bloom_filter(0.01) = 1% 误判率,GRANULARITY 4 = 每 4 个 granule 一个 bloom block
|
||||
ALTER TABLE logs_ingest ADD INDEX IF NOT EXISTS idx_trace_id trace_id TYPE bloom_filter(0.01) GRANULARITY 4;
|
||||
|
||||
-- IP 精确查找
|
||||
ALTER TABLE logs_ingest ADD INDEX IF NOT EXISTS idx_ip ip TYPE bloom_filter(0.01) GRANULARITY 4;
|
||||
|
||||
-- host 模糊查询支持(tokenbf_v1 对 LIKE '%xxx%' 有效)
|
||||
ALTER TABLE logs_ingest ADD INDEX IF NOT EXISTS idx_host host TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4;
|
||||
|
||||
-- firewall_policy_id 过滤(HasFirewallPolicy: > 0)
|
||||
ALTER TABLE logs_ingest ADD INDEX IF NOT EXISTS idx_fw_policy firewall_policy_id TYPE minmax GRANULARITY 4;
|
||||
|
||||
-- status 范围过滤(HasError: status >= 400)
|
||||
ALTER TABLE logs_ingest ADD INDEX IF NOT EXISTS idx_status status TYPE minmax GRANULARITY 4;
|
||||
|
||||
-- =============================================
|
||||
-- 阶段 3:物化索引到现有数据(对存量数据生效)
|
||||
-- =============================================
|
||||
-- 注意:MATERIALIZE INDEX 会触发后台 mutation,大表可能需要一定时间
|
||||
-- 可通过 SELECT * FROM system.mutations WHERE is_done = 0 监控进度
|
||||
|
||||
ALTER TABLE logs_ingest MATERIALIZE INDEX idx_trace_id;
|
||||
ALTER TABLE logs_ingest MATERIALIZE INDEX idx_ip;
|
||||
ALTER TABLE logs_ingest MATERIALIZE INDEX idx_host;
|
||||
ALTER TABLE logs_ingest MATERIALIZE INDEX idx_fw_policy;
|
||||
ALTER TABLE logs_ingest MATERIALIZE INDEX idx_status;
|
||||
|
||||
|
||||
-- =============================================================================
|
||||
-- dns_logs_ingest 表优化(DNS 日志表)
|
||||
-- =============================================================================
|
||||
|
||||
-- 大文本字段压缩
|
||||
ALTER TABLE dns_logs_ingest MODIFY COLUMN content_json String CODEC(ZSTD(3));
|
||||
ALTER TABLE dns_logs_ingest MODIFY COLUMN error String CODEC(ZSTD(1));
|
||||
|
||||
-- 低基数字段
|
||||
ALTER TABLE dns_logs_ingest MODIFY COLUMN question_type LowCardinality(String);
|
||||
ALTER TABLE dns_logs_ingest MODIFY COLUMN record_type LowCardinality(String);
|
||||
ALTER TABLE dns_logs_ingest MODIFY COLUMN networking LowCardinality(String);
|
||||
|
||||
-- request_id 精确查找
|
||||
ALTER TABLE dns_logs_ingest ADD INDEX IF NOT EXISTS idx_request_id request_id TYPE bloom_filter(0.01) GRANULARITY 4;
|
||||
|
||||
-- remote_addr 精确查找
|
||||
ALTER TABLE dns_logs_ingest ADD INDEX IF NOT EXISTS idx_remote_addr remote_addr TYPE bloom_filter(0.01) GRANULARITY 4;
|
||||
|
||||
-- question_name 模糊查询
|
||||
ALTER TABLE dns_logs_ingest ADD INDEX IF NOT EXISTS idx_question_name question_name TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4;
|
||||
|
||||
-- domain_id 过滤
|
||||
ALTER TABLE dns_logs_ingest ADD INDEX IF NOT EXISTS idx_domain_id domain_id TYPE minmax GRANULARITY 4;
|
||||
|
||||
-- 物化索引到现有数据
|
||||
ALTER TABLE dns_logs_ingest MATERIALIZE INDEX idx_request_id;
|
||||
ALTER TABLE dns_logs_ingest MATERIALIZE INDEX idx_remote_addr;
|
||||
ALTER TABLE dns_logs_ingest MATERIALIZE INDEX idx_question_name;
|
||||
ALTER TABLE dns_logs_ingest MATERIALIZE INDEX idx_domain_id;
|
||||
|
||||
|
||||
-- =============================================================================
|
||||
-- 验证命令(执行完上述 ALTER 后运行)
|
||||
-- =============================================================================
|
||||
|
||||
-- 查看列的压缩编解码器
|
||||
-- SELECT name, type, compression_codec FROM system.columns WHERE table = 'logs_ingest' AND database = currentDatabase();
|
||||
|
||||
-- 查看表的压缩率
|
||||
-- SELECT table, formatReadableSize(sum(data_compressed_bytes)) AS compressed, formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed, round(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS ratio FROM system.columns WHERE table IN ('logs_ingest', 'dns_logs_ingest') GROUP BY table;
|
||||
|
||||
-- 查看各列占用的磁盘空间(找出最大的列)
|
||||
-- SELECT name, formatReadableSize(sum(data_compressed_bytes)) AS compressed, formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed FROM system.columns WHERE table = 'logs_ingest' GROUP BY name ORDER BY sum(data_compressed_bytes) DESC;
|
||||
|
||||
-- 查看 mutation 进度
|
||||
-- SELECT database, table, mutation_id, command, is_done, parts_to_do FROM system.mutations WHERE is_done = 0;
|
||||
|
||||
-- 强制触发 merge(可选,让压缩编解码器变更对存量数据生效)
|
||||
-- OPTIMIZE TABLE logs_ingest FINAL;
|
||||
-- OPTIMIZE TABLE dns_logs_ingest FINAL;
|
||||
108
deploy/clickhouse/setup_clickhouse.sh
Normal file
108
deploy/clickhouse/setup_clickhouse.sh
Normal file
@@ -0,0 +1,108 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
INSTALL_SCRIPT="${SCRIPT_DIR}/install_clickhouse_linux.sh"
|
||||
HTTPS_SCRIPT="${SCRIPT_DIR}/configure_clickhouse_https.sh"
|
||||
RUNTIME_SCRIPT="${SCRIPT_DIR}/configure_clickhouse_runtime.sh"
|
||||
TABLES_SCRIPT="${SCRIPT_DIR}/init_waf_logs_tables.sh"
|
||||
|
||||
usage() {
|
||||
cat <<'EOF'
|
||||
Usage:
|
||||
sudo ./setup_clickhouse.sh [all|install|https|runtime|tables]
|
||||
|
||||
Modes:
|
||||
all Install ClickHouse, configure HTTPS, apply runtime config, init ingest tables (default)
|
||||
install Install ClickHouse only
|
||||
https Configure HTTPS only
|
||||
runtime Apply ClickHouse runtime config only
|
||||
tables Initialize ingest tables only
|
||||
|
||||
Common env vars:
|
||||
CLICKHOUSE_DEFAULT_PASSWORD Default user password set during install
|
||||
CH_HTTPS_PORT HTTPS port (default: 8443)
|
||||
CH_CERT_CN Certificate CN
|
||||
CH_CERT_DNS Certificate SAN DNS list (comma-separated)
|
||||
CH_CERT_IP Certificate SAN IP list (comma-separated)
|
||||
CH_CERT_DAYS Certificate validity days (default: 825)
|
||||
CH_LOG_LEVEL ClickHouse logger level (default: warning)
|
||||
CH_HOST ClickHouse host for table init (default: 127.0.0.1)
|
||||
CH_PORT ClickHouse port for table init (default: 9000)
|
||||
CH_USER ClickHouse user for table init (default: default)
|
||||
CH_PASSWORD ClickHouse password for table init
|
||||
CH_DATABASE Database for table init (default: default)
|
||||
EOF
|
||||
}
|
||||
|
||||
require_script() {
|
||||
local script="$1"
|
||||
if [[ ! -f "${script}" ]]; then
|
||||
echo "[ERROR] required file not found: ${script}"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
run_install() {
|
||||
echo "[INFO] step 1/3: install ClickHouse ..."
|
||||
bash "${INSTALL_SCRIPT}"
|
||||
}
|
||||
|
||||
run_https() {
|
||||
echo "[INFO] step 2/3: configure ClickHouse HTTPS ..."
|
||||
bash "${HTTPS_SCRIPT}"
|
||||
}
|
||||
|
||||
run_runtime() {
|
||||
echo "[INFO] step 3/4: apply ClickHouse runtime config ..."
|
||||
bash "${RUNTIME_SCRIPT}"
|
||||
}
|
||||
|
||||
run_tables() {
|
||||
echo "[INFO] step 4/4: initialize ingest tables ..."
|
||||
bash "${TABLES_SCRIPT}"
|
||||
}
|
||||
|
||||
MODE="${1:-all}"
|
||||
|
||||
case "${MODE}" in
|
||||
-h|--help|help)
|
||||
usage
|
||||
exit 0
|
||||
;;
|
||||
all|install|https|runtime|tables)
|
||||
;;
|
||||
*)
|
||||
echo "[ERROR] invalid mode: ${MODE}"
|
||||
usage
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
require_script "${INSTALL_SCRIPT}"
|
||||
require_script "${HTTPS_SCRIPT}"
|
||||
require_script "${RUNTIME_SCRIPT}"
|
||||
require_script "${TABLES_SCRIPT}"
|
||||
|
||||
case "${MODE}" in
|
||||
all)
|
||||
run_install
|
||||
run_https
|
||||
run_runtime
|
||||
run_tables
|
||||
;;
|
||||
install)
|
||||
run_install
|
||||
;;
|
||||
https)
|
||||
run_https
|
||||
;;
|
||||
runtime)
|
||||
run_runtime
|
||||
;;
|
||||
tables)
|
||||
run_tables
|
||||
;;
|
||||
esac
|
||||
|
||||
echo "[OK] setup completed: mode=${MODE}"
|
||||
2
deploy/fluent-bit/.gitignore
vendored
2
deploy/fluent-bit/.gitignore
vendored
@@ -1,2 +0,0 @@
|
||||
fluent-bit-windows.conf
|
||||
clickhouse-upstream-windows.conf
|
||||
@@ -1,471 +0,0 @@
|
||||
# 边缘节点日志链路部署(Fluent Bit + ClickHouse)
|
||||
|
||||
与 [日志链路调整方案](../../log-pipeline-migration-plan.md) 配套的配置与部署说明。本文档为 **Fluent Bit 部署教程**,按步骤即可在边缘节点或日志采集机上跑通采集 → ClickHouse 写入。
|
||||
|
||||
---
|
||||
|
||||
## Fluent Bit 跑在哪台机器上?
|
||||
|
||||
**Fluent Bit 应部署在写日志文件的节点机器上**(EdgeNode / EdgeDNS 同机),不要部署在 EdgeAPI 机器上。
|
||||
|
||||
- HTTP 日志文件默认在 `/var/log/edge/edge-node/*.log`,由 **EdgeNode** 本机写入;若配置了公用访问日志策略的文件 `path`,节点会优先复用该 `path` 所在目录。
|
||||
- DNS 日志文件默认在 `/var/log/edge/edge-dns/*.log`,由 **EdgeDNS** 本机写入;若配置了公用访问日志策略的文件 `path`,节点会优先复用该 `path` 所在目录。
|
||||
- Fluent Bit 使用 **tail** 读取本机路径,因此必须运行在这些日志文件所在机器上。
|
||||
- EdgeAPI 机器主要负责查询 ClickHouse/MySQL,不需要承担日志采集。
|
||||
- 多机部署时,每台写日志节点都跑一份 Fluent Bit,上报到同一 ClickHouse 集群。
|
||||
|
||||
---
|
||||
|
||||
## 一、前置条件
|
||||
|
||||
- **边缘节点(EdgeNode)** 已开启本地日志落盘,目录优先取“公用访问日志策略”的文件 `path`(取目录),为空时回退 `EDGE_LOG_DIR`,再回退默认 `/var/log/edge/edge-node`;生成 `access.log`、`waf.log`、`error.log`(JSON Lines)。
|
||||
- **DNS 节点(EdgeDNS)** 已开启本地日志落盘,目录优先取“公用访问日志策略”的文件 `path`(取目录),为空时回退 `EDGE_DNS_LOG_DIR`,再回退默认 `/var/log/edge/edge-dns`;生成 `access.log`(JSON Lines)。
|
||||
- **ClickHouse** 已安装并可访问(单机或集群),且已创建好 `logs_ingest` 表(见下文「五、ClickHouse 建表」)。
|
||||
- 若 Fluent Bit 与 ClickHouse 不在同一台机,需保证网络可达(默认 HTTPS 端口 8443)。
|
||||
- 日志轮转默认由 Node/DNS 内建 `lumberjack` 执行:
|
||||
- `maxSizeMB=256`
|
||||
- `maxBackups=14`
|
||||
- `maxAgeDays=7`
|
||||
- `compress=false`
|
||||
- `localTime=true`
|
||||
可通过公用日志策略 `file.rotate` 调整。
|
||||
|
||||
---
|
||||
|
||||
## 二、安装 Fluent Bit
|
||||
|
||||
### 2.1 Ubuntu / Debian
|
||||
|
||||
```bash
|
||||
# 添加 Fluent Bit 官方源并安装(以 Ubuntu 22.04 为例)
|
||||
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
|
||||
sudo apt-get install -y fluent-bit
|
||||
|
||||
# 或使用 TD Agent Bit 源(若需 ClickHouse 等扩展)
|
||||
# 见:https://docs.fluentbit.io/manual/installation/linux/ubuntu
|
||||
```
|
||||
|
||||
### 2.2 CentOS / RHEL / Amazon Linux
|
||||
|
||||
```bash
|
||||
# 使用官方 install 脚本
|
||||
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
|
||||
|
||||
# 或 yum/dnf 安装(以提供的仓库为准)
|
||||
# sudo yum install -y fluent-bit
|
||||
```
|
||||
|
||||
### 2.3 使用二进制包
|
||||
|
||||
从 [Fluent Bit 官方 Release](https://github.com/fluent/fluent-bit/releases) 下载对应架构的 tarball,解压后将 `bin/fluent-bit` 放到 PATH,并确保其 **Output 插件支持 ClickHouse**(部分发行版或自编译需启用 `out_clickhouse`)。
|
||||
|
||||
---
|
||||
|
||||
## 三、部署配置文件
|
||||
|
||||
### 3.1 放置配置
|
||||
|
||||
将本目录下配置文件放到同一目录,例如 `/etc/fluent-bit/` 或 `/opt/edge/fluent-bit/`:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /etc/fluent-bit
|
||||
sudo cp fluent-bit.conf clickhouse-upstream.conf /etc/fluent-bit/
|
||||
```
|
||||
|
||||
两文件需在同一目录,因 `fluent-bit.conf` 中有 `@INCLUDE clickhouse-upstream.conf`。
|
||||
|
||||
### 3.2 修改 ClickHouse 地址(必做)
|
||||
|
||||
编辑 `clickhouse-upstream.conf`,按实际环境填写 ClickHouse 的 Host/Port:
|
||||
|
||||
- **单机**:保留一个 `[NODE]`,改 `Host`、`Port`(默认 8443)。
|
||||
- **集群**:复制多段 `[NODE]`,每段一个节点,例如:
|
||||
|
||||
```ini
|
||||
[UPSTREAM]
|
||||
Name ch_backends
|
||||
|
||||
[NODE]
|
||||
Name node-01
|
||||
Host 192.168.1.10
|
||||
Port 8443
|
||||
|
||||
[NODE]
|
||||
Name node-02
|
||||
Host 192.168.1.11
|
||||
Port 8443
|
||||
```
|
||||
|
||||
### 3.3 ClickHouse 账号密码(有密码时必做)
|
||||
|
||||
不在 `clickhouse-upstream.conf` 里配置密码,而是通过 **环境变量** 传给 Fluent Bit:
|
||||
|
||||
- `CH_USER`:ClickHouse 用户名(如 `default`)。
|
||||
- `CH_PASSWORD`:对应用户的密码。
|
||||
|
||||
在 systemd 或启动脚本中设置(见下文「四、以 systemd 方式运行」)。
|
||||
|
||||
### 3.4 日志路径与 parsers.conf
|
||||
|
||||
- **日志路径**:`fluent-bit.conf` 里已同时配置 HTTP 与 DNS 两类路径:
|
||||
- HTTP:`/var/log/edge/edge-node/*.log`
|
||||
- DNS:`/var/log/edge/edge-dns/*.log`
|
||||
若你配置了公用访问日志策略的文件 `path`,或改了 `EDGE_LOG_DIR` / `EDGE_DNS_LOG_DIR`,请同步修改对应 `Path`。
|
||||
- **Parsers_File**:主配置引用了 `parsers.conf`。若安装包自带(如 `/etc/fluent-bit/parsers.conf`),无需改动;若启动报错找不到文件,可:
|
||||
- 从 Fluent Bit 官方仓库复制 [conf/parsers.conf](https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf) 到同一目录,或
|
||||
- 在同一目录新建空文件 `parsers.conf`(仅当不使用任何 parser 时)。
|
||||
|
||||
### 3.5 数据与状态目录
|
||||
|
||||
Fluent Bit 会使用配置里的 `storage.path` 和 DB 路径,需保证进程有写权限:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /var/lib/fluent-bit/storage
|
||||
sudo chown -R <运行 fluent-bit 的用户>:<同组> /var/lib/fluent-bit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 四、以 systemd 方式运行
|
||||
|
||||
### 4.1 使用自带服务(若安装包已提供)
|
||||
|
||||
若通过 apt/yum 安装,通常已有 `fluent-bit.service`。先改配置路径和环境变量:
|
||||
|
||||
```bash
|
||||
# 编辑服务文件(路径以实际为准,如 /lib/systemd/system/fluent-bit.service)
|
||||
sudo systemctl edit fluent-bit --full
|
||||
```
|
||||
|
||||
在 `[Service]` 中增加或修改:
|
||||
|
||||
- `EnvironmentFile` 指向你的环境变量文件,或直接写:
|
||||
- `Environment="CH_USER=default"`
|
||||
- `Environment="CH_PASSWORD=你的密码"`
|
||||
- `ExecStart` 中的配置文件路径改为你的 `fluent-bit.conf`,例如:
|
||||
- `ExecStart=/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf`
|
||||
|
||||
然后:
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable fluent-bit
|
||||
sudo systemctl start fluent-bit
|
||||
sudo systemctl status fluent-bit
|
||||
```
|
||||
|
||||
### 4.2 自定义 systemd 单元(无自带服务时)
|
||||
|
||||
新建 `/etc/systemd/system/fluent-bit-edge.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Fluent Bit - Edge Node Logs to ClickHouse
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=/usr/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.conf
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
# ClickHouse 认证(按需修改)
|
||||
Environment="CH_USER=default"
|
||||
Environment="CH_PASSWORD=your_clickhouse_password"
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
若密码含特殊字符,建议用 `EnvironmentFile=/etc/fluent-bit/fluent-bit.env`,并在该文件中写:
|
||||
|
||||
```bash
|
||||
CH_USER=default
|
||||
CH_PASSWORD=your_clickhouse_password
|
||||
```
|
||||
|
||||
然后:
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable fluent-bit-edge
|
||||
sudo systemctl start fluent-bit-edge
|
||||
sudo systemctl status fluent-bit-edge
|
||||
```
|
||||
|
||||
### 4.3 前台调试
|
||||
|
||||
不依赖 systemd 时可直接前台跑(便于看日志):
|
||||
|
||||
```bash
|
||||
export CH_USER=default
|
||||
export CH_PASSWORD=your_clickhouse_password
|
||||
fluent-bit -c /etc/fluent-bit/fluent-bit.conf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 五、ClickHouse 建表
|
||||
|
||||
平台(EdgeAPI)会查询两张表:
|
||||
- HTTP:`logs_ingest`
|
||||
- DNS:`dns_logs_ingest`
|
||||
|
||||
需在 ClickHouse 中先建表。库名默认为 `default`,若使用其它库,需与 EdgeAPI 的 `CLICKHOUSE_DATABASE` 一致。
|
||||
|
||||
在 ClickHouse 中执行(按需改库名或引擎):
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS default.logs_ingest
|
||||
(
|
||||
timestamp DateTime,
|
||||
node_id UInt64,
|
||||
cluster_id UInt64,
|
||||
server_id UInt64,
|
||||
host String,
|
||||
ip String,
|
||||
method String,
|
||||
path String,
|
||||
status UInt16,
|
||||
bytes_in UInt64,
|
||||
bytes_out UInt64,
|
||||
cost_ms UInt32,
|
||||
ua String,
|
||||
referer String,
|
||||
log_type String,
|
||||
trace_id String,
|
||||
firewall_policy_id UInt64 DEFAULT 0,
|
||||
firewall_rule_group_id UInt64 DEFAULT 0,
|
||||
firewall_rule_set_id UInt64 DEFAULT 0,
|
||||
firewall_rule_id UInt64 DEFAULT 0,
|
||||
request_headers String DEFAULT '',
|
||||
request_body String DEFAULT '',
|
||||
response_headers String DEFAULT '',
|
||||
response_body String DEFAULT ''
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY (timestamp, node_id, server_id, trace_id)
|
||||
SETTINGS index_granularity = 8192;
|
||||
```
|
||||
|
||||
DNS 日志建表:
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS default.dns_logs_ingest
|
||||
(
|
||||
timestamp DateTime,
|
||||
request_id String,
|
||||
node_id UInt64,
|
||||
cluster_id UInt64,
|
||||
domain_id UInt64,
|
||||
record_id UInt64,
|
||||
remote_addr String,
|
||||
question_name String,
|
||||
question_type String,
|
||||
record_name String,
|
||||
record_type String,
|
||||
record_value String,
|
||||
networking String,
|
||||
is_recursive UInt8,
|
||||
error String,
|
||||
ns_route_codes Array(String),
|
||||
content_json String DEFAULT ''
|
||||
)
|
||||
ENGINE = MergeTree()
|
||||
ORDER BY (timestamp, request_id, node_id)
|
||||
SETTINGS index_granularity = 8192;
|
||||
```
|
||||
|
||||
- **log_type**:`access` / `waf` / `error`;攻击日志同时看 **firewall_rule_id** 或 **firewall_policy_id** 是否大于 0(与原有 MySQL 通过规则 ID 判断攻击日志一致)。
|
||||
- **request_headers / response_headers**:JSON 字符串;**request_body / response_body**:请求/响应体(单条建议限制长度,如 512KB)。
|
||||
- **request_body 为空**:需在管理端为该站点/服务的「访问日志」策略中勾选「请求Body」后才会落盘;默认未勾选。路径大致为:站点/服务 → 访问日志 → 策略 → 记录字段 → 勾选「请求Body」。WAF 拦截且策略开启「记录请求Body」时也会记录。
|
||||
- **response_body 为空**:当前版本未实现(proto 与节点均未支持响应体落盘),表中已预留字段,后续可扩展。
|
||||
- **原有 MySQL 日志同步到 ClickHouse**:见 [mysql-to-clickhouse-migration.md](mysql-to-clickhouse-migration.md)。
|
||||
|
||||
若表已存在且缺少新字段,可执行:
|
||||
|
||||
```sql
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS firewall_policy_id UInt64 DEFAULT 0;
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS firewall_rule_group_id UInt64 DEFAULT 0;
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS firewall_rule_set_id UInt64 DEFAULT 0;
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS firewall_rule_id UInt64 DEFAULT 0;
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS request_headers String DEFAULT '';
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS request_body String DEFAULT '';
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS response_headers String DEFAULT '';
|
||||
ALTER TABLE default.logs_ingest ADD COLUMN IF NOT EXISTS response_body String DEFAULT '';
|
||||
ALTER TABLE default.dns_logs_ingest ADD COLUMN IF NOT EXISTS content_json String DEFAULT '';
|
||||
```
|
||||
|
||||
Fluent Bit 写入时使用 `json_date_key timestamp` 和 `json_date_format epoch`,会将 JSON 中的 `timestamp`(Unix 秒)转为 DateTime。
|
||||
|
||||
---
|
||||
|
||||
## 六、验证与排错
|
||||
|
||||
1. **看 Fluent Bit 日志**
|
||||
- systemd:`journalctl -u fluent-bit-edge -f`(或你的服务名)
|
||||
- 前台:直接看终端输出。
|
||||
|
||||
2. **看 ClickHouse 是否有数据**
|
||||
```sql
|
||||
SELECT count() FROM default.logs_ingest;
|
||||
SELECT * FROM default.logs_ingest LIMIT 5;
|
||||
SELECT count() FROM default.dns_logs_ingest;
|
||||
SELECT * FROM default.dns_logs_ingest LIMIT 5;
|
||||
```
|
||||
|
||||
3. **常见问题**
|
||||
- **连接被拒**:检查 `clickhouse-upstream.conf` 的 Host/Port、防火墙、ClickHouse 的 `listen_host`。
|
||||
- **认证失败**:检查 `CH_USER`、`CH_PASSWORD` 是否与 ClickHouse 用户一致,环境变量是否被 systemd 正确加载。
|
||||
- **找不到 parsers.conf**:见上文 3.4。
|
||||
- **没有新数据**:确认 EdgeNode/EdgeDNS 已写日志到 `Path` 下,且 Fluent Bit 对目录有读权限;可分别执行 `tail -f /var/log/edge/edge-node/access.log` 与 `tail -f /var/log/edge/edge-dns/access.log`。
|
||||
- **Node 上没有 `/var/log/edge/edge-node/access.log`**:见下文「八、Node 上找不到日志文件」。
|
||||
|
||||
---
|
||||
|
||||
## 七、与其它组件的关系(简要)
|
||||
|
||||
| 组件 | 说明 |
|
||||
|------|------|
|
||||
| **EdgeNode** | 日志落盘路径优先复用公用访问日志策略文件 `path`(取目录);若为空回退 `EDGE_LOG_DIR`,再回退默认 `/var/log/edge/edge-node`;生成 `access.log`、`waf.log`、`error.log`;内建 lumberjack 轮转(默认 256MB/14份/7天,可按策略调整),仍支持 SIGHUP 重建 writer。 |
|
||||
| **EdgeDNS** | DNS 访问日志落盘路径优先复用公用访问日志策略文件 `path`(取目录);若为空回退 `EDGE_DNS_LOG_DIR`,再回退默认 `/var/log/edge/edge-dns`;生成 `access.log`(JSON Lines),由 Fluent Bit 采集写入 `dns_logs_ingest`。 |
|
||||
| **logrotate** | 可选的历史兼容方案(已非必需);默认建议使用节点内建 lumberjack 轮转。 |
|
||||
| **平台(EdgeAPI)** | 配置 ClickHouse 只读连接(`CLICKHOUSE_HOST`、`CLICKHOUSE_PORT`、`CLICKHOUSE_USER`、`CLICKHOUSE_PASSWORD`、`CLICKHOUSE_DATABASE`);当请求带 `Day` 且已配置 ClickHouse 时,访问日志列表查询走 ClickHouse。 |
|
||||
|
||||
---
|
||||
|
||||
## 八、Node 上找不到日志文件
|
||||
|
||||
若在 EdgeNode 机器上执行 `tail -f /var/log/edge/edge-node/access.log` 报 **No such file or directory**,按下面检查:
|
||||
|
||||
1. **EdgeNode 版本**
|
||||
本地日志落盘是较新功能,需使用**包含该功能的 EdgeNode 构建**(当前仓库版本在首次加载配置时会预创建目录和三个空日志文件)。
|
||||
|
||||
2. **预创建目录(可选)**
|
||||
若进程以非 root 运行,可先手动建目录并赋权,避免无权限创建 `/var/log/edge`:
|
||||
```bash
|
||||
sudo mkdir -p /var/log/edge/edge-node
|
||||
sudo chown <运行 edge-node 的用户>:<同组> /var/log/edge/edge-node
|
||||
```
|
||||
|
||||
3. **重启 EdgeNode**
|
||||
新版本在**首次成功加载节点配置后**会调用 `EnsureInit()`,自动创建 `/var/log/edge/edge-node` 及 `access.log`、`waf.log`、`error.log`。重启一次 edge-node 后再看目录下是否已有文件。
|
||||
|
||||
4. **自定义路径**
|
||||
若在管理端设置了公用访问日志策略的文件 `path`,节点会优先使用该目录;否则才使用 `EDGE_LOG_DIR`。Fluent Bit 的 `Path` 需与实际目录一致。
|
||||
|
||||
以上完成即完成 Fluent Bit 的部署与验证。
|
||||
|
||||
---
|
||||
|
||||
## 九、HTTPS 模式(ClickHouse)
|
||||
|
||||
当 ClickHouse 只开放 HTTPS(如 8443)或链路必须加密时,使用本目录新增模板:
|
||||
|
||||
- `fluent-bit-https.conf`:Node+DNS 同机采集(HTTP+DNS 双输入)
|
||||
- `fluent-bit-dns-https.conf`:仅 DNS 节点采集
|
||||
- `fluent-bit-windows-https.conf`:Windows 节点 HTTPS 采集
|
||||
|
||||
### 9.1 什么时候用 HTTPS 模板
|
||||
|
||||
- ClickHouse 仅开放 HTTPS 端口;
|
||||
- 节点到 ClickHouse 跨公网或需要传输加密;
|
||||
- 你希望启用证书校验和 SNI。
|
||||
|
||||
### 9.2 最小切换步骤(Linux)
|
||||
|
||||
1. 备份当前配置:
|
||||
```bash
|
||||
sudo cp /etc/fluent-bit/fluent-bit.conf /etc/fluent-bit/fluent-bit.conf.bak
|
||||
```
|
||||
|
||||
2. 切换为 HTTPS 模板(Node+DNS 同机示例):
|
||||
```bash
|
||||
sudo cp /path/to/fluent-bit-https.conf /etc/fluent-bit/fluent-bit.conf
|
||||
```
|
||||
|
||||
3. 设置账号密码(按你的服务文件方式设置):
|
||||
```bash
|
||||
export CH_USER=default
|
||||
export CH_PASSWORD='your_password'
|
||||
```
|
||||
|
||||
4. 修改模板中的关键项:
|
||||
- `Host` / `Port`(HTTPS 常见端口 `8443`)
|
||||
- `tls.verify`:`On`/`Off`
|
||||
- `tls.ca_file`:自签名证书建议配置 CA 文件
|
||||
- `tls.vhost`:证书 CN/SAN 对应主机名(SNI)
|
||||
|
||||
5. 重启并检查:
|
||||
```bash
|
||||
sudo systemctl restart fluent-bit
|
||||
sudo systemctl status fluent-bit
|
||||
journalctl -u fluent-bit -f
|
||||
```
|
||||
|
||||
### 9.3 验证点
|
||||
|
||||
- `default.logs_ingest` 有新增数据(HTTP)
|
||||
- `default.dns_logs_ingest` 有新增数据(DNS)
|
||||
- Fluent Bit 日志中无 TLS 握手失败(`certificate`, `x509`, `tls`)
|
||||
|
||||
### 9.4 回滚
|
||||
|
||||
TLS 配置错误导致中断时,快速回滚:
|
||||
|
||||
```bash
|
||||
sudo cp /etc/fluent-bit/fluent-bit.conf.bak /etc/fluent-bit/fluent-bit.conf
|
||||
sudo systemctl restart fluent-bit
|
||||
```
|
||||
|
||||
回滚后恢复原 HTTP 模式,不影响平台 API/管理端配置。
|
||||
|
||||
---
|
||||
|
||||
## 十、平台托管模式(推荐)
|
||||
|
||||
从 `v1.4.7` 开始,Node/DNS 在线安装流程会由平台托管 Fluent Bit,默认不再要求逐台手改 `/etc/fluent-bit/fluent-bit.conf`。
|
||||
|
||||
### 10.1 托管行为
|
||||
|
||||
- 安装器优先使用发布包内置离线包(不走 `curl | sh`)。
|
||||
- 首次安装后写入:
|
||||
- `/etc/fluent-bit/fluent-bit.conf`
|
||||
- `/etc/fluent-bit/parsers.conf`
|
||||
- `/etc/fluent-bit/.edge-managed.env`
|
||||
- `/etc/fluent-bit/.edge-managed.json`
|
||||
- 配置发生变化时按 `hash` 幂等更新,仅在内容变化时重启服务。
|
||||
- Node 与 DNS 同机安装时会自动合并角色,输出单份配置。
|
||||
|
||||
### 10.2 托管元数据
|
||||
|
||||
平台会维护 `/etc/fluent-bit/.edge-managed.json`,核心字段:
|
||||
|
||||
- `roles`: 当前机器启用角色(`node`/`dns`)
|
||||
- `hash`: 当前托管配置摘要
|
||||
- `sourceVersion`: 平台版本号
|
||||
- `updatedAt`: 最近更新时间戳
|
||||
|
||||
### 10.3 支持矩阵(离线包)
|
||||
|
||||
当前固定支持以下平台键:
|
||||
|
||||
- `ubuntu22.04-amd64`
|
||||
- `ubuntu22.04-arm64`
|
||||
- `amzn2023-amd64`
|
||||
- `amzn2023-arm64`
|
||||
|
||||
构建阶段会校验矩阵包是否齐全,缺失会直接失败并打印期望文件路径。
|
||||
|
||||
### 10.4 手工配置兼容
|
||||
|
||||
- 若现有 `fluent-bit.conf` 不是平台托管文件(不含 `managed-by-edgeapi` 标记),安装器不会强制覆盖,会返回明确错误提示。
|
||||
- 需要切到托管模式时,先备份旧配置,再由平台触发一次安装/更新任务。
|
||||
|
||||
### 10.5 Resource Profile Notes (New)
|
||||
|
||||
- Managed default is now tuned for `2C4G` nodes (conservative and stable).
|
||||
- Additional sample profiles are provided for larger nodes:
|
||||
- `deploy/fluent-bit/fluent-bit-sample-4c8g.conf`
|
||||
- `deploy/fluent-bit/fluent-bit-sample-8c16g.conf`
|
||||
- These sample files are for benchmark/reference only and are not auto-applied by installer.
|
||||
- To use higher profiles in managed mode, sync those parameters into `EdgeAPI/internal/installers/fluent_bit.go` and then trigger node reinstall/upgrade.
|
||||
@@ -1,69 +0,0 @@
|
||||
# Sample profile for 4C8G nodes (Node + DNS on same host).
|
||||
# Replace Host/Port/URI and credentials according to your ClickHouse deployment.
|
||||
|
||||
[SERVICE]
|
||||
Flush 1
|
||||
Log_Level info
|
||||
Parsers_File parsers.conf
|
||||
storage.path /var/lib/fluent-bit/storage
|
||||
storage.sync normal
|
||||
storage.checksum off
|
||||
storage.backlog.mem_limit 512MB
|
||||
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path /var/log/edge/edge-node/*.log
|
||||
Tag app.http.logs
|
||||
Parser json
|
||||
Refresh_Interval 2
|
||||
Read_from_Head false
|
||||
DB /var/lib/fluent-bit/http-logs.db
|
||||
storage.type filesystem
|
||||
Mem_Buf_Limit 256MB
|
||||
Skip_Long_Lines On
|
||||
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path /var/log/edge/edge-dns/*.log
|
||||
Tag app.dns.logs
|
||||
Parser json
|
||||
Refresh_Interval 2
|
||||
Read_from_Head false
|
||||
DB /var/lib/fluent-bit/dns-logs.db
|
||||
storage.type filesystem
|
||||
Mem_Buf_Limit 256MB
|
||||
Skip_Long_Lines On
|
||||
|
||||
[OUTPUT]
|
||||
Name http
|
||||
Match app.http.logs
|
||||
Host 127.0.0.1
|
||||
Port 8443
|
||||
URI /?query=INSERT%20INTO%20default.logs_ingest%20FORMAT%20JSONEachRow
|
||||
Format json_lines
|
||||
http_user ${CH_USER}
|
||||
http_passwd ${CH_PASSWORD}
|
||||
json_date_key timestamp
|
||||
json_date_format epoch
|
||||
workers 2
|
||||
net.keepalive On
|
||||
Retry_Limit False
|
||||
tls On
|
||||
tls.verify On
|
||||
|
||||
[OUTPUT]
|
||||
Name http
|
||||
Match app.dns.logs
|
||||
Host 127.0.0.1
|
||||
Port 8443
|
||||
URI /?query=INSERT%20INTO%20default.dns_logs_ingest%20FORMAT%20JSONEachRow
|
||||
Format json_lines
|
||||
http_user ${CH_USER}
|
||||
http_passwd ${CH_PASSWORD}
|
||||
json_date_key timestamp
|
||||
json_date_format epoch
|
||||
workers 2
|
||||
net.keepalive On
|
||||
Retry_Limit False
|
||||
tls On
|
||||
tls.verify On
|
||||
@@ -1,69 +0,0 @@
|
||||
# Sample profile for 8C16G nodes (Node + DNS on same host).
|
||||
# Replace Host/Port/URI and credentials according to your ClickHouse deployment.
|
||||
|
||||
[SERVICE]
|
||||
Flush 1
|
||||
Log_Level info
|
||||
Parsers_File parsers.conf
|
||||
storage.path /var/lib/fluent-bit/storage
|
||||
storage.sync normal
|
||||
storage.checksum off
|
||||
storage.backlog.mem_limit 1024MB
|
||||
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path /var/log/edge/edge-node/*.log
|
||||
Tag app.http.logs
|
||||
Parser json
|
||||
Refresh_Interval 1
|
||||
Read_from_Head false
|
||||
DB /var/lib/fluent-bit/http-logs.db
|
||||
storage.type filesystem
|
||||
Mem_Buf_Limit 512MB
|
||||
Skip_Long_Lines On
|
||||
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path /var/log/edge/edge-dns/*.log
|
||||
Tag app.dns.logs
|
||||
Parser json
|
||||
Refresh_Interval 1
|
||||
Read_from_Head false
|
||||
DB /var/lib/fluent-bit/dns-logs.db
|
||||
storage.type filesystem
|
||||
Mem_Buf_Limit 512MB
|
||||
Skip_Long_Lines On
|
||||
|
||||
[OUTPUT]
|
||||
Name http
|
||||
Match app.http.logs
|
||||
Host 127.0.0.1
|
||||
Port 8443
|
||||
URI /?query=INSERT%20INTO%20default.logs_ingest%20FORMAT%20JSONEachRow
|
||||
Format json_lines
|
||||
http_user ${CH_USER}
|
||||
http_passwd ${CH_PASSWORD}
|
||||
json_date_key timestamp
|
||||
json_date_format epoch
|
||||
workers 4
|
||||
net.keepalive On
|
||||
Retry_Limit False
|
||||
tls On
|
||||
tls.verify On
|
||||
|
||||
[OUTPUT]
|
||||
Name http
|
||||
Match app.dns.logs
|
||||
Host 127.0.0.1
|
||||
Port 8443
|
||||
URI /?query=INSERT%20INTO%20default.dns_logs_ingest%20FORMAT%20JSONEachRow
|
||||
Format json_lines
|
||||
http_user ${CH_USER}
|
||||
http_passwd ${CH_PASSWORD}
|
||||
json_date_key timestamp
|
||||
json_date_format epoch
|
||||
workers 4
|
||||
net.keepalive On
|
||||
Retry_Limit False
|
||||
tls On
|
||||
tls.verify On
|
||||
@@ -1,62 +0,0 @@
|
||||
[SERVICE]
|
||||
Flush 1
|
||||
Log_Level info
|
||||
Parsers_File parsers.conf
|
||||
storage.path ./storage
|
||||
storage.sync normal
|
||||
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path E:\var\log\edge\edge-node\*.log
|
||||
Tag app.http.logs
|
||||
Parser json
|
||||
Refresh_Interval 1
|
||||
Read_from_Head true
|
||||
DB ./http-logs.db
|
||||
Mem_Buf_Limit 128MB
|
||||
Skip_Long_Lines On
|
||||
|
||||
[INPUT]
|
||||
Name tail
|
||||
Path E:\var\log\edge\edge-dns\*.log
|
||||
Tag app.dns.logs
|
||||
Parser json
|
||||
Refresh_Interval 1
|
||||
Read_from_Head true
|
||||
DB ./dns-logs.db
|
||||
Mem_Buf_Limit 128MB
|
||||
Skip_Long_Lines On
|
||||
|
||||
[OUTPUT]
|
||||
Name http
|
||||
Match app.http.logs
|
||||
Host 127.0.0.1
|
||||
Port 8443
|
||||
URI /?query=INSERT+INTO+logs_ingest+FORMAT+JSONEachRow
|
||||
Format json_lines
|
||||
http_user ${CH_USER}
|
||||
http_passwd ${CH_PASSWORD}
|
||||
tls On
|
||||
tls.verify On
|
||||
# tls.ca_file C:\\path\\to\\ca.pem
|
||||
# tls.vhost clickhouse.example.com
|
||||
Json_Date_Key timestamp
|
||||
Json_Date_Format epoch
|
||||
Retry_Limit 10
|
||||
|
||||
[OUTPUT]
|
||||
Name http
|
||||
Match app.dns.logs
|
||||
Host 127.0.0.1
|
||||
Port 8443
|
||||
URI /?query=INSERT+INTO+dns_logs_ingest+FORMAT+JSONEachRow
|
||||
Format json_lines
|
||||
http_user ${CH_USER}
|
||||
http_passwd ${CH_PASSWORD}
|
||||
tls On
|
||||
tls.verify On
|
||||
# tls.ca_file C:\\path\\to\\ca.pem
|
||||
# tls.vhost clickhouse.example.com
|
||||
Json_Date_Key timestamp
|
||||
Json_Date_Format epoch
|
||||
Retry_Limit 10
|
||||
@@ -1,20 +0,0 @@
|
||||
# logrotate 示例:边缘节点日志轮转
|
||||
# 安装:放入 /etc/logrotate.d/edge-node 或 include 到主配置
|
||||
|
||||
/var/log/edge/edge-node/*.log {
|
||||
daily
|
||||
rotate 14
|
||||
compress
|
||||
missingok
|
||||
notifempty
|
||||
copytruncate
|
||||
}
|
||||
|
||||
/var/log/edge/edge-dns/*.log {
|
||||
daily
|
||||
rotate 14
|
||||
compress
|
||||
missingok
|
||||
notifempty
|
||||
copytruncate
|
||||
}
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Reference in New Issue
Block a user