gerald-zhang of 维稳小队
3/30/2018 - 4:42 PM

阿里云日志分析

基本语法

  1. 语法:"类似Lucene的查询语法 | 类似SQL的分析语法",先查询,后分析
  2. 监控:基于分析结果,可以配置报警
# 流量急跌或暴涨监控,以环比上一个时间段的网络流量为例(还可以环比昨天、环比上周等)

1. 基础:先定义一个1分钟的时间窗口,统计时间窗口内每秒的平均流量
* | select sum(inflow)/(max(__time__)-min(__time__)) as inflow, 
__time__-__time__%60  as window_time 
from log 
group by window_time 
order by window_time 
limit 15

说明: 连续取15分钟的数据

2. 使用子查询计算窗口内的差异值(最大值变化率)
* | select max(inflow)/avg(inflow) as max_ratio 
from (select sum(inflow)/(max(__time__)-min(__time__)) as inflow , 
__time__-__time__%60  as window_time 
from log 
group by window_time 
order by window_time 
limit 15)

3. 计算窗口内的差异值(最近值变化率)
 * | select max_by(inflow, window_time)/1.0/avg(inflow) as lastest_ratio 
 from (select sum(inflow)/(max(__time__)-min(__time__)) as inflow , 
 __time__-__time__%60  as window_time 
 from log 
 group by window_time 
 order by window_time 
 limit 15)

 说明:通过max_by方法获取最大windows_time中的流量
 
 4. 计算窗口内的差异值(定义波动率,上一个值与下一个变化率)
  * | select (inflow- lag(inflow, 1, inflow)over() )*1.0/inflow as diff, from_unixtime(window_time) 
  from (select sum(inflow)/(max(__time__)-min(__time__)) as inflow , 
  __time__-__time__%60  as window_time 
  from log 
  group by window_time 
  order by window_time 
  limit 15)
  
  说明:
  a. 使用窗口函数(lag)进行计算,窗口函数中提取当前inflow与上一个周期inflow进行差值,即“lag(inflow, 1, inflow)over() “ ,并除以当前值作为一个变化比率
  b. 如果要定义一个绝对变化率,可以使用abs函数(绝对值)对计算结果进行统一。 
# 判断是否错误,最重要的是500
status:500 | select count(1) as c
# 响应时间按秒分布(p95)
* | select date_trunc('second',from_unixtime(__time__)) as t, 
avg(request_time) as average,
approx_percentile(request_time, 0.95) as p95 ,
approx_percentile(request_time, 0.90) as p90 ,
approx_percentile(request_time, 0.70) as p70 
group by t 
order by t
limit 1440