postgresql building lock tree from text log

1) собрать все логи с бд по локам

[code language=»sql»]
[postgres@p38rmisdb01 postgres]$ grep -i ‘lock:’ postgresql-2018-04-06.log
[/code]

2) made a with query enclosing every row with ‘ and with ; at the end
—-best of all do this on different server
[code language=»sql»]
with b as (
select unnest(array[

‘2018-06-05 00:03:33 KRAT [22266]: [17-1] db=lsd,appname=[unknown],user=app_group_master,client=192.168.5.111 DETAIL: Process holding the lock: 24957. Wait queue: 22266.’,
‘2018-06-05 09:00:36 KRAT [5922]: [6-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 17343. Wait queue: 5922.’,
‘2018-06-05 09:01:10 KRAT [5922]: [12-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 13309. Wait queue: 5922.’,
‘2018-06-05 09:06:56 KRAT [9196]: [6-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 10136. Wait queue: .’,
‘2018-06-05 09:09:55 KRAT [28905]: [4-1] db=lsd,appname=192.168.4.22,user=lsd,client=192.168.5.111 DETAIL: Process holding the lock: 24188. Wait queue: 28905.’,
‘2018-06-05 09:10:08 KRAT [28905]: [11-1] db=lsd,appname=192.168.4.22,user=lsd,client=192.168.5.111 DETAIL: Process holding the lock: 24188. Wait queue: 28905.’,
‘2018-06-05 09:11:07 KRAT [28905]: [15-1] db=lsd,appname=192.168.4.22,user=lsd,client=192.168.5.111 DETAIL: Process holding the lock: 24188. Wait queue: 28905.’,
‘2018-06-05 10:00:06 KRAT [342]: [5-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 23155. Wait queue: 342.’,
‘2018-06-05 10:00:41 KRAT [342]: [11-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 9430. Wait queue: 342.’,
‘2018-06-05 10:01:21 KRAT [342]: [17-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 1941. Wait queue: 342.’,
‘2018-06-05 10:09:23 KRAT [27511]: [411-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 18429. Wait queue: .’,
‘2018-06-05 10:09:24 KRAT [18429]: [6-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 30049. Wait queue: .’,
‘2018-06-05 10:09:24 KRAT [30049]: [4-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 18429. Wait queue: 30049.’,
‘2018-06-05 11:00:43 KRAT [29866]: [6-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Processes holding the lock: 2667, 27024. Wait queue: 29866.’,
‘2018-06-05 11:00:56 KRAT [29866]: [12-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 29718. Wait queue: 29866.’,
‘2018-06-05 11:14:49 KRAT [13709]: [902-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 4172. Wait queue: .’,
‘2018-06-05 11:14:49 KRAT [4172]: [4-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 13719. Wait queue: .’,
‘2018-06-05 12:00:07 KRAT [24431]: [5-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 26119. Wait queue: 24431.’,
‘2018-06-05 12:00:41 KRAT [24431]: [11-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 23751. Wait queue: 24431.’,
‘2018-06-05 12:01:02 KRAT [24431]: [17-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 11215. Wait queue: 24431.’,
‘2018-06-05 13:00:43 KRAT [19265]: [6-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 7261. Wait queue: 19265.’,
‘2018-06-05 13:00:46 KRAT [19265]: [12-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 22841. Wait queue: 19265.’,
‘2018-06-05 14:00:02 KRAT [13762]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 32362. Wait queue: 13762.’,
‘2018-06-05 14:00:06 KRAT [13762]: [10-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 17264. Wait queue: 13762.’,
‘2018-06-05 14:00:34 KRAT [13762]: [16-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 28253. Wait queue: 13762.’,
‘2018-06-05 14:00:37 KRAT [13762]: [22-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 14070. Wait queue: 13762.’,
‘2018-06-05 14:12:13 KRAT [30079]: [702-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL: Process holding the lock: 4418. Wait queue: .’,
‘2018-06-05 15:00:02 KRAT [8188]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 17264. Wait queue: 8188.’,
‘2018-06-05 15:00:07 KRAT [8188]: [10-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 1981. Wait queue: 8188.’,
‘2018-06-05 15:00:40 KRAT [8188]: [16-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Processes holding the lock: 13534, 27378. Wait queue: 8188.’,
‘2018-06-05 15:01:16 KRAT [8188]: [22-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 16850. Wait queue: 8188.’,
‘2018-06-05 16:00:02 KRAT [2756]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Processes holding the lock: 1981, 25953. Wait queue: 2756.’,
‘2018-06-05 16:00:07 KRAT [2756]: [10-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 23839. Wait queue: 2756.’,
‘2018-06-05 16:00:34 KRAT [2756]: [16-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 8448. Wait queue: 2756.’,
‘2018-06-05 16:01:07 KRAT [2756]: [22-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 6456. Wait queue: 2756.’,
‘2018-06-05 17:00:02 KRAT [29439]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 23839. Wait queue: 29439.’,
‘2018-06-05 17:00:29 KRAT [29439]: [11-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL: Process holding the lock: 3378. Wait queue: 29439.’

])
as txt),
analyze_lock as (
select substring(txt, 1,19)::timestamp without time zone as ttime,
split_part(split_part(txt, ‘holding the lock:’, 2), ‘. Wait’, 1) as holding_lock,
split_part(split_part(txt, ‘Wait queue: ‘, 2), ‘.’, 1) as wait_queue,
txt from b
—where txt like ‘%3103%’
— здесь можно оганичить по пользователю, по процессу
),
analyze_lock_normalize as ( —нормализованное предсталвение блокировок (все перечисленные через запятую разбиты на отдельные строки)
SELECT ttime,
unnest(string_to_array(regexp_replace(regexp_replace(cast((holding_lock) AS CHARACTER VARYING), ‘^\(‘,»), ‘\)$’, »), ‘,’))::int AS holding_lock,
unnest(string_to_array(regexp_replace(regexp_replace(cast((wait_queue) AS CHARACTER VARYING), ‘^\(‘,»), ‘\)$’, »), ‘,’))::int AS wait_queue,
txt
from analyze_lock
—WHERE ttime::time between ’14:45:00′::time and ’15:15:00′::time
— здесь можно оганичить по времени
)
,
— обычная паррент таблица
pp as (
select distinct holding_lock::int as id, null::int as parent_id, null::timestamp without time zone as ttime from analyze_lock_normalize t where not exists (select from analyze_lock_normalize t2 where t2.wait_queue = t.holding_lock )
union all
select distinct wait_queue::int as h_id, holding_lock::int as parrent_id, ttime from analyze_lock_normalize t —where not exists (select from analyze_lock t2 where t2.wait_queue like ‘%’||t.holding_lock||’%’ )
),

ltr as (
WITH RECURSIVE ltr_p AS (
SELECT id,
parent_id,
ttime,
id::text::ltree as ltree_lock,
id::text::ltree as main_lock,
1 as llevel
FROM pp
WHERE parent_id IS NULL
UNION ALL
SELECT n.id,
n.parent_id,
n.ttime,
r.ltree_lock || (n.id::text::ltree),
r.main_lock,
r.llevel + 1
FROM pp n
JOIN ltr_p r ON n.parent_id = r.id
where r.llevel < 10 — ограничиваем глубину погружения в локи (главное понять от кого они идут а не кого блокируют)
)
SELECT *, min(ttime) over (partition by main_lock) as main_lock_time, max(ttime) over (partition by main_lock) as main_lock_time_max
FROM ltr_p
ORDER BY ltree_lock
)

select replace(ltree_lock::text,’.’,’ —> ‘) as ltree_lock, min(ttime)/*-INTERVAL ‘5h’*/ as time_lock, main_lock, main_lock_time/*-INTERVAL ‘5h’*/, main_lock_time_max/*-INTERVAL ‘5h’*/ from ltr group by 1, 3, 4, 5 order by 4, 3, min(ttime) nulls first, 1
— you can change " -INTERVAL ‘5h’" to any suitable value

[/code]

Result is:

[code]
10040
10040 06.04.2018 4:38 06.04.2018 4:38
10040 —> 10018 06.04.2018 4:38 10040 06.04.2018 4:38 06.04.2018 4:38
10040 —> 10100 06.04.2018 4:38 10040 06.04.2018 4:38 06.04.2018 4:38
10040 —> 27005 06.04.2018 4:38 10040 06.04.2018 4:38 06.04.2018 4:38
20930
20930 06.04.2018 5:07 06.04.2018 5:07
20930 —> 21075 06.04.2018 5:07 20930 06.04.2018 5:07 06.04.2018 5:07
8548
8548 06.04.2018 6:41 06.04.2018 6:41
8548 —> 30636 06.04.2018 6:41 8548 06.04.2018 6:41 06.04.2018 6:41
27429
27429 06.04.2018 7:01 06.04.2018 7:01
27429 —> 3837 06.04.2018 7:01 27429 06.04.2018 7:01 06.04.2018 7:01
30857
30857 06.04.2018 7:02 06.04.2018 7:02
30857 —> 3837 06.04.2018 7:02 30857 06.04.2018 7:02 06.04.2018 7:02
10219
10219 06.04.2018 7:49 06.04.2018 7:57
10219 —> 31116 06.04.2018 7:49 10219 06.04.2018 7:49 06.04.2018 7:57
10219 —> 31116 —> 9138 06.04.2018 7:53 10219 06.04.2018 7:49 06.04.2018 7:57
10219 —> 31116 —> 31119 06.04.2018 7:55 10219 06.04.2018 7:49 06.04.2018 7:57
10219 —> 31116 —> 31119 —> 3410 06.04.2018 7:56 10219 06.04.2018 7:49 06.04.2018 7:57
10219 —> 31116 —> 9138 —> 31118 06.04.2018 7:57 10219 06.04.2018 7:49 06.04.2018 7:57
30944
30944 06.04.2018 7:56 06.04.2018 7:56
30944 —> 31114 06.04.2018 7:56 30944 06.04.2018 7:56 06.04.2018 7:56
31110
31110 06.04.2018 7:56 06.04.2018 7:56
31110 —> 12204 06.04.2018 7:56 31110 06.04.2018 7:56 06.04.2018 7:56
31110 —> 12204 —> 3411 06.04.2018 7:56 31110 06.04.2018 7:56 06.04.2018 7:56
15286
15286 06.04.2018 8:03 06.04.2018 8:03
15286 —> 13436 06.04.2018 8:03 15286 06.04.2018 8:03 06.04.2018 8:03
18662
18662 06.04.2018 8:04 06.04.2018 8:04
18662 —> 13436 06.04.2018 8:04 18662 06.04.2018 8:04 06.04.2018 8:04
22066
22066 06.04.2018 8:04 06.04.2018 8:04
22066 —> 13436 06.04.2018 8:04 22066 06.04.2018 8:04 06.04.2018 8:04
28885
28885 06.04.2018 9:02 06.04.2018 9:02
28885 —> 22676 06.04.2018 9:02 28885 06.04.2018 9:02 06.04.2018 9:02

[/code]
as you can see the PID 10219 is the cause of cascade locks

4) also you can check what that PID do before locking

[postgres@p38rmisdb01 postgres]$ grep -i ‘\[10219\]’ postgresql-2018-04-06.log

ps: thx to Amir for material !

Оставьте комментарий

Ваш e-mail не будет опубликован. Обязательные поля помечены *

Этот сайт использует Akismet для борьбы со спамом. Узнайте как обрабатываются ваши данные комментариев.