Падение вызывается на совершенно пустом accel-ppp без нагрузки такой командой:
while sleep 1; do accel-cmd reload; sleep 1; accel-cmd show sessions; done;
Конфиг при этом не изменяется, подключений к pppoe нет.
Конфиг:
[modules] log_syslog pppoe auth_mschap_v2 auth_mschap_v1 auth_chap_md5 radius [core] thread-count=4 [common] single-session=replace single-session-ignore-case=1 max-starting=100 [ppp] verbose=1 min-mtu=1152 mtu=1420 mru=1420 mppe=deny ipv4=require ipv6=deny lcp-echo-interval=20 lcp-echo-failure=3 lcp-echo-timeout=120 unit-cache=1 [auth] [pppoe] verbose=1 called-sid=mac interface=eth0 [dns] dns1=198.18.255.254 [radius] nas-identifier=MEGA gw-ip-address=198.18.255.254 server=127.0.0.1,passwradius,auth-port=1812,acct-port=1813,req-limit=50,fail-timeout=0,max-fail=10,weight=1 verbose=1 [log] syslog=accel-pppd,daemon copy=1 level=5 [cli] verbose=2 telnet=127.0.0.1:2000 tcp=127.0.0.1:2001 sessions-columns=type,ifname,username,ip,state,uptime-raw,calling-sid,called-sid,rx-bytes,tx-bytes
В обычном случае происходит такое:
Starting program: /usr/sbin/accel-pppd -c /etc/accel-ppp/accel-ppp.conf [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". [New Thread 0xb7ad9b40 (LWP 24563)] [New Thread 0xb72d8b40 (LWP 24566)] [New Thread 0xb6ad7b40 (LWP 24567)] [New Thread 0xb60ffb40 (LWP 24569)] [New Thread 0xb58feb40 (LWP 24570)] [New Thread 0xb50fdb40 (LWP 24572)] [New Thread 0xb48fcb40 (LWP 24573)] conf_file:/etc/accel-ppp/accel-ppp.conf:93: no section opened memory corruption: malloc(10) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/conf_file.c:117 free at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/conf_file.c:193 *** Error in `/usr/sbin/accel-pppd': corrupted double-linked list: 0xb61018c8 *** Thread 3 "accel-pppd" received signal SIGABRT, Aborted. [Switching to Thread 0xb72d8b40 (LWP 24566)] 0xb7fdc428 in __kernel_vsyscall () (gdb) bt full No symbol table info available. No symbol table info available. No symbol table info available. at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/memdebug.c:90 mem = 0xb61018d0 r = 0 ctx = {fname = 0xb7fda1c4 <sections> "D\036ПЁт!\020╤╓R\005─\\m\005──", file = 0xfa8c7f2b, line = 108205909, items = 0x0} sect = 0x8002f1bf <log_switch> r = -2147097804 sections_bak = {next = 0xb3d01554, prev = 0xb3d016ec} t = 0xb7ff2750 r = 4 set = {__val = {516, 0 <repeats 31 times>}} sig = 10 need_free = 0 stack = 0x0 No symbol table info available. No symbol table info available.
Это уже пропатчено тут: https://github.com/anphsw/accel-ppp/commit/a0c08ce019cf88278f882d823f876c6edc2d5218
После патча память не повреждается, а баг заходит несколько дальше, мои предположения такие: т.к. конфигурационное пространство одно и то же, но в него пытаются писать два независимых процесса, то могут происходить неприятные вещи с повреждением конфигурации (какие-то фатальные, какие-то нет)
Не фатальные выглядят так:
[New Thread 0xb7ad8b40 (LWP 22088)] [New Thread 0xb72d7b40 (LWP 22091)] [New Thread 0xb6ad6b40 (LWP 22092)] [New Thread 0xb60ffb40 (LWP 22093)] [New Thread 0xb58feb40 (LWP 22094)] [New Thread 0xb50fdb40 (LWP 22095)] [New Thread 0xb48fcb40 (LWP 22096)] conf_file:/etc/accel-ppp/accel-ppp.conf:59: no section opened conf_file:/etc/accel-ppp/accel-ppp.conf:7: no section opened
Заметьте, что ошибки в разных местах вылетают, хотя конфиг при этом никто не трогает!
Если конфигу с этим ошибками все-таки удается пролезть через парсер, то возникают злучайные ошибки в тех модулях, чьи секции конфига побились:
conf_file:/etc/accel-ppp/accel-ppp.conf:7: no section opened Thread 6 "accel-pppd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb58feb40 (LWP 22094)] 0xb7ae6e25 in load_config () at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/radius/serv.c:903 903 /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/radius/serv.c: Нет такого файла или каталога. (gdb) bt full #0 0xb7ae6e25 in load_config () at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/radius/serv.c:903 sect = 0x0 opt = 0xb7aecfdf s = 0xb7af1224 <serv_list> r = 0x0 pos = 0x3db n = 0x80053f78 opt1 = 0x0 #1 0xb7fd6408 in triton_event_fire (ev_id=11, arg=0x0) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/event.c:103 ev = 0x80053f78 h = 0x8005e2e8 #2 0x8001f43c in conf_reload_notify (r=0) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/cli/std_cmd.c:319 No locals. #3 0xb7fd1b7b in __config_reload (notify=0x8001f3f8 <conf_reload_notify>) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/triton.c:73 t = 0xb7fda190 <threads> r = 0 #4 0xb7fd2097 in triton_thread (thread=0x8005d13c) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/triton.c:159 set = {__val = {516, 0 <repeats 31 times>}} sig = 10 need_free = 0 stack = 0x0 #5 0xb7fae41b in start_thread () from /lib/libpthread.so.0 No symbol table info available. #6 0xb7c5092e in clone () from /lib/libc.so.6 No symbol table info available.
(gdb) run Starting program: /usr/sbin/accel-pppd -c /etc/accel-ppp/accel-ppp.conf [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". [New Thread 0xb7ad8b40 (LWP 10263)] [New Thread 0xb72d7b40 (LWP 10266)] [New Thread 0xb6ad6b40 (LWP 10267)] [New Thread 0xb60ffb40 (LWP 10269)] [New Thread 0xb58feb40 (LWP 10270)] [New Thread 0xb50fdb40 (LWP 10271)] [New Thread 0xb48fcb40 (LWP 10273)] memory corruption: free at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/logs/log_syslog.c:171 Thread 6 "accel-pppd" received signal SIGABRT, Aborted. [Switching to Thread 0xb58feb40 (LWP 10270)] 0xb7fdc428 in __kernel_vsyscall () (gdb) bt full #0 0xb7fdc428 in __kernel_vsyscall () No symbol table info available. #1 0xb7b8b219 in raise () from /lib/libc.so.6 No symbol table info available. #2 0xb7b8c9cd in abort () from /lib/libc.so.6 No symbol table info available. #3 0x80005096 in md_free (ptr=0xb3d02fd4, fname=0xb7b1c6a4 "/var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/logs/log_syslog.c", line=171) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/memdebug.c:84 mem = 0xb3d02fb0 #4 0xb7b1c4b9 in load_config () at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/logs/log_syslog.c:171 opt = 0x0 facility = 24 #5 0xb7fd6408 in triton_event_fire (ev_id=11, arg=0x0) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/event.c:103 ev = 0x80053f78 h = 0x80054180 #6 0x8001f43c in conf_reload_notify (r=0) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/cli/std_cmd.c:319 No locals. #7 0xb7fd1b7b in __config_reload (notify=0x8001f3f8 <conf_reload_notify>) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/triton.c:73 t = 0xb7fda190 <threads> r = 0 #8 0xb7fd2097 in triton_thread (thread=0x8005d264) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/triton.c:159 set = {__val = {516, 0 <repeats 31 times>}} sig = 10 need_free = 0 stack = 0x0 #9 0xb7fae41b in start_thread () from /lib/libpthread.so.0 No symbol table info available. #10 0xb7c5092e in clone () from /lib/libc.so.6 No symbol table info available.
Следующая итерация такая: отключить возможность одновременно парсить конфиг двумя разными процессами:
diff --git a/accel-pppd/triton/conf_file.c b/accel-pppd/triton/conf_file.c index e1d9650..fa4a56a 100644 --- a/accel-pppd/triton/conf_file.c +++ b/accel-pppd/triton/conf_file.c @@ -33,6 +33,8 @@ static int sect_add_item(struct conf_ctx *ctx, const char *name, char *val, char static struct conf_option_t *find_item(struct conf_sect_t *, const char *name); static int load_file(struct conf_ctx *ctx); +int conf_loading = 0; + static int __conf_load(struct conf_ctx *ctx, const char *fname) { struct conf_ctx ctx1; @@ -180,6 +182,12 @@ int conf_load(const char *fname) int r; struct conf_ctx ctx; + if (conf_loading) { + fprintf(stderr, "conf_file: loading already in progress\n"); + return -1; + } + conf_loading = 1; + if (fname) { if (conf_fname) _free(conf_fname); @@ -190,6 +198,8 @@ int conf_load(const char *fname) ctx.items = NULL; r = __conf_load(&ctx, fname); + conf_loading = 0; + return r; } @@ -213,6 +223,12 @@ int conf_reload(const char *fname) { struct sect_t *sect; int r; + + if (conf_loading) { + fprintf(stderr, "conf_file: reloading already in progress\n"); + return -1; + } + LIST_HEAD(sections_bak); list_splice_init(§ions, §ions_bak);
После этого приходит постоянство: при попытке парсить конфиг одновременно два раза мы вылетаем стабильно в одном и том же месте:
conf_file: reloading already in progress Thread 4 "accel-pppd" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb6ad5b40 (LWP 18176)] 0xb7fd19a1 in __list_del (prev=0x0, next=0x0) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/list.h:85 85 /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/list.h: Нет такого файла или каталога. (gdb) bt full #0 0xb7fd19a1 in __list_del (prev=0x0, next=0x0) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/list.h:85 No locals. #1 0xb7fd19fa in list_del (entry=0x80058e1c) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/list.h:96 No locals. #2 0xb7fd1fe3 in triton_thread (thread=0x8005e454) at /var/tmp/portage/net-dialup/accel-ppp-9999/work/accel-ppp-9999/accel-pppd/triton/triton.c:145 set = {__val = {516, 0 <repeats 31 times>}} sig = 10 need_free = 0 stack = 0x0 #3 0xb7fae41b in start_thread () from /lib/libpthread.so.0 No symbol table info available. #4 0xb7c5092e in clone () from /lib/libc.so.6 No symbol table info available.
Насколько я понял, последний патч не устраивает xeb'a, так что хотелось бы услышать предложения, как лучше пофиксить такую ситуацию.