Tair 又 core dump 了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Core was generated by `sbin/tair_server -f etc/dataserver.conf'.
Program terminated with signal 11, Segmentation fault.
(gdb) f
#0  0x000000000048229b in request_processor::process () at request_processor.cpp:1340
1340        req->r->opacket = resp;
(gdb) p/a req
$36 = 0x2aadba4d0d80
(gdb) p/a resp
$37 = 0x2aadba4d7e60
(gdb) p/a req->r
$38 = 0x2aacf3986da0
(gdb) p/a &req->r->opacket
$39 = 0x2aacf3986de0
(gdb) disassemble $rip-9, +15
Dump of assembler code from 0x482292 to 0x4822a1:
   # load address of resp to rax
   0x0000000000482292 <request_processor::process()+98>:       mov    0x10(%rsp),%rax
   # load address of req->r to rdx
   0x0000000000482297 <request_processor::process()+103>:      mov    0x20(%r13),%rdx
   # assign address of resp to req->r->opacket
=> 0x000000000048229b <request_processor::process()+107>:      mov    %rax,0x40(%rdx)
   0x000000000048229f <request_processor::process()+111>:      mov    0x10(%rsp),%rdi
End of assembler dump.
(gdb) x/a $rsp+0x10
0x4f56fcd0:     0x2aadba4d7e60 # address of resp
(gdb) p/a $rax
$40 = 0x2aadba4d7e60 # address of resp
(gdb) p/a $r13 # address of req
$41 = 0x2aadba4d0d80
(gdb) x/a $r13+0x20 # address of req->r
0x2aadba4d0da0: 0x2aacf3986da0
(gdb) p/a $rdx
$42 = 0x0

  又是一个 Segmentation faltcore 在一个赋值操作, req->r->opacket = resp; 按照惯例,req, req->r 或者 req->r->opacket 指向的地址应该是非法的,但查看这些地址,却全都是合法的地址。查看汇编代码,程序 core 在指令 mov %rax, 0x40(%rdx) 处,%rdx 内容为 NULL,即 req->rNULL%rdx 的值是从 %r13 + 0x20 处取得的,而该处的值是 0x2aacf3986da0,不是 NULL
  只有一种可能:最初从 (%r13+0x20),即 req->r 取出的值(到 %rdx)是 NULL,在访问 0x40(%rdx) 之前,req->r 又被复制为非 NULL。那就是并发问题了。
  类似这种诡异的现象,可能还会遇到 assert(var != 0) 失败,但 var 却是非 0 的情况。
  遇到难以置信的 bug,就想想并发。

Tags: ,.
你好!除了代码,此处没有多少原创之物,皆为本人搜集、整理、总结之记录与心得,欢迎转载分享!转载时请尽量注明出处,将不胜感激。祝你健康、快乐!
Home

Be the first to comment on this entry.

You must be logged in to post a comment.