Skip to content

potential race condition with ProjDataInfoSubsetByView #1720

Description

@KrisThielemans

SyneRBI/SIRF#1379 flagged up issues in PETRIC2 with the subsets. I now experience occasional crashes in test_proj_data_info_subsets_pet on a system with 16 cores. At default settings, this occurs only occasionally, but setting OMP_NUM_THREADS=17

------------------ TOF
        Generating default ProjData from D690
<snip>
        Setting up default projector pair, ProjectorByBinPairUsingProjMatrixByBin
        Testing unbalanced subset 9: views {9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143}

        Testing Subset forward projection is consistent
        Setting up default projector pair, ProjectorByBinPairUsingProjMatrixByBin
        Testing Subset forward projection is consistent with reduced segment range
        Setting up default projector pair, ProjectorByBinPairUsingProjMatrixByBin
corrupted size vs. prev_size
Aborted (core dumped)
malloc(): invalid next size (unsorted)

or

malloc(): invalid next size (unsorted)
malloc_consolidate(): invalid chunk size

It's not because it's running out of memory, as top gives

MiB Mem :  31846.2 total,  26778.8 free,   4576.2 used,    872.1 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  27270.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 257371 kris      20   0 1798752   1.2g  24960 R  1587   3.7  22:04.42 test_proj_data_

Note that increasing the number of threads beyond the default made the run-time a lot slower, and top reported an increasing amount of "system" time, indicating I guess that the most of the locking stuff is doing what it is supposed to do.

Running with high number of threads in gdb gave me

Thread 8 "test_proj_data_" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff1b2ce40 (LWP 257392)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44     ./nptl/pthread_kill.c: No such file or directory
(gdb) info stack
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff624527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff62288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff62297b6 in __libc_message_impl (fmt=fmt@entry=0x7ffff63ce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:134
#6  0x00007ffff62a8ff5 in malloc_printerr (str=str@entry=0x7ffff63cc5e8 "corrupted size vs. prev_size") at ./malloc/malloc.c:5775
#7  0x00007ffff62a9b96 in unlink_chunk (p=<optimized out>, av=<optimized out>) at ./malloc/malloc.c:1611
#8  0x00007ffff62a9d2b in malloc_consolidate (av=av@entry=0x7fffd8000030) at ./malloc/malloc.c:4876
#9  0x00007ffff62aba90 in _int_malloc (av=av@entry=0x7fffd8000030, bytes=bytes@entry=3784) at ./malloc/malloc.c:4041
#10 0x00007ffff62ad714 in __GI___libc_malloc (bytes=3784) at ./malloc/malloc.c:3336
#11 0x00007ffff66c17fc in operator new (sz=3784) at ../../../../libstdc++-v3/libsupc++/new_op.cc:50
#12 0x00007ffff66c1849 in operator new[] (sz=<optimized out>) at ../../../../libstdc++-v3/libsupc++/new_opv.cc:32
#13 0x000055555570573d in stir::Array<2, float>::Array (range=..., this=0x7ffff1b2b6c0) at /home/kris/devel/build_sirfetc/sources/STIR/src/include/stir/Array.inl:117
#14 stir::Viewgram<float>::Viewgram (this=0x7ffff1b2b7e0, pdi_sptr=..., ind=...) at /home/kris/devel/build_sirfetc/sources/STIR/src/buildblock/Viewgram.cxx:69
#15 0x0000555555680af4 in stir::ProjDataInfo::get_empty_viewgram (this=this@entry=0x7fffb001a170, ind=...)
    at /home/kris/devel/build_sirfetc/sources/STIR/src/buildblock/ProjDataInfo.cxx:375
#16 0x0000555555683c7b in stir::ProjDataInfo::get_empty_related_viewgrams (this=0x7fffb001a170, viewgram_indices=..., symmetries_used=...,
    make_num_tangential_poss_odd=make_num_tangential_poss_odd@entry=false, timing_pos_num=timing_pos_num@entry=0)
    at /home/kris/devel/build_sirfetc/sources/STIR/src/buildblock/ProjDataInfo.cxx:480
#17 0x0000555555673ff2 in stir::ProjData::get_empty_related_viewgrams (this=<optimized out>, view_segmnet_num=..., symmetries_used=...,
    make_num_tangential_poss_odd=make_num_tangential_poss_odd@entry=false, timing_pos=timing_pos@entry=0)
    at /home/kris/devel/build_sirfetc/sources/STIR/src/buildblock/ProjData.cxx:252
#18 0x00005555557bdca3 in _ZN4stir21ForwardProjectorByBin15forward_projectERNS_8ProjDataEiib._omp_fn.0(void) ()
    at /home/kris/devel/build_sirfetc/sources/STIR/src/recon_buildblock/ForwardProjectorByBin.cxx:219
#19 0x00007ffff799d0b6 in __kmp_GOMP_microtask_wrapper () from /home/kris/miniforge3/envs/sirfetc/lib/libgomp.so.1
#20 0x00007ffff79bb679 in __kmp_invoke_microtask () from /home/kris/miniforge3/envs/sirfetc/lib/libgomp.so.1
#21 0x00007ffff7933214 in __kmp_invoke_task_func () from /home/kris/miniforge3/envs/sirfetc/lib/libgomp.so.1
#22 0x00007ffff7935fa0 in __kmp_launch_thread () from /home/kris/miniforge3/envs/sirfetc/lib/libgomp.so.1
#23 0x00007ffff7998ea2 in __kmp_launch_worker(void*) () from /home/kris/miniforge3/envs/sirfetc/lib/libgomp.so.1
#24 0x00007ffff629caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#25 0x00007ffff6329c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Metadata

Metadata

Labels

Type

Fields

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions