]> code.ossystems Code Review - openembedded-core.git/commit
boost-build-native: workaround one rarely hang problem on fedora34
authorChangqing Li <changqing.li@windriver.com>
Thu, 8 Jul 2021 04:04:35 +0000 (12:04 +0800)
committerAlexandre Belloni <alexandre.belloni@bootlin.com>
Fri, 9 Jul 2021 09:38:37 +0000 (11:38 +0200)
commit7ff692d2e9787bb5b36929a208597595473db0c7
tree55b0798682714f041e79fe6c4b5e30529a3e5f25
parent8d04a6c01bf367eb1cb88fd34768a024c369216a
boost-build-native: workaround one rarely hang problem on fedora34

Reproduce scenes:
* On fedora34
* autofs.service is started
* test is nis user, which mounted at /nis by autofs
* under /nis/test, there are symlinks point to another nis mount point /nis/yan

Result:
task boost-build-native:do_install hang forever

NOTE: recipe ovmf-edk2-stable202102-r0: task do_package_write_rpm: Succeeded
NOTE: Running noexec task 8124 of 8152 (/layers/oe-core/meta/recipes-core/ovmf/ovmf_git.bb:do_build)
Bitbake still alive (5000s)
Bitbake still alive (10000s)
Bitbake still alive (15000s)
Bitbake still alive (20000s)
Bitbake still alive (25000s)
Bitbake still alive (30000s)
Bitbake still alive (35000s)
Bitbake still alive (40000s)
Bitbake still alive (45000s)
Bitbake still alive (50000s)

$ps aux | grep b2
test 2773444 0.0 0.0 13532 2748 ? D Jul01 0:00 ./b2 install --prefix=/build/tmp-glibc/work/x86_64-linux/boost-build-native/4.4.1-r0/recipe-sysroot-native/usr staging-prefix=/build/tmp-glibc/work/x86_64-linux/boost-build-native/4.4.1-r0/image/build/tmp-glibc/work/x86_64-linux/boost-build-native/4.4.1-r0/recipe-sysroot-native/usr

$ sudo cat /proc/2773444/stack
[<0>] autofs_wait+0x257/0x720
[<0>] autofs_mount_wait+0x49/0xf0
[<0>] autofs_d_manage+0x76/0x1a0
[<0>] __traverse_mounts+0xd9/0x220
[<0>] step_into+0x3ad/0x6d0
[<0>] walk_component+0x62/0x190
[<0>] link_path_walk.part.0.constprop.0+0x20d/0x350
[<0>] path_lookupat+0x3a/0x1b0
[<0>] filename_lookup+0x9b/0x180
[<0>] vfs_statx+0x64/0x100
[<0>] __do_sys_newfstatat+0x1e/0x40
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

$ dmesg
[1559743.424610] autofs4:pid:2773444:autofs_mount_wait: waiting for mount name=yan
[1559743.424621] autofs4:pid:2773444:autofs_wait: existing wait id = 0x00000056, name = yan, nfy=1
[1560001.400440] autofs4:pid:2774530:autofs_mount_wait: waiting for mount name=yan
[1560001.400452] autofs4:pid:2774530:autofs_wait: existing wait id = 0x00000056, name = yan, nfy=1
[1560022.493282] autofs4:pid:2774537:autofs_mount_wait: waiting for mount name=yan
[1560022.493292] autofs4:pid:2774537:autofs_wait: existing wait id = 0x00000056, name = yan, nfy=1
[1560122.076589] autofs4:pid:3979116:autofs_mount_wait: mount wait done status=-4
[1560162.222374] autofs4:pid:2774530:autofs_mount_wait: mount wait done status=-4
[1560167.116188] autofs4:pid:2774537:autofs_mount_wait: mount wait done status=-4
[1560188.140532] autofs4:pid:2774671:autofs_mount_wait: waiting for mount name=yan
[1560188.140540] autofs4:pid:2774671:autofs_wait: existing wait id = 0x00000056, name = yan, nfy=1
[1560189.651905] autofs4:pid:2774671:autofs_mount_wait: mount wait done status=-4

Analyzation:
b2 will walk the HOME dir, when access the symlink point to /nis/yan,
autofs hang at autofs_wait.  the process stay at D stat forever. This
maybe caused by abnormal status of autofs.service. The problem cannot
reproduce after restart autofs.service. There should be an autofs bug.
and there is an autofs hang problem bug on fedora34 on it's bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1953390

Workaround:
Since b2 don't actually write something to HOME dir, change HOME dir to
/var/run, a dir not mounted by autofs.

Signed-off-by: Changqing Li <changqing.li@windriver.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
meta/recipes-support/boost/boost-build-native_4.4.1.bb