Ticket #1082 (closed defect: fixed)

Opened 8 years ago

Last modified 8 years ago

--via-binary hangs

Reported by: filko_ Owned by:
Priority: Sometime Milestone:
Component: clients/cave Version: 0.58.0
Keywords: Cc:
Blocked By: Blocking:
Distribution: Gentoo

Description

After finishing some (5-10) of packages it hangs and just sits at this point:

=== Done builtin_tidyup
rm -fr /var/tmp/paludis/build/fonts-font-misc-misc-1.1.2
=== Completed ebuild phases loadenv tidyup

 * Regenerating environment...
Regenerating //etc/ld.so.cache...
 * Done regenerating environment
 * Updating CONFIG_PROTECT and CONFIG_PROTECT_MASK caches.

Done create binary in ::karambol-bin for fonts/font-misc-misc-1.1.2:0::x11 replacing 1.1.0:0::karambol-bin


8 of 40: Starting fetch for =fonts/font-misc-misc-1.1.2:0::karambol-bin...

Saw that before, but it was very rare. Now it happens more or less randomly, but never got it to go past 10 packages.

Attachments

gdb.log Download (3.5 KB) - added by filko_ 8 years ago.
cave execute-resolution --set --continue-on-failure if-satisfied --world-specs world world
gdb2.log Download (4.0 KB) - added by filko_ 8 years ago.
cave perform fetch --hooks --if-supported --managed-output --output-exclusivity with-others --no-terminal-titles =fonts/font-misc-misc-1.1.2:0::karambol-bin --x-of-y 8 of 40 --regulars-only --ignore-manual-fetch-errors
backtrace.log Download (89.9 KB) - added by replica 8 years ago.
backtrace2.log Download (31.0 KB) - added by replica 8 years ago.
cave perform fetch --hooks --if-supported --managed-output --output-exclusivity with-others --no-terminal-titles =sys-fs/lvm2-2.02.83:0::local-pbin --x-of-y 14 of 28 --regulars-only --ignore-manual-fetch-errors

Change History

Changed 8 years ago by filko_

cave execute-resolution --set --continue-on-failure if-satisfied --world-specs world world

Changed 8 years ago by filko_

cave perform fetch --hooks --if-supported --managed-output --output-exclusivity with-others --no-terminal-titles =fonts/font-misc-misc-1.1.2:0::karambol-bin --x-of-y 8 of 40 --regulars-only --ignore-manual-fetch-errors

comment:1 Changed 8 years ago by ciaranm

Hrm, the perform process is waiting for an IPC response. Wonder why it's not getting one.

comment:2 follow-up: ↓ 3 Changed 8 years ago by pioto

I've seen this sorta thing happen if, at the end of the phase, some child processes are still around. For example, if a test doesn't kill off any test servers it spawned. Not sure if this is an instance of that, or of something else.

comment:3 in reply to: ↑ 2 Changed 8 years ago by filko_

Replying to pioto:

some child processes are still around

Nothing like that here.

comment:4 Changed 8 years ago by ciaranm

Something replica noticed: it's showing =fonts/font-misc-misc-1.1.2:0::karambol-bin rather than the ID name, which means the ID doesn't exist. I suspect this is relevant.

Changed 8 years ago by replica

comment:5 Changed 8 years ago by replica

I rebuilt paludis with -ggdb3 and was finally able to grab a full backtrace of all threads during a hung binary fetch.

The hang occurred with the message:

8 of 22: Starting fetch for =sys-libs/timezone-data-2011a:0::local-pbin...

The operation being performed was a cave resume after cave resolve world with the following set in the environment:

CAVE_RESOLVE_OPTIONS="--complete --continue-on-failure if-independent --reinstall-scm weekly --resume-file ${HOME}/.cave/cave-resume --via-binary */*"

Changed 8 years ago by replica

cave perform fetch --hooks --if-supported --managed-output --output-exclusivity with-others --no-terminal-titles =sys-fs/lvm2-2.02.83:0::local-pbin --x-of-y 14 of 28 --regulars-only --ignore-manual-fetch-errors

comment:6 Changed 8 years ago by ciaranm

Think I've found a way of reproducing it consistently:

rm -fr /var/db/paludis/repositories/binaries/packages/app-*
sed -i -e 'd' /var/db/paludis/repositories/binaries/metadata/categories.conf
cave fix-cache
cave resolve hilite vim-runtime -b '*/*' -1x

The choice of packages seems to matter too.

comment:7 Changed 8 years ago by ciaranm

cave@1297546998: [DEBUG create_output_manager_info.package_id.deserialise.creating] (same context) Deserialising CreateOutputManagerForPackageIDActionInfo...

That's where it's locking up.

Uh oh.

comment:8 Changed 8 years ago by ciaranm

Oooooh. It's giving one of these:

Internal error at paludis::Thread::~Thread() at thread.cc:70: Exception 'When deserialising:: When finding the single version from packages matching =app-editors/vim-runtime-7.3.003:7.3::binaries with filter all matches:: Did not get unique result for 'the single version from packages matching =app-editors/vim-runtime-7.3.003:7.3::binaries with filter all matches' (got { }) (paludis::DidNotGetExactlyOneError)' uncaught in child thread

Which isn't getting shown.

comment:9 Changed 8 years ago by ciaranm

Right. Here's the problem: the 'perform' process asks the main 'execute-resolution' process to produce output managers for it. One of the bits of information passed to the output manager creation is a PackageID. This PackageID may sometimes refer to something that the execute-resolution process thinks doesn't exist because it looked earlier and cached it.

So... Why does it need the PackageID to create the output manager?

The only thing that needs it is output.conf 's 'matches' rule, which isn't documented, and which isn't used for anything. So one option's to just kill that off.

Another option is to go back to invalidating things in the main process. But we were really trying to avoid doing that...

comment:10 Changed 8 years ago by ciaranm

  • Status changed from new to closed
  • Resolution set to fixed

I think I fixed it...

Note: See TracTickets for help on using tickets.