Ticket #12 (closed defect: fixed)

Opened 8 years ago

Last modified 8 years ago

http://lists.humbug.org.au/mailman/admindb/mailman returns 500 Internal Server Error

Reported by: russell Owned by: russell
Priority: major Milestone: Sysadmin
Component: mailing-list Version: NA
Keywords: Cc:


The subject says it all I think. Go to that URL, enter the password, click the "Let me in..." button and page returned is:

<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
 sysadmin@humbug.org.au and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
<address>Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny8 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_wsgi/2.5 mod_perl/2.0.4 Perl/v5.10.0 Server at lists.humbug.org.au Port 80</address>

If you want to have a look at this, email myself or librarian@ humbug.xxxxxx for the password.

Change History

comment:1 Changed 8 years ago by russell

  • Component changed from dns to mailing-list

comment:2 Changed 8 years ago by raymond

I don't now enough Mailman to be sure, but I think there is configuration lying around from the old lists.

/var/log/mailman/error: multiple mention of admin.py access for non-existent lists as early as yesterday.
/etc/mailman/config: contains .config, .regular-members, and .digest-members files for each of the "deleted" lists.

Date on all configuration files is "2009-05-03 07:00".

comment:3 Changed 8 years ago by stephen

  • Owner changed from somebody to stephen
  • Status changed from new to assigned

comment:4 Changed 8 years ago by stephen

I've been looking into this, but I'll need some help for it not to take me an eternety to figure out. I'll ask James to assist me this weekend, probably during the meeting.

comment:5 Changed 8 years ago by russell

  • Owner changed from stephen to russell

comment:6 Changed 8 years ago by russell

  • Status changed from assigned to closed
  • Resolution set to fixed


Chronology was thus:

  1. I recall someone getting sick of mailman spam notifications for general. It seems the solution was to send them then to /dev/null.
  1. Mailmain still carefully filed away all spam messages in its pending queue, waiting for someone who cared to look at them.
  1. 3 years later, someone who cared came along (me). The pending queue was held in a single python pickle which by this stage contained some 16,000 messages. When I attemped to look at the queue, mailman died.
  1. Turns this initial failure was caused by python running out or memory when trying to process this pickle. (This much I had guessed).
  1. When mailman died, it did so so horribly it didn't cleanup its locks.
  1. The next time you went via the web interface, mailman went into what was effectively an infinite loop, waiting to acquire the lock. Unfortunately, that loop contains a memory leak. So it again ran and of memory, and in general for all appearances looked like the original problem.
  1. Turns our that our VM at that point (around 10 PM last night), ran out of memory. The linux OOM killer fired up, but unfortunately choose the wrong process to kill. The process it did choose refused to die, so it went infinite trying to kill it.
  1. As a consequence of that, we lost control of the VM.
  1. Stephen Thorn wrestled control of the VM back for us this morning.
  1. I have now rm /var/lib/mailman/lists/mailman/request.pck. This was the original trigger for the problem.
  1. I have restored normal processing of SPAM messages. All notifications are now being sent to list-bounces@…, which for now is aliased to president@….

Thanks to Greg for loaning me his VM and time to help track down this problem. Thanks to Stephen for being patient for me when I rang him this morning.

Note: See TracTickets for help on using tickets.