ProbLog PTTP 16Apr2010
From PTAGISWiki
This incident was reported by: Dave on 16 April 2010 14:30(PDT)
Synopsis
At 13:31:25 PDT on Apr. 16, 2010, Big Brother issued a CRITICAL (RED) alert for the pttp process on blueback: "PTTP Queue is greater than 20 (it is at 47)". I diagnosed that the PTTP_Dispatcher process was "hung", killed the process, moved the PTTP_Dispatcher.pid file, and submitted an empty TST interrogation file to get things restarted. I monitored the event queue and confirmed that the events were quickly dispatched as soon as my new MiniMon file was received. The next hobbit poll at 14:21:57 found the PTTP queue empty. Curiously, PTTP_reclaimBrokenTransactions.pl was launched at approximately 14:10, while PTTP was hybernating, and "reclaimed" two transactions, but this reclamation activity did not resolve the problem with the PTTP_Dispatcher process.
Details
The first step was to log into blueback and see if there was a ~pittag/etc/PTTP_Dispatcher.pid file. There was, and it had a 13:23 timestamp. The next step was to view any queued events in the '/home/ptagdev/ptagis3/source/pttp/source/prod_test/events' directory. I did this from Total Commander. It can also be done from a shell as ptagdev, using the see_events alias. Normally, there should be no persistent files in the "events" directory. By the time I checked this afternoon there were 65, and the list eventually grew to 85.
Then I checked the PPTTP_Dispatcher.pl process on blueback, as pittag.
blueback:UI:pittag: > ps -ef | grep Disp pittag 28638 1 9 12:23:19 ? 32:44 /usr/local/bin/perl /home/pit
It'd been running for almost 33 minutes, so I killed it, and confirmed that no other Disp* processes were running.
blueback:UI:pittag: > kill -9 28638 blueback:UI:pittag: > ps -ef | grep [D]isp blueback:UI:pittag: >
After killing PPTTP_Dispatcher.pl, I moved the existing ~pittag/etc/PTTP_Dispatcher.pid file.
blueback:UI:pittag: > cd ~pittag/etc blueback:UI:pittag: > ls -l PTTP_Dispatcher.pid -rw-rw-rw- 1 pittag pitadmin 6 Apr 16 12:23 PTTP_Dispatcher.pid blueback:UI:pittag: > mv PTTP_Dispatcher.pid PTTP_Dispatcher.pid_16apr10_1416 blueback:UI:pittag: >
After removing the blocking PID file, I emailed in an empty MiniMon TST09000.TST file, observed that file transaction appear in the queue, after which the queue quickly emptied. That confirmed that the immediate issue was resolved.
Curiosity
At 2:10, while I was focused on resolving the problem with the PTTP_Dispatcher issue, I received an email from PTTP_EventAlerter@blueback.psmfc.org:
One or more interrupted PTTP transactions have been detected and repaired by ...
/home/pittag/bin/PTTP/PTTP_reclaimBrokenTransactions.pl
.. launched periodically by the 'pittag' cron. Excerpts from its log file ...
/home/ptagdev/ptagis3/source/pttp/source/prod_test/log/pttp_reclaimTx_log.txt
.. are presented below:
BeganTxReclamation:Fri Apr 16 13:10:01 PST 2010
TX_ID='TX97152CD4-5DBF-4040-B2DF-22432BB525E9'
FILES='FDC10106.B02'
Generating EVENTS ...
TX_ID='TX5A222E77-A314-4842-AB9E-43328B7BB276'
FILES='KCB10106.A'
Generating EVENTS ...
Starting PTTP_Dispatcher ...
EndedTxReclamation:Fri Apr 16 13:10:13 PST 2010
I don't know what affect this had on the disposition of the FDC10106.B02 and KCB10106.A files, but it didn't appear to have any impact on the PTTP_Dispatcher.pl process.
