Using a watchdog stored procedure to report the success of an OpenInsight process to an external monitoring service is very useful. Without clogging up your inbox with notifications that only indicate a scheduled job finished you can rest assured the external monitoring service will alert you only when the watchdog failed to report the process status. This technique is demonstrated by checking the OpenInsight Dedicated Indexer Health using a Nagios Passive Check.
The watchdog technique can be adapted for any process but to be effective you must call RTP27 to reload the watchdog stored procedure each time.
I can across an implementation of the watchdog stored procedure and noticed alerts weren't being generated when the process went down. Examination of the OpenInsight program showed no errors until I interacted with OpenInsight at which time the REV_LOADREC error appeared.
The watchdog stored procedure was put in place specifically to alert me when the process failed before users started reporting problems, but why was it continuing to run even though the process failed?
The answer lies in the fact that programs are cached by OpenInsight. The process and watchdog store procedures were both running as scheduled even though OpenInsight lost connection with the LinearHash service. When the process being monitored had no work to do it simply fired of the watchdog procedure to report back that it was still alive. When a network disconnect did occur the program continued to run, not finding any new work, and not triggering any programs that weren't already loaded in the OpenInsight program loader cache. No errors were generated until I connected to the OpenInsight instance and triggered something that wasn't already cached at which time the REV_LOADREC error triggered.
If you use a watchdog stored procedure to monitor processes you should call the built-in OpenInsight function RTP27('YOUR_WATCHDOG_FUNC') before calling the watchdog function to ensure the watchdog is reloaded into memory each time. If a network disconnect happens RTP27 will fail to load and the watchdog function will fail to report into the montioring service which should generate an error to alert you before users notice anything is wrong.