Plan for installing Ingres 2006 Release 2 SP2 on sockeye

From PTAGISWiki

Jump to: navigation, search

Contents

Background

In response to a tagging file that crashes Ingres in production whenever we try to load it, we are bringing or Ingres installation up to the latest patch level in the hopes that the crash will no longer occur.

Overall plan

  • install Ingres 2006 Release 2 SP2
    • category selection is Product: Ingres Database > Release: 2006 > Platform: Sun Solaris SPARC
    • downloaded filename is ingres2006-9.1.2-100-com-sun-solaris-sparc-32-64bit.tar
    • md5sum: 9b85f3c4d4245fdd7dc341a76b8683f6 ingres2006-9.1.2-100-com-sun-solaris-sparc-32-64bit.tar
  • install patch 13548
  • perform installation on sockeye, pitblade, blueback, bay, jack
  • maintain documentation on sockeye:/home/ptagdev/ptagis3/dba_chores/InstallIngres/Install_Ingres2006_SP2 as copied from upgrade_to_Ingres2006

Email from Ingres tech support

Here is the relevant message from Ingres tech support on issue 139541:

You don't need to install the SP1 first, you can go straight to the SP2 and then install the patch 13548 just after.
Yes, you have a point, the service pack that you need is Ingres 2006 Release 2 SP2.

It is always strongly recommended that you install it first on a test environment before going through the production one.

The main difference of patch install and service pack install are:
1) The service pack works as an upgrade installation, it means a new version installed over a previous existent 
installation preserving all data and server configuration (CBF).
2) There is no formal uninstall procedure on service pack installation. If needed, an uninstall must be done by an 
OS copy of the II_SYSTEM directory without DATA, WORK, JNL, DMP, CKP and LOG (logfile) locations.

Installing a patch:
1) Have a ckpdb of all your database. If possible an offline one would be better.
2) ingstop
3) If you want you can have an OS copy at this point
4) Install the patch
5) ingstart
6) Do tests
7) If any problem, you can uninstall the patch by following the guide lines into patch.doc

Installing the Service Pack:
1) Have a ckpdb of all your database. If possible an offline one would be better.
2) ingstop
3) Have an OS copy of II_SYSTEM directory without DATA, JNL, DMP, WORK, CKP, LOG locations. If you want you can also 
have an OS copy of these locations separated.
4) Install the Service pack over your existent installation.
5) ingstart
6) Do tests
7) If any problem, you can uninstall the Service pack by coping back the files copied before on (3)

Documentation included in SP2

Documentation in patch 13585

Installation script/history

Development detailed plan

database checkpoint of sockeye

  • root crontab runs /usr/ingres/run_snapckp.sh snapckp
  • found that it was last run on Aug 29, 2009
  • activated it for running on sockeye Oct 20, 2009

filesystem backup of II_SYSTEM on sockeye

filesystem backup of II_SYSTEM on pitblade

install SP2 on sockeye

  • SP2 was installed and a test interrogation file was loaded
    • I copied a file from loaded/interrogation/TST/2007 to interrogation/hold
    • set up p3_daily_3.sh to run in pittag's crontab
/usr/pit/pittag/bin/p3_daily_3.sh: STARTED @ Mon Oct 26 09:00:01 PST 2009
idl_dea_noplot.sh: STARTED @ Mon Oct 26 09:00:02 PST 2009
/usr/pit/pittag/bin/idl_dea_noplot.sh: 26/10/09 09:00 495 tags or test tags in 2460 records to load..
idl_dea_noplot.sh: Running in SERIAL mode..
fdc_maintainCountSRRT.pl: STARTED @ Mon Oct 26 09:00:35 PST 2009
fdc_maintainCountSRRT.pl: FINISHED @ Mon Oct 26 09:00:36 PST 2009
/usr/pit/pittag/bin/idl_dea_noplot.sh: FINISHED @ Mon Oct 26 09:00:36 PST 2009
/usr/pit/pittag/bin/p3_daily_3.sh: FINISHED @ Mon Oct 26 09:00:36 PST 2009
  • Then ran an interrogation summary report on the load
cd /home/ptagdev/ptagis3/source/reports
report -upittag ptagis3 int_load_summary -fryantest.out
cat ryantest.out
             PACIFIC STATES MARINE FISHERIES COMMISSION
                          PIT Tag Database
           INTERROGATION DATA LOADER PERFORMANCE SUMMARY
          FOR INTERROGATION DATA FILES UPLOADED  10/26/09

               -0- The loader ran in SERIAL mode -0-

+----+-+--------------------------+---------------------+----------+
| S  |A|      Number of Rows      |         Time        | Average  |
| i  |B|        Inserted          |                     |   Rate   |
| t  |/+-------+--------+---------+----------+----------+          |
| e  |S|  Obs  |  Test  |  Fail   |   Start  |    End   | Rows/Min |
+----+-+-------+--------+---------+----------+----------+----------+
 TST  S      57       9        0    10:00:17   10:00:29      330
+------+-------+--------+---------+---------------------+
| Sum  |     57|      9 |      0  |       Elapsed       |
+------+-------+--------+---------+---------------------+==========+
| Overall:         66 rows        |        0.2 min      |    330   |
+---------------------------------+---------------------+==========+

Tested connectivity from pitblade via netutil and found that I had to reset the installation password on pitblade even though I set the password to the same value.

install patch on sockeye

start ingres on sockeye

install SP2 on pitblade

install patch on pitblade

start ingres on pitblade

Production detailed plan

database checkpoint of blueback

filesystem backup of II_SYSTEM on blueback

filesystem backup of II_SYSTEM on bay and jack

install SP2 on blueback

install patch on blueback

start ingres on blueback

install SP2 on bay

install patch on bay

start ingres on bay

install SP2 on jack

install patch on jack

start ingres on jack

Troubleshooting

The service pack and patch are installed on blueback and now jack is unable to connect. It returns this error when it attempts to run the query-runner.pl:

jack:D2:root: > ./query-runner.pl -u rday -r blah
DBI connect('blueback::ptagis3;-Gptagis_rpt','',...) failed: E_LQ0001 Failed to connect to DBMS session.
E_LC0001 GCA protocol service (GCA_REQUEST) failure.
    Internal service status E_GC0142 -- The installation password for the
    remote vnode is invalid.. at ./query-runner.pl line 72
Deep recursion on subroutine "main::catch" at ./query-runner.pl line 232, <DEF> line 41.
Deep recursion on subroutine "main::log_end" at ./query-runner.pl line 282, <DEF> line 41.
^C

However, testing the connection through netutil yields a success from jack to blueback.

Looking over the settings in netutil on bay and jack, they are identical. bay is able to run queries without difficulty using the same perl script.

I tried ingstop and ingstart on jack, but that didn't help anything.

I tried using netutil to set the installation password again, but that didn't help.

I found the user ingres on jack can connect without difficulty, but root cannot. Currently query-builder.pl is run as root. User rday can connect from jack to blueback just fine.

I tried running "mkvalidpw" as root on jack. It successfully reinstalled the binary. I then did an ingstop and ingstart on jack. The situation was unchanged after this: root cannot connect, but other users can.

checking environment

If I run on bay with the absolute minimum environment, it seems to work.

bay:D2:rday: > su - 
Password: 
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# cd /usr/local/pitweb/ptagis-1.0/bin
# ./query-runner.pl -u rday -r blah
# setenv
setenv: not found
# env
HOME=/
HZ=
LANG=en_US.UTF-8
LOGNAME=root
MAIL=/var/mail/root
PATH=/usr/sbin:/usr/bin
SHELL=/sbin/sh
TERM=xterm
TZ=US/Pacific

On jack, the experience is different:

jack:D2:rday: > su - 
Password: 
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
You have mail.
# cd /usr/local/pitweb/ptagis-1.0/bin
# ./query-runner.pl -u rday -r blah
DBI connect('blueback::ptagis3;-Gptagis_rpt','',...) failed: E_LQ0001 Failed to connect to DBMS session.
E_LC0001 GCA protocol service (GCA_REQUEST) failure.
    Internal service status E_GC0142 -- The installation password for the
    remote vnode is invalid.. at ./query-runner.pl line 72
Deep recursion on subroutine "main::catch" at ./query-runner.pl line 232, <DEF> line 28.
Deep recursion on subroutine "main::log_end" at ./query-runner.pl line 282, <DEF> line 28.
^C# env
HOME=/
HZ=
LANG=en_US.UTF-8
LOGNAME=root
MAIL=/var/mail/root
PATH=/usr/sbin:/usr/bin
SHELL=/sbin/sh
TERM=xterm
TZ=US/Pacific

I made a copy of query-runner.pl that displays the full perl environment before trying to connect to the database. Here is the run on bay:

bay:D2:root: > ./query-runner-test.pl -u rday -r blah
ENVIRONMENT
key: BAY_DBA_NAME   value: pittag
key: BAY_NODE_DB   value: blueback::ptagis3
key: BLUEBACK_DBA_NAME   value: pittag
key: BLUEBACK_NODE_DB   value: ptagis3
key: CEW_OUTPUT_ROOT   value: /home/pittag/out/cewOUT
key: CEW_SRC   value: /home/pittag/bin/CEW
key: CLASSPATH   value: /usr/ingres/ing2006r2/ingD2/ingres/lib/iijbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/iijbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/iijbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:
key: EDITOR   value: vi
key: HOME   value: /home/rday
key: II_SYSTEM   value: /usr/ingres/ing2006r2/ingD2
key: ING_SET   value: set lockmode session where readlock=nolock
key: JACK_DBA_NAME   value: pittag
key: JACK_NODE_DB   value: blueback::ptagis3
key: LANG   value: en_US.UTF-8
key: LD_LIBRARY_PATH   value: /lib:/usr/ccs/lib:/usr/ucblib:/usr/ingres/ing2006r2/ingD2/ingres/lib:/opt/SUNWspro:/usr/openwin/lib:/usr/dt/lib:/usr/ingres/SUNWspro
key: LOGNAME   value: root
key: MAIL   value: /var/mail//rday
key: OLDPWD   value: /usr/local/pitweb/ptagis-1.0/bin
key: PATH   value: /usr/ingres/ing2006r2/ingD2/ingres/bin:/usr/ingres/ing2006r2/ingD2/ingres/utility:/usr/ingres/ing2006r2/ingD2/ingres/files:/usr/ingres/ing2006r2/ingD2/ingres/lib:/usr/ingres/ing2006r2/ingD2/ingres/SUNWspro/bin:/usr/bin:/usr/local/bin:/usr/local/mysql/bin:/usr/ccs/bin:/usr/sbin:/usr/local/bin:/usr/ingres/ing2006r2/ingD2/bin:/usr/ingres/ing2006r2/ingD2/utility
key: PITBLADE_DBA_NAME   value: pittag
key: PITBLADE_NODE_DB   value: sockeye::ptagis3
key: PROMPT_COMMAND   value: echo -ne "\033]0;"; uname -n; echo -ne "\007"
key: PWD   value: /home/pittag/bin/CEW
key: SEBASTES_DBA_NAME   value: pittag
key: SEBASTES_NODE_DB   value: blueback::ptagis3
key: SHELL   value: /bin/bash
key: SHLVL   value: 2
key: SOCKEYE_DBA_NAME   value: pittag
key: SOCKEYE_NODE_DB   value: ptagis3
key: SSH_CLIENT   value: 10.1.2.59 42618 22
key: SSH_CONNECTION   value: 10.1.2.59 42618 172.16.100.41 22
key: SSH_TTY   value: /dev/pts/5
key: SUDO_COMMAND   value: /bin/bash
key: SUDO_GID   value: 30
key: SUDO_UID   value: 118
key: SUDO_USER   value: rday
key: TERM   value: xterm
key: TERM_INGRES   value: vt100fx
key: TZ   value: US/Pacific
key: USER   value: root
key: _   value: /bin/env

And here is the run on jack:

jack:D2:root: > ./query-runner-test.pl -u rday -r blah
ENVIRONMENT
key: BAY_DBA_NAME   value: pittag
key: BAY_NODE_DB   value: blueback::ptagis3
key: BLUEBACK_DBA_NAME   value: pittag
key: BLUEBACK_NODE_DB   value: ptagis3
key: CEW_OUTPUT_ROOT   value: /home/pittag/out/cewOUT
key: CEW_SRC   value: /home/pittag/bin/CEW
key: CLASSPATH   value: /usr/ingres/ing2006r2/ingD2/ingres/lib/iijdbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/iijdbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/iijdbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/iijdbc.jar:/usr/ingres/ing2006r2/ingD2/ingres/lib/edbc.jar:
key: EDITOR   value: vi
key: HOME   value: /home/rday
key: II_SYSTEM   value: /usr/ingres/ing2006r2/ingD2
key: ING_SET   value: set lockmode session where readlock=nolock
key: JACK_DBA_NAME   value: pittag
key: JACK_NODE_DB   value: blueback::ptagis3
key: LANG   value: en_US.UTF-8
key: LD_LIBRARY_PATH   value: /lib:/usr/ccs/lib:/usr/ucblib:/usr/ingres/ing2006r2/ingD2/ingres/lib:/opt/SUNWspro:/usr/openwin/lib:/usr/dt/lib:/usr/ingres/SUNWspro
key: LOGNAME   value: root
key: MAIL   value: /var/mail//rday
key: PATH   value: /usr/ingres/ing2006r2/ingD2/ingres/bin:/usr/ingres/ing2006r2/ingD2/ingres/utility:/usr/ingres/ing2006r2/ingD2/ingres/files:/usr/ingres/ing2006r2/ingD2/ingres/lib:/usr/ingres/ing2006r2/ingD2/ingres/SUNWspro/bin:/usr/bin:/usr/local/bin:/usr/local/mysql/bin:/usr/ccs/bin:/usr/sbin:/usr/local/bin:/usr/ingres/ing2006r2/ingD2/bin:/usr/ingres/ing2006r2/ingD2/utility
key: PITBLADE_DBA_NAME   value: pittag
key: PITBLADE_NODE_DB   value: sockeye::ptagis3
key: PROMPT_COMMAND   value: echo -ne "\033]0;"; uname -n; echo -ne "\007"
key: PWD   value: /home/pittag/bin/CEW
key: SEBASTES_DBA_NAME   value: pittag
key: SEBASTES_NODE_DB   value: blueback::ptagis3
key: SHELL   value: /bin/bash
key: SHLVL   value: 2
key: SOCKEYE_DBA_NAME   value: pittag
key: SOCKEYE_NODE_DB   value: ptagis3
key: SSH_CLIENT   value: 10.1.2.59 54410 22
key: SSH_CONNECTION   value: 10.1.2.59 54410 172.16.100.42 22
key: SSH_TTY   value: /dev/pts/1
key: SUDO_COMMAND   value: /bin/bash
key: SUDO_GID   value: 30
key: SUDO_UID   value: 118
key: SUDO_USER   value: rday
key: TERM   value: xterm
key: TERM_INGRES   value: vt100fx
key: TZ   value: US/Pacific
key: USER   value: root
key: _   value: /bin/env
DBI connect('blueback::ptagis3;-Gptagis_rpt','',...) failed: E_LQ0001 Failed to connect to DBMS session.
E_LC0001 GCA protocol service (GCA_REQUEST) failure.
    Internal service status E_GC0142 -- The installation password for the
    remote vnode is invalid.. at ./query-runner-test.pl line 76
Deep recursion on subroutine "main::catch" at ./query-runner-test.pl line 236, <DEF> line 40.
Deep recursion on subroutine "main::log_end" at ./query-runner-test.pl line 286, <DEF> line 40.
^C

classpath defines iijdbc.jar incorrectly on bay

A diff of those reveals that there is a difference in the classpath variable. Bay seems to have a typo in the path to the ingres DAS driver class and so is probably still using the EDBC driver. Jack doesn't have that typo and is probably using the IIJDBC driver.

The bay classpath includes this incorrect path:

/usr/ingres/ing2006r2/ingD2/ingres/lib/iijbc.jar

I need to track down where that is being defined. The correct path is:

/usr/ingres/ing2006r2/ingD2/ingres/lib/iijdbc.jar

The path typo is not in the weblogic start (/usr/local/bea81/user_projects/domains/mydomain/startManagedWeblogic.sh) file as I expected.

The CLASSPATH typo is in bay:/usr/ingres/set_ing_install.sh

The CLASSPATH typo is a problem, but is not related to this particular issue. The perl query-runner.pl does not use the CLASSPATH variable.

root@jack cannot connect to blueback, but ingres@jack can

As a temporary measure, I rewrote query-runner.pl to drop permissions to ingres from root before connecting to blueback. This causes the output of reports to be written as ingres instead of root, so I had to chmod -R ingres on all the files in /usr/local/pitweb/ptagis-1.0/cvswd.

This should allow both app servers to work until Ingres tech support can determine a better solution.

More details at Ingres Tech Support Issue 141589

As a temporary measure, I altered query-runner.pl to setuid to ingres. This required changing permissions on the files in the user's report directories from root to ingres. This enabled me to get jack back into service while the problem is being resolved.

Resolution: bad vnode on jack

Ingres tech support suggested running netutil as root to see if it had a vnode entry that was preventing it from connecting. I did so and found that it had a private vnode entry for user root only. I removed that private vnode entry leaving the normal global vnode. That immediately fixed the problem.

I left the modifications to query-runner.pl in place since it makes sense to run that process with the least privileges required.

Personal tools