Skip to main content

CRS and ASM cannot start because GNS offline(VIP ran away)

Our RAC crash when we have storage issue, after storage is fixed, starting RAC using "crsctl start crs" failed on ASM start. Did not think it could be caused by GNS but looks like it was. GNS was offline because the VIP somehow ran away on one of the RAC node, I can ping and ssh to the VIP but cannot run "srvctl stop/start GNS".

This is the evidence that ASM rely on GNS to be online, if GNS is not online, then the remote_listener in +ASM's parameter file is invalid and asm is killed.

In alert_ASM1.log:
Using parameter settings in server-side spfile +OCR_VOTE/wwwracprdcrs00/asmparameterfile/registry.253.830014321
...
Sat Aug 03 00:49:10 2019
USER (ospid: 16060): terminating the instance due to error 119

The ASM log does not say why error 119 occur, but checking ohasd_oraagent_grid.trc, it's the remote_listener that causing problem:

In ohasd_oraagent_grid.trc:

2019-08-03 00:49:10.295345 :CLSDYNAM:2912564992: [ ora.asm]{0:5:3} [start] ORA-00119: invalid specification for system parameter REMOTE_LISTENER
ORA-00132: syntax error or unresolved network name 'scan.prd01.rac.bcferries.corp'
...
2019-08-03 00:49:10.318762 :    AGFW:2916767488: {0:5:3} ora.asm 1 1 state changed from: STARTING to: OFFLINE

sqlplus / as sysasm
SQL> show parameter remote

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
remote_listener                      string      scan.prd01.rac.bcferries.corp

After we powered off the node which run VIP, the ASM and CRS can start.

I am doing more investigations, I can hardly believe this. thought CRS/ASM should start before GNS service, according to this link, GNS is level 4, ASM is level 2:

https://www.hhutzler.de/blog/troubleshooting-clusterware-startup-problems/

Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory ...

oracle dba_hist_sysmetric_summary

found this blog is helpful to get CPU and IO statistics on oracle database. http://shob-dbadmin.blogspot.ca/2012/12/how-to-find-total-io-of-database.html courtesy to  Shomil Bansal , below are hist writing, not mine. How to find total IO of the database instance Total IO of database instance is sum of the physical reads, physical writes and redo writes. There are several views to find these values. v$sysmetric  - Reports metric values for only the most current time sample 60 secs. v$sysmetric_summary  - Reports metric values for time sample of 1 hour. v$sysmetric_history  - Reports metric values every 60 sec from the time instance is up. Better way to analyse IO using this view to take deltas between two time periods. dba_hist_sysmetric_history  - All the above views are refreshed when the instance is restarted. This view, part of AWR, stores the historical stats. I have used this view for my report. Query: ====== set lines 350...

non-existent process lock port on windows server

I have a database link created between oracle and sqlserver using oracle tg4odbc, the product is installed on windows server and run as service "OracleOraGtw11g_home1TNSListener", but sometime the service cannot started, the root cause of this problem is that the port number 1521 is used by an non-existent process. The first step is to use netstat -bano|find "1521" to get the process id, in my case it's 5844, which shows the connection is from my oracle server 10.8.0.169 H:\>netstat -bano|find "1521"   TCP    0.0.0.0:1521           0.0.0.0:0              LISTENING       5844   TCP    10.14.45.33:1521       10.8.0.169:42987       ESTABLISHED     5844 however the process id does not show in either task manager or process explorer. The next step is to run tcpview, which shows non-existent under process column, there are three rows, t...