Skip to main content

pathping to detect network timeout

Recently our BPC nightly job failed intermittently, a few database mirror servers report timeout as well, I need to prove to network team that it's not an application issue, it's a network issue, so I implement the a windows scheduled task to run pathping among those servers every 1 minute. it did prove that network timeout. here is the script and sample of logs.

pingnetwork.bat :
echo %date% %time%
pathping server1
echo %date% %time%
pathping server2
echo %date% %time%
pathping server3
echo %date% %time%
pathping server4

pingnetwork.bat >> e:\dropit\d.txt

the log d.txt looks like this

"
Thu 05/19/2011 3:20:15.19
E:\dba>pathping agenwi034
Tracing route to agenwi034.corp.ctv.ca [10.1.38.134]
over a maximum of 30 hops:
0 agends030.corp.ctv.ca [10.1.38.130]
1 * agenwi034.corp.ctv.ca [10.1.38.134]
Computing statistics for 25 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
Trace complete.
"

ftp the log file to unix and format it in a readable way.
grep -E "\/ 100" d.txt > d1.txt

the new log d1.txt looks like this:
"
E:\dba>echo Thu 05/19/2011 3:20:15.19
Thu 05/19/2011 3:20:15.19
0/ 100 = 0% |
1 0ms 0/ 100 = 0% 0/ 100 = 0% agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
65/ 100 = 65% |
1 1ms 65/ 100 = 65% 0/ 100 = 0% agenwi031.corp.ctv.ca [10.1.38.131]
60/ 100 = 60% |
1 1ms 60/ 100 = 60% 0/ 100 = 0% agenwi032.corp.ctv.ca [10.1.38.132]
39/ 100 = 39% |
1 0ms 39/ 100 = 39% 0/ 100 = 0% agenwi033.corp.ctv.ca [10.1.38.133]
"

now network is convinced of the network issue and is working on it.

Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory ...

oracle dba_hist_sysmetric_summary

found this blog is helpful to get CPU and IO statistics on oracle database. http://shob-dbadmin.blogspot.ca/2012/12/how-to-find-total-io-of-database.html courtesy to  Shomil Bansal , below are hist writing, not mine. How to find total IO of the database instance Total IO of database instance is sum of the physical reads, physical writes and redo writes. There are several views to find these values. v$sysmetric  - Reports metric values for only the most current time sample 60 secs. v$sysmetric_summary  - Reports metric values for time sample of 1 hour. v$sysmetric_history  - Reports metric values every 60 sec from the time instance is up. Better way to analyse IO using this view to take deltas between two time periods. dba_hist_sysmetric_history  - All the above views are refreshed when the instance is restarted. This view, part of AWR, stores the historical stats. I have used this view for my report. Query: ====== set lines 350...

database not registered to scan_listener

created a new database COGSTD, but it's not registered on scan_listener, when connect, the error is: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor In the alert log, when the database started, I usually see the remote_listener and local_listener are set dynamically before database is mount, but on this problematic database, I only see local_listener are set, but not remote_listener ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=xx.xx.xx.xx0)(PORT=1521))' SCOPE=MEMORY SID='COGSTD3'; Troubleshooting: 1. The service is registered with local listener: node1>lsnrctl status listener|grep COGSTD Service "COGSTD" has 1 instance(s).   Instance "COGSTD1", status READY, has 1 handler(s) for this service... Similar output on node2 and node 3 with the local instnance name 2. The service is not registered with scan_listener: -bash-4.1>srvctl status scan_listener ...