Skip to main content

pathping to detect network timeout

Recently our BPC nightly job failed intermittently, a few database mirror servers report timeout as well, I need to prove to network team that it's not an application issue, it's a network issue, so I implement the a windows scheduled task to run pathping among those servers every 1 minute. it did prove that network timeout. here is the script and sample of logs.

pingnetwork.bat :
echo %date% %time%
pathping server1
echo %date% %time%
pathping server2
echo %date% %time%
pathping server3
echo %date% %time%
pathping server4

pingnetwork.bat >> e:\dropit\d.txt

the log d.txt looks like this

"
Thu 05/19/2011 3:20:15.19
E:\dba>pathping agenwi034
Tracing route to agenwi034.corp.ctv.ca [10.1.38.134]
over a maximum of 30 hops:
0 agends030.corp.ctv.ca [10.1.38.130]
1 * agenwi034.corp.ctv.ca [10.1.38.134]
Computing statistics for 25 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
Trace complete.
"

ftp the log file to unix and format it in a readable way.
grep -E "\/ 100" d.txt > d1.txt

the new log d1.txt looks like this:
"
E:\dba>echo Thu 05/19/2011 3:20:15.19
Thu 05/19/2011 3:20:15.19
0/ 100 = 0% |
1 0ms 0/ 100 = 0% 0/ 100 = 0% agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
65/ 100 = 65% |
1 1ms 65/ 100 = 65% 0/ 100 = 0% agenwi031.corp.ctv.ca [10.1.38.131]
60/ 100 = 60% |
1 1ms 60/ 100 = 60% 0/ 100 = 0% agenwi032.corp.ctv.ca [10.1.38.132]
39/ 100 = 39% |
1 0ms 39/ 100 = 39% 0/ 100 = 0% agenwi033.corp.ctv.ca [10.1.38.133]
"

now network is convinced of the network issue and is working on it.

Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory ...

oracle dba_hist_sysmetric_summary

found this blog is helpful to get CPU and IO statistics on oracle database. http://shob-dbadmin.blogspot.ca/2012/12/how-to-find-total-io-of-database.html courtesy to  Shomil Bansal , below are hist writing, not mine. How to find total IO of the database instance Total IO of database instance is sum of the physical reads, physical writes and redo writes. There are several views to find these values. v$sysmetric  - Reports metric values for only the most current time sample 60 secs. v$sysmetric_summary  - Reports metric values for time sample of 1 hour. v$sysmetric_history  - Reports metric values every 60 sec from the time instance is up. Better way to analyse IO using this view to take deltas between two time periods. dba_hist_sysmetric_history  - All the above views are refreshed when the instance is restarted. This view, part of AWR, stores the historical stats. I have used this view for my report. Query: ====== set lines 350...

ORA_RMAN_SGA_TARGET

assume that we lost all the files of oracle database but we do have rman backup, when trying to bring up a dummy database before restore start, I get this error. RMAN> startup nomount force; WARNING: cannot translate ORA_RMAN_SGA_TARGET value startup failed: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/PROD/spfilePROD.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/PROD/spfilePROD.ora ORA-15056: additional error message ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +DATA/prod/spfileprod.ora ORA-17503: ksfdopn:2 Failed to open file +DATA/prod/spfileprod.ora ORA-15173: entry 'spfileprod.ora' does not exist in directory 'prod' ORA-06512: at line 4 starting Oracle instance without parameter file for retrival of spfile RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =================================...