Skip to main content

pathping to detect network timeout

Recently our BPC nightly job failed intermittently, a few database mirror servers report timeout as well, I need to prove to network team that it's not an application issue, it's a network issue, so I implement the a windows scheduled task to run pathping among those servers every 1 minute. it did prove that network timeout. here is the script and sample of logs.

pingnetwork.bat :
echo %date% %time%
pathping server1
echo %date% %time%
pathping server2
echo %date% %time%
pathping server3
echo %date% %time%
pathping server4

pingnetwork.bat >> e:\dropit\d.txt

the log d.txt looks like this

"
Thu 05/19/2011 3:20:15.19
E:\dba>pathping agenwi034
Tracing route to agenwi034.corp.ctv.ca [10.1.38.134]
over a maximum of 30 hops:
0 agends030.corp.ctv.ca [10.1.38.130]
1 * agenwi034.corp.ctv.ca [10.1.38.134]
Computing statistics for 25 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
Trace complete.
"

ftp the log file to unix and format it in a readable way.
grep -E "\/ 100" d.txt > d1.txt

the new log d1.txt looks like this:
"
E:\dba>echo Thu 05/19/2011 3:20:15.19
Thu 05/19/2011 3:20:15.19
0/ 100 = 0% |
1 0ms 0/ 100 = 0% 0/ 100 = 0% agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
65/ 100 = 65% |
1 1ms 65/ 100 = 65% 0/ 100 = 0% agenwi031.corp.ctv.ca [10.1.38.131]
60/ 100 = 60% |
1 1ms 60/ 100 = 60% 0/ 100 = 0% agenwi032.corp.ctv.ca [10.1.38.132]
39/ 100 = 39% |
1 0ms 39/ 100 = 39% 0/ 100 = 0% agenwi033.corp.ctv.ca [10.1.38.133]
"

now network is convinced of the network issue and is working on it.

Comments

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

I am applying the quarterly patch for 19c RDBMS, I tried using napply but failed, but somehow it corrupted the inventory though nothing applied. further apply and lsinventory command ran into error like this: $ ./OPatch/opatch lsinventory Oracle Interim Patch Installer version 12.2.0.1.21 Copyright (c) 2020, Oracle Corporation.  All rights reserved. Oracle Home       : /u02/app/oracle/19.0.0 Central Inventory : /u01/app/oraInventory    from           : /u02/app/oracle/19.0.0/oraInst.loc OPatch version    : 12.2.0.1.21 OUI version       : 12.2.0.7.0 Log file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/opatch2020-09-08_13-35-59PM_1.log Lsinventory Output file location : /u02/app/oracle/19.0.0/cfgtoollogs/opatch/lsinv/lsinventory2020-09-08_13-35-59PM.txt -------------------------------------------------------------------------------- Inventory load failed... OPatch cannot load inventory for the given Oracle Home. LsInventorySession failed: Unable to create patchObject Possible ca

non-existent process lock port on windows server

I have a database link created between oracle and sqlserver using oracle tg4odbc, the product is installed on windows server and run as service "OracleOraGtw11g_home1TNSListener", but sometime the service cannot started, the root cause of this problem is that the port number 1521 is used by an non-existent process. The first step is to use netstat -bano|find "1521" to get the process id, in my case it's 5844, which shows the connection is from my oracle server 10.8.0.169 H:\>netstat -bano|find "1521"   TCP    0.0.0.0:1521           0.0.0.0:0              LISTENING       5844   TCP    10.14.45.33:1521       10.8.0.169:42987       ESTABLISHED     5844 however the process id does not show in either task manager or process explorer. The next step is to run tcpview, which shows non-existent under process column, there are three rows, two show status as "listening", the other one shows status "established", right click and k

oracle dba_hist_sysmetric_summary

found this blog is helpful to get CPU and IO statistics on oracle database. http://shob-dbadmin.blogspot.ca/2012/12/how-to-find-total-io-of-database.html courtesy to  Shomil Bansal , below are hist writing, not mine. How to find total IO of the database instance Total IO of database instance is sum of the physical reads, physical writes and redo writes. There are several views to find these values. v$sysmetric  - Reports metric values for only the most current time sample 60 secs. v$sysmetric_summary  - Reports metric values for time sample of 1 hour. v$sysmetric_history  - Reports metric values every 60 sec from the time instance is up. Better way to analyse IO using this view to take deltas between two time periods. dba_hist_sysmetric_history  - All the above views are refreshed when the instance is restarted. This view, part of AWR, stores the historical stats. I have used this view for my report. Query: ====== set lines 350 pages 50 feedback off set markup html