pathping to detect network timeout

Recently our BPC nightly job failed intermittently, a few database mirror servers report timeout as well, I need to prove to network team that it's not an application issue, it's a network issue, so I implement the a windows scheduled task to run pathping among those servers every 1 minute. it did prove that network timeout. here is the script and sample of logs.

pingnetwork.bat :
echo %date% %time%
pathping server1
echo %date% %time%
pathping server2
echo %date% %time%
pathping server3
echo %date% %time%
pathping server4

pingnetwork.bat >> e:\dropit\d.txt

the log d.txt looks like this

"
Thu 05/19/2011 3:20:15.19
E:\dba>pathping agenwi034
Tracing route to agenwi034.corp.ctv.ca [10.1.38.134]
over a maximum of 30 hops:
0 agends030.corp.ctv.ca [10.1.38.130]
1 * agenwi034.corp.ctv.ca [10.1.38.134]
Computing statistics for 25 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
Trace complete.
"

ftp the log file to unix and format it in a readable way.
grep -E "\/ 100" d.txt > d1.txt

the new log d1.txt looks like this:
"
E:\dba>echo Thu 05/19/2011 3:20:15.19
Thu 05/19/2011 3:20:15.19
0/ 100 = 0% |
1 0ms 0/ 100 = 0% 0/ 100 = 0% agends030.corp.ctv.ca [10.1.38.130]
68/ 100 = 68% |
1 1ms 68/ 100 = 68% 0/ 100 = 0% agenwi034.corp.ctv.ca [10.1.38.134]
65/ 100 = 65% |
1 1ms 65/ 100 = 65% 0/ 100 = 0% agenwi031.corp.ctv.ca [10.1.38.131]
60/ 100 = 60% |
1 1ms 60/ 100 = 60% 0/ 100 = 0% agenwi032.corp.ctv.ca [10.1.38.132]
39/ 100 = 39% |
1 0ms 39/ 100 = 39% 0/ 100 = 0% agenwi033.corp.ctv.ca [10.1.38.133]
"

now network is convinced of the network issue and is working on it.

DBA

Search This Blog

pathping to detect network timeout

Comments

Post a Comment

Popular posts from this blog

Opatch apply/lsinventory error: oneoff is corrupted or does not exist

oracle dba_hist_sysmetric_summary

database not registered to scan_listener