January 22nd, 2013, 10:15 AM
Mind if I SCREAM first
Six very very similar servers running 2003.
Five do nothing but host SQL databases.
These five - run an app my developers made that:
1) Backup the db's
2) Zip them up
3) FTP the zipped files to server #6
The three week old (newest of the five) servers is having issues. FTP from a CMD prompt worked for two weeks and then quite for five days. This morning it worked.
I spent all day yesterday chatting with the data center techs. They were leaning towards blaming server #6 to be the culprit. (permissions etc) Hopefully today I will be successful when they attempt to answer the questions-> Why do the other 4 servers Not have the same problem.
Related to the issue: the problem server takes 30 minutes to transfer a file that the other server can do in 7 minutes.
Yesterday, they had me download an app called WinMRT which looks like a fancy tracert type app. They thought they detected an issue with a switch where server #6 is located...
Anyone have any thoughts on this? Thanks in advance.
January 22nd, 2013, 12:21 PM
That sounds like a network issue of some sort to me also. Intermittent connection issues should not be a permissions problem if the permissions are not changing, and the time delay has lost packets written all over it. A bad port on a switch could be the cause, or maybe a flaky network cable. I recently had a switch that was giving me similar issues. Sometimes it would work fine, then it would quit, then it was fine again. I replaced it and all the mysterious issues were gone.
January 22nd, 2013, 01:15 PM
Thanks as always JDC...
Just got this back from the techs....
Can't say that I understand of of it... But apparently they do
I would like to change the speed and duplex settings at the switch.
Currently I see the switch is set to 100/Full duplex, and the server is set to Auto negotiate.
Currently, I see the FCS-Err, and the Rcv-Err increasing at the switch, which very well may be disrupting traffic.
Why I believe this may be related to Speed and Duplex: because the errors are aligning, matching 1 for 1, such as:
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize
Gi0/36 0 211739 0 211739 0
about 20 minutes later:
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize
Gi0/36 0 219181 0 219181 0
So as you can see the traffic is being affected, specifically the traffic outbound from the server to the switch, which is why you are having the issues with the FTP sessions.
Please advise a time when you wish to proceed. If you wish to proceed "Just after 20:00 CST on 01/22/2013", that is fine, and this will be scheduled for that time.
January 22nd, 2013, 01:41 PM
It would help if we knew what make and model of switch, and what make and model of NIC are being used. I have seen cases where one brand of switch did not get along well with a different brand if NIC. Matching the NIC and switch settings might help to resolve this. If "Auto-negotiate" does not fix the issue, you may need to change the switch and server to some fixed setting, at least for the specific port on the switch.
January 22nd, 2013, 03:39 PM
Sorry I wasn't more clear. These servers are located at a huge Data Center.
I was wanting to tap the knowledge here in case they drug their feet. I'll know in the morning if the work done tonight helps or not.
January 22nd, 2013, 03:50 PM
The data center gurus should know the makes and models of their equipment. If they are smart, they should be making sure they use compatible gear. However, if they decided to go for the cheapest gear instead, that is when problems can crop up.
January 22nd, 2013, 03:56 PM
The NIC is an Intel 82574L Gigabyte.
I'm sure they could tell me which one of the hundreds of switches they have what the make is, but I'd rather not go there...
January 22nd, 2013, 04:52 PM
That Intel NIC should work with just about anything. With luck, their adjustment should eliminate your issues.
January 23rd, 2013, 04:40 AM
Tech guy from data center wrote:
As far as what I can see both private and public interfaces are running on half duplex[rather than full] which would in effect cause the transmit and received traffic to run into each other.
Looks like it might be fixed
Yesterday I wrote: "However, the file transfer time was approx 30 minutes
for both the automated and manual transfers."
Today - manual FTP took 12 SECONDS
to transfer the same file... WOW
January 23rd, 2013, 10:06 AM
Nice. Nothing like an 80 car pileup on the network channel freeway. A more interesting question would be how the settings got like that in the first place. At least you are back in business for now.
January 23rd, 2013, 10:20 AM
OK - you asked for it:
Week number two - the machine started rebooting itself.
The three week old (newest of the five) servers is having issues.
First night repair- added thermal paste to the cpu
Second night repair - upgrade the bios and tweaks to the temp controls
Third night - take hard drive out and put it in another box. DONE
Boom - pc runs fine - network issue kick in.
In other words - I've had it up to HERE with the machine
But, we've been using the same DC for almost nine years. I can sleep good knowing they'll help out when needed. In part it took a while to fix these things because "I" had them do it after normal business hours.
I only got one call from my clients that access the machine during the whole deal...
January 23rd, 2013, 11:01 AM
Server Ops work is always a boatload of fun. It looks like you had some typical new equipment issues. Out of the box, they either work fine for years or you have issues in the first three months. With luck you should now be ready to go with no further problems for a few years at least.
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)