Networks Windows 7x64 - loss of internet connection every few days ...

Discussion in 'Tech Support' started by DocJonz, 26 Jan 2011.

  1. Chicken76

    Chicken76 Minimodder

    Joined:
    10 Nov 2009
    Posts:
    952
    Likes Received:
    32
    You just posted while I was writing my previous post.
    7000 connections is HUUUUGE! Had it dropped a little later, you might have been able to say:
    My connection count, it's OVER NINE THOUSAND!!!111

    Jokes aside, my Win7 machine has less than a hundred connections right now, while running a torrent client!
     
  2. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    OK. I've run the script in a 'good' state ...... just need to wait 3-4 days now ;-)
    What should I be looking for?
     
  3. harlam357

    harlam357 What's a Dremel?

    Joined:
    4 Dec 2009
    Posts:
    17
    Likes Received:
    0
    Hey Doc... first I've heard of this. Saw the issue you entered on the google code site. I monitor 9 clients on my Win7 x64 machine and have never had any issue approaching this. All machines being monitored are windows though. I have users on my team that monitor 40-50 clients all day every day... some windows, some linux and I've never had anyone report anything like this.

    Are the clients all local? Meaning monitored via local path or UNC path? If so, I really don't "open" a connection per say. I let windows do it for me. It's done with a simple File.Copy() command... source path to target path. I'm not opening a TCP connection manually or anything like that. For http or ftp clients I use the built in .NET WebClient class. Again, it does all the TCP work under the hood and I'm sure the connections there are getting closed.

    I have to believe it's not my code doing this directly... however, all the traffic that HFM generates may be bringing out another issue. Also, I don't recommend such a low threshold between refresh cycles. It shouldn't be an issue... but a refresh every 5 minutes should be plenty.

    Second, I saw the other issue you enter regarding a version column. Hit F11 or see View > Toggle Version Information to get client and core version info in the grid and web output.
     
  4. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    @Harlam357 - thanks for the posts.
    In answer to your question - Yes, all my Clients are local path.
    I've never noted a problem before - it only cropped up in trying to investigate my dropped connection issues ....

    As for the F11 option - this shows the "core version" but not the "client version".
     
    Last edited: 25 Feb 2011
  5. harlam357

    harlam357 What's a Dremel?

    Joined:
    4 Dec 2009
    Posts:
    17
    Likes Received:
    0
    Hey Doc... yes, it does show both the core and client versions. Your client type column just may not be wide enough to see it. Look at your HFM summary page... it's there too. ;)
     
  6. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    It's been 4 days! The telly is boring, just been watching this thread ....:D
     
  7. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    Hee Hee!
    No show yet - which is good news, of course! - but there's still time for them to go down ... !
     
  8. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    I'm biting my nails now ......:lol:
     
  9. Chicken76

    Chicken76 Minimodder

    Joined:
    10 Nov 2009
    Posts:
    952
    Likes Received:
    32
    Changes in your IPv4 routes. See if the default gateway is still the same. Also look at the metrics of all the routes. Any changes there?
     
  10. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    In some ways I feel I should apologise ... because they haven't gone down yet - still both running this morning!!

    The only changes I'd made were; installed Win7 SP1, set IP's back to static, added router to DNS list and made changes to HFM.NET time interval (in fact closing it for most of the day and decreasing the refresh rate by a factor of ten) - that's it.

    In regards to HFM.NET (I still have a suspicion that this might be the culprit ...), I used to have it at a very high refresh interval because if one of my (many!) Folding@Home clients uploaded, it used to hog ALL the bandwidth so that the family couldn't go on-line - so I encouraged them to look at the HFM.NET status before doing any serious surfing to see if one of the clients was going to up-load (its refresh interval was set at 2mins). SMP work units took 20mins to upload, and if another few clients were not far behind, they would queue and knock-out the broadband for about an hour!
    I changed ISP in mid Jan, and this is no longer an issue - the WU's upload faster, and you can surf at the same time, though I'd left HFM.NET refreshing quickly - I've now changed this following the accumulation of open connections found when running netstat.
    If the PC's stay up for a couple of weeks without issue, I'll go back and play with the HFM.NET refresh settings to see if this can bring the machine down on its own.
     
    Last edited: 5 Mar 2011
  11. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    My HFM machine fell over Feb 28th 5:30am and then March 4th at 2am with 2 minute refresh on 12 clients. I have changed this to 10 minutes to see if anything changes. I usually concede defeat with a whole host of error messages, no internet and no access to any other machine and both local clients stopped. We'll see if the ~4 days can be extended!
     
  12. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    Machines have been up for 2 weeks now :D
    (HFM has been mainly off over this time)
    Restarted both today, and set HFM to 2mins on one of them ..... will it/won't it fallover in 3-4 days :worried:
     
  13. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    I vote YES :rock:
     
  14. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    ... and you are right :thumb:

    I will be running HFM.NET at less frequent intervals, and switching it off when not required - at least until there's a fix. I'll inform Harlem of the findings.
     
  15. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    I am still having similar issues :grr: I have 4 Win 7 machines being 'read' by HFM that resides on my main XP machine in my so-called study. It is falling over regularly just under every 4 days with a screen full of error messages and no access to other machines, internet and will not run apps. Only solution is a restart. Hope the source of the problem is found or I'll have to turn off HFM.

    Oddly, all other Win 7 machines just maintain local stats and none have ever had this issue!
     
  16. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    On the XP machine, the Event Viewer for Applications has a plethora of errors all the same! Source: "Windows Search Service", Category "gatherer", Event "3100". These occur EVERY time the machine falls over! The event proerties are "Unable to initialize the filter host process. Terminating. Details: This operation returned because the timeout period expired. (0x800705b4). It is also preceded by a system event error Source "sr" that is the System Restore Filter apparently restoring random files on my HDD named "HarddiskVolume1" by the OS.
     
  17. standinwave

    standinwave Folding in memory of my mum

    Joined:
    7 Jun 2009
    Posts:
    265
    Likes Received:
    0
    Ran full chkdsk on the HDD btw and all was just fine!
     
  18. Chicken76

    Chicken76 Minimodder

    Joined:
    10 Nov 2009
    Posts:
    952
    Likes Received:
    32
    If monitoring local files (using a local path) doesn't do any harm, then it's clear that the problem is network related. It could be:
    1. HFM (although, since it's written in a high level language it's very unlikely it's causing connections to pile up, as it doesn't have direct access to network packets)
    2. .Net
    3. Windows' TCP/IP stack.

    Ouch!

    The fact that it ends up triggering system restore to do it's job on some files, is troubling. I don't know about you, but I for one don't want any application to convince my operating system that some of it's key files are damaged or corrupted and that they need to be overwritten with backed-up copies. (which may not be up-to-date or for all I know may have been replaced with trojans or malware) Heck, why replace Windows' files when an application crashes?! Isn't it bad enough that a simple application that reads a few files over the network can cause the whole OS to crash? /ANGRY

    [SUPPOSITION]It may be more that just the TCP/IP stack running out of free ports in the dynamic range, it may also be that the OS itself is running out of file handlers, and when it's trying to access some of it's files/libraries, it can't, thus triggering System Restore on the assumption they have been deleted or corrupted. One would have thought they'd have handled running out of file handlers better, and with more explicit error messages, though.[/SUPPOSITION]

    Standinwave, could you post more info on the error messages you get on both XP and 7? Also DocJonz could you do the same with your errors in the Event Log from your Win7 machines? If it turns out this problem is not isolated to one person or computer, I would advise people to stop using HFM, or at least stop using it 24/7 until the problem is fixed. I mean no disrespect to Harlam357 or his work and contribution to Folding, but any way I look at it, I see it as a security issue.
     
  19. harlam357

    harlam357 What's a Dremel?

    Joined:
    4 Dec 2009
    Posts:
    17
    Likes Received:
    0
    Hi DocJonz, standinwave, and Chicken76,

    I'm still at a loss here. I run HFM 24x7 for weeks at a time without shutting down and have had absolutely no issue with losing network connectivity; and until I can make a verifiable diagnosis then I'm not going to take any action towards resolving an issue I cannot replicate. It's strange, because there are several of you here on this same forum who seem to be having the same issue, yet I've had no reports of such problems from anywhere else. Now, granted, if someone were having this issue as well, and HFM IS indeed causing the problem, they may not have gone to the lengths you all have to try and troubleshoot the issue. In other words, I'm not discounting what you guys are saying and take the report seriously; otherwise I wouldn't be here.

    I ran a netstat -a on my machine where I run HFM 24x7 and have been for weeks and I have to guesstimate that it showed ~150 open connections... nowhere even approaching 1000, let alone 5, 7, 9 thousand.

    HFM simply makes a File.Copy(pathGivenInConfigDialog, toLocalPath). I open no TCP/IP socket directly, this is all handled by .NET and Windows. Since you all say true local monitoring is ok, then in this case it's a File.Copy() from a UNC path to a Local Path. It just really can't get any more simple than that.

    I am doing nothing with the Windows Search Service. I am doing nothing with System Restore. My application DOES NOT manipulate files outside of its user data folders. The only exception to that is the Auto-Run registry entry that can be set by the application. Let me just make that clear.

    Based on the initial issue report, are you sure it's only connections to Linux machines?

    The only other difference that just really jumps out at me is the use of frequent retrieval cycles (i.e. 1-2 minutes). I'm setting my machine up on a 2 minute cycle and will let you all know if it "falls over" or I end up with thousands of network connections. I am going to ask my beta testers to do the same so we can get some better coverage on the suspected problem. If Linux is the problem then that will preclude me having any issues since I have no Linux clients being monitored.

    Please do share any further diagnostic information in the issue found here: http://code.google.com/p/hfm-net/issues/detail?id=256
     
  20. DocJonz

    DocJonz Another CPC refugee .....

    Joined:
    24 Apr 2009
    Posts:
    1,202
    Likes Received:
    97
    @harlam357 - Thanks for the feedback.

    I had not had a problem till the last machine builds at the end of last year - my first Win7 machines - and hence I thought Win7 was the issue, and then ended up looking at pretty much everything to try to find the cause - its taken nearly six months to find it, as, like you, I wasn't expecting any issues from HFM.

    Just checked netstat and it is CLOSE_WAITS on my three Linux machines that are filling up the log.

    In my case, the 1-2 minute refresh just brings the machine down quicker, i.e. 3-4 days. If the machine is rebooted, the queue clears and starts again.

    Not sure what other info I can provide that will help you, but if you think of something, let me know.
    Jon
     

Share This Page