I'm wondering if anyone has got any idea on this switch problem I have at work.... We have two HP 1910-48G swtiches which we use for a Hyper-V cluster with an iSCSI SAN. The switches are linked together by 6 ports in a link aggregation group, and port 48 on each switch is used as an uplink to our existing Cisco LAN infrastructure. The Hyper-V cluster servers use network teaming and iSCSI MPIO so that everything is split evenly across both switches for fault tolerance. The problem we have is that the uplink in port 48 in the first switch seems to go down if you so much as blow on it, which takes down our entire cluster! Its like the brief downtime in the uplink is taking the switches down, even though they carry on running fine. The only event in the syslogs just shows the uplink port going down and up. I thought that the uplink in the second switch would take over instead through STP, or the Hyper-V hosts should still at least see each other and the SAN even with no uplink working to the rest of the LAN. This does appear to be an issue with the switches themselves. This was all configured before I started working here, but I've been through the configuration and haven't noticed any issues. Could this be an STP problem? Or a bug maybe? I'm waiting for a quiet Sunday to take this all down and do some proper testing and maybe a firmware update, but it's a strange problem that I've not come across before. I've been managing Hyper-V clusters for years and never had this issue in the past.
do you have both HP's connected to the core cisco? And then 6 links in a link aggregation group between the two HP's? If thats the case, it may be that STP / root bridge is causing the connection to go HP1 -> Cisco -> HP2 (i.e, the link aggregation group is made down to remove the loop). So removing the uplink port on one of them causes everything to go down until STP re-converges, which depending on the config could take 30 - 60 seconds. I would look at port-fast settings, change to rapid spanning tree, or look at the design to remove the loop, so this isnt an issue in the first place. I would assume that the cisco can also do link aggregation, so perhaps look at changing to a setup with cisco -> (multi port link aggregation) -> HP1 -> (multi port link aggregation) HP2. Mike