New ask Hacker News story: BuyVM: Example of refreshingly honest and transparent provider

BuyVM: Example of refreshingly honest and transparent provider
3 by indigodaddy | 0 comments on Hacker News.
I’m not a shill of BuyVM, just have always been impressed with their transparency and honesty within their operations. ===== Hello REDACTED, To start, for the sake of the support staff, if your VPS is now humming along, we ask that you close any open tickets you have about this issue. Short version: Networks' been crapping itself for the past day. After 2 - 3 hard failures today we did an emergency JunOS downgrade and are cautiously optimistic that it's resolved. Longer/technical version: For the past month or so we've been suffering a weird issue where random IP addresses would stop responding. The issue is that our router would lose/expire out the ARP entry for these IP addresses and wouldn't attempt to learn them again unless forced to. No amount of traffic getting sent to the IP externally (pings, random connection attempts, etc) would make the router issue an ARP "who-has" message. The only fix was to either ping the IP on the LAN (through our speedtest page for instance), forcing the router to see the ARP message on the network, or to login to the router and ping from there. No amount of limits, tweaks, process restarts, etc, has helped with this. The only way to date has been the above methods. This morning we had an issue where all users on REDACTED.x.x dropped offline. These subnets had been moved down to our new Cisco core switch to help resolve the issue. Our Juniper router dropped the ARP entry for the Cisco switch, killing BGP. After some time it re-learned then entry reestablished connectivity, bringing those subnets back online. This happened three times in the past 24 hours, each outage lasting anywhere from 5 minutes to 30 minutes. Las Vegas is the only location suffering this issue. The key difference between all locations being that Vegas was running JunOS 20.4, where as everywhere else is 18.2~4. In JunOS 20.x they made changes to ARP to allow policing. While we did change this policy, it didn't help us any. As of an hour ago we performed a downgrade to JunOS 18.4. It obviously went terribly. After some research we found the configuration entry causing issues (our 100G ports kept channelizing) and fixed it, bringing things back online. As of now, ARP's are resolving near instantaneously, even for new IP addresses. We took this as a good time to change the rest of our ports to 100G, giving us a 400Gbit interlink between our router and core switch. We're cautiously optimistic that this issue is now resolved. We thank you for your patience. BuyVM Team

Comments