Dear Valued Fpweb.net Customer,
I am reaching out to you in regards to the hosting service outage you and your customers experienced this past Thursday, June 23rd. First and foremost, I want to apologize personally and on behalf of the entire Fpweb.net team for the disruption to your business.
It is critical for Fpweb.net to rebuild your trust in us as a partner by transparently communicating the outage source and taking comprehensive measures to prevent a recurrence. Quality, dependable service is – and always will be – our #1 priority. Those of you who are long time Fpweb.net customers hopefully know that we are dedicated to delivering on that promise.
On June 23rd at 10:53am CDT, our customers lost connectivity to their hosting services.
What caused it?
A routing error in our redundant management firewall at our data center in St. Louis, Missouri was calling the same MAC address through multiple ports. Our management firewall handles the communication between our backup systems, networking switches, and our hosted SharePoint sites. The MAC addresses are used in the network as a unique identifier so that the data is transmitted to the correct place. Since the firewall was calling an identical MAC address through different ports, a never-ending “loop” was created – causing the device processing power to reach 100% utilization – rendering it unable to process traffic.
By 11:23am CDT, our networking team was able to resolve the loop. At that time, dedicated server customers were able to RDP into their server(s) – but still could not access their website(s) via http/https. The team determined that this was due to a lack of site communication to customer SQL servers across the management network – resulting in no data being delivered to the user requesting it via the SharePoint site. Throughout the day, our senior engineers worked diligently to bring all customers back online. By 6:00pm CDT, 99% of our customer’s services were fully operational.
Our engineering team takes preventative measures and architects hardware redundancy in all systems to minimize the risk of outages like this one. While no hosting company can prevent 100% of outages, we continually add the latest state-of-the-art fail-safes and redundancy into our infrastructure to make it stronger and better for the future.
Addressing the root cause of this specific incident:
- Immediately: Our team has pinpointed our VLAN architecture as an area that we can address immediately to ensure that this specific problem does not occur again. The VLAN architecture will also be thoroughly re-engineered based on what we learned while working with them to restore services to our customers last Thursday.
- Moving Forward: Over the next few months, we are committed to making some major infrastructure investments at our primary data center to further safeguard against network-related service drops. Our senior engineers will also carry out a thorough audit and evaluation of our network architecture to ensure that the investments are specific and will produce measurable progress as we continue to engineer our SharePoint hosting network to be the best in the world.
We want to hear from you
You entrust us with your reputation and your customers’ critical business communications. We take this responsibility seriously. We will continue to work tirelessly to improve our network and serve you better.
As always, your questions and suggestions regarding this outage are encouraged.
Please send your feedback to firstname.lastname@example.org. This distribution list is monitored by the entire Fpweb.net senior leadership team. Other issues related to support and billing should continue to be communicated via your Fpweb.net Account Portal.
Rob LaMear IV
Founder & CEO