On July 8, 2022, a botched maintenance update on the Rogers ISP network in Canada crashed internet access across the country for at least 12 hours, with some customers experiencing problems for days afterward.
The impact was profound. The nationwide outage affected phone and internet service for about 12.2 million customers – about 25% of Canada’s internet capacity – halting point-of-sale debit payments on the Interac network, preventing Rogers mobile phone users from accessing 9-1-1 services, disrupting transit services dependent on online payment, and even wreaking havoc on traffic signals in Toronto dependent on cellular GSM for timing changes.
Adding insult to injury, the outage even forced Canadian musician The Weeknd to postpone the first stop on his world tour at Toronto’s Rogers Centre.
The cause? As was subsequently revealed in Rogers’ submission to regulator Canadian Radio-television and Telecommunications Commission, the update “deleted a routing filter and allowed for all possible routes to the Internet to pass through the routers. … Certain network routing equipment became flooded, exceeded their capacity levels, and was then unable to route traffic, causing the common core network to stop processing traffic.”
Although Rogers – one of Canada’s major internet, broadcasting, and mobile wireless companies – restored service to most customers within a day, the catastrophic loss of service startled Canadian businesses. Some, like the approximately 100 outlets operated by farm and agriculture supply retailer Peavey Mart, had redundant access to other internet providers already in place.
As a result, “only two stores were directly impacted where they had no internet connectivity,” says Shaun Guthrie, the company’s Senior VP of Information Technology and VP of the CIO Association of Canada.
“However, we rely on Interac services for our customers to transact, which relies solely on Rogers, so we lost the ability to do debit card payments.”
Not just a domestic issue
“Some of the non-profits that I serve lost the ability to record meeting the needs of vulnerable people for a day or two,” says Helen Knight, Virtual CIO and Strategic Technology Consultant for Canadian non-profits. “Personally, my children and I had no way to communicate. My 13-year-old daughter was out until 10 p.m. and I was worried she had no way to get home.”
Others were not so fortunate. “As a global company producing waterslides and water park attractions, the Rogers network outage did affect us more than we originally thought,” says Chris Palsenbarg, Manager of IT Operations and Help Desk Support with WhiteWater West Industries. “Staff travelling overseas couldn’t even use their phones.”
Sapper Labs Group is a Canadian cybersecurity/cyberintelligence firm. “Although our company was not affected by the Rogers outage, many of our partners, clients, and competitors were,” says Dave McMahon, Sapper Labs’ Chief Intelligence Officer. “Some organizations have yet to fully recover. This has had a ripple effect through the market.”
In the wake of the Rogers outage, Canadian CIOs and IT executives and experts are reviewing their readiness to cope with such failures in the future. Their conclusions are worth noting by CIOs everywhere, all of whom are at risk of encountering similar service outages in their own countries, whether from system issues, intrusions, or power failure due to environmental or other causes.
The Rogers outage underlined the value of having redundant ISP access, even though doing costs more than relying on just one. Although some corporations balk at this extra expense, Peavey Mart accepts the value of paying for redundant internet access wherever possible. The company was rewarded for its farsightedness on July 8, 2022.
The failure of the Rogers ISP network didn’t blindside the company either, because “we proactively monitor the state of our data communications,” Guthrie says. “As a result, once the stores were impacted by the outage, they automatically failed over to their secondary ISPs through our SD-WAN enabled infrastructure.”
Non-profit organizations such as Canada’s Salvation Army can’t afford the kind of infrastructure used by Peavey Mart. But their CIOs are determined experts accustomed to “accomplishing amazing feats using free software and donated hardware,” says Knight. “They are accustomed to their aged IT infrastructure failing, so they usually have a manual process to fall back on,” she says.
As a result, Canadian non-profit CIOs can cope with ISP failures, at least at the time they actually occur. “The lost data from the outage will impact them later, when they don’t have correct records showing how many people they served to show their donors, potentially impacting future grants,” Knight says.
This being the case, Knight believes the Rogers outage could change non-profit attitudes to redundant ISP access for the better. “After all, it has been common practice for years to have a redundant connection for all critical business components, so the silver lining is that now non-profits understand a new risk area they may not have considered,” she says.
“So if this is the incident that allows non-profits to recognize the need to have a senior technology leader at the decision-making table, aligning their strategic plans to their technical roadmap, then this might well be the cheapest and easiest way to learn that lesson. It is much better than facing a cyber breach!”
Check your suppliers’ backup plans
For Sapper Labs, “the Rogers outage reinforced our confidence in our own architecture and mode of operation,” McMahon says. But this sense of confidence reinforced the point that a company’s IT infrastructure doesn’t exist in isolation. Instead, it is one link in a chain of ISPs, cloud platforms and others who connect to the enterprise via the internet.
Thus, “the takeaway from the Rogers outage is to ensure that one’s supply chain, partners and clients are equally prepared and that there are contingencies in place to assist them in maintaining business operations,” he says. “What was enlightening was that the outage immediately revealed who was a Rogers customer, whether they have alternate means of communications, their level of cybersecurity maturity, and critical interdependencies across the ecosystem.”
Peavey Mart is equally diligent about checking for vulnerabilities in its data supply chain. “We ask all our cloud providers; do they have redundancy?” says Guthrie. “Do their systems have failovers to backup systems built in, and do they have things like business continuity plans in place so that when a failure occurs, their people know what to do? And we ask those questions up front.”
Unfortunately, retailers like Peavey Mart don’t have the clout to demand such answers from Canadian interbank megacorps like Interac. “As a result, we have no choice but to assume that Interac has such backup measures in place, which they clearly did not,” he says.
Expect more ISP failures
The resolution of the Rogers outage in Canada was followed by government investigations, negative media reports, and lots of predictable public outrage. But none of these reactions will be able to change a very simple fact: ISP networks are complex and vast systems made of many parts whose response to maintenance upgrades cannot be completely modeled in simulations.
As a result, even after all the improvements Rogers has promised to make and that other Canadian ISPs might copy out of a sense of prudence, “I have no doubt that we’ll probably see additional failures,” says Guthrie. “I don’t know who it will be, but I think we will likely see an additional failure within a year.”
This being the case, CIOs whose companies rely on ISP access need to take steps now to protect their enterprises against such outages. According to Dave McMahon, the path forward is clear: “Dual providers and redundant independent systems are best practices in industry,” he says.
“It is the very definition of a high-availability system. This is why all Sapper Labs employees already have multiple means of secure communications and abilities to collaborate online. We are currently assessing how best we can extend similar secure high-assurance solutions to our clients and partners.”
At the same time, CIOs need to remain humble and not overestimate their ability to plan for such events beforehand.
“Technology is so ubiquitous and so complex, with every person and every organization experiencing new and complex technical challenges over the last couple years, that although it is possible to protect companies against Rogers-style outages it isn’t possible or cost-effective to protect against all risk,” says Knight. “Instead, it is a matter of quantifying the impact and urgency of each risk and prioritizing organizational continuity plans for the most critical operational areas.”
The bottom line: A Rogers-style ISP outage is a crisis that can and likely will confront CIOs in companies around the world in the years to come. This is why boosting redundant systems and preparing contingency plans now is a must, to minimize and mitigate the inevitable impact of these communication failures on the enterprise.