TECHNOLOGY - Disaster recovery
Last Review Date: 22/12/21
Boatshed Hosting Overview
The Boatshed system sits on a cluster of cloud servers hosted within RackSpace and is designed in such way to maximise uptime in the event of network or hardware failure.
In addition to the benefits already inherent within the RackSpace services (see next section), to further mitigate cloud host hardware failure impact each server runs on a separate host server.
All web traffic (not just boatshed.com) is delivered via RackSpace Cloud Load Balancer which automatically monitors application servers and removes them from the cluster on failure ensuring visitors receive continuous service. When stability returns, the application servers are automatically re-added to the cluster.
The Load Balancer also protects against denial of service attempts by limiting the maximum user connections and directing traffic to the least busy server. In the rare instance that all application servers be offline then the Load Balancer will gracefully show a maintenance message.
The database is exported nightly, compressed, encrypted using 2048 bit key and stored on Amazon Web Services S3 storage cloud in Ireland. This protects the database in the event that the RackSpace infrastructure in London suffers catastrophic failure or attack.
Boat images are also replicated to Amazon S3 storage cloud and new images uploaded are automatically stored on save. Developers are emailed in the event of failure on save but this does not interfere with the user actions.
Services are monitored using RackSpace Cloud Monitoring which checks for specific server responses relating to Web Server and database functionality. Automated alerts are sent to system administrators and Boatshed HQ on an event.
In the event of total loss of the cloud server and its data we can rebuild the platform using server images (stored separately) and data restored from Amazon Web Services.
RackSpace
RackSpace provide an OpenStack virtual server environment on which we run multiple servers within a major London Data centre with excellent Internet interconnects and high resilience.
Rackspace data centres are accredited to PCI DSS, ISO27001, and ISAE 3402 Type II SOC 1 standards, ensuring our hosting is secured by the best processes and technologies .
Uptime reliability
Reliability is one of the key Rackspace tenets that our customers rave about the most. Intensive Hosting has been designed to meet the needs of companies whose hosted applications require 100% availability. Our 24/7/365 Fanatical Support™ and our 100% network and infrastructure uptime guarantees ensure that our customers mission critical applications are always up and running.
1. Proactive Monitoring and Support – 24/7/365
Rackspace will proactively monitor your sites and applications to detect security threats and address them in the fastest possible time. Rackspace has the largest global team of certified Microsoft and Linux engineers in the world. We can offer the fastest response and resolution to monitoring and customer alerts by level III certified technical support staff.
2. Secure Data Centres staffed around-the-clock
Rackspace has two data centres in the UK (a further 6 in the US and 1 in Hong Kong) which are all engineered with fully redundant connectivity, power and HVAC to avoid any single point of failure. Multi-level security systems ensure that only data centre Operations Engineers are physically allowed near your routers, switches and servers.
3. 100% Cisco Powered Network
Rackspace has the largest multi-homed self-healing network available in any UK data centre. The network also incorporates a patented Intrusion Detection System (IDS) to protect against external threats.
High Availability
High availability is a feature unique to Rackspace Cloud Monitoring. Because we provide Monitoring as a Service hosted in the cloud, we are able to keep that service up and running without any downtime. When implementing improvements to the system, we can take a region down for an upgrade, even lose another datacenter due to a localised disaster or event, and Rackspace Cloud Monitoring will continue to monitor your resources and send you notifications.
Service Level Agreement
Network
We guarantee that our data center network will be available 100% of the time in any given monthly billing period, excluding scheduled maintenance.
Data Center Infrastructure
We guarantee that data center HVAC and power will be functioning 100% of the time in any given monthly billing period, excluding scheduled maintenance. Infrastructure downtime exists when Cloud Servers™ downtime occurs as a result of power or heat problems.
Cloud Server Hosts
We guarantee the functioning of all cloud server hosts including compute, storage, and hypervisor. If a cloud server host fails, we guaranty that restoration or repair will be complete within one hour of problem identification.
Migration
If a cloud server migration is required because of cloud server host degradation, we will notify you at least 24 hours in advance of beginning the migration, unless we determine in our reasonable judgment, that we must begin the migration sooner to protect your cloud server data. Either way, we guaranty that the migration will be complete within three hours of the time that we begin the migration.
RackSpace Maintenance Policy
RackSpace perform maintenance to continue to deliver optimal performance, reliability and security. They strive to perform them without impact to customers; however, the possibility always exists that a maintenance event can have unintended consequences up to and including unplanned or extended service interruption.
If we expect a maintenance event to cause service disruption we will use reasonable efforts to post advance notice of the maintenance. For example, if we have to remove a system element to perform maintenance, we will transfer the system to a redundant element to minimise downtime, but there may be some downtime while the system transfers to the redundant element.
If maintenance is urgently needed, we may not be able to give much, or any, advance notice; for example, we may need to install critical security updates or replace faulty system components.