Data Redundancy and Storage Architecture
Data redundancy is not a feature added to a self-hosted game-server estate after launch. It is a foundational engineering constraint applied before the first server process ever starts. For 57 Studios™, which operates persistent Unturned™ server estates with live player sessions and active commerce on the studio's Tebex storefront, the loss of a storage device without immediate failover translates directly to corrupted save states, dropped sessions, and interrupted transaction records. The documented professional baseline is to provision storage redundancy that prevents any single device failure from reaching the player or the commerce layer.
This article documents the reference storage architecture that 57 Studios deploys on its self-hosted estate. The architecture has four load-bearing components: a RAID 10 primary storage array built from enterprise-grade datacenter SATA SSDs, a synchronous local replica that writes in lockstep with the primary, a synchronous offsite replica located a minimum of 1,500 kilometers from the primary site, and a backup-storage tier built on 45Drives Storinator XL60 hardware for archive and snapshot retention. Together these components deliver a documented Recovery Time Objective (RTO) of 30 seconds or less and a documented Recovery Point Objective (RPO) of 5 seconds or less for any single-device or single-site failure.

Prerequisites
- A self-hosted server estate documented per the Recommended Server Hardware article
- Network connectivity documented per Internet Connectivity Requirements with the documented bandwidth provisioning for synchronous replication
- Power architecture documented per Power and UPS Configuration, including the documented UPS layer that protects write operations during power events
- A documented offsite facility with owned enterprise hardware at the documented minimum geographic separation
- Access to a documented hardware RAID controller or a software RAID stack (mdadm / ZFS) with documented journal placement
What you'll learn
- Why RAID 10 is the documented minimum for production Unturned hosting, and why RAID levels below 10 are unsuitable for this workload
- The reference build storage configuration: four Intel D3-S4510 7.68 TB datacenter SATA SSDs in RAID 10 yielding 15.36 TB usable
- The documented RTO and RPO targets and what they require architecturally
- The difference between synchronous and asynchronous replication, and why asynchronous replication does not meet the documented RPO
- The DRBD vs Ceph vs ZFS send/receive comparison for synchronous replication
- Bandwidth requirements for synchronous offsite replication
- The documented offsite minimum: 1,500 km from the primary site
- The 45Drives Storinator XL60 as the documented backup-storage tier, and the three EPYC-tier configurations
- The documented sequence of a synchronous write as it traverses the full storage stack
- Save-state recovery walkthrough
- Hardware RAID controllers vs software RAID, journal placement, and fsync semantics
Why storage redundancy is the documented professional baseline
Every Unturned game server maintains an on-disk save state that records player inventory, map changes, vehicle positions, and economic data. The game engine flushes this state to disk on a configurable interval. When a storage device fails mid-flush, the save state on that device is corrupted. When the device fails between flushes, the data written since the last flush is unrecoverable. For an active server estate with hundreds of concurrent players, either outcome is a documented operational failure.
The question storage architecture answers is not whether a device will fail but when. Enterprise datacenter SSDs publish an Annualized Failure Rate (AFR) of approximately 0.3 to 0.7 percent per device per year. A four-device array has a compound failure probability approximately four times the single-device rate. Over a three-year service window, a four-device array running at the midpoint AFR has a greater than 8 percent probability of losing at least one device. The documented professional baseline provisions for that loss without any disruption to the player experience.
Best practice
The documented professional baseline is not designed for the common case. It is designed for the failure case. A storage architecture that functions correctly when nothing fails and fails visibly when something breaks is not a documented professional baseline. The documented professional baseline functions correctly through the failure — the player never knows a device was lost.
Pro tip
Track AFR exposure across the entire installed device fleet. When the compound probability of at least one failure in the next 12 months exceeds 10 percent, schedule a documented device-refresh cycle. For the reference four-device array at the midpoint AFR, this threshold falls at approximately 36 months of service.
RAID level selection: the documented minimum is RAID 10
The RAID level is the first architectural decision in any storage design. Multiple RAID levels are in active deployment across the game-server community; not all of them are suitable for the Unturned hosting workload. The table below documents each level against the production hosting criteria.
| RAID level | Minimum devices | Usable fraction | Read speed | Write speed | Fault tolerance | Production suitable |
|---|---|---|---|---|---|---|
| RAID 0 | 2 | 100% | High | High | None (zero fault tolerance) | No |
| RAID 1 | 2 | 50% | High | Write-speed of single device | 1 device loss | No (insufficient usable fraction for multi-server estate) |
| RAID 5 | 3 | (N-1)/N | High | Degraded on rebuild | 1 device loss | No (write hole under power loss; rebuild window leaves estate unprotected) |
| RAID 6 | 4 | (N-2)/N | High | Lower than RAID 5 | 2 device losses | Marginal (write hole risk; rebuild window degrades write performance) |
| RAID 10 | 4 | 50% | High | High | 1 device per mirror pair | Yes (documented minimum) |
| RAID 50 | 6 | (N-2)/N | Very high | High | 1 device per RAID 5 stripe set | Yes (with documented journal protection) |
| RAID 60 | 8 | (N-4)/N | Very high | Moderate | 2 devices per RAID 6 stripe set | Yes (with documented journal protection) |
Why RAID 5 and RAID 6 are not production-suitable for this workload
RAID 5 and RAID 6 both suffer from the documented RAID write hole: a partial-stripe write that is interrupted by a power event can leave the parity stripe inconsistent with the data stripes. When the array comes back online, parity rebuild based on the inconsistent parity stripe can silently propagate corrupt data. A UPS layer (documented in Power and UPS Configuration) reduces the probability of a mid-write power event but does not eliminate it across the full battery-discharge window.
Beyond the write hole, the RAID 5 and RAID 6 rebuild window is a documented operational risk. A single-device rebuild on a 7.68 TB enterprise SSD completes in approximately 4 to 8 hours depending on the rebuild rate and the concurrent I/O load. During that window, the array is in a degraded state. For RAID 5, a second device failure during the rebuild window is unrecoverable. The documented professional baseline does not accept an unrecoverable failure mode during any maintenance or rebuild window.
Common mistake
Provisioning RAID 5 and justifying it with "I have a UPS, so the write hole won't happen." A UPS protects against utility power loss. It does not protect against a kernel crash, a firmware bug that resets the RAID controller mid-write, or a battery-discharge event that outlasts the UPS runtime. The write hole is a documented risk that the UPS layer only partially mitigates, not eliminates.
Why RAID 10 is the documented minimum
RAID 10 is a stripe of mirrors. Each data stripe is simultaneously written to two physical devices in the same mirror pair. A single device failure within a mirror pair leaves the other device in that pair fully intact; the array continues operating in degraded mode with no data loss and no performance degradation on reads (some write performance degradation is expected on the affected mirror pair during rebuild). The rebuild window for a single failed device is much shorter than for parity-based RAID levels because the rebuild source is a single mirror device, not a parity computation across the entire stripe.
For the reference 57 Studios four-device RAID 10 array, a single device failure is recovered by copying the surviving mirror device to the replacement device. At documented enterprise SSD sequential write speeds, a 7.68 TB rebuild completes in approximately 2 to 3 hours. The estate remains online throughout.
RAID 10 has one documented cost: the usable fraction is 50 percent of the raw capacity. Four 7.68 TB devices yield 30.72 TB raw and 15.36 TB usable. The studio accepts this cost as the documented price of the production fault tolerance and write performance required by the Unturned hosting workload.
Reference build storage configuration
The reference 57 Studios primary storage array is provisioned as follows.
| Parameter | Value |
|---|---|
| Device model | Intel D3-S4510 Series 7.68 TB |
| Device class | Datacenter SATA SSD |
| Device count | 4 |
| RAID level | RAID 10 (stripe of two mirrors) |
| Raw capacity | 30.72 TB |
| Usable capacity | 15.36 TB |
| Sequential read (array) | Up to 2,100 MB/s |
| Sequential write (array) | Up to 1,000 MB/s |
| Random 4K write (array) | Up to 200,000 IOPS |
| Endurance per device | 14,000 TBW |
| AFR per device | ≤ 0.35% (documented by Intel) |
| Interface | SATA III (6 Gb/s) |
| Controller | Hardware RAID controller (LSI MegaRAID or equivalent) |
| Write cache | Battery-backed write cache on controller |
| Journal placement | Dedicated journal partition on each device pair |
The Intel D3-S4510 is selected for its documented datacenter write endurance rating (14,000 TBW per device), its documented 24x7 duty cycle rating, and its documented power-loss data protection (capacitor-backed write cache at the device level that flushes in-flight writes on power loss). The combination of device-level power-loss protection and a battery-backed write cache on the hardware RAID controller provides two independent layers of protection against write-hole conditions.
Did you know?
The Intel D3-S4510 series carries a separate AFR specification for mixed-use workloads (read/write mix typical of a game-server estate) vs purely read-heavy workloads. The documented AFR of 0.35% applies to the mixed-use workload profile. Always verify the AFR specification against the workload class, not the headline figure.

RTO and RPO: the documented targets and what they require
Two metrics define the performance of a storage redundancy architecture against the failure scenarios it is designed to survive.
Recovery Time Objective (RTO) is the maximum documented time between a failure event and the return of fully operational service. The reference 57 Studios RTO is 30 seconds or less. This means that from the moment a device fails (or a site goes offline), the player experience must be fully restored within 30 seconds. A 30-second RTO rules out any recovery procedure that requires manual operator intervention before service is restored; the automated failover must complete within the documented window.
Recovery Point Objective (RPO) is the maximum documented data age at the point of recovery. The reference 57 Studios RPO is 5 seconds or less. This means that in the worst case, recovery returns the estate to the state it was in 5 seconds before the failure. For the Unturned save-state workload, 5 seconds of data loss is the documented maximum acceptable loss per the studio's published operational standards.
Best practice
RTO and RPO are not design targets that can be met through backup alone. A backup that runs every 15 minutes documents a 15-minute RPO. Snapshot backups that run every 5 minutes document a 5-minute RPO — which just meets the documented target — but only for clean snapshot boundaries. Writes in flight at the moment the snapshot runs are not captured. Only synchronous replication meets the documented 5-second RPO with confidence across all write patterns.
What a 30-second RTO requires architecturally
A 30-second RTO for a storage failure means the failover target (the synchronous local or offsite replica) must be ready to serve I/O before the local RAID array finishes its degraded-mode detection and remount sequence. This requires:
- The replica is in a documented consistent state at all times (synchronous writes — not periodic snapshots).
- The replica is mounted or mountable without a filesystem check (clean journal state).
- The failover logic is automated (no manual steps in the critical path).
- The network path to the replica is documented and low-latency enough that the automated failover can complete before the 30-second window closes.
For the synchronous local replica, all four requirements are straightforward. For the synchronous offsite replica, requirement 4 imposes a documented network latency constraint: the replication link must complete round-trip acknowledgment within the documented synchronous write latency budget.
What a 5-second RPO requires architecturally
A 5-second RPO means the replica cannot be more than 5 seconds behind the primary at any point. This rules out:
- Periodic snapshot replication (15-minute or even 1-minute intervals exceed the target)
- Asynchronous replication with a documented lag greater than 5 seconds
- Any replication strategy that defers writes to the replica until a buffer is flushed
Only synchronous replication — where the primary write is not acknowledged to the game server until the replica confirms the write — meets the documented 5-second RPO with confidence.
Common mistake
Treating snapshot backups as equivalent to synchronous replication for RPO purposes. Snapshot backups are a documented recovery floor: they define the worst-case data age after a catastrophic failure that destroys both the primary and the replica. Snapshots do not meet the documented RPO for a single-device or single-site failure. See the FAQ entry on snapshot backups for the full documentation.
Synchronous vs asynchronous replication
The distinction between synchronous and asynchronous replication is the most important architectural decision in the storage redundancy design.
In synchronous replication, the primary write operation is not acknowledged to the application (the game server) until the write has been committed to both the primary device and the replica device. The application's write is blocked during the replication round trip. The result is that the replica is always current to the last acknowledged write; at any point in time, the replica contains the exact same committed data as the primary.
In asynchronous replication, the primary write is acknowledged to the application as soon as it is committed to the primary device. The replica write happens in the background, without blocking the application. The replica lags behind the primary by a documented interval — typically between milliseconds and seconds depending on the replication technology and the network conditions. The lag is the documented RPO floor for asynchronous replication.
For the documented 5-second RPO, asynchronous replication with a documented lag consistently below 5 seconds might appear sufficient. The documented professional baseline rejects asynchronous replication for two reasons:
- The lag is not constant. Under high write load or transient network congestion, asynchronous replication lag can spike above the documented threshold without triggering an alarm. The RPO degradation is silent.
- The documented RPO is the maximum acceptable data age, not the average. A replication technology that meets the 5-second average lag under normal conditions does not document the 5-second maximum lag under all conditions.
The documented professional baseline is synchronous replication for both the local replica and the offsite replica. The performance cost of synchronous replication — the write latency added by the replication round trip — is absorbed by the enterprise SSD write performance and the documented network infrastructure.
Replication technology comparison: DRBD vs Ceph vs ZFS send/receive
Three replication technologies are in active use across the self-hosted Unturned hosting community for synchronous block-level or filesystem-level replication.
| Technology | Layer | Synchronous support | Typical use case | Documented pro | Documented con |
|---|---|---|---|---|---|
| DRBD (Distributed Replicated Block Device) | Block | Native (Protocol C) | Two-node primary/secondary HA | Mature, kernel-integrated, low overhead, widely documented | Two-node default; multi-node requires DRBD Proxy or Linstor |
| Ceph (RADOS Block Device) | Object/Block | Native | Multi-node distributed storage | Scales to large node counts; built-in replication factor | Operational complexity; requires ≥3 OSD nodes for documented durability |
| ZFS send/receive | Filesystem | Near-synchronous (snapshot-based) | Dataset-level replication | Integrated with ZFS; efficient incremental sends | Snapshot interval defines RPO floor; not true synchronous at sub-second |
DRBD Protocol C (synchronous)
DRBD Protocol C is the documented synchronous replication mode for DRBD. In Protocol C, the primary write is not acknowledged to the application until the write has been committed to the local disk and the remote disk and acknowledged back to the primary DRBD node. The round-trip acknowledgment is what makes Protocol C synchronous.
The write latency addition from Protocol C replication is dominated by the network round-trip time (RTT) between the primary and the replica. For a local replica on the same LAN, the RTT is typically below 0.5 milliseconds; the latency addition is negligible. For an offsite replica at a 1,500 km geographic separation, the RTT on a dedicated dark-fiber or low-latency leased-line connection is typically 8 to 15 milliseconds; the latency addition is documented but manageable for the Unturned write workload.
DRBD is the reference 57 Studios replication technology for the primary-to-local-replica synchronous pair. The local replica is on a separate physical node in the same rack. Protocol C is documented in the DRBD configuration as protocol C; in the resource definition.
Pro tip
Monitor DRBD replication lag continuously using the drbdsetup status or drbdadm status command output. The out-of-sync byte count is the leading indicator of replication lag before it surfaces as an RPO violation. Add the DRBD status to the documented monitoring stack documented in Cellular Failover for ISP Redundancy.
Ceph RADOS Block Device (RBD)
Ceph provides synchronous replication through its RADOS object layer. When a Ceph pool is configured with a replication factor of 3 (the documented minimum for production durability), every write is committed to three OSD nodes before being acknowledged to the client. The acknowledgment is synchronous: the client write call does not return until all three replicas confirm.
Ceph is operationally more complex than DRBD. It requires a minimum of three OSD nodes for documented fault tolerance, a separate Monitor cluster (minimum three Monitor nodes), and a documented network topology that separates the public client network from the cluster replication network. For a self-hosted estate that is not already running a Ceph cluster, the operational overhead of standing up a Ceph cluster for storage replication alone is significant.
The documented 57 Studios reference is to use Ceph for estates that exceed the two-node DRBD architecture — typically estates with more than four game-server nodes where the per-server local replica becomes unwieldy.
ZFS send/receive
ZFS send/receive replicates ZFS dataset snapshots from a source pool to a destination pool. The replication is incremental: only the blocks that changed since the last snapshot are transmitted. The RPO for ZFS send/receive is the snapshot interval: if snapshots are taken every 5 seconds, the documented RPO floor is 5 seconds.
ZFS send/receive is documented as near-synchronous rather than strictly synchronous because the acknowledgment model is snapshot-based, not write-based. A write committed to the primary ZFS pool between two snapshots is not acknowledged to the replica until the next snapshot send completes. For the documented 5-second RPO, a 5-second snapshot interval meets the target at steady state but does not guarantee sub-5-second RPO under all write patterns.
ZFS send/receive is the documented reference technology for the offsite backup-tier replication (the Storinator XL60 tier) where the RPO target is relaxed from 5 seconds (the synchronous replication RPO) to the snapshot interval (documented at 15 minutes for the backup tier, which serves as the recovery floor, not the production RPO).
Bandwidth requirements for synchronous offsite replication
Synchronous replication over the WAN link between the primary site and the offsite replica consumes documented bandwidth proportional to the write throughput of the primary storage array. Sizing the bandwidth correctly is a documented prerequisite for meeting the synchronous RPO across the full range of write loads.
The documented bandwidth requirement is calculated from the reference write workload profile. The Unturned game server estate generates write traffic in three categories:
- Save-state flushes: periodic full save-state writes. At the documented flush interval, each active server node writes between 50 MB and 500 MB depending on the server population and the map complexity.
- Transaction log writes: continuous low-volume writes from the commerce layer and the audit log. Typically 5 to 20 MB/s sustained.
- Burst write events: player-triggered events (large inventory changes, vehicle spawns, base construction) that generate above-average write traffic. Documented peak burst: 200 MB/s for up to 30 seconds.
| Write category | Sustained MB/s | Peak MB/s | Documented share of write bandwidth |
|---|---|---|---|
| Save-state flushes | 20–80 | 500 | 45% |
| Transaction log writes | 5–20 | 30 | 15% |
| Burst write events | 10–50 | 200 | 40% |
| Aggregate (reference estate) | 35–150 | 500 | — |
For synchronous replication, the offsite WAN link must sustain the documented aggregate write throughput with headroom for the replication protocol overhead (approximately 10 to 15 percent for DRBD Protocol C). The documented reference WAN provisioning for the offsite synchronous replica is a dedicated 2 Gbps symmetric link, which provides documented headroom over the peak aggregate write throughput plus protocol overhead.
Did you know?
Synchronous replication does not transmit the full data volume to the offsite replica. DRBD Protocol C transmits only the written blocks, not the entire device image. The bandwidth requirement is proportional to the write rate, not the total stored volume. A 15.36 TB primary array that writes 150 MB/s sustained requires approximately 170 MB/s of replication bandwidth (with overhead), not 15.36 TB of bandwidth.
Offsite geographic separation: the 1,500 km minimum
The offsite replica is the documented protection against site-level failures: the primary site's building is damaged, the primary site's power is out beyond the generator's documented autonomous runtime (documented in Backup Generator Configuration), or the primary site's network connectivity is severed beyond the documented cellular failover coverage (documented in Cellular Failover for ISP Redundancy).
The documented minimum geographic separation between the primary site and the offsite replica is 1,500 km. The figure is selected against the documented geographic footprint of regional failure scenarios: major weather events (hurricanes, ice storms, tornado outbreaks), major seismic events, and regional utility disruptions. The documented historical record for the reference 57 Studios primary geography (Austin, Texas) shows that regional weather events have documented geographic footprints of up to 700 km. The 1,500 km minimum provides documented clearance above the historical regional footprint.
The offsite replica must be on owned enterprise hardware. The documented professional baseline does not provision the offsite replica on rented cloud storage because the documented RPO requires synchronous replication, and synchronous replication to a cloud storage target introduces documented variable-latency overhead that is inconsistent with the synchronous write acknowledgment model.
Common mistake
Provisioning the offsite replica at a data center in the same metropolitan area as the primary site. A co-location facility 20 km from the primary site does not meet the documented 1,500 km minimum. A regional event that affects the primary site — a major weather event, a regional power grid disruption — is likely to affect a co-location facility in the same metropolitan area.
45Drives Storinator XL60: the documented backup-storage tier
The backup-storage tier is the third redundancy layer in the documented architecture. The backup tier holds snapshot archives — point-in-time copies of the primary array at documented intervals — and serves as the recovery floor for catastrophic failures that destroy both the primary and the offsite synchronous replica. The backup tier is not the production RPO target; it is the worst-case recovery floor.
The documented reference backup-storage hardware is the 45Drives Storinator XL60. The Storinator XL60 is a 4U, 60-bay large-format-drive chassis designed for high-density storage. It is available in three documented configurations that differ in the EPYC processor tier.
Full configuration details and ordering information are available at: https://www.45drives.com/products/storinator-xl60-configurations.php
Base configuration (EPYC 8124P)
The Base configuration uses the AMD EPYC 8124P processor (16 cores, 32 threads, 3.0 GHz base / 3.7 GHz boost, 64 MB L3 cache). The EPYC 8124P is well-matched to the backup-tier workload: high sequential read and write throughput for snapshot ingestion, sufficient core count for parallel snapshot processing, and documented single-thread performance adequate for the ZFS send/receive pipeline.
The Base configuration is the documented reference for self-hosted estates where the backup-tier workload is dominated by sequential snapshot writes and does not require the higher-tier parallel processing capability of the Enhanced or Turbo configurations.
Enhanced configuration (EPYC 8224P)
The Enhanced configuration uses the AMD EPYC 8224P processor (24 cores, 48 threads, 2.55 GHz base / 3.0 GHz boost, 64 MB L3 cache). The increased core count provides documented additional headroom for parallel snapshot ingestion from multiple replication sources simultaneously — appropriate for estates with more than four replication sources writing to the backup tier concurrently.
Turbo configuration (EPYC 8534P)
The Turbo configuration uses the AMD EPYC 8534P processor (64 cores, 128 threads, 2.3 GHz base / 3.1 GHz boost, 256 MB L3 cache). The Turbo configuration is documented for large-scale backup-tier deployments where the snapshot ingestion workload is high enough to saturate the Base and Enhanced configurations, or where the backup-tier node also serves read workloads (snapshot verification, restore testing) concurrently with snapshot ingestion.
| Parameter | Base | Enhanced | Turbo |
|---|---|---|---|
| Processor | AMD EPYC 8124P | AMD EPYC 8224P | AMD EPYC 8534P |
| Cores / threads | 16 / 32 | 24 / 48 | 64 / 128 |
| Base clock | 3.0 GHz | 2.55 GHz | 2.3 GHz |
| Boost clock | 3.7 GHz | 3.0 GHz | 3.1 GHz |
| L3 cache | 64 MB | 64 MB | 256 MB |
| Chassis | 4U, 60-bay | 4U, 60-bay | 4U, 60-bay |
| Drive bays | 60 large-format | 60 large-format | 60 large-format |
| Reference use | Single-source backup tier | Multi-source backup tier (≤8 sources) | Large-scale multi-source backup tier |
| Documented configuration link | 45drives.com | 45drives.com | 45drives.com |
The Storinator XL60's 60-bay capacity is populated with high-density enterprise HDDs for the backup tier. The documented reference population for the backup tier is 20 to 40 drives depending on the retention window required, with drives installed in a RAID 6 or RAID 60 configuration (acceptable for the backup tier, where write performance is less critical than density and dual-device fault tolerance).
Did you know?
The backup tier uses RAID 6 or RAID 60 rather than RAID 10 because the write performance requirement is different. The backup tier writes snapshot data in large sequential chunks during defined snapshot windows — the workload does not have the random write pressure of the production RAID 10 array. RAID 6 and RAID 60 are documented as suitable for this sequential-dominant, density-prioritized workload profile.
Synchronous write path: sequence diagram
The following sequence diagram documents the path of a single write from the game server application through the full storage stack for a successful synchronous write.
The documented synchronous write path means the game server process does not receive a successful write acknowledgment until the write is committed to all three locations: the primary RAID 10 array, the local DRBD secondary, and the offsite DRBD secondary. The write latency is the maximum of the three write paths plus the DRBD Protocol C round-trip overhead.
Storage redundancy strategy distribution among professional operators
The following chart documents the distribution of primary storage redundancy strategies observed among professional self-hosted Unturned server operators in the 57 Studios community survey.
The 8 percent of operators using RAID 5 configurations or no RAID are documented as not meeting the 57 Studios documented professional baseline. The 14 percent using RAID 10 with local backup only are documented as meeting the primary array redundancy baseline but not the offsite replication baseline. The 22 percent using RAID 10 with asynchronous offsite replication are documented as approaching the baseline with the noted RPO caveat on asynchronous lag consistency.
Hardware RAID controllers vs software RAID
The choice between a hardware RAID controller and a software RAID stack (mdadm or ZFS) is a documented architectural decision with meaningful implications for the write-hole protection model, the battery-backed write cache, and the management interface.
Hardware RAID controller
A hardware RAID controller is a dedicated PCIe card that manages the RAID logic independently of the host CPU. The key documented advantages for production hosting are:
Battery-backed write cache: the hardware controller maintains an on-board write cache backed by a battery (or supercapacitor). Writes are committed to the cache and acknowledged to the host; the cache is flushed to the physical devices on the next flush cycle. In the event of a power loss during a flush, the battery holds the cache contents for a documented period (typically 72 hours) until power is restored. The cache is then flushed to the devices cleanly, preventing a write-hole condition.
Dedicated processing: RAID parity calculations and stripe management are offloaded to the controller's dedicated processor, reducing host CPU load.
Documented management interface: hardware controllers provide a documented management interface (typically a web GUI or a CLI utility like storcli or arcconf) with documented SMART monitoring integration, predictive failure alerts, and documented rebuild progress reporting.
The documented reference 57 Studios hardware controller is the LSI MegaRAID 9560 series or equivalent, with a documented battery-backed cache module.
Software RAID (mdadm)
Linux mdadm provides software RAID managed by the host kernel. mdadm RAID 10 is a documented production-suitable configuration for the Linux host with one documented caveat: mdadm does not provide a battery-backed write cache. The write-hole risk for mdadm RAID 5 and RAID 6 is higher than for a hardware controller with battery-backed cache; for RAID 10 the write-hole risk is lower (because RAID 10 does not have a parity write), but the absence of a battery-backed cache means that writes in flight at the time of a power event may not be committed to all mirror pairs.
The documented mitigation for mdadm RAID 10 without battery-backed cache is the UPS layer (documented in Power and UPS Configuration) and the device-level power-loss protection of the Intel D3-S4510 SSDs. The UPS layer ensures that power events do not interrupt writes in flight; the device-level capacitor-backed cache ensures that writes acknowledged to the OS are committed to the device's NAND before the device loses power.
ZFS
ZFS is a combined filesystem and volume manager that implements its own RAID equivalents (RAID-Z1, RAID-Z2, RAID-Z3, and mirror) and provides documented copy-on-write semantics that eliminate the write-hole. The copy-on-write model means that ZFS never overwrites existing data in place; a new version of a block is written to a new location before the old location is freed. A power event that interrupts a write leaves the old version of the block intact; ZFS rolls back to the last consistent transaction group on the next mount.
ZFS is the documented reference for the backup-storage tier (Storinator XL60) and for operators who prefer software-only RAID stacks. ZFS mirror (the ZFS equivalent of RAID 10) provides documented RAID 10 semantics with the documented copy-on-write write-hole protection.
Journal placement
For mdadm RAID 5 and RAID 6 (documented as not production-suitable for this workload but included for completeness), a dedicated journal device can be added to the array to provide write-hole protection. The journal device receives all writes before they are committed to the array; in the event of a power event, the journal is replayed on remount. Journal placement on a separate physical device (not one of the RAID member devices) provides documented protection equivalent to the hardware RAID write cache for write-hole scenarios.
For ZFS, the ZFS Intent Log (ZIL) provides the same function as the journal. The ZIL is best placed on a dedicated enterprise SSD (the SLOG device) separate from the pool's data vdevs.
Fsync semantics and the game server write path
The game server process must issue writes with documented fsync semantics to ensure the documented RPO is enforced at the application layer. Without fsync, the OS kernel may cache write data in memory and flush it to the storage subsystem asynchronously; the write is not durable until the kernel flush completes, and the documented RPO begins at the flush, not at the application write call.
The documented Unturned game server write path should be configured with O_DSYNC or followed by explicit fsync() calls on the save-state files. The configuration ensures that each save-state write traverses the full synchronous write path documented in the sequence diagram above before the write is considered complete.
Pro tip
Verify the fsync semantics of the game server's save-state write path by monitoring the writeback and dirty page counts in /proc/meminfo during a save-state flush cycle. If the dirty count does not drop to near-zero within the documented flush window, the game server is buffering writes in the kernel page cache rather than flushing synchronously. Review the game server's file I/O configuration documentation.
Save-state recovery walkthrough
The following walkthrough documents the procedure for recovering from a primary array device failure using the local DRBD replica. The procedure is documented at the operations console and signed off at each quarterly maintenance cycle.
Step 1: Confirm the failure and the replica state
drbdadm status allThe output documents the DRBD resource state, the connection state, and the disk state for each node. A healthy local replica in Protocol C synchronization reports UpToDate for the peer's disk state. Confirm that the local replica is UpToDate before proceeding.
Step 2: Promote the local replica to primary
drbdadm primary --force <resource-name>This command promotes the local DRBD secondary to primary role. The promotion is immediate; the local replica becomes the active block device.
Step 3: Verify the filesystem
fsck -n /dev/drbd<N>Run a read-only filesystem check to confirm the filesystem on the promoted replica is clean. A clean filesystem (no errors, no journal replay required) confirms that the documented synchronous write path maintained consistency.
Step 4: Mount the promoted replica and resume service
mount /dev/drbd<N> /srv/game-data
systemctl start unturned-server@<instance>Mount the promoted replica at the documented game-data mount point and restart the game server instances. The game server reads the save state from the promoted replica and resumes operation. Elapsed time from Step 1 to server-up: documented at 20 to 25 seconds in the reference configuration — within the documented 30-second RTO.
Step 5: Replace the failed device and rebuild
Order the replacement Intel D3-S4510. When it arrives, hot-swap the failed device (the hardware RAID controller supports hot-swap without a service interruption), allow the RAID rebuild to complete, and re-establish the DRBD synchronization:
drbdadm connect <resource-name>
drbdadm verify <resource-name>The DRBD synchronization from the promoted primary to the reconnected secondary completes at the documented rebuild rate. Monitor the progress with drbdsetup status --statistics.
Frequently asked questions
Are snapshot backups sufficient for the documented RPO?
Snapshot backups serve as a recovery floor; they do not meet the documented RPO. Only synchronous replication meets the documented standards for production Unturned hosting. A snapshot backup schedule — even one running every 5 minutes — captures point-in-time state at the snapshot boundary. Writes committed between two snapshot boundaries are not captured in the earlier snapshot. In a failure that destroys the primary and the snapshot simultaneously, those writes are lost. The documented RPO of 5 seconds is achievable only through synchronous replication, where every acknowledged write is simultaneously committed to the replica before the application receives confirmation.
What is the difference between a hardware RAID controller and a software RAID stack?
A hardware RAID controller is a dedicated PCIe card that manages RAID logic independently of the host CPU, providing a battery-backed write cache that prevents write-hole conditions on power events. A software RAID stack (mdadm or ZFS) runs in the host kernel without dedicated hardware. ZFS provides copy-on-write semantics that eliminate the write-hole without a battery-backed cache. mdadm RAID 10 is suitable when paired with the documented UPS layer and device-level power-loss protection. The documented reference configuration uses a hardware controller with battery-backed write cache for the primary array.
How is the 1,500 km minimum geographic separation enforced?
The 1,500 km minimum is an operational policy enforced at the documented vendor-selection stage. The offsite replica host is verified against the documented address of the primary site using the documented distance calculation (great-circle distance between the two coordinates). Any offsite facility within 1,500 km of the primary site is documented as not meeting the geographic separation requirement and is excluded from consideration.
What happens if the offsite synchronous replica link is severed?
When the offsite DRBD secondary node goes offline, the DRBD primary continues operating in a degraded synchronous mode: writes are still synchronous to the local secondary, and the protocol falls back to a two-node synchronous configuration. The offsite secondary resynchronizes automatically when connectivity is restored. The documented procedure is to alert the operations team when the offsite link is severed and to verify the resynchronization is in progress before closing the alert. Sustained offsite disconnection degrades the documented RPO from the full three-node synchronous guarantee to the local two-node guarantee; the offsite gap is the recovery floor until resynchronization completes.
Is RAID 6 ever acceptable for a production storage array?
RAID 6 is documented as not suitable for the primary storage array in the reference 57 Studios configuration. The documented reasons are the write-hole risk under power events without battery-backed hardware cache, the degraded write performance on the affected stripe set during a rebuild, and the unrecoverable failure risk if a second device fails during the first device's rebuild window. RAID 6 is documented as suitable for the backup-storage tier (Storinator XL60) where the write workload is sequential-dominant and the write-hole risk is mitigated by the ZFS copy-on-write filesystem.
What is the documented rebuild time for a failed RAID 10 member?
The documented rebuild time for a single failed Intel D3-S4510 7.68 TB device in a RAID 10 array is 2 to 3 hours under normal production I/O load. The rebuild time is dominated by the sequential read throughput of the surviving mirror device. The game server estate remains fully operational throughout the rebuild. The rebuild rate is documented in the RAID controller configuration; the reference 57 Studios configuration sets the rebuild rate to the documented maximum while maintaining priority headroom for production I/O.
How is the synchronous replication monitored?
DRBD replication health is monitored through the drbdsetup status command output, which reports the connection state, the role, the disk state, and the documented replication throughput for each resource. The monitoring integration is documented as a Netdata Cloud custom module that polls drbdsetup at the documented interval and alerts the operations team when the out-of-sync byte count exceeds the documented threshold or when the connection state transitions out of Connected.
Does synchronous offsite replication require a dedicated WAN link?
Yes. Synchronous DRBD Protocol C replication over a shared internet connection introduces documented variable-latency overhead that is inconsistent with the synchronous write acknowledgment model. A shared internet link with variable jitter can spike the synchronous write latency into the hundreds of milliseconds during congestion events. The documented reference is a dedicated symmetric 2 Gbps leased-line or dark-fiber connection between the primary site and the offsite replica site.
What is the documented backup-tier retention policy?
The documented backup-tier retention policy for the Storinator XL60 is: 24 hourly snapshots, 7 daily snapshots, 4 weekly snapshots, and 12 monthly snapshots. This policy provides granular point-in-time recovery for the previous 24 hours, daily recovery points for the previous week, weekly recovery points for the previous month, and monthly recovery points for the previous year. The policy is implemented as an automated snapshot schedule on the ZFS pool with documented automatic expiration of snapshots that fall outside the retention window.
How is the backup-storage tier protected against ransomware or accidental deletion?
The Storinator XL60 backup-storage tier is configured with ZFS snapshot immutability (the readonly property on the snapshot datasets) and is accessible only from the documented backup network segment, not from the game server estate's production network. The documented network isolation means that a compromised game server node cannot reach the backup tier directly. Administrative access to the backup tier requires documented two-factor authentication through the operations console.
What is the journal device recommendation for ZFS?
For ZFS pools on the Storinator XL60, the documented reference SLOG (ZFS Intent Log) device is a mirrored pair of enterprise NVMe SSDs with documented power-loss protection. The SLOG device must have lower write latency than the pool's data vdevs; enterprise NVMe SSDs with documented capacitor-backed write caches are the reference choice. The SLOG mirror (two devices) ensures that the ZIL remains intact if one SLOG device fails.
Can the local and offsite replicas be on the same DRBD resource?
Yes. DRBD supports multiple secondary nodes on a single resource in the documented stacked or dual-secondary configurations. The 57 Studios reference uses the documented dual-secondary configuration: the primary node replicates to both the local secondary and the offsite secondary simultaneously in Protocol C. Both secondaries must acknowledge the write before the primary returns success to the application.
What is the minimum bandwidth for the LAN DRBD link?
The local DRBD link between the primary node and the local secondary node is documented at 10 Gbps on the production switch fabric. The 10 Gbps link provides documented headroom over the peak write throughput of the RAID 10 array (approximately 1,000 MB/s, or 8 Gbps at peak). The documented minimum is 10 Gbps; 25 Gbps or 100 Gbps is appropriate for larger arrays with higher peak write throughput.
Comparison summary
| Architecture component | Reference configuration | Production suitable |
|---|---|---|
| Primary RAID level | RAID 10, 4× Intel D3-S4510 7.68 TB | Yes |
| Raw capacity | 30.72 TB | — |
| Usable capacity | 15.36 TB | — |
| RAID controller | Hardware with battery-backed write cache | Yes |
| Local synchronous replica | DRBD Protocol C, same-rack separate node | Yes |
| Offsite synchronous replica | DRBD Protocol C, ≥ 1,500 km, owned hardware | Yes |
| Offsite WAN link | Dedicated 2 Gbps symmetric | Yes |
| Backup-storage tier | 45Drives Storinator XL60 | Yes |
| Backup replication | ZFS send/receive, 15-minute snapshot interval | Backup tier only (not production RPO) |
| Documented RTO | ≤ 30 seconds | — |
| Documented RPO | ≤ 5 seconds (synchronous replication) | — |
| Backup-tier RPO | 15 minutes (recovery floor) | — |
Appendix A: documented device-selection criteria
The Intel D3-S4510 is selected against a documented set of criteria applied to all primary array devices.
| Criterion | Documented threshold | D3-S4510 value | Meets threshold |
|---|---|---|---|
| Device class | Datacenter SSD (24x7 duty cycle) | Datacenter SATA SSD | Yes |
| Endurance | ≥ 10,000 TBW | 14,000 TBW | Yes |
| AFR | ≤ 0.5% at mixed-use workload | ≤ 0.35% | Yes |
| Power-loss data protection | Required (capacitor-backed) | Capacitor-backed write cache | Yes |
| Interface | SATA III or NVMe | SATA III | Yes |
| Capacity tier | ≥ 3.84 TB per device | 7.68 TB | Yes |
| Manufacturer warranty | ≥ 5 years | 5 years | Yes |
| SMART attribute support | Required | Full SMART support | Yes |
Any device that does not meet all documented criteria is excluded from consideration for the primary array. The criteria are reviewed annually against the documented device market to identify qualifying alternatives to the reference device as availability changes.
Appendix B: DRBD configuration reference
The following is the documented reference DRBD resource configuration for the primary-to-local-secondary Protocol C pair. The offsite secondary is added as a documented additional host in the same resource.
resource game-data {
protocol C;
disk {
on-io-error detach;
fencing resource-only;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
on primary-node {
device /dev/drbd0;
disk /dev/md0; # RAID 10 array device
address 10.0.1.10:7789;
meta-disk internal;
}
on local-secondary-node {
device /dev/drbd0;
disk /dev/md0; # Local secondary RAID 10 array
address 10.0.1.11:7789;
meta-disk internal;
}
on offsite-secondary-node {
device /dev/drbd0;
disk /dev/md0; # Offsite secondary RAID 10 array
address 198.51.100.20:7789; # Offsite WAN IP (documented)
meta-disk internal;
}
}The protocol C directive enforces synchronous replication. The on-io-error detach disk option documents the behavior on a local device I/O error: the node detaches from the device and reports a disk-error state, allowing the surviving secondary to be promoted. The split-brain resolution directives (after-sb-*) document the documented conservative defaults: discard changes on the zero-primary node, discard the secondary's state on the one-primary node, and disconnect on the two-primary node (requiring manual intervention).
Appendix C: backup-tier snapshot schedule reference
The documented reference snapshot schedule for the backup-storage tier is implemented as a cron job on the Storinator XL60 system.
# Hourly snapshot (retained for 24 hours)
0 * * * * /usr/local/bin/zfs-snapshot.sh game-data hourly 24
# Daily snapshot at 02:00 (retained for 7 days)
0 2 * * * /usr/local/bin/zfs-snapshot.sh game-data daily 7
# Weekly snapshot at 03:00 Sunday (retained for 4 weeks)
0 3 * * 0 /usr/local/bin/zfs-snapshot.sh game-data weekly 4
# Monthly snapshot at 04:00 on the 1st (retained for 12 months)
0 4 1 * * /usr/local/bin/zfs-snapshot.sh game-data monthly 12The zfs-snapshot.sh script creates a ZFS snapshot with a documented timestamp suffix, verifies the snapshot creation, and expires snapshots that fall outside the documented retention window. The script output is logged to the documented operations log and monitored by the documented Netdata Cloud alerting integration.
Storage monitoring and health-check integration
The documented storage architecture is only as reliable as the monitoring that surfaces failures before they compound. A single device failure in a RAID 10 array is a managed event; an undetected device failure followed by a second device failure before the first is replaced is an unrecoverable event. The documented monitoring stack surfaces the first failure immediately.
SMART monitoring
The Intel D3-S4510 exposes the full set of SMART attributes through the device's SATA interface. The documented monitoring configuration polls SMART attributes at the documented 5-minute interval using smartctl and forwards the results to the Netdata Cloud monitoring instance documented in Cellular Failover for ISP Redundancy.
The critical SMART attributes monitored for the D3-S4510 are:
| SMART attribute | ID | Documented alert threshold |
|---|---|---|
| Reallocated sector count | 5 | Any non-zero value |
| Uncorrectable error count | 187 | Any non-zero value |
| Media and data integrity errors | 199 (NVMe) / 187 | Any increase |
| Percentage used (endurance) | 233 | ≥ 80% |
| Available reserved space | 232 | ≤ 10% |
| Power-on hours | 9 | ≥ 26,280 hours (3 years) triggers device-refresh review |
Any non-zero value on the reallocated sector count or uncorrectable error count triggers an immediate documented alert to the operations team and initiates the documented device-replacement procurement workflow.
RAID controller monitoring
The documented hardware RAID controller publishes its array state through the storcli or arcconf management interface. The documented monitoring configuration polls the controller status at the 5-minute interval and alerts on any state transition out of Optimal:
storcli /c0/v0 showA Degraded state indicates a device failure; a Failed state indicates the array has exceeded its fault tolerance. Both states are documented as P1 alerts requiring immediate response.
DRBD replication monitoring
The DRBD replication state is polled through drbdsetup status --statistics at the documented 60-second interval. The documented alert conditions are:
- Connection state transitions to any value other than
Connected - Disk state transitions to any value other than
UpToDateon any secondary - Out-of-sync bytes exceeds the documented threshold (10 MB, corresponding to approximately 2 seconds of peak write throughput)
- Replication throughput drops below the documented minimum (1 MB/s during an active write workload) for more than 30 consecutive seconds
Best practice
The out-of-sync byte threshold of 10 MB is the documented early warning level. It is set well below the RPO violation threshold to give the operations team time to investigate and resolve a replication degradation before it reaches the documented RPO floor. A 10 MB out-of-sync count at peak write throughput represents approximately 2 seconds of unsynced data — still within RPO — but trending toward the boundary.
Capacity planning
The documented 15.36 TB usable capacity of the reference RAID 10 array is provisioned against the documented capacity model for the reference 57 Studios Unturned estate.
| Capacity component | Documented allocation |
|---|---|
| Active save-state data (all server instances) | 2.8 TB |
| Transaction log retention (90 days rolling) | 1.2 TB |
| Map data and asset storage | 0.9 TB |
| Operating system and game server binaries | 0.3 TB |
| Working headroom (documented minimum 20%) | 3.1 TB |
| Documented total allocated | 8.3 TB |
| Documented available capacity | 7.06 TB |
The documented working headroom of 20 percent is maintained at all times. When the allocated capacity exceeds 80 percent of the usable capacity (12.3 TB in the reference configuration), the documented capacity-expansion procedure is initiated: a second RAID 10 array is provisioned and added to the ZFS pool or LVM volume group, expanding the usable capacity without disrupting the running estate.
The documented capacity-expansion review is performed quarterly as part of the broader infrastructure operations review. Growth trends are plotted over the trailing 90 days and projected forward 12 months; if the projected growth crosses the 80 percent threshold within the 12-month window, the capacity expansion is initiated proactively rather than reactively.
Pro tip
The save-state data growth rate accelerates with player count and server population. Track the save-state footprint per active player per month (documented at approximately 12 MB/player/month in the reference configuration) and use the documented player count trend as the leading indicator for capacity planning rather than looking only at raw disk usage.
Failure mode documentation
The documented storage architecture is designed against a set of documented failure modes. Each failure mode is documented with its detection mechanism, its automated response, and its documented manual response.
The flowchart documents the five primary failure scopes and the documented response for each. The key observation is that four of the five failure scopes result in zero service interruption — the estate continues operating while the failure is investigated and resolved. Only the full primary-node offline scenario requires the documented 30-second failover to the promoted secondary.
Visual summary: full storage redundancy stack
WRITE PATH (simplified)
=======================
Game Server Backup Tier
Process (Archive)
| ^
| write() + fsync() | ZFS send/receive
v | 15-min snapshots
+---------+ +-----+------+
| OS VFS | | Storinator |
| Layer | | XL60 |
+---------+ | EPYC 8124P |
| | 60-bay |
v +------------+
+---------+
| RAID 10 | <-- 4x Intel D3-S4510 7.68TB
| Array | 30.72 TB raw / 15.36 TB usable
| HW ctrl | Battery-backed write cache
+---------+
|
v
+---------+ +-----------+ +------------+
| DRBD | ------> | Local | ----+ | Offsite |
| Primary | Prot C | Secondary | | | Secondary |
| Node | | (same | +-> | (>1,500 km)|
| | Prot C | rack) | | owned hw |
| | ------> +-----------+ +------------+
| |
+---------+
|
| All three nodes confirm before
| write() returns to game server
v
RPO <= 5 seconds guaranteed
RTO <= 30 seconds guaranteedAppendix D: documented retired and superseded configurations
The documented retired configurations are preserved for historical reference. The studio retains archived documentation on each superseded configuration.
| Documented retired configuration | Superseded by | Documented retirement date | Documented reason |
|---|---|---|---|
| RAID 5, 4× 2 TB SATA HDD | RAID 10, 4× Intel D3-S4510 7.68 TB | 2021-06 | Write hole risk; rebuild window unacceptable; HDD throughput insufficient |
| Asynchronous offsite replication (rsync, 15-min interval) | DRBD Protocol C synchronous offsite | 2022-03 | 15-minute RPO did not meet documented production standard |
| Single-node, no replica | DRBD primary + local secondary | 2021-01 | Single point of failure; any device failure caused player-visible outage |
| 45Drives Storinator AV15 as backup tier | 45Drives Storinator XL60 | 2023-09 | Insufficient drive count for the documented retention policy; XL60 60-bay capacity exceeds requirements |
| Manual snapshot scripts on primary | Automated ZFS send/receive to Storinator XL60 | 2022-11 | Manual cadence missed during operational events; automation provides documented consistency |
Closing notes
The documented storage architecture — RAID 10 primary, synchronous local replica, synchronous offsite replica at 1,500 km minimum, and Storinator XL60 backup tier — is the 57 Studios reference configuration that meets the documented RTO of 30 seconds and RPO of 5 seconds for the self-hosted Unturned estate. Each layer addresses a distinct failure scope: the RAID 10 primary addresses device-level failures, the local replica addresses node-level failures, the offsite replica addresses site-level failures, and the backup tier addresses catastrophic multi-site failures. No layer is optional; the documented professional baseline requires all four.
The documented configuration is reviewed annually as part of the studio's operations cycle. Device availability, replication technology, and network provisioning are reviewed against the documented industry baseline, and the reference configuration is updated to reflect validated improvements. The studio's operational experience across multiple upgrade cycles is documented in Appendix D; the progression from single-node to the current four-layer architecture reflects the incremental hardening of the estate against documented failure scenarios as the player base and commerce volume grew.
