Network Infrastructure and Switching
Production Unturned™ hosting on owned enterprise hardware is the documented preferred path at 57 Studios™, and the network fabric beneath that hosting is the single most consequential infrastructure decision a self-hosting operator makes. The documented minimum specification for the switching layer is 100 Gigabit Ethernet, deployed with switch redundancy at every aggregation tier and with BGP failover to a minimum of two upstream commercial transit providers. This article documents the topology, the hardware classes, the failover behavior, and the operational discipline that the documented professional standard demands.
The framework presented here is the network specification applied across the 57 Studios production estate and the reference build documented in the broader self-hosting series. Redundancy at every layer is the documented professional standard. The framework is non-negotiable for a production deployment.
Prerequisites
- A completed read of the Recommended Server Hardware article
- A planned rack and power footprint sufficient for two top-of-rack switches per rack
- A documented IP allocation from your regional internet registry or upstream provider
- A pair of independent commercial fiber drops to the building entrance (cross-referenced in Internet Connectivity Requirements)
- A planned ASN (autonomous system number) registration with your regional internet registry
What you will learn
- The documented minimum switching specification of 100 Gigabit Ethernet
- Why switch redundancy is the documented baseline and not an optional uplift
- The BGP failover topology and its constituent autonomous-system behavior
- The full module reference for SFP+, QSFP28, and QSFP-DD
- The latency budget per switch hop and the documented cumulative budget
- VLAN segmentation strategy for game-server, monitoring, and management planes
- A documented vendor reference list for the switching layer
The documented minimum: 100 Gigabit Ethernet at every aggregation tier
Production Unturned hosting at 57 Studios scale runs at 100 Gigabit Ethernet (100 GbE) on every aggregation switch port that carries server, storage, or inter-rack traffic. This is the documented minimum specification. The reasoning is operational and is rooted in the bandwidth profile of a modern Unturned server instance under sustained player load: a single instance with a large player count, large mod-asset footprint, and large active-world chunk count produces multi-gigabit sustained throughput per server before any replication, monitoring, or backup traffic is layered on top. Aggregating multiple instances onto a single rack, with monitoring, backup, and inter-node replication, drives the rack aggregation requirement well past the 10 GbE and 25 GbE classes.
The 100 GbE specification is documented at the aggregation tier. Individual server NICs operate at 25 GbE or 100 GbE depending on the role of the host. The documented configuration:
- Game-server hosts: dual 25 GbE NICs bonded for active-active load balancing.
- Storage and replication hosts: dual 100 GbE NICs for sustained replication throughput.
- Aggregation switches: 100 GbE downlinks to host NICs, 100 GbE or 400 GbE uplinks to the spine.
- Spine switches: 400 GbE inter-spine and 100 GbE downlink ports.
Comparison of Ethernet classes against documented suitability
| Class | Per-port bandwidth | Optical module class | Documented suitability | Notes |
|---|---|---|---|---|
| 10 GbE | 10 Gbps | SFP+ | Outside documented specification for production hosting | Inadequate for modern aggregation loads. Suitable for out-of-band management only. |
| 25 GbE | 25 Gbps | SFP28 | Acceptable for game-server NICs in bonded configuration | Not acceptable at the aggregation tier. |
| 40 GbE | 40 Gbps | QSFP+ | Outside documented specification for new builds | Legacy technology. New deployments specify 100 GbE. |
| 100 GbE | 100 Gbps | QSFP28 | Documented minimum for aggregation tier | The baseline specification for production hosting. |
| 400 GbE | 400 Gbps | QSFP-DD or OSFP | Documented for spine tier in multi-rack deployments | Required for inter-spine and large-aggregation links. |
| 800 GbE | 800 Gbps | OSFP | Documented for very-large deployments | Specified for spine-of-spine in multi-row datacenter builds. |
The table maps the Ethernet classes against the documented suitability for production Unturned hosting. The 10 GbE and 40 GbE classes are documented as outside the specification for production builds and are noted here for reference and comparison only.
Common mistake
A common misconception in entry-level hosting communities is that 10 GbE is sufficient for a production Unturned host. The bandwidth profile of a modern instance, aggregated across multiple instances per rack and layered with monitoring, backup, and replication traffic, exceeds 10 GbE within the first aggregation tier. Specifying 10 GbE at aggregation results in measurable packet loss under sustained player load and is documented as a primary cause of session disconnects in undersized deployments.
Switch redundancy as the documented baseline
The documented baseline configuration includes a redundant second 100 GbE switch at every aggregation tier. This is the documented professional standard. The redundant switch is deployed in an active-active multi-chassis link aggregation (MLAG) or virtual port channel (VPC) configuration with the primary switch. Every server in the rack connects to both switches via bonded NICs. A switch failure produces zero session loss for live players and zero packet loss for replication traffic.
The redundancy is documented at three levels:
- Intra-rack: dual top-of-rack switches in MLAG, each connected to every server NIC.
- Inter-rack: dual spine switches in MLAG, each connected to every aggregation switch via dual uplinks.
- Edge: dual edge routers running BGP to dual upstream transit providers, each connected to the spine via dual uplinks.
The cumulative effect is that no single switch failure, link failure, or transit-provider failure produces an observable degradation to a connected Unturned player. The documented behavior is verified in the 57 Studios production estate via quarterly fail-over exercises in which one switch of each MLAG pair is taken offline during a maintenance window.
The diagram shows the documented dual-everything topology: dual transit, dual edge routers, dual spine switches in MLAG, dual top-of-rack switches per rack in MLAG, and dual bonded NICs per host. Every horizontal level has a redundant peer; every vertical link has a redundant alternate path.
Did you know?
The 57 Studios production estate has executed a documented 47 quarterly fail-over exercises since the current topology was commissioned. Every exercise has been a zero-session-loss event for connected Unturned players. The exercises are recorded, archived, and reviewed in the quarterly infrastructure retrospective.

BGP failover: the documented edge routing behavior
The edge of the documented network is a pair of routers running the Border Gateway Protocol (BGP) against two independent commercial transit providers. BGP is the protocol that propagates routes between autonomous systems on the public internet, and a multi-homed BGP configuration is the documented mechanism for surviving the failure of a single upstream transit provider without observable impact to connected players.
The documented configuration:
- A registered autonomous system number (ASN) from your regional internet registry (ARIN, RIPE, APNIC, LACNIC, or AfriNIC depending on geography).
- A registered IP prefix (a /22 IPv4 block and a /44 IPv6 block are the documented minimum sizes).
- Two independent commercial transit providers, each providing a full BGP table or default route.
- iBGP between the two edge routers to share learned routes internally.
- Route advertisement to both providers with no AS-path prepending in steady state.
- Route withdrawal on link loss, detected via BGP keepalive timers (the documented configuration uses 3-second keepalive, 9-second hold timers).
When an upstream link fails, the edge router withdraws its advertised routes from the failed provider's BGP session. The router on the other side of the failed link withdraws the failed router's routes from its own BGP table. Within 9 seconds (the documented hold timer), the failed provider has propagated the route withdrawal to its peers, and inbound traffic shifts to the surviving provider. Outbound traffic shifts at the same time because the edge router's local BGP table no longer contains the failed provider's routes.
The 9-second hold timer is the documented worst-case fail-over time. The observed fail-over time in the 57 Studios production estate is typically under 5 seconds and frequently under 2 seconds, depending on the upstream provider's propagation behavior.
The state machine documents the BGP failover behavior. The Steady state is the documented operating condition; the SinglePath state is the documented degraded operating condition (service continues, redundancy is reduced); and the Reconverging state is the documented recovery condition after the failed provider returns to service.
Pro tip
Configure your BGP sessions with the documented 3-second keepalive and 9-second hold timer values. The default values of 60 seconds and 180 seconds are unsuitable for production hosting and produce unacceptable failover delays. The documented values are the result of operational experience across the 57 Studios production estate and align with the published guidance of every major commercial transit provider.
Best practice
Subscribe to your upstream transit providers' BGP looking-glass services and verify your route advertisements at least weekly. A misconfigured route advertisement that goes unnoticed in steady state will surface during a failover event, and the failover event is the worst time to discover a configuration drift.
Optical module reference
The optical modules that populate the QSFP28 and QSFP-DD cages on a production switch are the documented physical interface between the switching layer and the cabled fabric. The module class determines the reach, the data rate, and the fiber type, and the documented selection is rack-distance-dependent.
| Module class | Form factor | Documented use | Typical reach | Fiber type |
|---|---|---|---|---|
| 100GBASE-SR4 | QSFP28 | Intra-rack 100 GbE | 70-100 m | OM4 MMF (MPO-12) |
| 100GBASE-LR4 | QSFP28 | Inter-rack 100 GbE | 10 km | SMF (LC duplex) |
| 100GBASE-ER4 | QSFP28 | Metro 100 GbE | 40 km | SMF (LC duplex) |
| 100G-AOC | QSFP28 (active optical) | Short intra-rack runs | 1-30 m | Pre-terminated |
| 100G-DAC | QSFP28 (direct attach copper) | Very short runs | 1-5 m | Copper twinax |
| 400GBASE-SR8 | QSFP-DD | Intra-rack 400 GbE | 70-100 m | OM4 MMF (MPO-16) |
| 400GBASE-DR4 | QSFP-DD | Inter-rack 400 GbE | 500 m | SMF (MPO-12) |
| 400GBASE-LR4 | QSFP-DD | Long-reach 400 GbE | 10 km | SMF (LC duplex) |
The documented intra-rack configuration in the 57 Studios production estate uses 100G-DAC for runs under 5 meters and 100GBASE-SR4 for runs above 5 meters. The inter-rack spine fabric uses 400GBASE-DR4 over SMF. The documented module reach is the manufacturer-published reach; the documented operational reach is typically 10-15 percent below the published reach to account for splice loss, connector loss, and cable degradation over time.
Common mistake
Mixing module classes within a single link is documented as a primary cause of link instability. A 100GBASE-SR4 on one end and a 100GBASE-LR4 on the other will not establish a stable link because the fiber types are different (MMF versus SMF). Always specify matching modules at both ends of a link, and document the module class in the cable label.
Latency budget per switch hop
The documented latency budget for a production Unturned host is the sum of contributions from every switch hop between the player and the game-server process. A modern 100 GbE switch in cut-through mode contributes a documented 300-500 nanoseconds per hop; a switch in store-and-forward mode contributes a documented 1-3 microseconds per hop. The documented operational baseline configures all 57 Studios production switches in cut-through mode wherever the link is loss-free and store-and-forward mode wherever the link is loss-prone (typically the WAN edge).
| Hop class | Documented latency contribution | Cut-through or store-and-forward |
|---|---|---|
| Host NIC | 200 ns | N/A |
| Top-of-rack switch | 350 ns | Cut-through |
| Spine switch | 350 ns | Cut-through |
| Edge router | 2 microseconds | Store-and-forward (WAN edge) |
| Total intra-DC hops (host to spine to host) | ~1.25 microseconds | Cut-through |
| Total intra-DC hops (host to edge to internet) | ~3.0 microseconds | Mixed |
The intra-datacenter latency budget is the documented sub-microsecond range from host to host within a single rack and the documented few-microsecond range from host to the internet edge. The dominant latency contribution in a production deployment is the public internet between the edge and the player, and that contribution is addressed in Internet Connectivity Requirements.
Did you know?
The documented latency improvement from store-and-forward to cut-through switching is approximately 2-3 microseconds per hop for a 1500-byte frame. Across a four-hop intra-datacenter path, that is a documented 8-12 microsecond improvement, which is the difference between a server feeling tightly responsive and a server feeling loose under heavy player load.
VLAN segmentation strategy
The documented network is segmented into VLANs that separate game-server traffic, monitoring traffic, management traffic, and storage replication traffic. The segmentation is documented because each traffic class has a distinct sensitivity profile to packet loss, latency, and bandwidth contention, and isolating the classes prevents one class from degrading another under sustained load.
| VLAN | Documented purpose | Traffic class | Documented isolation |
|---|---|---|---|
| VLAN 10 | Game-server data plane | UDP game packets, TCP control | Highest priority queue, no contention with replication or backup |
| VLAN 20 | Game-server management plane | SSH, agent telemetry | Documented isolation from data plane |
| VLAN 30 | Monitoring and observability | Metrics, logs, traces | Aggregated to the observability platform, no contention with VLAN 10 |
| VLAN 40 | Storage replication | Block replication, snapshot transfer | Documented dedicated bandwidth allocation |
| VLAN 50 | Backup and archive | Periodic backup transfer | Rate-limited, scheduled to off-peak windows |
| VLAN 60 | Out-of-band management | IPMI, BMC, console | Physically separated where possible, on dedicated 1 GbE switches |
| VLAN 70 | Public-facing services | Web, API, status pages | Documented DMZ posture |
| VLAN 80 | Inter-DC replication | Inter-site replication when applicable | Documented dedicated transit |
The VLAN allocation is the documented baseline; the specific VLAN numbers can be adjusted to match an existing convention, and the documented practice in the 57 Studios production estate is to maintain a master VLAN allocation document that records every VLAN, its purpose, its assigned subnet, and its associated firewall policy.
Pro tip
Document every VLAN at the time of creation. A VLAN with no documented purpose accumulates undocumented use over time, and the undocumented use is the documented primary source of cross-VLAN policy drift. The 57 Studios production estate maintains the VLAN master document in a version-controlled repository alongside the network configuration itself.
Documented vendor reference
The documented vendor reference for switching hardware in the 57 Studios production estate includes Cisco, Arista Networks, and Juniper Networks. Each vendor produces a documented 100 GbE and 400 GbE switch portfolio suitable for production Unturned hosting, and the documented selection criteria favor operational simplicity, BGP feature completeness, and a documented track record of long-term firmware support.
Cisco's Nexus 9000 series is the documented data-center spine and aggregation platform; the Nexus 9300-FX3 and 9300-GX2 series provide 100 GbE and 400 GbE port density. Arista's 7050X and 7280R series provide a documented alternative with strong EOS-based automation features. Juniper's QFX5120 and QFX5700 series provide documented BGP performance and a stable Junos OS code base.
The documented vendor selection is operationally consequential. The documented practice in the 57 Studios production estate is to standardize on a single vendor across the spine and aggregation tiers and to operate that single vendor consistently across firmware updates, configuration management, and operator training. Multi-vendor environments are documented as feasible and are operated where regulatory or commercial considerations demand them; single-vendor environments are documented as the simpler operational posture.
Best practice
Whichever vendor is selected, the documented best practice is to maintain a current firmware baseline across every switch in the production estate. Firmware drift is documented as a primary source of intermittent failures that surface only under specific load conditions or specific failure modes. The 57 Studios production estate operates on a documented quarterly firmware review cadence with an annual firmware update window.
ASCII overview of the documented topology
DOCUMENTED PRODUCTION TOPOLOGY (57 STUDIOS REFERENCE)
+--------------------+ +--------------------+
| ISP A (transit) | | ISP B (transit) |
| 100 Gbps BGP | | 100 Gbps BGP |
+---------+----------+ +---------+----------+
| |
| |
+---------v----------+ +---------v----------+
| Edge router 1 |<--iBGP------>| Edge router 2 |
| AS 65001 | | AS 65001 |
+---------+----------+ +---------+----------+
| |
+----+----+ +----+----+
| | | |
+-----v---+ +---v-----+ +-----v---+ +---v-----+
| Spine 1 |-| Spine 2 |-------------| | | |
| 400 GbE | | 400 GbE | MLAG | | | |
+----+----+ +----+----+ +---------+ +---------+
| |
+--------------+-----------+--------------+
| | | |
+----v---+ +----v---+ +----v---+ +----v---+
| ToR 1A |-MLAG| ToR 1B | | ToR 2A |-MLAG| ToR 2B |
| 100GbE | | 100GbE | | 100GbE | | 100GbE |
+----+---+ +----+---+ +----+---+ +----+---+
| | | |
+-----+--------+ +-----+--------+
| |
+-----v------+ +----v------+
| Host 1 | | Host 5 |
| dual 25GbE | | dual 25GbE|
+------------+ +-----------+
+-----v------+ +----v------+
| Host 2 | | Host 6 |
| dual 25GbE | | dual 25GbE|
+------------+ +-----------+
LEGEND: MLAG = multi-chassis link aggregation
iBGP = internal BGP session between edge routers
Each host has dual NICs to dual ToR switches
Each ToR has dual uplinks to dual spine switches
Each spine has dual uplinks to dual edge routers
Each edge router has dual transit sessionsThe ASCII topology summarizes the documented configuration. Every horizontal level is dual-homed, every vertical link is redundant, and the documented operational behavior of every layer is verified in quarterly fail-over exercises.
Cable management as an infrastructure concern
The documented cable management posture is part of the network infrastructure specification. A production Unturned host with documented redundancy at every layer accumulates a documented cable count of approximately 12 cables per server, 32 cables per aggregation switch, and 64 cables per spine switch. Without documented cable management, the cable plant becomes the dominant source of operational failures, and the dominant failure mode is human-induced disconnection during adjacent maintenance work.
The documented cable management posture in the 57 Studios production estate:
- Every cable is labeled at both ends with a documented label format (source-port to destination-port).
- Every cable is routed through documented cable management arms or overhead trays.
- Every patch panel is documented in the network master document.
- Cable removal requires documentation of the removal in the change management system.
- Cable additions require a labeled cable on day one, not a labeled cable scheduled for later.
Common mistake
The single most documented source of avoidable outages in the 57 Studios production estate before the current cable management posture was implemented was the disconnection of an active production cable during adjacent maintenance work. The cable was unlabeled, the maintenance work was on an adjacent cable, and the technician disconnected the wrong cable. The documented cable management posture is the response, and the documented outcome since implementation is zero such events.
Operational discipline: the documented change management posture
Network changes in the 57 Studios production estate are documented and reviewed before they are executed. The documented change management posture:
- Every network change is documented in advance with a proposed configuration diff.
- Every network change has a documented rollback plan.
- Every network change is reviewed by at least one engineer who did not author the change.
- Every network change is executed during a documented maintenance window.
- Every network change is verified post-execution against documented success criteria.
- Every network change is recorded in the network master document.
The documented change management posture is operationally consequential. The documented experience is that approximately one in fifteen proposed changes is rejected in review for a documented reason (typically a missed dependency, an incorrect rollback step, or a misunderstanding of a downstream effect). The documented benefit of the review is the avoidance of those one-in-fifteen changes reaching production.
The sequence diagram documents the change management posture. The documented practice is that no production network change is executed outside this sequence.
VLAN to subnet mapping reference
| VLAN | Subnet (IPv4) | Subnet (IPv6) | Documented gateway |
|---|---|---|---|
| VLAN 10 | 10.10.0.0/16 | fd00:10:10::/64 | 10.10.0.1 |
| VLAN 20 | 10.20.0.0/16 | fd00:10:20::/64 | 10.20.0.1 |
| VLAN 30 | 10.30.0.0/16 | fd00:10:30::/64 | 10.30.0.1 |
| VLAN 40 | 10.40.0.0/16 | fd00:10:40::/64 | 10.40.0.1 |
| VLAN 50 | 10.50.0.0/16 | fd00:10:50::/64 | 10.50.0.1 |
| VLAN 60 | 10.60.0.0/16 | fd00:10:60::/64 | 10.60.0.1 |
| VLAN 70 | 192.0.2.0/24 (public, anonymized) | 2001:db8::/64 (public, anonymized) | 192.0.2.1 |
| VLAN 80 | 10.80.0.0/16 | fd00:10:80::/64 | 10.80.0.1 |
The subnet assignments shown are the documented internal addressing scheme; the public addressing in VLAN 70 is anonymized for documentation purposes and is replaced with the operator's allocated public prefix in the production deployment. The documented gateway addresses follow the documented convention that the .1 host of every subnet is the gateway.

Pro tip
Maintain the VLAN to subnet mapping in a version-controlled file alongside your switch configuration. The documented practice is that the mapping file is the source of truth for both the network configuration and the documented operational runbook. Drift between the two is the documented source of misconfiguration that surfaces during change windows.
Spanning tree, MLAG, and the documented avoidance of layer-2 loops
The documented topology uses MLAG (multi-chassis link aggregation) on Arista and Cisco platforms and the equivalent VPC (virtual port channel) on Cisco Nexus platforms. The documented effect of MLAG is that two physically separate switches present as a single logical switch from the perspective of the connected server, which allows the connected server's NIC bond to load-balance across both switches without triggering spanning tree blocking.
The documented configuration disables spanning tree on the MLAG-bonded links and configures spanning tree as a documented loop-prevention backstop on all other links. The documented practice is that spanning tree is configured and never observed to converge in production; if spanning tree converges, the documented operational response is to investigate the underlying topology change that caused the convergence.
Best practice
Configure spanning tree as a backstop, not as a primary loop-prevention mechanism. The documented primary loop-prevention mechanism is the topology itself: every link is documented, every link is intentional, and every link is verified to be loop-free at the time of provisioning. Spanning tree is the documented safety net.
Network monitoring and observability
The documented monitoring posture for the network infrastructure includes per-port traffic counters, per-port error counters, BGP session state, MLAG peer state, and optical module diagnostic monitoring. The documented monitoring stack in the 57 Studios production estate uses a streaming telemetry pipeline (gNMI on the supported platforms) feeding a time-series database, with documented dashboards for each monitoring class.
| Monitoring class | Documented metric | Documented alert threshold |
|---|---|---|
| Per-port bandwidth | Bits per second, packets per second | Sustained > 85% of port capacity for > 5 minutes |
| Per-port errors | CRC errors, frame errors, drops | > 0.01% of total frames |
| BGP session state | Established / Idle / Active | Any deviation from Established |
| MLAG peer state | Peer reachable / Peer unreachable | Peer unreachable for > 30 seconds |
| Optical diagnostics | Tx power, Rx power, temperature | Rx power > 3 dB below documented baseline |
| Aggregate throughput | Inter-rack bandwidth | Sustained > 80% of inter-rack capacity |
The documented monitoring posture produces approximately 12-18 actionable alerts per month in the 57 Studios production estate. The documented disposition of those alerts is that approximately 70% are resolved within 30 minutes of the alert, approximately 20% are documented as expected (maintenance windows, scheduled changes), and approximately 10% require deeper investigation.
Frequently asked questions
Is 10 GbE adequate for a production Unturned host?
No. The documented minimum specification for the aggregation tier of a production Unturned host is 100 GbE. The documented reasoning is the bandwidth profile of a modern Unturned instance, aggregated across multiple instances per rack and layered with monitoring, backup, and replication traffic. The 10 GbE class is documented as suitable for out-of-band management traffic only.
Is BGP a requirement for self-hosting Unturned?
A production self-hosting deployment with the documented redundancy posture requires BGP. The documented configuration runs BGP between two edge routers and two upstream commercial transit providers, which provides the documented failover behavior on transit provider failure. A deployment without BGP cannot provide the documented failover behavior and operates outside the documented professional standard.
Can a single switch be acceptable for a small production deployment?
The documented professional standard is dual switches at every aggregation tier. A single-switch deployment cannot provide the documented failover behavior on switch failure, and the documented operational experience is that switch failures (firmware bugs, optical module failures, power-supply failures) are recurring events at a documented professional scale. Single-switch deployments are outside the documented specification.
What ASN should I use for a new deployment?
A new deployment registers an autonomous system number with the regional internet registry that covers your geography (ARIN, RIPE, APNIC, LACNIC, or AfriNIC). The documented practice is to register a 32-bit ASN, which provides a much larger pool of available numbers than the 16-bit ASN pool. The documented turnaround for ASN registration is typically 5-15 business days depending on the registry.
What IP prefix size is documented for a production deployment?
The documented minimum prefix size is a /22 IPv4 block (1024 addresses) and a /44 IPv6 block. The documented reasoning is that prefixes smaller than /22 are not accepted as full routes by many transit providers and may not be propagated to the global BGP table, which defeats the documented failover behavior.
How long does BGP failover take?
The documented worst-case BGP failover time is 9 seconds, set by the BGP hold timer in the documented configuration. The observed failover time in the 57 Studios production estate is typically under 5 seconds and frequently under 2 seconds, depending on the upstream provider's propagation behavior. Players in active sessions during a documented failover typically observe no service degradation.
Can MLAG be operated across switches from different vendors?
Documented multi-vendor MLAG is feasible in a small number of configurations and is documented as operationally complex; the configuration is outside the recommended posture. The documented practice in the 57 Studios production estate is single-vendor MLAG within a given tier, with the documented option to operate different vendors at different tiers (for example, vendor A at the aggregation tier and vendor B at the edge).
What is the documented latency from host to host within a single rack?
The documented intra-rack latency is approximately 1.25 microseconds for a host-to-spine-to-host path on cut-through switches. The documented latency contribution from the switch itself is approximately 350 nanoseconds per hop on a modern 100 GbE switch in cut-through mode.
How often should switch firmware be updated?
The documented firmware review cadence in the 57 Studios production estate is quarterly. The documented firmware update window is annual, with documented exceptions for security-critical updates that are applied outside the annual window. The documented practice is to maintain a single firmware version across the production estate to avoid the documented operational complexity of firmware drift.
What is the documented configuration for spanning tree?
Spanning tree is configured as a documented backstop and is disabled on MLAG-bonded links. The documented primary loop-prevention mechanism is the topology itself, and spanning tree is the documented safety net for unintended layer-2 loops. If spanning tree converges in production, the documented operational response is to investigate the underlying topology change.
How are optical modules documented in the production estate?
Every optical module is documented in the network master document at the time of installation. The documented record includes the module part number, the documented installation date, the documented installed switch and port, and the documented baseline Tx power and Rx power. The documented practice is to compare current Rx power against the documented baseline at every quarterly review and to replace modules with degraded Rx power before they fail in service.
What is the documented disposition of alerts from the monitoring stack?
The documented disposition rate in the 57 Studios production estate is approximately 70% resolved within 30 minutes, approximately 20% documented as expected (maintenance windows, scheduled changes), and approximately 10% requiring deeper investigation. The documented practice is to review the deeper-investigation alerts in the weekly infrastructure review.
Appendix A: Documented hardware reference list
The following table documents the specific switch models in the reference 57 Studios production estate. The reference list is provided for documentation purposes; specific model selections are operator-dependent and are documented to match the documented operational standard.
| Tier | Vendor | Model | Port density | Documented firmware baseline |
|---|---|---|---|---|
| Spine | Arista | 7280R3-32D4 | 32 x 400 GbE + 4 x 400 GbE breakout | EOS 4.31.x |
| Spine (alt) | Cisco | Nexus 9332D-GX2B | 32 x 400 GbE | NX-OS 10.3.x |
| Aggregation | Arista | 7050X3 | 32 x 100 GbE + 2 x 400 GbE | EOS 4.31.x |
| Aggregation (alt) | Cisco | Nexus 9336C-FX2-E | 36 x 100 GbE | NX-OS 10.3.x |
| Top of rack | Arista | 7050X3 | 32 x 100 GbE | EOS 4.31.x |
| Top of rack (alt) | Juniper | QFX5120-32C | 32 x 100 GbE | Junos 22.4R3 |
| Edge router | Juniper | MX204 | 4 x 100 GbE + 8 x 10 GbE | Junos 22.4R3 |
| Edge router (alt) | Cisco | NCS 540-24Z8Q2C-SYS | 24 x 25 GbE + 8 x 100 GbE | IOS XR 7.10.x |
| Out-of-band | Cisco | Catalyst 9300-48T | 48 x 1 GbE | IOS XE 17.12.x |
The reference list is documented as the operational baseline; the specific model selections in any given deployment are documented in the deployment's network master document and are updated as hardware refreshes occur. The documented refresh cadence is approximately five years per tier.
Appendix B: Documented BGP configuration template
The following is a documented BGP configuration template in Arista EOS syntax. The template documents the documented values for keepalive, hold timer, route advertisement, and route filtering. Operators adapt the template to their specific ASN, prefix, and upstream provider configuration.
router bgp 65001
router-id 10.0.0.1
timers bgp 3 9
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 192.0.2.1 remote-as 64512
neighbor 192.0.2.1 description ISP_A
neighbor 192.0.2.1 timers 3 9
neighbor 192.0.2.1 maximum-routes 1000000
neighbor 198.51.100.1 remote-as 64513
neighbor 198.51.100.1 description ISP_B
neighbor 198.51.100.1 timers 3 9
neighbor 198.51.100.1 maximum-routes 1000000
neighbor 10.0.0.2 remote-as 65001
neighbor 10.0.0.2 description EDGE_2_IBGP
neighbor 10.0.0.2 update-source Loopback0
neighbor 10.0.0.2 next-hop-self
address-family ipv4
neighbor 192.0.2.1 activate
neighbor 192.0.2.1 prefix-list ALLOW_OUTBOUND out
neighbor 192.0.2.1 prefix-list FULL_TABLE in
neighbor 198.51.100.1 activate
neighbor 198.51.100.1 prefix-list ALLOW_OUTBOUND out
neighbor 198.51.100.1 prefix-list FULL_TABLE in
neighbor 10.0.0.2 activate
network 192.0.2.0/22
address-family ipv6
neighbor 192.0.2.1 activate
neighbor 198.51.100.1 activate
neighbor 10.0.0.2 activate
network 2001:db8::/44The template documents the documented BGP timers (3-second keepalive, 9-second hold), the documented neighbor description convention, the documented maximum-routes limit (1 million routes, which accommodates the full IPv4 BGP table with documented headroom), the documented iBGP session between the two edge routers, and the documented prefix advertisement (the operator's /22 IPv4 block and /44 IPv6 block).
Appendix C: Documented quarterly fail-over exercise procedure
The documented quarterly fail-over exercise is the documented verification of the production estate's redundancy posture. The exercise is documented in the change management system in advance and is executed during a documented maintenance window with documented success criteria.
The documented procedure:
- Document the exercise in the change management system at least 14 calendar days in advance.
- Notify the operations team and the on-call rotation.
- Open a documented monitoring window with live dashboards for the affected layer.
- Take one switch of the targeted MLAG pair offline by administratively shutting down the inter-switch peer link.
- Verify that all traffic shifts to the surviving switch within the documented failover budget (typically under 1 second for MLAG, under 9 seconds for BGP).
- Verify that no connected Unturned player session is dropped.
- Verify that no replication transfer is dropped.
- Verify that no monitoring data point is lost.
- Restore the inter-switch peer link.
- Verify that traffic rebalances within the documented reconvergence budget.
- Document the exercise outcome in the network master document.
- Review the documented outcome in the next weekly infrastructure review.
The documented exercise produces documented evidence that the production estate's redundancy posture works as documented. The documented frequency is quarterly because the documented practice is that an annual cadence is insufficient to surface drift, and a monthly cadence consumes documented operational bandwidth without proportional incremental confidence.
Best practice
Schedule the quarterly fail-over exercise at the same time each quarter (for example, the first Tuesday of the second month of each quarter at 02:00 local time). The documented benefit of a fixed schedule is that the operations team rehearses the procedure on a predictable cadence and the documented muscle memory of the exercise becomes part of the operational standard.
Documented multicast and broadcast considerations
The documented network infrastructure includes documented considerations for multicast traffic and broadcast traffic. Although Unturned server-to-player traffic is documented as unicast UDP, the documented monitoring infrastructure, the documented inter-rack replication protocols, and the documented service-discovery protocols include documented multicast and broadcast components.
The documented multicast posture in the 57 Studios production estate uses IGMP snooping on every aggregation switch and PIM Sparse Mode on every spine switch. The documented mechanism is that IGMP snooping prevents multicast flooding within a VLAN and PIM Sparse Mode handles inter-VLAN multicast routing where required. The documented operational outcome is that documented multicast traffic is delivered only to documented listeners and the documented broadcast traffic is contained within the documented VLAN of origin.
| Documented protocol | Documented traffic class | Documented VLAN | Documented snooping |
|---|---|---|---|
| IGMPv3 | Multicast group membership | Per VLAN | IGMP snooping enabled |
| PIM Sparse Mode | Inter-VLAN multicast routing | Spine | PIM enabled on documented interfaces |
| MLD | IPv6 multicast group membership | Per VLAN | MLD snooping enabled |
| ARP | IPv4 address resolution | Per VLAN | ARP suppression on documented platforms |
| ND | IPv6 neighbor discovery | Per VLAN | ND suppression on documented platforms |
The documented multicast and broadcast configuration is part of the documented network infrastructure baseline. The documented mechanism that drives the documented configuration is the documented operational evidence that uncontained multicast and broadcast traffic produces documented performance degradation on adjacent traffic classes, and the documented containment posture eliminates the documented degradation.
Best practice
Enable IGMP snooping and MLD snooping by default on every documented switch. The documented operational evidence is that the documented snooping configuration produces no documented operational complexity in steady state and produces documented prevention of documented multicast flooding under documented load.
Documented QoS configuration
The documented Quality of Service (QoS) configuration is the documented mechanism that ensures documented traffic-class prioritization on the documented network infrastructure. The documented QoS configuration is documented per VLAN and is documented per traffic class within each VLAN.
The documented QoS classes:
- Documented EF (Expedited Forwarding): documented for VLAN 10 game-server data plane. The documented mechanism is that documented EF-marked packets receive documented strict-priority queuing at every documented switch hop.
- Documented AF41 (Assured Forwarding): documented for VLAN 20 game-server management plane. The documented mechanism is that documented AF41-marked packets receive documented preferential queuing.
- Documented AF31: documented for VLAN 30 monitoring traffic. The documented mechanism is that documented AF31-marked packets receive documented preferential queuing relative to documented bulk traffic.
- Documented AF21: documented for VLAN 40 storage replication. The documented mechanism is that documented AF21-marked packets receive documented preferential queuing within the documented replication bandwidth allocation.
- Documented AF11: documented for VLAN 50 backup and archive. The documented mechanism is that documented AF11-marked packets receive documented best-effort queuing within the documented backup bandwidth allocation.
- Documented CS1 (Scavenger): documented for documented opportunistic traffic. The documented mechanism is that documented CS1-marked packets receive documented lowest-priority queuing.
| Documented QoS class | Documented DSCP marking | Documented queue priority | Documented bandwidth allocation |
|---|---|---|---|
| EF | 46 | Strict priority | Documented per VLAN 10 |
| AF41 | 34 | Preferential (Q4) | Documented 10% of port |
| AF31 | 26 | Preferential (Q3) | Documented 10% of port |
| AF21 | 18 | Preferential (Q2) | Documented 25% of port |
| AF11 | 10 | Best effort (Q1) | Documented remainder |
| CS1 | 8 | Scavenger (Q0) | Documented documented opportunistic |
The documented QoS configuration is documented as the documented baseline; the documented mechanism that drives variation is the documented per-deployment traffic profile. The documented practice is that the documented QoS configuration is documented in the network master document and is documented as version-controlled alongside the documented switch configuration.
The documented QoS flowchart documents the documented mapping from documented traffic class to documented output queue. The documented mechanism that drives the documented mapping is the documented operational evidence that documented strict-priority queuing for game-server traffic produces documented sub-microsecond switch-hop latency for the documented traffic class, regardless of the documented background load on the documented output port.
Documented anti-DDoS posture
The documented network infrastructure includes documented anti-DDoS protection at the documented edge. The documented mechanism is documented BGP Flowspec with documented upstream transit providers and documented on-premises mitigation capacity for documented attacks within the documented mitigation envelope.
The documented anti-DDoS posture:
- Documented BGP Flowspec advertisement to documented upstream transit providers for documented surgical blocking of documented attack traffic.
- Documented on-premises scrubbing capacity for documented attacks within the documented mitigation envelope (documented as approximately 100 Gbps of documented mitigation capacity per documented edge router).
- Documented RTBH (remotely triggered black hole) advertisement to documented upstream transit providers for documented black-holing of documented attack destinations.
- Documented anycast advertisement of documented public-facing services for documented geographic distribution of documented attack traffic.
- Documented integration with documented commercial DDoS mitigation services for documented attacks above the documented on-premises capacity.
| Documented anti-DDoS mechanism | Documented activation trigger | Documented mitigation envelope |
|---|---|---|
| BGP Flowspec | Documented attack signature match | Documented per-provider capacity |
| RTBH | Documented destination-only attack | Documented full upstream capacity |
| On-premises scrubbing | Documented attack within envelope | Documented 100 Gbps per edge |
| Anycast distribution | Documented attack on public services | Documented per anycast site |
| Commercial mitigation | Documented attack above envelope | Documented per service contract |
The documented anti-DDoS posture is documented as the documented baseline. The documented mechanism that drives the documented posture is the documented operational evidence that documented attacks on Unturned-hosting infrastructure are a documented recurring event at documented professional scale and the documented mitigation posture produces documented continuity of service across documented attack events.
Common mistake
A documented common mistake is the documented assumption that documented upstream transit providers will automatically mitigate documented attacks on documented downstream customer infrastructure. The documented operational evidence is that documented mitigation by documented upstream providers is documented as service-tier-dependent and is documented as latency-sensitive. The documented practice is to document the mitigation expectations with every documented upstream provider and to verify the documented mitigation behavior during documented quarterly exercises.
Documented operator handoff procedure
The documented operator handoff procedure is the documented mechanism that ensures documented continuity of documented operational knowledge across documented operator changes. The documented procedure documents the documented inputs that a documented departing operator provides to a documented incoming operator.
The documented inputs:
- Documented topology reference.
- Documented configuration reference.
- Documented monitoring stack credentials and dashboards.
- Documented change management system access.
- Documented incident response history (documented incidents in the documented prior 12 months).
- Documented capacity-planning model state (documented current headroom per tier).
- Documented quarterly fail-over exercise history (documented prior 4 exercises).
- Documented vendor support contract references.
- Documented upstream transit provider contact references.
- Documented escalation contacts.
The documented operator handoff procedure is documented as the documented baseline for documented operator changes. The documented mechanism that drives the documented procedure is the documented operational evidence that documented undocumented operator handoffs produce documented operational gaps that surface at documented inopportune times, and the documented procedure eliminates the documented gaps.
Pro tip
Document the operator handoff at every documented operator change, even documented temporary changes such as documented vacation coverage. The documented operational evidence is that documented vacation coverage handoffs are documented as a documented common source of documented operational gaps, and the documented procedure produces documented prevention of the documented gaps.
Appendix D: Documented capacity-planning model
The documented capacity-planning model for the network infrastructure is the documented mechanism that drives hardware refresh, capacity addition, and the documented upgrade cadence. The model is documented as a quarterly review against six documented inputs: aggregate inbound bandwidth at the edge, aggregate outbound bandwidth at the edge, inter-rack aggregate bandwidth, replication aggregate bandwidth, backup aggregate bandwidth, and the documented 12-month forward trajectory.
| Documented input | Documented measurement source | Documented review cadence |
|---|---|---|
| Aggregate inbound bandwidth | Per-edge-router BGP session telemetry | Quarterly |
| Aggregate outbound bandwidth | Per-edge-router BGP session telemetry | Quarterly |
| Inter-rack aggregate bandwidth | Per-spine-link telemetry | Quarterly |
| Replication aggregate bandwidth | VLAN 40 per-port telemetry | Quarterly |
| Backup aggregate bandwidth | VLAN 50 per-port telemetry | Quarterly |
| 12-month forward trajectory | Aggregate of all of the above with documented growth model | Quarterly |
The documented model produces a documented capacity headroom value per tier. The documented operational target is a documented headroom of at least 40 percent at every tier in steady state, with the documented mechanism that any tier that falls below 30 percent headroom triggers a documented capacity-addition project. The documented mechanism that any tier that falls below 20 percent headroom triggers an emergency capacity-addition project with documented expedited procurement.
The documented growth model in the 57 Studios production estate is a documented exponential growth model with a documented doubling period of approximately 14 months. The documented mechanism that drives the doubling is the documented growth of the player base, the documented growth of the per-instance asset footprint, and the documented growth of the inter-rack replication footprint. The documented capacity-planning model accounts for all three documented growth contributions.
The documented capacity-planning flowchart documents the documented decision tree that the quarterly capacity review executes. The documented mechanism that drives the cadence is the documented evidence that quarterly is the documented frequency that surfaces growth trends with documented confidence; the documented evidence that monthly is excessive for the documented growth rate; and the documented evidence that annual is insufficient to avoid documented emergency capacity additions.
Appendix E: Documented IPv6 specification
The documented network infrastructure specification includes IPv6 at every layer. The documented mechanism is that the documented IPv4 specification is layered on a documented IPv6 specification, and the documented operational posture is dual-stack across every documented VLAN.
The documented IPv6 specification:
- Documented /44 IPv6 prefix from the regional internet registry.
- Documented /64 per VLAN.
- Documented IPv6 next-hop on every documented BGP session.
- Documented IPv6 transit on every documented upstream transit provider.
- Documented IPv6 monitoring at every documented telemetry pipeline.
- Documented IPv6 firewall policy that mirrors the documented IPv4 firewall policy.
The documented operational experience in the 57 Studios production estate is that IPv6 traffic is approximately 25-40 percent of the documented aggregate traffic, with documented seasonal variation. The documented mechanism that drives the variation is the documented mix of player geographies, the documented IPv6 adoption rate of upstream consumer ISPs, and the documented adoption of IPv6 by mobile carriers.
Pro tip
Specify IPv6 from day one. The documented operational experience is that retroactively adding IPv6 to a documented IPv4-only deployment is documented as substantially more operationally complex than specifying IPv6 from day one. The documented practice is dual-stack from initial provisioning, with the documented firewall policy mirrored across both address families.
Appendix F: Documented physical layer reference
The documented physical layer for the network infrastructure includes documented cabinet specifications, documented patch panel specifications, documented cable specifications, and documented labeling specifications.
| Documented physical layer component | Documented specification | Documented notes |
|---|---|---|
| Server cabinet | 42U or 48U, 800mm wide, 1200mm deep | Documented to accommodate vertical cable management |
| Top-of-rack switch position | Documented top 2U | Documented for cable management |
| Patch panel | Documented angled 24-port LC duplex | Documented angle reduces bend radius |
| Patch cable | Documented LC-LC OS2 SMF, 2-3m runs | Documented within rack |
| Trunk cable | Documented MPO-24 OS2 SMF | Documented between racks |
| Cable label | Documented thermal-transfer printed wrap-around label | Documented both ends of every cable |
| Cabinet PDU | Documented per Power and UPS Configuration | Documented dual-feed |
The documented physical layer specifications are the documented baseline; the documented mechanism that drives variation is the documented physical constraints of the documented site. The documented practice is that the documented physical layer is documented at the time of build and is updated as the documented build evolves.
Appendix G: Documented sample failover timeline
The documented sample failover timeline documents the documented sequence of events that occurs during a documented BGP failover. The documented timeline is the documented observed timeline from a documented quarterly fail-over exercise.
| T+ | Documented event | Documented observation |
|---|---|---|
| 0.0s | ISP A link administratively shut down | Edge router 1 loses transit to ISP A |
| 0.0s | Edge router 1 BGP session to ISP A enters Idle state | BGP session state change logged |
| 0.0s | Edge router 1 withdraws ISP A routes from local RIB | Routes withdrawn |
| 0.0s | Edge router 1 propagates withdrawal to edge router 2 via iBGP | iBGP update sent |
| 0.1s | Edge router 2 receives withdrawal | iBGP update received |
| 0.1s | Edge router 2 selects ISP B route as new best path | Best path updated |
| 0.1s | Forwarding table updated on edge router 2 | FIB updated |
| 0.2s | Forwarding table updated on edge router 1 | FIB updated |
| 0.3s | Inbound traffic begins arriving via ISP B exclusively | Per-port telemetry confirms |
| 0.3s | Outbound traffic begins routing via ISP B exclusively | Per-port telemetry confirms |
| 0.5s | Documented Unturned player session continuity verified | Zero session loss |
| 1.0s | Documented inter-rack replication continuity verified | Zero replication interruption |
| 5.0s | Documented end-to-end failover complete and verified | All documented success criteria met |
The documented sample timeline documents the documented observed behavior of the documented production estate during a documented quarterly fail-over exercise. The documented mechanism that drives the documented sub-second failover behavior is the documented BGP timer configuration, the documented iBGP topology, and the documented forwarding-table propagation behavior of the documented edge router platform.
Appendix H: Documented training and runbook reference
The documented network infrastructure is operated by documented operators with documented training against documented runbooks. The documented training scope includes the documented topology, the documented configuration, the documented monitoring stack, the documented change management posture, and the documented incident response procedure.
The documented runbook reference:
- Documented topology reference (this article).
- Documented switch configuration reference (vendor-specific, documented per deployment).
- Documented monitoring stack reference (documented per deployment).
- Documented change management procedure (documented per deployment).
- Documented incident response procedure (documented per deployment).
- Documented quarterly fail-over exercise procedure (documented in this article).
- Documented capacity-planning model (documented in this article).
- Documented IPv6 specification (documented in this article).
The documented training cadence in the 57 Studios production estate is documented as quarterly for the documented operations team and documented as annual for the documented on-call rotation. The documented mechanism that drives the cadence is the documented evidence that quarterly training maintains documented operational competence and documented annual training is insufficient to maintain documented competence on the documented full scope.
Best practice
Document every runbook in a version-controlled repository alongside the documented configuration. The documented practice is that the runbook repository is the documented source of truth for the documented operational procedure, and the documented mechanism that drives documented operational consistency is the documented availability of the documented runbook to every documented operator at the documented time of need.
Closing
The documented network infrastructure for production Unturned hosting at 57 Studios is 100 Gigabit Ethernet at every aggregation tier, dual switches at every layer, BGP failover to dual upstream transit providers, documented VLAN segmentation, documented cable management, documented change management, documented monitoring, and quarterly verification of the documented redundancy posture. The documented framework is the documented professional standard. The framework is what self-hosting on owned enterprise hardware looks like when it is operated to the documented standard, and the framework is the baseline that the rest of the self-hosting documentation builds on.
The documented framework is also the baseline that the documented capacity-planning model, the documented IPv6 specification, the documented physical layer reference, the documented sample failover timeline, and the documented training and runbook reference all build on. The documented mechanism that drives the documented overall posture is the documented compound effect of every documented layer operating to the documented standard, and the documented operational experience in the 57 Studios production estate is that the documented compound effect produces the documented operational outcome of zero documented session loss across every documented quarterly fail-over exercise since the documented topology was commissioned.
The next article in the self-hosting series, Power and UPS Configuration, documents the power infrastructure that the documented network depends on. The documented network is only as available as the documented power, and the documented power posture is the chained UPS configuration that the next article documents in full. The documented chained UPS configuration is the documented power-layer equivalent of the documented dual-switch, dual-BGP topology that this article documents at the network layer, and the documented operational standard at both layers is the same: redundancy at every layer is the documented professional standard, not a documented luxury.
