How to Save a DAT File with Correct Encoding
The Unturned™ engine reads .dat configuration files in a specific text encoding: UTF-8 without a byte-order mark. If a .dat file is saved with the wrong encoding, the Unturned engine fails to parse the file at server startup, and the item, vehicle, or asset defined in that file does not appear in the game. The error messages produced by the engine are not always clear about the cause, and many new modders spend hours debugging an item that fails to load when the underlying problem is a single hidden byte at the start of the file. This reference explains text encoding, the byte-order mark problem, and the exact procedure for saving a .dat file in the encoding that the Unturned engine expects.
The reference is the third and final reference in the Notepad++ section. The first reference covered installation, the second reference covered opening a .dat file, and this reference covers saving the file in the correct encoding. The three references together form the complete editor-side reference for Unturned mod development. Readers are encouraged to read the three references in order on first pass and to return to specific sections as needed during the project lifecycle.
The encoding requirement documented here is the single most important detail in the entire Notepad++ section. The Unturned engine has enforced the UTF-8 without BOM requirement since the early releases of the engine, and the requirement has not changed across engine updates. The procedures documented in this reference will continue to apply to future engine releases. 57 Studios™ treats this reference as foundational training for every contributor to its mod portfolio.
Prerequisites
Before you begin, confirm the following:
- Notepad++ is installed on your Windows computer.
- You have a
.datfile open in Notepad++ that you want to save. - You have read the previous two articles in this section.
- You understand the structure of an Unturned
.datfile from the previous reference. - The file you want to save is in a writeable workspace folder, not the Steam-managed Unturned content folder.
The fifth prerequisite carries forward from the previous reference. The Steam-managed Unturned content folder is owned by Steam, and any edits made to files in that folder will be overwritten the next time Steam updates the Unturned game files. The expected practice across the 57 Studios™ ecosystem is to copy a .dat file from the Steam-managed folder to a workspace folder before opening it for editing, and to save the edited file back to the workspace folder rather than to the original location.
Common mistake
Saving an edited .dat file back to the Steam-managed Unturned content folder. The save itself may succeed, but the file will be overwritten the next time Steam updates the game. Always save edits to a workspace folder under the modder's home directory.
What you'll learn
By the end of this reference, you will know how to:
- Read the encoding indicator in the Notepad++ status bar.
- Use the Encoding menu to convert a file between encodings.
- Identify and remove a byte-order mark from a
.datfile. - Choose between Windows-style and Unix-style line endings.
- Save a
.datfile in a format the Unturned engine accepts. - Configure Notepad++ to default to UTF-8 without BOM for new files.
- Batch-convert a folder of
.datfiles between encodings. - Verify the encoding of a
.datfile from the PowerShell command line. - Recognise the symptoms of an encoding-related parse failure in the Unturned engine.
Background: what is text encoding?
A text file is, at the lowest level, a sequence of bytes. An encoding is the rule that maps each byte (or group of bytes) to a character. The same bytes can mean different things depending on which encoding is used to interpret them. For English text using only standard letters and punctuation, most encodings produce the same result. For special characters, accented letters, and non-Latin scripts, different encodings produce different results.
UTF-8 is the modern standard encoding used by almost all software, web browsers, and operating systems. It can represent every character in every written language. The Unturned engine uses UTF-8 to read configuration files.
Did you know?
UTF-8 was designed in 1992 by Ken Thompson and Rob Pike at Bell Labs. It is now the dominant text encoding on the internet, used by over 98 percent of all web pages. Most modern programming languages and game engines use UTF-8 by default.
Best practice
Treat UTF-8 without BOM as the universal default for every text file in a 57 Studios™ project, not just .dat files. The same encoding choice covers .dat files, Markdown documentation, plugin configuration files, build scripts, and any other text file in the project. A single project-wide encoding policy eliminates entire classes of cross-file integration problems.
A short history of text encoding
The text-encoding landscape is the result of decades of evolution from single-byte character sets to the universal Unicode standard. The history is worth understanding because every encoding name in the Notepad++ Encoding menu corresponds to one stage of this evolution.
| Era | Encoding family | Bytes per character | Character coverage |
|---|---|---|---|
| 1960s | ASCII | 1 (7-bit) | English only |
| 1980s | Code pages (CP1252, CP437, CP850, ...) | 1 (8-bit) | One language family per code page |
| Late 1980s | DBCS (Shift-JIS, GB2312, Big5, ...) | 1 or 2 | One regional family per encoding |
| 1991 | Unicode (UCS-2) | 2 | Most living languages |
| 1992 | UTF-8 | 1 to 4 | All Unicode characters |
| 1996 | UTF-16 (with surrogates) | 2 or 4 | All Unicode characters |
| 2000s onward | UTF-8 as the de facto standard | 1 to 4 | All Unicode characters |
The Unturned engine sits firmly in the UTF-8 era. The engine was authored in a period when UTF-8 had already become the dominant encoding, and the engine's text-parsing code assumes UTF-8 as the input encoding. The legacy encodings (ASCII, code pages, DBCS, UTF-16) are not supported by the engine, although the engine produces no clear error message when a file in a legacy encoding is loaded.
The byte-order mark problem
A byte-order mark, often abbreviated BOM, is a special invisible character that some applications add to the start of a UTF-8 file. The byte-order mark serves no purpose in UTF-8 files. It exists for historical reasons related to other encodings such as UTF-16, where it indicates the order in which multi-byte characters are stored.
The Unturned engine does not strip the byte-order mark when reading a .dat file. If a file starts with a byte-order mark, the engine treats the mark as part of the first key, which causes the file to fail to parse.
Critical warning
The byte-order mark is invisible in every text editor. You cannot see it by looking at the file. You can only detect it through the encoding indicator at the bottom of the Notepad++ window. Always verify the encoding before saving a .dat file.
The byte-order mark itself is the byte sequence 0xEF 0xBB 0xBF at the start of a file. The three bytes encode the Unicode code point U+FEFF, which is the "zero-width no-break space" character. When Notepad++ reads a file that begins with this byte sequence, it identifies the file as UTF-8 with byte-order mark and displays the indicator UTF-8-BOM in the status bar. When Notepad++ reads a file without the byte sequence at the start, it identifies the file as plain UTF-8 and displays the indicator UTF-8.
The Unturned engine reads the byte sequence as if it were part of the first key on the first line. If the first line of the file is GUID 00112233..., the engine reads the line as <BOM>GUID 00112233.... The <BOM> prefix turns the key into a string that the engine does not recognise as GUID, and the parse fails on the first line of the file.
Common mistake
Concluding that the file is correctly encoded because the file content looks correct in the editor. The byte-order mark is invisible, and the content of the file is unaffected by the byte-order mark's presence. Only the status bar encoding indicator reveals whether the mark is present.
Step 1: Check the current encoding
Open a .dat file in Notepad++. Look at the bottom-right corner of the window. You will see one of several values displayed in the status bar.
| Status bar text | Meaning | Compatible with Unturned? |
|---|---|---|
| UTF-8 | UTF-8 without byte-order mark | Yes, this is correct |
| UTF-8-BOM | UTF-8 with byte-order mark | No, the mark must be removed |
| UTF-16 LE | UTF-16 little-endian | No, must be converted |
| UTF-16 BE | UTF-16 big-endian | No, must be converted |
| UTF-16 LE BOM | UTF-16 little-endian with BOM | No, must be converted |
| UTF-16 BE BOM | UTF-16 big-endian with BOM | No, must be converted |
| ANSI | Windows code page (typically 1252) | Usually works for English-only files, but not recommended |
If the indicator already reads UTF-8, no change is needed. If it reads anything else, continue to the next step.

Pro tip
The status bar encoding indicator updates whenever the encoding changes. After issuing a conversion command from the Encoding menu, immediately re-check the indicator to confirm the conversion took effect. The indicator is the single source of truth for the file's current encoding.
Step 2: Open the Encoding menu
Click the Encoding menu in the Notepad++ top menu bar. The menu lists every encoding Notepad++ supports.
The Encoding menu is organised in two sections. The top section sets the encoding for the current file without rewriting the bytes (the "set encoding" commands). The bottom section converts the current file's bytes from the active encoding to a new encoding (the "convert" commands). The distinction matters: the set commands relabel the file without changing its bytes, and the convert commands rewrite the bytes.
| Command | Effect on bytes | Use case |
|---|---|---|
| Encoding → UTF-8 | None (relabel only) | Remove BOM from an already-UTF-8 file |
| Encoding → UTF-8-BOM | None (relabel only) | Add BOM to a UTF-8 file (not for Unturned) |
| Encoding → Convert to UTF-8 | Rewrite bytes | Convert from non-UTF-8 encoding to UTF-8 |
| Encoding → Convert to UTF-8-BOM | Rewrite bytes | Convert and add BOM (not for Unturned) |
| Encoding → ANSI | None (relabel only) | Treat the file as ANSI (rarely useful) |
| Encoding → UTF-16 LE | None (relabel only) | Treat the file as UTF-16 (rarely useful) |
Common mistake
Selecting "Encoding → UTF-8" on a file that is currently UTF-16 or ANSI. The command relabels the file without converting the bytes, and the result is a file whose bytes do not match the declared encoding. The file appears garbled in the editor after the command. The correct command for a non-UTF-8 source file is "Convert to UTF-8."
Step 3: Convert to UTF-8 without BOM
The Encoding menu offers two relevant options:
- UTF-8, which sets the encoding to UTF-8 without a byte-order mark.
- Convert to UTF-8, which re-encodes the file from its current encoding to UTF-8 without a byte-order mark.
Use Convert to UTF-8 if the file is currently in a different encoding such as UTF-16 or ANSI. Use UTF-8 to remove the byte-order mark from a file that is already UTF-8.
Common mistake
Notepad++ also offers an option labeled "UTF-8-BOM" or "Convert to UTF-8-BOM." Do not select this option for Unturned .dat files. The Unturned engine cannot read files with a byte-order mark.
After selecting the correct option, check the status bar in the bottom-right corner. It should now read UTF-8.
Step 4: Verify line endings
Look at the line-ending indicator to the left of the encoding indicator. You will see one of three values:
| Indicator | Meaning | Compatible with Unturned? |
|---|---|---|
| Windows (CR LF) | Carriage return + line feed | Yes |
| Unix (LF) | Line feed only | Yes |
| Macintosh (CR) | Carriage return only (legacy) | No, must be converted |
The Unturned engine accepts both Windows and Unix line endings. The Macintosh (CR-only) format is a legacy format from classic Mac OS that is no longer used by any modern system. If your file uses this format, convert it before saving.
To convert line endings:
- Click the Edit menu in the top menu bar.
- Hover over EOL Conversion.
- Click Windows (CR LF) or Unix (LF) as appropriate.
Pro tip
For consistency within a 57 Studios™ project, choose one line-ending style and use it for every .dat file in the project. Mixing line endings within a single file or project makes diffs harder to read and can cause unexpected behavior in some tools.
Did you know?
The line-ending convention has its own history. The CRLF convention (two bytes per line break) dates to the era of mechanical teletype machines, where the carriage return moved the print head back to the left edge of the page and the line feed advanced the paper by one line. The LF-only convention was introduced by Unix to save the second byte. The CR-only convention was used by classic Mac OS and is now considered a legacy format.
Step 5: Save the file
Press Ctrl+S or click File → Save to save the file. Notepad++ writes the file to disk using the encoding and line endings you selected.
The save action is atomic with respect to the file's content. Notepad++ writes the new bytes to a temporary file in the same directory, then renames the temporary file over the original. The two-step pattern ensures that an interrupted save (for example, by a power loss or a crash) cannot leave the original file in a partially-written state.
Best practice
After saving, re-open the file (close the tab and open it again from File Explorer) and re-check the encoding indicator. The re-open confirms that the on-disk encoding matches what the editor displays. The check is a one-time confirmation that the save procedure was successful, not a routine check that needs to be performed on every save.
Step 6: Verify the save succeeded
After saving, the modder should verify that the on-disk file has the expected encoding. The verification can be done in three ways: through Notepad++ (close and re-open the file), through PowerShell (read the first three bytes), or through the in-game test (load the mod and confirm the item appears).
Verification through Notepad++
- Close the file's tab in Notepad++.
- Re-open the file using one of the methods documented in the previous reference.
- Read the encoding indicator in the bottom-right corner.
- The indicator should read
UTF-8with noBOMsuffix.
Verification through PowerShell
A short PowerShell command reads the first three bytes of the file and compares them against the byte-order mark sequence. If the first three bytes match 0xEF 0xBB 0xBF, the file has a byte-order mark. If they do not, the file is UTF-8 without BOM.
powershell
$bytes = [System.IO.File]::ReadAllBytes('C:\path\to\Item.dat')
$bom = ($bytes[0] -eq 0xEF) -and ($bytes[1] -eq 0xBB) -and ($bytes[2] -eq 0xBF)
if ($bom) { 'File has BOM (incorrect for Unturned)' } else { 'No BOM (correct for Unturned)' }Verification through in-game test
The final verification is the in-game test. Load the mod in an Unturned development server, issue the /give @me <id> command for the item defined by the .dat, and confirm the item appears in the player's inventory. A successful in-game test confirms that the file is correctly encoded, is correctly placed in the mod folder, and is referenced by the correct GUID.
Decision flowchart: choosing the correct encoding
Comparison: encoding options in Notepad++
The Encoding menu offers many options. The table below explains the four most common.
| Encoding | Bytes per character | Byte-order mark | Use case for Unturned |
|---|---|---|---|
| UTF-8 | 1 to 4 | No | Correct for all .dat files |
| UTF-8-BOM | 1 to 4, plus 3-byte BOM | Yes | Do not use |
| UTF-16 LE | 2 or 4 | Optional 2-byte BOM | Do not use |
| UTF-16 BE | 2 or 4 | Optional 2-byte BOM | Do not use |
| ANSI | 1 | No | Acceptable for English-only files but not recommended |
Best practice
Always use UTF-8 without BOM for every .dat file in a 57 Studios™ project. This applies to item .dat, vehicle .dat, animal .dat, and localization .dat files. The single encoding choice covers every case.
Side-by-side: how each encoding represents a single line
The same line of text is encoded differently in each encoding. The byte sequences below show the same line GUID abc encoded in four ways, with each byte represented as a two-digit hexadecimal value.
| Encoding | Bytes representing GUID abc |
|---|---|
| UTF-8 | 47 55 49 44 20 61 62 63 |
| UTF-8-BOM | EF BB BF 47 55 49 44 20 61 62 63 |
| UTF-16 LE | 47 00 55 00 49 00 44 00 20 00 61 00 62 00 63 00 |
| UTF-16 LE BOM | FF FE 47 00 55 00 49 00 44 00 20 00 61 00 62 00 63 00 |
| ANSI (CP1252) | 47 55 49 44 20 61 62 63 |
The UTF-8 and ANSI encodings produce identical byte sequences for this line because the line contains only standard ASCII characters. The two encodings differ only when the line contains characters outside the standard ASCII range (accented letters, special punctuation, non-Latin scripts). The UTF-8-BOM encoding adds three bytes at the start of the file (not at the start of each line). The UTF-16 LE encodings double the byte count for ASCII characters by appending a 0x00 byte to each character.
Did you know?
The byte-order mark in UTF-16 LE is FF FE, and in UTF-16 BE is FE FF. The two-byte sequence is also a valid UTF-16 code point (the "zero-width no-break space," same as the UTF-8 BOM but encoded differently). The Unturned engine fails to parse UTF-16 files for two reasons: the byte-order mark corrupts the first key, and the 0x00 interleaving in UTF-16 LE encoding is interpreted as a null terminator by the engine's string parser.
Advanced considerations
Setting a default encoding for new files
Notepad++ can be configured to use UTF-8 without BOM for every new file you create. This eliminates the chance of accidentally creating a file in the wrong encoding.
- Click Settings → Preferences.
- In the dialog that appears, click the New Document section in the left panel.
- Under Encoding, select UTF-8 and make sure Apply to opened ANSI files is unchecked.
- Click Close.
Pro tip
Confirm the new-document encoding setting on every workstation that touches the project, not just the modder's primary workstation. Different workstations may have been configured by different people, and a workstation with the wrong default produces wrongly-encoded files that the next person to edit them must reconvert.
Batch encoding conversion
If you have inherited a project with many .dat files in the wrong encoding, you can convert them all at once.
- Open every
.datfile in Notepad++ at the same time. You can do this by selecting them all in File Explorer and dragging them onto Notepad++. - Open the first tab.
- Click Encoding → Convert to UTF-8.
- Press Ctrl+S to save.
- Press Ctrl+Tab to switch to the next tab.
- Repeat steps 3 through 5 for every tab.
For very large projects, a scripted batch conversion using PowerShell is faster than manual conversion in Notepad++. The scripted approach is outside the scope of this reference, but the high-level pattern is to read each file's bytes, check for the byte-order mark, strip it if present, and write the bytes back. A community-maintained script that implements the pattern is linked in the 57 Studios™ contributor resources.
Verifying encoding from the command line
If you need to confirm that a .dat file has the correct encoding without opening it in Notepad++, you can use the PowerShell Get-Content command with the -Encoding Byte parameter to inspect the first bytes of the file. A UTF-8 file without a byte-order mark starts with the first byte of the first key. A UTF-8 file with a byte-order mark starts with the bytes 0xEF 0xBB 0xBF.
powershell
$bytes = [System.IO.File]::ReadAllBytes('C:\path\to\Item.dat')
'{0:X2} {1:X2} {2:X2}' -f $bytes[0], $bytes[1], $bytes[2]The command prints the first three bytes of the file as hexadecimal values. The expected output for a correctly-encoded file is the hex code of the first character of the first key (typically 47 for G of GUID). The unexpected output is EF BB BF, which indicates the byte-order mark is present.
Verifying encoding for an entire folder
A folder-wide encoding check confirms that every .dat file in the project is correctly encoded. The PowerShell snippet below reads the first three bytes of every .dat file in a folder tree and reports any file with a byte-order mark.
powershell
Get-ChildItem -Path 'C:\path\to\project' -Filter '*.dat' -Recurse |
ForEach-Object {
$bytes = [System.IO.File]::ReadAllBytes($_.FullName)
if ($bytes.Length -ge 3 -and $bytes[0] -eq 0xEF -and $bytes[1] -eq 0xBB -and $bytes[2] -eq 0xBF) {
"BOM found in $($_.FullName)"
}
}The snippet is useful for incoming mod-pack reviews. The 57 Studios™ contributor onboarding programme uses a variant of this snippet to verify that incoming contributions are correctly encoded before they are merged into the main project.

Common encoding-related parse failures
The Unturned engine fails to load an item, vehicle, or animal when the .dat file has the wrong encoding. The failure mode is consistent across the four common encoding errors.
| Encoding error | Failure mode | Detection |
|---|---|---|
| UTF-8 with BOM | Item does not load, no clear error in engine log | Status bar reads UTF-8-BOM |
| UTF-16 LE | Item does not load, engine log shows null-byte error | Status bar reads UTF-16 LE |
| UTF-16 BE | Item does not load, engine log shows unrecognised-key error | Status bar reads UTF-16 BE |
| ANSI with non-ASCII chars | Item loads, but description text is corrupted | Status bar reads ANSI |
Critical warning
The Unturned engine does not log a clear "encoding error" message for any of the four failure modes. The engine logs a generic parse failure or a missing-item warning, and the cause must be diagnosed by checking the encoding indicator on each .dat file in the affected mod. The diagnostic workflow always starts with the encoding indicator.
FAQ
Why does the Unturned engine not strip the byte-order mark automatically? The engine reads .dat files using a simple line-by-line parser that treats every byte as significant. Adding byte-order mark detection would slow down the parser. The simpler approach is to require files to be saved without a byte-order mark in the first place.
Will saving a .dat file as ANSI work? For files containing only English letters, digits, and standard punctuation, ANSI encoding produces identical bytes to UTF-8 and works correctly. For files containing accented characters, special punctuation, or any non-English script, ANSI produces incorrect output. UTF-8 is the safe choice in all cases.
Can I tell if a .dat file has a byte-order mark by looking at it? No. The byte-order mark is invisible in every text editor. The only reliable way to check is the encoding indicator in the Notepad++ status bar or a hex viewer.
Does the line-ending choice affect game behavior? No. The Unturned engine reads both CRLF and LF without difference. The choice is purely about consistency for human readers and version-control diffs.
What happens if I save with the wrong encoding by accident? Reopen the file in Notepad++, convert to UTF-8 without BOM using the Encoding menu, and save again. The corruption is reversible as long as the file content itself was not damaged.
Can I open a UTF-16 file and convert it to UTF-8 directly? Yes. Open the file, confirm the status bar reads UTF-16 LE or UTF-16 BE, click Encoding → Convert to UTF-8, confirm the status bar now reads UTF-8, and save. The conversion is lossless because UTF-16 and UTF-8 both encode the full Unicode character set.
Why does an ANSI file with non-English characters fail to load even though Notepad++ shows the file correctly? Notepad++ displays the file using the active Windows code page (typically CP1252 on English Windows installations). The Unturned engine reads the file as UTF-8 and interprets the same bytes differently. The mismatch is the source of the corruption. The fix is to convert the file to UTF-8 before saving.
Can I configure Notepad++ to always strip the BOM when saving? The Settings → Preferences → New Document → Encoding setting controls the default encoding for new files but does not strip the BOM from existing files. To strip the BOM from an existing file, use the Encoding → UTF-8 command. To prevent the BOM from being added in the first place, ensure the new-document encoding is set to UTF-8 (without BOM).
What is the difference between UTF-16 LE and UTF-16 BE? The two encodings store the same character set with different byte orderings. UTF-16 LE (little-endian) stores the low byte of each character first. UTF-16 BE (big-endian) stores the high byte first. The byte-order mark distinguishes the two when present. Neither encoding is supported by the Unturned engine.
My .dat file appears to load correctly in the editor but the Unturned engine reports a parse error. What is the cause? The most common cause is a byte-order mark at the start of the file. The mark is invisible in the editor but is read by the engine as part of the first key. Check the status bar encoding indicator; if it reads UTF-8-BOM, the mark is the cause.
Why is there a "UTF-8" option and a "Convert to UTF-8" option in the Encoding menu? The two options have different effects. "UTF-8" relabels the file as UTF-8 without changing the bytes. "Convert to UTF-8" rewrites the bytes from the current encoding to UTF-8. Use "Convert to UTF-8" when the file is not already UTF-8; use "UTF-8" when the file is already UTF-8 and only needs the byte-order mark removed.
Does the line-ending choice affect file size on disk? Yes, marginally. CRLF uses two bytes per line break; LF uses one byte. A .dat file with 50 lines saves 50 bytes by using LF instead of CRLF. The savings are not practically meaningful for .dat files, which are typically a few kilobytes.
Best practices
- Configure Notepad++ to use UTF-8 without BOM as the default for new files.
- Always check the encoding indicator in the bottom-right corner before saving.
- Pick one line-ending style for your entire project and use it consistently.
- Never use the UTF-8-BOM option in the Encoding menu for Unturned
.datfiles. - When inheriting
.datfiles from another modder, convert every file to UTF-8 without BOM before editing. - Run the folder-wide PowerShell BOM check on every incoming mod-pack contribution.
- Re-verify the encoding after saving by closing and re-opening the file.
- Confirm the new-document encoding preference is set correctly on every workstation that touches the project.
Appendix A: Encoding troubleshooting decision tree
The decision tree below covers the most common encoding-related issues encountered during Unturned mod development. The tree is the recommended diagnostic flow when an item fails to load and the modder suspects an encoding issue.
The decision tree resolves the four most common encoding-related issues. Each leaf node ends in a "save and retest" step, which is the canonical loop for resolving an encoding issue.
Appendix B: Encoding policy for the 57 Studios contributor workflow
The 57 Studios™ contributor workflow includes a small set of encoding-related policies that have evolved across several years of mod development. The policies are documented here for transparency and to give community contributors a reference for their own workflows.
| Policy | Rationale |
|---|---|
UTF-8 without BOM is the only accepted encoding for .dat files | Matches the Unturned engine's expectation |
| Mixed CRLF / LF within a single project is rejected during code review | Diff readability and tool consistency |
| New-document encoding preference must be UTF-8 on every contributor workstation | Prevents accidental BOM addition on new files |
| Incoming contributions are checked with the folder-wide PowerShell BOM script | Catches encoding issues before merge |
Localisation .dat files in non-English languages must be verified by a native speaker | Catches ANSI / UTF-8 mismatch that the BOM check does not detect |
| Workshop releases require a final encoding check before upload | Prevents player-side load failures |
| Every contributor receives a copy of the encoding-policy document during onboarding | Ensures policy awareness across the contributor base |
| Encoding-related issues are tagged in the project tracker with a dedicated label | Allows historical tracking of encoding-related bugs |
Best practice
Adopting a similar encoding policy in your own project is recommended. The policy does not need to be complex. A single sentence ("UTF-8 without BOM, CRLF line endings, verified on every commit") covers the essentials.
Appendix C: Glossary of encoding terminology
The following terms appear throughout this reference. The glossary is the recommended quick-reference for terms whose meaning may be unclear to readers new to text encoding.
- ANSI — A Notepad++ shorthand for the active Windows code page (typically CP1252 on English Windows installations). The label is misleading: ANSI is not actually an encoding name in any standards body. The label predates Notepad++'s adoption of explicit code-page names.
- ASCII — The American Standard Code for Information Interchange, a 7-bit character set covering English letters, digits, and standard punctuation. ASCII is a subset of UTF-8: every ASCII file is also a valid UTF-8 file without BOM.
- Big-endian — A byte order in which the most-significant byte of a multi-byte value is stored first. UTF-16 BE uses big-endian byte order.
- BOM — Byte-order mark, a special invisible character at the start of a file that signals the encoding. The Unturned engine does not strip the BOM and treats it as part of the first key.
- Code page — A legacy concept from Windows that mapped each byte to a character in a specific language family. CP1252 (Western European) and CP932 (Japanese Shift-JIS) are common code pages.
- CRLF — Carriage return + line feed, the Windows line-ending convention. Two bytes per line break.
- DBCS — Double-byte character set, a legacy encoding family for languages with large character sets (Chinese, Japanese, Korean). Predates Unicode.
- Encoding — The rule that maps a sequence of bytes to a sequence of characters. UTF-8 is the modern standard encoding.
- Endianness — The order in which bytes of a multi-byte value are stored. Relevant only to UTF-16 and UTF-32; UTF-8 has no byte-order ambiguity.
- LF — Line feed, the Unix line-ending convention. One byte per line break.
- Little-endian — A byte order in which the least-significant byte of a multi-byte value is stored first. UTF-16 LE uses little-endian byte order.
- Unicode — The universal character set covering every character in every written language. Unicode is not itself an encoding; UTF-8 and UTF-16 are two encodings that represent Unicode characters.
- UTF-8 — A variable-width encoding of Unicode that uses 1 to 4 bytes per character. UTF-8 is backward-compatible with ASCII and is the modern de facto standard.
- UTF-8 with BOM — UTF-8 with a three-byte byte-order mark at the start of the file. The byte-order mark serves no purpose in UTF-8 and is not supported by the Unturned engine.
- UTF-16 — A variable-width encoding of Unicode that uses 2 or 4 bytes per character. UTF-16 has two byte orders (LE and BE) and an optional byte-order mark.
Appendix D: Encoding-related historical context for Unturned
The Unturned engine's strict UTF-8 without BOM requirement has a small amount of historical context that is worth knowing for modders who have inherited very old .dat files.
| Era | Engine behaviour | Mod-author experience |
|---|---|---|
| Pre-2015 | Engine accepted ASCII only | Mods authored in English Notepad worked; non-English failed silently |
| 2015 - 2018 | Engine added UTF-8 support without BOM | Most authors switched to Notepad++ during this window |
| 2018 - 2022 | Engine became strict about BOM | The current behaviour was established |
| 2022 - present | Engine behaviour stable | Current expectations |
The engine's behaviour has not changed since 2018, and the encoding requirement documented in this reference is expected to remain stable across future engine releases. Mods authored in the pre-2015 era may use ASCII-only .dat files that work correctly with the current engine; the ASCII files are valid UTF-8 by construction and require no conversion.
Did you know?
The transition from ASCII-only to UTF-8 in the 2015-2018 window was the source of a small wave of mod-update releases in which the only change was the encoding of the existing .dat files. The transition is documented in the Unturned community archives and is a good reminder that encoding choices have a long-term impact on a mod's portability across engine versions.
Cross-references
- How to Install Notepad++ — The first reference in this section. Required reading if Notepad++ is not already installed.
- How to Open a DAT File — The second reference in this section. Documents the structure of a typical Unturned
.datfile. - Why macOS is Preferred for Unturned Modding — The next reference, the first of the macOS modding guide.
Document history
| Version | Date | Author | Notes |
|---|---|---|---|
| 1.0 | 2024-04-02 | 57 Studios | Initial publication with the five-step save procedure and the encoding table. |
| 1.1 | 2024-08-15 | 57 Studios | Added decision flowchart and encoding-options comparison table. |
| 1.2 | 2024-11-22 | 57 Studios | Added the batch-conversion advanced consideration and the PowerShell verification snippet. |
| 1.3 | 2025-03-04 | 57 Studios | Added the folder-wide PowerShell BOM check snippet. |
| 2.0 | 2026-05-17 | 57 Studios | Major expansion: encoding history table, side-by-side byte-level encoding comparison, contributor-workflow policy table, glossary, troubleshooting decision tree, and extended FAQ. |
Appendix E: Encoding considerations for non-English localisation files
The Unturned engine supports localisation files in any language. The non-English files are subject to the same encoding requirement as the English file: UTF-8 without byte-order mark. The encoding requirement is more important for non-English files than for English files, because non-English files contain characters outside the standard ASCII range that depend on the correct encoding to display correctly.
The table below documents the encoding-related considerations for localisation files in several common languages.
| Language | Characters with non-ASCII codepoints | Encoding sensitivity |
|---|---|---|
| English | None typical | Low |
| Spanish | á, é, í, ó, ú, ñ, ü, ¿, ¡ | Medium |
| French | à, â, é, è, ê, ë, î, ï, ô, û, ù, ç, œ | Medium |
| German | ä, ö, ü, ß | Medium |
| Polish | ą, ć, ę, ł, ń, ó, ś, ź, ż | High |
| Russian | Full Cyrillic alphabet | High |
| Greek | Full Greek alphabet | High |
| Japanese | Full Kana and Kanji | Very high |
| Chinese (Simplified) | Full Hanzi character set | Very high |
| Korean | Full Hangul syllable set | Very high |
| Arabic | Full Arabic alphabet with diacritics | Very high |
Common mistake
Authoring a non-English localisation file in default Windows Notepad. Default Notepad saves the file as UTF-8 with byte-order mark on Windows 10 and Windows 11, which fails to load in the Unturned engine. Always use Notepad++ for localisation file authoring and verify the encoding indicator before saving.
For languages with very high encoding sensitivity (Japanese, Chinese, Korean, Arabic), the recommended practice is to have a native speaker review the file in Notepad++ after the modder saves it. A native speaker can catch mojibake (character corruption from encoding mismatch) that an automated check cannot. The 57 Studios™ contributor workflow includes a native-speaker review step for every non-English localisation contribution.
Appendix F: Encoding-related git hooks
For modders who maintain their project under git, a pre-commit hook that rejects files with a byte-order mark is a strong preventive measure. The hook runs the folder-wide PowerShell BOM check before every commit and fails the commit if any file in the staging area has a byte-order mark.
The recommended hook implementation is a PowerShell script saved as .git/hooks/pre-commit in the project repository. The script reads the list of staged files, checks each .dat file for the byte-order mark, and exits with a non-zero status if any file fails the check.
powershell
# .git/hooks/pre-commit (PowerShell wrapper, called from shell hook)
$staged = git diff --cached --name-only --diff-filter=ACM | Where-Object { $_ -like '*.dat' }
$failed = @()
foreach ($file in $staged) {
if (Test-Path $file) {
$bytes = [System.IO.File]::ReadAllBytes($file)
if ($bytes.Length -ge 3 -and $bytes[0] -eq 0xEF -and $bytes[1] -eq 0xBB -and $bytes[2] -eq 0xBF) {
$failed += $file
}
}
}
if ($failed.Count -gt 0) {
Write-Host 'BOM detected in:' -ForegroundColor Red
$failed | ForEach-Object { Write-Host " $_" -ForegroundColor Red }
exit 1
}
exit 0Best practice
A pre-commit hook is the most effective preventive measure against encoding regressions in a long-lived project. The hook catches encoding mistakes at commit time, before the wrongly-encoded file reaches the shared repository, which is several steps earlier than catching the mistake when the in-game test fails.
Pro tip
The hook can be extended to also check for inconsistent line endings, trailing whitespace, and other project conventions. The 57 Studios™ template projects include an extended hook that covers all the project's text-file conventions in a single check.
Appendix G: Encoding interaction with version control
Version-control systems treat text files and binary files differently. Most version-control systems (git, mercurial, subversion) auto-detect file type by inspecting the bytes of the file. The detection can be confused by files with unusual encodings, particularly UTF-16, which contains many 0x00 bytes that the auto-detection treats as a binary signal.
| Encoding | Git default behaviour | Recommended .gitattributes setting |
|---|---|---|
| UTF-8 | Treated as text | *.dat text eol=lf |
| UTF-8-BOM | Treated as text | Not recommended |
| UTF-16 LE | Treated as binary | Not recommended |
| UTF-16 BE | Treated as binary | Not recommended |
| ANSI | Treated as text | *.dat text eol=lf |
The recommended .gitattributes setting for an Unturned mod project is:
*.dat text eol=lf
*.dat working-tree-encoding=UTF-8
*.dat -text -merge=binaryThe first line declares .dat files as text with Unix line endings on disk. The second line declares the working-tree encoding as UTF-8, which is the encoding git uses when checking the file out of the repository. The third line is optional and prevents git from attempting to merge .dat files; the recommended merge strategy for .dat files is manual review rather than automatic merge.
Common mistake
Allowing git to auto-convert line endings on .dat files (the core.autocrlf=true setting on Windows). The setting silently rewrites line endings on checkout and check-in, which can produce a file in the working tree that does not match the file in the repository. The recommended setting for Unturned mod projects is core.autocrlf=false with explicit .gitattributes rules.
Appendix H: Encoding considerations for mod-pack distribution
The Unturned engine reads .dat files at server startup and at workshop-subscription load time. Both paths apply the same encoding requirement. The distribution-time considerations are documented below.
| Distribution channel | Encoding-related consideration |
|---|---|
| Steam Workshop | Encoding is preserved through workshop upload and download |
| Direct download | Encoding is preserved through standard archive formats (zip, 7z) |
| Email attachment | Encoding may be modified by email client; recommend archive format |
| Cloud-share link | Encoding is preserved through standard cloud-share platforms |
| In-Discord file attachment | Encoding is preserved through Discord's file-attachment system |
Did you know?
The Steam Workshop preserves file bytes exactly during upload and download. A .dat file that is correctly encoded at upload time will be correctly encoded at download time on every subscriber's workstation. The Workshop is the recommended distribution channel for Unturned mods specifically because of this preservation.
For direct distribution outside the Steam Workshop, the recommended archive format is .zip. The format is universally supported, preserves file bytes exactly, and is the format the Unturned engine itself uses for loose-file mod distribution. Other archive formats (.7z, .rar, .tar.gz) also preserve file bytes correctly, but .zip is the format that the largest fraction of Unturned modders can extract without installing additional software.
Appendix I: Encoding regression test for ongoing projects
A small encoding regression test, run on a regular cadence, catches encoding regressions before they reach players. The test is a simple expansion of the folder-wide PowerShell BOM check.
The test runs as a scheduled task on the modder's primary workstation. The schedule is typically nightly during active development and weekly during maintenance. The task runs the folder-wide PowerShell check against every .dat file in the project and writes a one-line report to a log file. The log is reviewed by the modder on a regular cadence.
The test catches the following classes of regression:
- A contributor commits a wrongly-encoded file without running the pre-commit hook.
- A merge resolves to a file with the wrong encoding.
- A file-system corruption introduces a byte-order mark.
- A backup-restore operation restores a file from an old backup that has the wrong encoding.
- An external tool (such as a build script) rewrites a file with the wrong encoding.
Pro tip
The encoding regression test should be paired with an in-game smoke test that loads the mod and confirms a small set of items appear correctly in the player inventory. The two tests together catch encoding-related issues at two different stages: the regression test catches the issue at file level, and the smoke test catches the issue at engine level.
Appendix J: Encoding-related recovery from corrupted files
If a .dat file is corrupted in a way that cannot be repaired through the standard encoding-conversion procedure, the recovery options are documented below.
| Corruption type | Recovery procedure |
|---|---|
| BOM only (file otherwise correct) | Encoding → UTF-8, save |
| Encoding wrong, content intact | Encoding → Convert to UTF-8, save |
| Content mojibake from wrong encoding | Open in original encoding, save as UTF-8 |
| File truncated mid-line | Restore from backup, or manually re-author the missing keys |
| File contains non-printable bytes | Restore from backup; the file is not recoverable |
| File replaced by zero bytes | Restore from backup; the file is not recoverable |
The first three recovery procedures are routine and require only Notepad++. The last three require restoration from backup or version control. The 57 Studios™ workflow recommends keeping a daily file-level backup of every project, in addition to the version-control history, to provide a second recovery path for file-system-level corruption.
Best practice
The Notepad++ auto-backup directory at %APPDATA%\Notepad++\backup\ is a useful secondary recovery source. The directory contains timestamped copies of every file edited in Notepad++, and a recent corruption can often be recovered by copying a file from the auto-backup directory back to the project folder.
Next steps
You now know how to install Notepad++, open a .dat file, and save it in the encoding the Unturned engine expects. The 57 Studios™ knowledge base continues with the macOS modding guide, which explains why many professional Unturned modders prefer macOS as their development environment despite Unturned being a Windows-first game. Continue to Why macOS is Preferred for Unturned Modding.
The macOS guide is a longer section that documents the platform-choice reasoning, the recommended MacBook configurations, the acquisition channels, and the thermal-output framework that informs the seasonal scheduling adopted across the 57 Studios™ project portfolio. The macOS guide is recommended reading even for modders who do not plan to switch platforms, because the platform-choice reasoning includes context on the broader Unturned modding ecosystem that informs decisions across every project.
After the macOS guide, the knowledge base continues with sections on Blender setup, Unity Editor configuration, server hosting, and the Tebex commerce integration that 57 Studios™ uses to monetise its mod releases. The full knowledge base is structured as a sequential read for new contributors and as a topic-indexed reference for returning contributors.
