6.7 KiB
File Formats
This document captures source-file-format notes that complement the normative
requirements in requirements/source_file_formats.md.
The first documented format is a Matroska source that carries styled ASS/SSA subtitle streams together with embedded font attachments.
Styled ASS In Matroska With Embedded Fonts
These files are typically .mkv releases where subtitle rendering quality
depends on keeping both parts of the subtitle package together:
- one or more subtitle streams with codec
ass - one or more attachment streams that embed font files used by those subtitles
This matters because ASS subtitles are not plain text subtitles in the narrow WebVTT sense. They can carry layout, styling, positioning, karaoke, signs, and other typesetting effects. If the matching embedded fonts are lost, consumers can still see subtitle text but the intended styling and sometimes glyph coverage can be degraded.
For FFX this format is special because the ASS subtitle streams should remain normally editable and mappable, while the related font attachments should be transported unchanged.
Observed Sample
Assessment date: 2026-04-17
Observed sample file:
tests/assets/boruto_s01e283_ssa.mkv
Commands used for assessment:
ffprobe tests/assets/boruto_s01e283_ssa.mkv
ffprobe -hide_banner -show_format -show_streams -of json tests/assets/boruto_s01e283_ssa.mkv
Observed stream layout:
| Stream index | Kind | Key details |
|---|---|---|
0 |
video | codec_name=h264 |
1 |
audio | codec_name=aac, language=jpn |
2 |
subtitle | codec_name=ass, language=ger, default |
3 |
subtitle | codec_name=ass, language=eng |
4-13 |
attachment | tags.mimetype=font/ttf, .ttf filenames |
Observed attachment filenames:
AmazonEmberTanuki-Italic.ttfAmazonEmberTanuki-Regular.ttfArial.ttfArial Bold.ttfGeorgia.ttfTimes New Roman.ttfTimes New Roman Bold.ttfTrebuchet MS.ttfVerdana.ttfVerdana Bold.ttf
Important probe behavior from the real sample:
- Plain
ffprobelists the font streams asAttachment: none. - Plain
ffprobealso prints warnings such asCould not find codec parameters for stream 4 (Attachment: none): unknown codecand laterUnsupported codec with id 0 for input stream .... - The JSON produced by
FileProperties.FFPROBE_COMMAND_TOKENS(ffprobe -hide_banner -show_format -show_streams -of json) still exposes the attachment streams clearly throughcodec_type="attachment"and the attachment tags. - In that JSON, the attachment streams do not expose
codec_name.
This last point is important for FFX: robust detection must not depend on
attachment codec_name being present.
Detection Guidance
Current known indicators for this format are:
- one or more subtitle streams with
codec_type="subtitle"andcodec_name="ass" - one or more attachment streams with
codec_type="attachment" - attachment tags that identify embedded fonts, especially
tags.mimetype="font/ttf" - attachment filenames that end in
.ttf
The pattern can vary. FFX should therefore treat the above as a cluster of signals rather than an exact signature tied to one file.
Inference from the observed sample plus FFmpeg documentation:
- MIME matching should not be limited to
font/ttfalone. - The Boruto sample uses
font/ttf. - FFmpeg's Matroska attachment example uses
mimetype=application/x-truetype-fontfor a.ttfattachment. - Detection should therefore normalize multiple TTF-like MIME values rather than depend on a single exact string.
Processing Expectations In FFX
The format-specific requirements live in
requirements/source_file_formats.md. In practical terms, FFX should:
- recognize the ASS-plus-font-attachment pattern even when attachment probe data is incomplete
- tell the operator that the pattern was detected and that special handling is being used
- reject sidecar subtitle import for such sources, because converting or replacing these subtitle tracks with ordinary external text subtitles would break the intended subtitle package
- continue to allow normal manipulation of the ASS subtitle tracks themselves
- preserve the font attachment streams unchanged
FFmpeg Notes
Relevant FFmpeg documentation confirms several behaviors that line up with FFX's needs:
- FFmpeg documents
-attachas adding an attachment stream to the output, and explicitly names Matroska fonts used in subtitle rendering as an example. - FFmpeg documents attachment streams as regular streams that are created after the mapped media streams.
- FFmpeg documents
-dump_attachmentfor extracting attachment streams, which is useful for debugging or validating a source file's embedded fonts. - FFmpeg's Matroska example requires a
mimetypemetadata tag for attached fonts, which is consistent with using attachment tags as detection signals. - FFmpeg also notes that attachments are implemented as codec extradata. That helps explain why probe output for attachment streams can look different from ordinary audio, video, and subtitle streams.
Implication for FFX:
- Attachment preservation is not an optional cosmetic feature for this format. It is part of preserving the subtitle package correctly.
Jellyfin Notes
Jellyfin's documentation also supports keeping this format intact:
- Jellyfin's subtitle compatibility table lists
ASS/SSAas supported inMKVand not supported inMP4. - Jellyfin notes that when subtitles must be transcoded, they are either converted to a supported format or burned into the video, and burning them in is the most CPU-intensive path.
- Jellyfin's subtitle-extraction example for
SSA/ASSfirst dumps attachment streams and then extracts the ASS subtitle stream, which reflects the real relationship between ASS subtitles and embedded fonts in MKV releases. - Jellyfin's font documentation says text-based subtitles require fonts to render properly.
- Jellyfin's configuration documentation says the web client uses configured fallback fonts for ASS subtitles when other fonts such as MKV attachments or client-side fonts are not available.
Inference from the Jellyfin compatibility tables:
- Keeping this subtitle format in Matroska is the safest interoperability choice for Jellyfin consumers.
- Converting the subtitle payload to WebVTT would lose styled ASS behavior.
- Dropping the attachment streams would force client or fallback font substitution and can change appearance or glyph coverage.
References
- FFmpeg documentation: https://ffmpeg.org/ffmpeg.html
- Jellyfin codec support: https://jellyfin.org/docs/general/clients/codec-support/
- Jellyfin configuration and fonts: https://jellyfin.org/docs/general/administration/configuration/