# File Formats This document captures source-file-format notes that complement the normative requirements in `requirements/source_file_formats.md`. The first documented format is a Matroska source that carries styled ASS/SSA subtitle streams together with embedded font attachments. ## Styled ASS In Matroska With Embedded Fonts These files are typically `.mkv` releases where subtitle rendering quality depends on keeping both parts of the subtitle package together: - one or more subtitle streams with codec `ass` - one or more attachment streams that embed font files used by those subtitles This matters because ASS subtitles are not plain text subtitles in the narrow WebVTT sense. They can carry layout, styling, positioning, karaoke, signs, and other typesetting effects. If the matching embedded fonts are lost, consumers can still see subtitle text but the intended styling and sometimes glyph coverage can be degraded. For FFX this format is special because the ASS subtitle streams should remain normally editable and mappable, while the related font attachments should be transported unchanged. ## Observed Sample Assessment date: `2026-04-17` Observed sample file: - `tests/assets/boruto_s01e283_ssa.mkv` Commands used for assessment: ```bash ffprobe tests/assets/boruto_s01e283_ssa.mkv ffprobe -hide_banner -show_format -show_streams -of json tests/assets/boruto_s01e283_ssa.mkv ``` Observed stream layout: | Stream index | Kind | Key details | | --- | --- | --- | | `0` | video | `codec_name=h264` | | `1` | audio | `codec_name=aac`, `language=jpn` | | `2` | subtitle | `codec_name=ass`, `language=ger`, default | | `3` | subtitle | `codec_name=ass`, `language=eng` | | `4`-`13` | attachment | `tags.mimetype=font/ttf`, `.ttf` filenames | Observed attachment filenames: - `AmazonEmberTanuki-Italic.ttf` - `AmazonEmberTanuki-Regular.ttf` - `Arial.ttf` - `Arial Bold.ttf` - `Georgia.ttf` - `Times New Roman.ttf` - `Times New Roman Bold.ttf` - `Trebuchet MS.ttf` - `Verdana.ttf` - `Verdana Bold.ttf` Important probe behavior from the real sample: - Plain `ffprobe` lists the font streams as `Attachment: none`. - Plain `ffprobe` also prints warnings such as `Could not find codec parameters for stream 4 (Attachment: none): unknown codec` and later `Unsupported codec with id 0 for input stream ...`. - The JSON produced by `FileProperties.FFPROBE_COMMAND_TOKENS` (`ffprobe -hide_banner -show_format -show_streams -of json`) still exposes the attachment streams clearly through `codec_type="attachment"` and the attachment tags. - In that JSON, the attachment streams do not expose `codec_name`. This last point is important for FFX: robust detection must not depend on attachment `codec_name` being present. ## Detection Guidance Current known indicators for this format are: - one or more subtitle streams with `codec_type="subtitle"` and `codec_name="ass"` - one or more attachment streams with `codec_type="attachment"` - attachment tags that identify embedded fonts, especially `tags.mimetype="font/ttf"` - attachment filenames that end in `.ttf` The pattern can vary. FFX should therefore treat the above as a cluster of signals rather than an exact signature tied to one file. Inference from the observed sample plus FFmpeg documentation: - MIME matching should not be limited to `font/ttf` alone. - The Boruto sample uses `font/ttf`. - FFmpeg's Matroska attachment example uses `mimetype=application/x-truetype-font` for a `.ttf` attachment. - Detection should therefore normalize multiple TTF-like MIME values rather than depend on a single exact string. ## Processing Expectations In FFX The format-specific requirements live in `requirements/source_file_formats.md`. In practical terms, FFX should: - recognize the ASS-plus-font-attachment pattern even when attachment probe data is incomplete - tell the operator that the pattern was detected and that special handling is being used - reject sidecar subtitle import for such sources, because converting or replacing these subtitle tracks with ordinary external text subtitles would break the intended subtitle package - continue to allow normal manipulation of the ASS subtitle tracks themselves - preserve the font attachment streams unchanged ## FFmpeg Notes Relevant FFmpeg documentation confirms several behaviors that line up with FFX's needs: - FFmpeg documents `-attach` as adding an attachment stream to the output, and explicitly names Matroska fonts used in subtitle rendering as an example. - FFmpeg documents attachment streams as regular streams that are created after the mapped media streams. - FFmpeg documents `-dump_attachment` for extracting attachment streams, which is useful for debugging or validating a source file's embedded fonts. - FFmpeg's Matroska example requires a `mimetype` metadata tag for attached fonts, which is consistent with using attachment tags as detection signals. - FFmpeg also notes that attachments are implemented as codec extradata. That helps explain why probe output for attachment streams can look different from ordinary audio, video, and subtitle streams. Implication for FFX: - Attachment preservation is not an optional cosmetic feature for this format. It is part of preserving the subtitle package correctly. ## Jellyfin Notes Jellyfin's documentation also supports keeping this format intact: - Jellyfin's subtitle compatibility table lists `ASS/SSA` as supported in `MKV` and not supported in `MP4`. - Jellyfin notes that when subtitles must be transcoded, they are either converted to a supported format or burned into the video, and burning them in is the most CPU-intensive path. - Jellyfin's subtitle-extraction example for `SSA/ASS` first dumps attachment streams and then extracts the ASS subtitle stream, which reflects the real relationship between ASS subtitles and embedded fonts in MKV releases. - Jellyfin's font documentation says text-based subtitles require fonts to render properly. - Jellyfin's configuration documentation says the web client uses configured fallback fonts for ASS subtitles when other fonts such as MKV attachments or client-side fonts are not available. Inference from the Jellyfin compatibility tables: - Keeping this subtitle format in Matroska is the safest interoperability choice for Jellyfin consumers. - Converting the subtitle payload to WebVTT would lose styled ASS behavior. - Dropping the attachment streams would force client or fallback font substitution and can change appearance or glyph coverage. ## References - FFmpeg documentation: https://ffmpeg.org/ffmpeg.html - Jellyfin codec support: https://jellyfin.org/docs/general/clients/codec-support/ - Jellyfin configuration and fonts: https://jellyfin.org/docs/general/administration/configuration/