193 lines
7.0 KiB
ReStructuredText
193 lines
7.0 KiB
ReStructuredText
File Formats
|
|
============
|
|
|
|
This document captures source-file-format notes that complement the normative
|
|
requirements in ``requirements/source_file_formats.md``.
|
|
|
|
The first documented format is a Matroska source that carries styled ASS/SSA
|
|
subtitle streams together with embedded font attachments.
|
|
|
|
Styled ASS In Matroska With Embedded Fonts
|
|
------------------------------------------
|
|
|
|
These files are typically ``.mkv`` releases where subtitle rendering quality
|
|
depends on keeping both parts of the subtitle package together:
|
|
|
|
* one or more subtitle streams with codec ``ass``
|
|
* one or more attachment streams that embed font files used by those subtitles
|
|
|
|
This matters because ASS subtitles are not plain text subtitles in the narrow
|
|
WebVTT sense. They can carry layout, styling, positioning, karaoke, signs, and
|
|
other typesetting effects. If the matching embedded fonts are lost, consumers
|
|
can still see subtitle text but the intended styling and sometimes glyph
|
|
coverage can be degraded.
|
|
|
|
For FFX this format is special because the ASS subtitle streams should remain
|
|
normally editable and mappable, while the related font attachments should be
|
|
transported unchanged.
|
|
|
|
Observed Sample
|
|
---------------
|
|
|
|
Assessment date: ``2026-04-17``
|
|
|
|
Observed sample file:
|
|
|
|
* ``tests/assets/boruto_s01e283_ssa.mkv``
|
|
|
|
Commands used for assessment:
|
|
|
|
.. code-block:: bash
|
|
|
|
ffprobe tests/assets/boruto_s01e283_ssa.mkv
|
|
ffprobe -hide_banner -show_format -show_streams -of json tests/assets/boruto_s01e283_ssa.mkv
|
|
|
|
Observed stream layout:
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
|
|
* - Stream index
|
|
- Kind
|
|
- Key details
|
|
* - ``0``
|
|
- video
|
|
- ``codec_name=h264``
|
|
* - ``1``
|
|
- audio
|
|
- ``codec_name=aac``, ``language=jpn``
|
|
* - ``2``
|
|
- subtitle
|
|
- ``codec_name=ass``, ``language=ger``, default
|
|
* - ``3``
|
|
- subtitle
|
|
- ``codec_name=ass``, ``language=eng``
|
|
* - ``4``-``13``
|
|
- attachment
|
|
- ``tags.mimetype=font/ttf``, ``.ttf`` filenames
|
|
|
|
Observed attachment filenames:
|
|
|
|
* ``AmazonEmberTanuki-Italic.ttf``
|
|
* ``AmazonEmberTanuki-Regular.ttf``
|
|
* ``Arial.ttf``
|
|
* ``Arial Bold.ttf``
|
|
* ``Georgia.ttf``
|
|
* ``Times New Roman.ttf``
|
|
* ``Times New Roman Bold.ttf``
|
|
* ``Trebuchet MS.ttf``
|
|
* ``Verdana.ttf``
|
|
* ``Verdana Bold.ttf``
|
|
|
|
Important probe behavior from the real sample:
|
|
|
|
* Plain ``ffprobe`` lists the font streams as ``Attachment: none``.
|
|
* Plain ``ffprobe`` also prints warnings such as ``Could not find codec
|
|
parameters for stream 4 (Attachment: none): unknown codec`` and later
|
|
``Unsupported codec with id 0 for input stream ...``.
|
|
* The JSON produced by ``FileProperties.FFPROBE_COMMAND_TOKENS``
|
|
(``ffprobe -hide_banner -show_format -show_streams -of json``) still exposes
|
|
the attachment streams clearly through ``codec_type="attachment"`` and the
|
|
attachment tags.
|
|
* In that JSON, the attachment streams do not expose ``codec_name``.
|
|
|
|
This last point is important for FFX: robust detection must not depend on
|
|
attachment ``codec_name`` being present.
|
|
|
|
Detection Guidance
|
|
------------------
|
|
|
|
Current known indicators for this format are:
|
|
|
|
* one or more subtitle streams with ``codec_type="subtitle"`` and
|
|
``codec_name="ass"``
|
|
* one or more attachment streams with ``codec_type="attachment"``
|
|
* attachment tags that identify embedded fonts, especially
|
|
``tags.mimetype="font/ttf"``
|
|
* attachment filenames that end in ``.ttf``
|
|
|
|
The pattern can vary. FFX should therefore treat the above as a cluster of
|
|
signals rather than an exact signature tied to one file.
|
|
|
|
Inference from the observed sample plus FFmpeg documentation:
|
|
|
|
* MIME matching should not be limited to ``font/ttf`` alone.
|
|
* The Boruto sample uses ``font/ttf``.
|
|
* FFmpeg's Matroska attachment example uses
|
|
``mimetype=application/x-truetype-font`` for a ``.ttf`` attachment.
|
|
* Detection should therefore normalize multiple TTF-like MIME values rather
|
|
than depend on a single exact string.
|
|
|
|
Processing Expectations In FFX
|
|
------------------------------
|
|
|
|
The format-specific requirements live in
|
|
``requirements/source_file_formats.md``. In practical terms, FFX should:
|
|
|
|
* recognize the ASS-plus-font-attachment pattern even when attachment probe data
|
|
is incomplete
|
|
* tell the operator that the pattern was detected and that special handling is
|
|
being used
|
|
* reject sidecar subtitle import for such sources, because converting or
|
|
replacing these subtitle tracks with ordinary external text subtitles would
|
|
break the intended subtitle package
|
|
* continue to allow normal manipulation of the ASS subtitle tracks themselves
|
|
* preserve the font attachment streams unchanged
|
|
|
|
FFmpeg Notes
|
|
------------
|
|
|
|
Relevant FFmpeg documentation confirms several behaviors that line up with
|
|
FFX's needs:
|
|
|
|
* FFmpeg documents ``-attach`` as adding an attachment stream to the output, and
|
|
explicitly names Matroska fonts used in subtitle rendering as an example.
|
|
* FFmpeg documents attachment streams as regular streams that are created after
|
|
the mapped media streams.
|
|
* FFmpeg documents ``-dump_attachment`` for extracting attachment streams, which
|
|
is useful for debugging or validating a source file's embedded fonts.
|
|
* FFmpeg's Matroska example requires a ``mimetype`` metadata tag for attached
|
|
fonts, which is consistent with using attachment tags as detection signals.
|
|
* FFmpeg also notes that attachments are implemented as codec extradata. That
|
|
helps explain why probe output for attachment streams can look different from
|
|
ordinary audio, video, and subtitle streams.
|
|
|
|
Implication for FFX:
|
|
|
|
* Attachment preservation is not an optional cosmetic feature for this format.
|
|
It is part of preserving the subtitle package correctly.
|
|
|
|
Jellyfin Notes
|
|
--------------
|
|
|
|
Jellyfin's documentation also supports keeping this format intact:
|
|
|
|
* Jellyfin's subtitle compatibility table lists ``ASS/SSA`` as supported in
|
|
``MKV`` and not supported in ``MP4``.
|
|
* Jellyfin notes that when subtitles must be transcoded, they are either
|
|
converted to a supported format or burned into the video, and burning them in
|
|
is the most CPU-intensive path.
|
|
* Jellyfin's subtitle-extraction example for ``SSA/ASS`` first dumps attachment
|
|
streams and then extracts the ASS subtitle stream, which reflects the real
|
|
relationship between ASS subtitles and embedded fonts in MKV releases.
|
|
* Jellyfin's font documentation says text-based subtitles require fonts to
|
|
render properly.
|
|
* Jellyfin's configuration documentation says the web client uses configured
|
|
fallback fonts for ASS subtitles when other fonts such as MKV attachments or
|
|
client-side fonts are not available.
|
|
|
|
Inference from the Jellyfin compatibility tables:
|
|
|
|
* Keeping this subtitle format in Matroska is the safest interoperability choice
|
|
for Jellyfin consumers.
|
|
* Converting the subtitle payload to WebVTT would lose styled ASS behavior.
|
|
* Dropping the attachment streams would force client or fallback font
|
|
substitution and can change appearance or glyph coverage.
|
|
|
|
References
|
|
----------
|
|
|
|
* FFmpeg documentation: https://ffmpeg.org/ffmpeg.html
|
|
* Jellyfin codec support: https://jellyfin.org/docs/general/clients/codec-support/
|
|
* Jellyfin configuration and fonts: https://jellyfin.org/docs/general/administration/configuration/
|