Convert docs to sphinx
This commit is contained in:
192
docs/file_formats.rst
Normal file
192
docs/file_formats.rst
Normal file
@@ -0,0 +1,192 @@
|
||||
File Formats
|
||||
============
|
||||
|
||||
This document captures source-file-format notes that complement the normative
|
||||
requirements in ``requirements/source_file_formats.md``.
|
||||
|
||||
The first documented format is a Matroska source that carries styled ASS/SSA
|
||||
subtitle streams together with embedded font attachments.
|
||||
|
||||
Styled ASS In Matroska With Embedded Fonts
|
||||
------------------------------------------
|
||||
|
||||
These files are typically ``.mkv`` releases where subtitle rendering quality
|
||||
depends on keeping both parts of the subtitle package together:
|
||||
|
||||
* one or more subtitle streams with codec ``ass``
|
||||
* one or more attachment streams that embed font files used by those subtitles
|
||||
|
||||
This matters because ASS subtitles are not plain text subtitles in the narrow
|
||||
WebVTT sense. They can carry layout, styling, positioning, karaoke, signs, and
|
||||
other typesetting effects. If the matching embedded fonts are lost, consumers
|
||||
can still see subtitle text but the intended styling and sometimes glyph
|
||||
coverage can be degraded.
|
||||
|
||||
For FFX this format is special because the ASS subtitle streams should remain
|
||||
normally editable and mappable, while the related font attachments should be
|
||||
transported unchanged.
|
||||
|
||||
Observed Sample
|
||||
---------------
|
||||
|
||||
Assessment date: ``2026-04-17``
|
||||
|
||||
Observed sample file:
|
||||
|
||||
* ``tests/assets/boruto_s01e283_ssa.mkv``
|
||||
|
||||
Commands used for assessment:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
ffprobe tests/assets/boruto_s01e283_ssa.mkv
|
||||
ffprobe -hide_banner -show_format -show_streams -of json tests/assets/boruto_s01e283_ssa.mkv
|
||||
|
||||
Observed stream layout:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Stream index
|
||||
- Kind
|
||||
- Key details
|
||||
* - ``0``
|
||||
- video
|
||||
- ``codec_name=h264``
|
||||
* - ``1``
|
||||
- audio
|
||||
- ``codec_name=aac``, ``language=jpn``
|
||||
* - ``2``
|
||||
- subtitle
|
||||
- ``codec_name=ass``, ``language=ger``, default
|
||||
* - ``3``
|
||||
- subtitle
|
||||
- ``codec_name=ass``, ``language=eng``
|
||||
* - ``4``-``13``
|
||||
- attachment
|
||||
- ``tags.mimetype=font/ttf``, ``.ttf`` filenames
|
||||
|
||||
Observed attachment filenames:
|
||||
|
||||
* ``AmazonEmberTanuki-Italic.ttf``
|
||||
* ``AmazonEmberTanuki-Regular.ttf``
|
||||
* ``Arial.ttf``
|
||||
* ``Arial Bold.ttf``
|
||||
* ``Georgia.ttf``
|
||||
* ``Times New Roman.ttf``
|
||||
* ``Times New Roman Bold.ttf``
|
||||
* ``Trebuchet MS.ttf``
|
||||
* ``Verdana.ttf``
|
||||
* ``Verdana Bold.ttf``
|
||||
|
||||
Important probe behavior from the real sample:
|
||||
|
||||
* Plain ``ffprobe`` lists the font streams as ``Attachment: none``.
|
||||
* Plain ``ffprobe`` also prints warnings such as ``Could not find codec
|
||||
parameters for stream 4 (Attachment: none): unknown codec`` and later
|
||||
``Unsupported codec with id 0 for input stream ...``.
|
||||
* The JSON produced by ``FileProperties.FFPROBE_COMMAND_TOKENS``
|
||||
(``ffprobe -hide_banner -show_format -show_streams -of json``) still exposes
|
||||
the attachment streams clearly through ``codec_type="attachment"`` and the
|
||||
attachment tags.
|
||||
* In that JSON, the attachment streams do not expose ``codec_name``.
|
||||
|
||||
This last point is important for FFX: robust detection must not depend on
|
||||
attachment ``codec_name`` being present.
|
||||
|
||||
Detection Guidance
|
||||
------------------
|
||||
|
||||
Current known indicators for this format are:
|
||||
|
||||
* one or more subtitle streams with ``codec_type="subtitle"`` and
|
||||
``codec_name="ass"``
|
||||
* one or more attachment streams with ``codec_type="attachment"``
|
||||
* attachment tags that identify embedded fonts, especially
|
||||
``tags.mimetype="font/ttf"``
|
||||
* attachment filenames that end in ``.ttf``
|
||||
|
||||
The pattern can vary. FFX should therefore treat the above as a cluster of
|
||||
signals rather than an exact signature tied to one file.
|
||||
|
||||
Inference from the observed sample plus FFmpeg documentation:
|
||||
|
||||
* MIME matching should not be limited to ``font/ttf`` alone.
|
||||
* The Boruto sample uses ``font/ttf``.
|
||||
* FFmpeg's Matroska attachment example uses
|
||||
``mimetype=application/x-truetype-font`` for a ``.ttf`` attachment.
|
||||
* Detection should therefore normalize multiple TTF-like MIME values rather
|
||||
than depend on a single exact string.
|
||||
|
||||
Processing Expectations In FFX
|
||||
------------------------------
|
||||
|
||||
The format-specific requirements live in
|
||||
``requirements/source_file_formats.md``. In practical terms, FFX should:
|
||||
|
||||
* recognize the ASS-plus-font-attachment pattern even when attachment probe data
|
||||
is incomplete
|
||||
* tell the operator that the pattern was detected and that special handling is
|
||||
being used
|
||||
* reject sidecar subtitle import for such sources, because converting or
|
||||
replacing these subtitle tracks with ordinary external text subtitles would
|
||||
break the intended subtitle package
|
||||
* continue to allow normal manipulation of the ASS subtitle tracks themselves
|
||||
* preserve the font attachment streams unchanged
|
||||
|
||||
FFmpeg Notes
|
||||
------------
|
||||
|
||||
Relevant FFmpeg documentation confirms several behaviors that line up with
|
||||
FFX's needs:
|
||||
|
||||
* FFmpeg documents ``-attach`` as adding an attachment stream to the output, and
|
||||
explicitly names Matroska fonts used in subtitle rendering as an example.
|
||||
* FFmpeg documents attachment streams as regular streams that are created after
|
||||
the mapped media streams.
|
||||
* FFmpeg documents ``-dump_attachment`` for extracting attachment streams, which
|
||||
is useful for debugging or validating a source file's embedded fonts.
|
||||
* FFmpeg's Matroska example requires a ``mimetype`` metadata tag for attached
|
||||
fonts, which is consistent with using attachment tags as detection signals.
|
||||
* FFmpeg also notes that attachments are implemented as codec extradata. That
|
||||
helps explain why probe output for attachment streams can look different from
|
||||
ordinary audio, video, and subtitle streams.
|
||||
|
||||
Implication for FFX:
|
||||
|
||||
* Attachment preservation is not an optional cosmetic feature for this format.
|
||||
It is part of preserving the subtitle package correctly.
|
||||
|
||||
Jellyfin Notes
|
||||
--------------
|
||||
|
||||
Jellyfin's documentation also supports keeping this format intact:
|
||||
|
||||
* Jellyfin's subtitle compatibility table lists ``ASS/SSA`` as supported in
|
||||
``MKV`` and not supported in ``MP4``.
|
||||
* Jellyfin notes that when subtitles must be transcoded, they are either
|
||||
converted to a supported format or burned into the video, and burning them in
|
||||
is the most CPU-intensive path.
|
||||
* Jellyfin's subtitle-extraction example for ``SSA/ASS`` first dumps attachment
|
||||
streams and then extracts the ASS subtitle stream, which reflects the real
|
||||
relationship between ASS subtitles and embedded fonts in MKV releases.
|
||||
* Jellyfin's font documentation says text-based subtitles require fonts to
|
||||
render properly.
|
||||
* Jellyfin's configuration documentation says the web client uses configured
|
||||
fallback fonts for ASS subtitles when other fonts such as MKV attachments or
|
||||
client-side fonts are not available.
|
||||
|
||||
Inference from the Jellyfin compatibility tables:
|
||||
|
||||
* Keeping this subtitle format in Matroska is the safest interoperability choice
|
||||
for Jellyfin consumers.
|
||||
* Converting the subtitle payload to WebVTT would lose styled ASS behavior.
|
||||
* Dropping the attachment streams would force client or fallback font
|
||||
substitution and can change appearance or glyph coverage.
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
* FFmpeg documentation: https://ffmpeg.org/ffmpeg.html
|
||||
* Jellyfin codec support: https://jellyfin.org/docs/general/clients/codec-support/
|
||||
* Jellyfin configuration and fonts: https://jellyfin.org/docs/general/administration/configuration/
|
||||
Reference in New Issue
Block a user