Optimizes ffprobe usage

This commit is contained in:
Javanaut
2026-04-11 16:00:01 +02:00
parent c384d54c12
commit 0939a0c6c2
5 changed files with 161 additions and 59 deletions

5
.gitignore vendored
View File

@@ -16,3 +16,8 @@ dist/
.venv/ .venv/
venv/ venv/
.codex .codex
*.mkv
*.webm
ffmpeg2pass-0.log

View File

@@ -11,6 +11,7 @@
- A first modern integration slice now exists under [`tests/integration/subtrack_mapping`](/home/osgw/.local/src/codex/ffx/tests/integration/subtrack_mapping). Remaining test-suite cleanup is now mostly about migrating and shrinking the legacy harness surface under [`tests/legacy`](/home/osgw/.local/src/codex/ffx/tests/legacy). - A first modern integration slice now exists under [`tests/integration/subtrack_mapping`](/home/osgw/.local/src/codex/ffx/tests/integration/subtrack_mapping). Remaining test-suite cleanup is now mostly about migrating and shrinking the legacy harness surface under [`tests/legacy`](/home/osgw/.local/src/codex/ffx/tests/legacy).
- The CLI root now lazy-loads heavy runtime dependencies so lightweight commands such as `version`, `help`, `configure_workstation`, and `upgrade` stay import-light. - The CLI root now lazy-loads heavy runtime dependencies so lightweight commands such as `version`, `help`, `configure_workstation`, and `upgrade` stay import-light.
- Shared CLI defaults for container/output tokens now live outside [`src/ffx/ffx_controller.py`](/home/osgw/.local/src/codex/ffx/src/ffx/ffx_controller.py), and a focused unit test locks in the lazy-import contract. - Shared CLI defaults for container/output tokens now live outside [`src/ffx/ffx_controller.py`](/home/osgw/.local/src/codex/ffx/src/ffx/ffx_controller.py), and a focused unit test locks in the lazy-import contract.
- `FileProperties` now uses one cached `ffprobe -show_format -show_streams -of json` call per source file, and the combined payload was confirmed against the Dragonball asset to satisfy both previous probe call sites fully.
- FFX logger setup now reuses named handlers, and fallback logger access no longer mutates handlers in ordinary constructors and helpers. - FFX logger setup now reuses named handlers, and fallback logger access no longer mutates handlers in ordinary constructors and helpers.
- The process wrapper now uses `subprocess.run(...)` with centralized command formatting plus stable timeout and missing-command error mapping. - The process wrapper now uses `subprocess.run(...)` with centralized command formatting plus stable timeout and missing-command error mapping.
- Active ORM controllers now use single-query accessors instead of paired `count()` plus `first()` lookups. - Active ORM controllers now use single-query accessors instead of paired `count()` plus `first()` lookups.
@@ -19,8 +20,7 @@
## Focused Snapshot ## Focused Snapshot
- Highest-leverage application optimizations: - Highest-leverage application optimizations:
- Collapse repeated `ffprobe` calls into a single probe result per source file. - Revisit crop detection cost now that the probe path is consolidated.
- Revisit crop detection cost after the probe path is consolidated.
- Highest-leverage repo and workflow optimizations: - Highest-leverage repo and workflow optimizations:
- Consolidate setup and upgrade tooling to reduce overlapping shell-script responsibilities. - Consolidate setup and upgrade tooling to reduce overlapping shell-script responsibilities.
@@ -28,16 +28,7 @@
## Optimization Candidates ## Optimization Candidates
1. Media probing does two separate `ffprobe` subprocesses per file 1. Crop detection is always a full extra ffmpeg scan
- [`src/ffx/file_properties.py`](/home/osgw/.local/src/codex/ffx/src/ffx/file_properties.py) calls `ffprobe` once for format data and once for stream data.
- Optimization:
- Use one probe call that requests both format and streams.
- Cache that result inside `FileProperties`.
- Expected value:
- Less subprocess overhead.
- Faster inspect and convert flows.
2. Crop detection is always a full extra ffmpeg scan
- [`src/ffx/file_properties.py`](/home/osgw/.local/src/codex/ffx/src/ffx/file_properties.py) runs a dedicated `ffmpeg -vf cropdetect` pass for each file when crop detection is requested. - [`src/ffx/file_properties.py`](/home/osgw/.local/src/codex/ffx/src/ffx/file_properties.py) runs a dedicated `ffmpeg -vf cropdetect` pass for each file when crop detection is requested.
- Optimization: - Optimization:
- Cache crop results for repeated runs on the same source. - Cache crop results for repeated runs on the same source.
@@ -45,7 +36,7 @@
- Expected value: - Expected value:
- Lower latency on repeated experimentation. - Lower latency on repeated experimentation.
3. Tooling overlap and naming drift 2. Tooling overlap and naming drift
- There are still overlapping workstation-setup entrypoints across [`tools/configure_workstation.sh`](/home/osgw/.local/src/codex/ffx/tools/configure_workstation.sh), [`tools/setup.sh`](/home/osgw/.local/src/codex/ffx/tools/setup.sh), and newer CLI maintenance commands. - There are still overlapping workstation-setup entrypoints across [`tools/configure_workstation.sh`](/home/osgw/.local/src/codex/ffx/tools/configure_workstation.sh), [`tools/setup.sh`](/home/osgw/.local/src/codex/ffx/tools/setup.sh), and newer CLI maintenance commands.
- Optimization: - Optimization:
- Decide which scripts remain canonical. - Decide which scripts remain canonical.
@@ -55,7 +46,7 @@
- Less operator confusion. - Less operator confusion.
- Fewer duplicated procedures to maintain. - Fewer duplicated procedures to maintain.
4. Placeholder UI surfaces should either ship or disappear 3. Placeholder UI surfaces should either ship or disappear
- [`src/ffx/help_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/help_screen.py) and [`src/ffx/settings_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/settings_screen.py) are placeholders. - [`src/ffx/help_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/help_screen.py) and [`src/ffx/settings_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/settings_screen.py) are placeholders.
- Optimization: - Optimization:
- Either remove them from the active UI surface or complete them. - Either remove them from the active UI surface or complete them.
@@ -64,7 +55,7 @@
- Leaner interface. - Leaner interface.
- Lower UX ambiguity. - Lower UX ambiguity.
5. Large Textual screens repeat configuration and controller loading 4. Large Textual screens repeat configuration and controller loading
- Screens such as [`src/ffx/media_details_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/media_details_screen.py), [`src/ffx/pattern_details_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/pattern_details_screen.py), and [`src/ffx/show_details_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/show_details_screen.py) repeat setup patterns and local metadata filtering extraction. - Screens such as [`src/ffx/media_details_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/media_details_screen.py), [`src/ffx/pattern_details_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/pattern_details_screen.py), and [`src/ffx/show_details_screen.py`](/home/osgw/.local/src/codex/ffx/src/ffx/show_details_screen.py) repeat setup patterns and local metadata filtering extraction.
- Optimization: - Optimization:
- Extract a shared screen base or helper for common config/controller/bootstrap logic. - Extract a shared screen base or helper for common config/controller/bootstrap logic.
@@ -73,7 +64,7 @@
- Lower maintenance overhead. - Lower maintenance overhead.
- Easier UI iteration. - Easier UI iteration.
6. Several helper functions are unfinished or dead-weight 5. Several helper functions are unfinished or dead-weight
- [`src/ffx/helper.py`](/home/osgw/.local/src/codex/ffx/src/ffx/helper.py) contains `permutateList(...): pass`. - [`src/ffx/helper.py`](/home/osgw/.local/src/codex/ffx/src/ffx/helper.py) contains `permutateList(...): pass`.
- There are many combinator and conversion placeholders across tests and migrations. - There are many combinator and conversion placeholders across tests and migrations.
- Optimization: - Optimization:
@@ -83,7 +74,7 @@
- Smaller mental model. - Smaller mental model.
- Less time spent re-evaluating inactive paths. - Less time spent re-evaluating inactive paths.
7. Test suite shape is expensive to understand and likely expensive to run 6. Test suite shape is expensive to understand and likely expensive to run
- The project still carries a large legacy matrix of combinator files under [`tests/legacy`](/home/osgw/.local/src/codex/ffx/tests/legacy), several placeholder `pass` implementations, and at least one suspicious filename with an embedded space: [`tests/legacy/disposition_combinator_2_3 .py`](/home/osgw/.local/src/codex/ffx/tests/legacy/disposition_combinator_2_3 .py). - The project still carries a large legacy matrix of combinator files under [`tests/legacy`](/home/osgw/.local/src/codex/ffx/tests/legacy), several placeholder `pass` implementations, and at least one suspicious filename with an embedded space: [`tests/legacy/disposition_combinator_2_3 .py`](/home/osgw/.local/src/codex/ffx/tests/legacy/disposition_combinator_2_3 .py).
- A first focused replacement slice now exists in [`tests/integration/subtrack_mapping/test_cli_bundle.py`](/home/osgw/.local/src/codex/ffx/tests/integration/subtrack_mapping/test_cli_bundle.py), so the remaining work is migration and consolidation rather than creating the modern test shape from scratch. - A first focused replacement slice now exists in [`tests/integration/subtrack_mapping/test_cli_bundle.py`](/home/osgw/.local/src/codex/ffx/tests/integration/subtrack_mapping/test_cli_bundle.py), so the remaining work is migration and consolidation rather than creating the modern test shape from scratch.
- Optimization: - Optimization:
@@ -94,7 +85,7 @@
- Faster contributor onboarding. - Faster contributor onboarding.
- Easier CI adoption later. - Easier CI adoption later.
8. Process resource limiting semantics could be clearer 7. Process resource limiting semantics could be clearer
- [`src/ffx/process.py`](/home/osgw/.local/src/codex/ffx/src/ffx/process.py) prepends `nice` and `cpulimit` directly when values are set. - [`src/ffx/process.py`](/home/osgw/.local/src/codex/ffx/src/ffx/process.py) prepends `nice` and `cpulimit` directly when values are set.
- Optimization: - Optimization:
- Validate and document effective behavior for combined `nice` + `cpulimit`. - Validate and document effective behavior for combined `nice` + `cpulimit`.
@@ -103,7 +94,7 @@
- Fewer surprises in production-like runs. - Fewer surprises in production-like runs.
- Easier support for user-reported performance behavior. - Easier support for user-reported performance behavior.
9. Regex and string utility cleanup 8. Regex and string utility cleanup
- [`src/ffx/helper.py`](/home/osgw/.local/src/codex/ffx/src/ffx/helper.py) still has repeated string-replacement churn in filename/TMDB normalization helpers, and regex handling in helpers is easy to regress quietly. - [`src/ffx/helper.py`](/home/osgw/.local/src/codex/ffx/src/ffx/helper.py) still has repeated string-replacement churn in filename/TMDB normalization helpers, and regex handling in helpers is easy to regress quietly.
- Optimization: - Optimization:
- Keep regex literals raw and centralized where appropriate. - Keep regex literals raw and centralized where appropriate.
@@ -112,7 +103,7 @@
- Cleaner runtime output. - Cleaner runtime output.
- Less warning noise during dry-run maintenance commands. - Less warning noise during dry-run maintenance commands.
10. Database startup always runs schema creation and version checks 9. Database startup always runs schema creation and version checks
- [`src/ffx/database.py`](/home/osgw/.local/src/codex/ffx/src/ffx/database.py) runs `Base.metadata.create_all(...)` and version checks every time a DB-backed context is created. - [`src/ffx/database.py`](/home/osgw/.local/src/codex/ffx/src/ffx/database.py) runs `Base.metadata.create_all(...)` and version checks every time a DB-backed context is created.
- Optimization: - Optimization:
- Measure startup cost and consider separating bootstrapping from ordinary command execution. - Measure startup cost and consider separating bootstrapping from ordinary command execution.
@@ -136,7 +127,6 @@
1. Triage the list into quick wins, medium refactors, and long-horizon cleanup. 1. Triage the list into quick wins, medium refactors, and long-horizon cleanup.
2. Tackle the cheapest high-impact items first: 2. Tackle the cheapest high-impact items first:
- single-call `ffprobe` refactor.
- crop detection sampling or caching pass. - crop detection sampling or caching pass.
3. Decide which setup and upgrade entrypoints stay canonical before adding more maintenance surface. 3. Decide which setup and upgrade entrypoints stay canonical before adding more maintenance surface.

View File

@@ -7,9 +7,16 @@ Detailed product rules for source-to-target subtrack mapping live in
`requirements/subtrack_mapping.md`. This file describes only how tests cover `requirements/subtrack_mapping.md`. This file describes only how tests cover
that area. that area.
## Interpreter Requirement
- Agents shall run Python-side test commands with `~/.local/share/ffx.venv/bin/python`.
- This applies to the legacy harness, `unittest`, `pytest`, helper scripts, and `python -m ffx ...` test invocations.
- Agents shall not silently substitute `python`, `python3`, or another interpreter for Python-side test work.
- If `~/.local/share/ffx.venv/bin/python` is missing or not executable, agents shall stop and report the missing venv instead of continuing with Python-side test execution.
## Current Harness ## Current Harness
- Entrypoint: `python tests/legacy_runner.py run` - Entrypoint: `~/.local/share/ffx.venv/bin/python tests/legacy_runner.py run`
- Runner style: custom Click CLI, not `pytest` or `unittest` - Runner style: custom Click CLI, not `pytest` or `unittest`
- Commands: - Commands:
- `run`: discover scenario files, instantiate each scenario, run yielded jobs - `run`: discover scenario files, instantiate each scenario, run yielded jobs
@@ -35,7 +42,7 @@ that area.
- inputs per job: `1` - inputs per job: `1`
- jobs: `140` - jobs: `140`
- expected failures: `0` - expected failures: `0`
- execution: build one synthetic source file, run `python -m ffx convert`, assert filename selectors only - execution: build one synthetic source file, run `~/.local/share/ffx.venv/bin/python -m ffx convert`, assert filename selectors only
- selectors executed: `B`, `L`, `I` - selectors executed: `B`, `L`, `I`
- selectors defined but not executed: `S`, `R` - selectors defined but not executed: `S`, `R`
- `2`: `tests/legacy/scenario_2.py` - `2`: `tests/legacy/scenario_2.py`
@@ -43,7 +50,7 @@ that area.
- inputs per job: `1` - inputs per job: `1`
- jobs: `8193` - jobs: `8193`
- expected failures: `3267` - expected failures: `3267`
- execution: build one synthetic source file, run `python -m ffx convert`, probe result with `FileProperties`, assert track layout and selected audio and subtitle metadata - execution: build one synthetic source file, run `~/.local/share/ffx.venv/bin/python -m ffx convert`, probe result with `FileProperties`, assert track layout and selected audio and subtitle metadata
- selectors executed: `M`, `AD`, `AT`, `SD`, `ST` - selectors executed: `M`, `AD`, `AT`, `SD`, `ST`
- selectors defined but not executed: `MT`, `AP`, `SP`, `J` - selectors defined but not executed: `MT`, `AP`, `SP`, `J`
- `4`: `tests/legacy/scenario_4.py` - `4`: `tests/legacy/scenario_4.py`
@@ -51,7 +58,7 @@ that area.
- inputs per job: `6` - inputs per job: `6`
- jobs: `768` - jobs: `768`
- expected failures: `336` - expected failures: `336`
- execution: build six synthetic preset files, recreate temp SQLite DB, insert show and pattern, run one batch convert command, query TMDB during assertions - execution: build six synthetic preset files, recreate temp SQLite DB, insert show and pattern, run one batch convert command via `~/.local/share/ffx.venv/bin/python`, query TMDB during assertions
- selectors executed: `M`, `AD`, `AT`, `SD`, `ST` - selectors executed: `M`, `AD`, `AT`, `SD`, `ST`
- selectors defined but not executed: `MT`, `AP`, `SP`, `J` - selectors defined but not executed: `MT`, `AP`, `SP`, `J`
- notes: - notes:

View File

@@ -13,6 +13,7 @@ from ffx.model.pattern import Pattern
class FileProperties(): class FileProperties():
FILE_EXTENSIONS = ['mkv', 'mp4', 'avi', 'flv', 'webm'] FILE_EXTENSIONS = ['mkv', 'mp4', 'avi', 'flv', 'webm']
FFPROBE_COMMAND_TOKENS = ["ffprobe", "-hide_banner", "-show_format", "-show_streams", "-of", "json"]
SE_INDICATOR_PATTERN = '([sS][0-9]+[eE][0-9]+)' SE_INDICATOR_PATTERN = '([sS][0-9]+[eE][0-9]+)'
SEASON_EPISODE_INDICATOR_MATCH = '[sS]([0-9]+)[eE]([0-9]+)' SEASON_EPISODE_INDICATOR_MATCH = '[sS]([0-9]+)[eE]([0-9]+)'
@@ -78,6 +79,26 @@ class FileProperties():
self.__season = -1 self.__season = -1
self.__episode = -1 self.__episode = -1
self.__ffprobeData = None
def _getFfprobeData(self):
if self.__ffprobeData is not None:
return self.__ffprobeData
ffprobeOutput, ffprobeError, returnCode = executeProcess(
FileProperties.FFPROBE_COMMAND_TOKENS + [self.__sourcePath]
)
if 'Invalid data found when processing input' in ffprobeError:
raise Exception(f"File {self.__sourcePath} does not contain valid stream data")
if returnCode != 0:
raise Exception(f"ffprobe returned with error {returnCode}")
self.__ffprobeData = json.loads(ffprobeOutput)
return self.__ffprobeData
def getFormatData(self): def getFormatData(self):
""" """
@@ -99,22 +120,7 @@ class FileProperties():
} }
} }
""" """
return self._getFfprobeData()['format']
# ffprobe -hide_banner -show_format -of json
ffprobeOutput, ffprobeError, returnCode = executeProcess(["ffprobe",
"-hide_banner",
"-show_format",
"-of", "json",
self.__sourcePath]) #,
#context = self.context)
if 'Invalid data found when processing input' in ffprobeError:
raise Exception(f"File {self.__sourcePath} does not contain valid stream data")
if returnCode != 0:
raise Exception(f"ffprobe returned with error {returnCode}")
return json.loads(ffprobeOutput)['format']
def getStreamData(self): def getStreamData(self):
@@ -159,24 +165,7 @@ class FileProperties():
} }
} }
""" """
return self._getFfprobeData()['streams']
# ffprobe -hide_banner -show_streams -of json
ffprobeOutput, ffprobeError, returnCode = executeProcess(["ffprobe",
"-hide_banner",
"-show_streams",
"-of", "json",
self.__sourcePath]) #,
#context = self.context)
if 'Invalid data found when processing input' in ffprobeError:
raise Exception(f"File {self.__sourcePath} does not contain valid stream data")
if returnCode != 0:
raise Exception(f"ffprobe returned with error {returnCode}")
return json.loads(ffprobeOutput)['streams']

View File

@@ -0,0 +1,111 @@
from __future__ import annotations
import json
import logging
from pathlib import Path
import sys
import unittest
from unittest.mock import patch
SRC_ROOT = Path(__file__).resolve().parents[2] / "src"
if str(SRC_ROOT) not in sys.path:
sys.path.insert(0, str(SRC_ROOT))
class StaticConfig:
def getData(self):
return {}
class DummyPatternController:
def __init__(self, context):
self.context = context
def matchFilename(self, filename):
return {}
def make_logger(name: str) -> logging.Logger:
logger = logging.getLogger(name)
logger.handlers = []
logger.setLevel(logging.DEBUG)
logger.propagate = False
logger.addHandler(logging.NullHandler())
return logger
class FilePropertiesProbeTests(unittest.TestCase):
def import_module(self):
try:
import ffx.file_properties as file_properties_module
except ModuleNotFoundError as ex:
if ex.name == "sqlalchemy":
self.skipTest("sqlalchemy is not installed in this environment")
raise
return file_properties_module
def make_context(self):
return {
"logger": make_logger("ffx-test-file-properties-probe"),
"config": StaticConfig(),
"database": {"session": object()},
"use_pattern": False,
}
def sample_probe_data(self):
return {
"format": {
"filename": "/tmp/example_s01e01.mkv",
"nb_streams": 2,
"format_name": "matroska,webm",
},
"streams": [
{
"index": 0,
"codec_name": "h264",
"codec_type": "video",
"disposition": {"default": 1},
"tags": {},
},
{
"index": 1,
"codec_name": "aac",
"codec_type": "audio",
"channel_layout": "stereo",
"channels": 2,
"disposition": {"default": 0},
"tags": {"language": "eng"},
},
],
}
def test_format_and_stream_accessors_share_one_combined_probe(self):
file_properties_module = self.import_module()
probe_output = self.sample_probe_data()
with (
patch.object(file_properties_module, "PatternController", DummyPatternController),
patch.object(
file_properties_module,
"executeProcess",
return_value=(json.dumps(probe_output), "", 0),
) as mocked_execute,
):
file_properties = file_properties_module.FileProperties(
self.make_context(),
"/tmp/example_s01e01.mkv",
)
self.assertEqual(probe_output["format"], file_properties.getFormatData())
self.assertEqual(probe_output["streams"], file_properties.getStreamData())
mocked_execute.assert_called_once_with(
file_properties_module.FileProperties.FFPROBE_COMMAND_TOKENS
+ ["/tmp/example_s01e01.mkv"]
)
if __name__ == "__main__":
unittest.main()