Files
ffx/SCRATCHPAD.md
2026-04-09 12:46:24 +02:00

11 KiB

Scratchpad

Goal

  • Capture a compact, project-wide list of optimization candidates after a broad scan of the current FFX codebase, tooling, and requirements.

Settled

  • The biggest near-term wins are in startup cost, repeated subprocess work, repeated database query patterns, and general repo hygiene.
  • This list is intentionally optimization-oriented rather than bug-oriented. Some items below also improve correctness or maintainability, but they were selected because they can reduce runtime cost, operator friction, or iteration overhead.
  • A first modern integration slice now exists under tests/integration/subtrack_mapping. Remaining test-suite cleanup is now mostly about migrating and shrinking the legacy harness surface under tests/legacy.
  • FFX logger setup now reuses named handlers, and fallback logger access no longer mutates handlers in ordinary constructors and helpers.

Focused Snapshot

  • Highest-leverage application optimizations:

    • Lazy-load CLI command dependencies so lightweight commands do not import most of the app.
    • Collapse repeated ffprobe calls into a single probe result per source file.
    • Replace query.count() plus first() patterns with single-query ORM accessors.
    • Cache or precompile filename pattern regexes instead of scanning every pattern for every file.
  • Highest-leverage repo and workflow optimizations:

    • Consolidate setup and upgrade tooling to reduce overlapping shell-script responsibilities.
    • Continue migrating the oversized legacy test/combinator surface into focused modern tests so it is easier to run, debug, and extend.

Optimization Candidates

  1. CLI startup and import cost
  • src/ffx/cli.py imports a large portion of the application at module import time, even for cheap commands such as version, help, setup_dependencies, and upgrade.
  • Optimization:
    • Move heavy imports into the commands that actually need them.
    • Keep the CLI root importable with only core stdlib and Click dependencies.
  • Expected value:
    • Faster startup for scripting and tooling commands.
    • Less coupling between maintenance commands and the runtime stack.
  1. Repeated database queries via count() plus first()
  1. Filename pattern matching scales linearly across all patterns
  • src/ffx/pattern_controller.py loads every pattern and runs re.search against each filename on every lookup.
  • Optimization:
    • Cache compiled regexes in process memory.
    • Stop after the first intentional match instead of silently returning the last match.
    • Consider explicit pattern priority if overlapping rules are valid.
  • Expected value:
    • Faster per-file setup when many patterns exist.
    • More predictable matching behavior.
  1. Media probing does two separate ffprobe subprocesses per file
  • src/ffx/file_properties.py calls ffprobe once for format data and once for stream data.
  • Optimization:
    • Use one probe call that requests both format and streams.
    • Cache that result inside FileProperties.
  • Expected value:
    • Less subprocess overhead.
    • Faster inspect and convert flows.
  1. Crop detection is always a full extra ffmpeg scan
  • src/ffx/file_properties.py runs a dedicated ffmpeg -vf cropdetect pass for each file when crop detection is requested.
  • Optimization:
    • Cache crop results for repeated runs on the same source.
    • Consider exposing shorter sampling windows or probe presets for large files.
  • Expected value:
    • Lower latency on repeated experimentation.
  1. Process wrapper lacks stronger execution controls
  • src/ffx/process.py uses Popen(...).communicate() without timeout handling, structured error mapping, or direct missing-command handling.
  • Optimization:
    • Add timeout support and clearer FileNotFoundError handling.
    • Consider subprocess.run(..., check=False, text=True) where streaming is not required.
    • Centralize return/error formatting.
  • Expected value:
    • Better failure diagnosis.
    • Cleaner process management semantics.
  1. Tooling overlap and naming drift
  • There are still overlapping prep and setup entrypoints across tools/prepare.sh, tools/setup.sh, and newer CLI maintenance commands.
  • Optimization:
    • Decide which scripts remain canonical.
    • Replace or remove legacy wrappers once equivalent CLI commands exist.
    • Keep CLI maintenance commands and shell wrappers aligned.
  • Expected value:
    • Less operator confusion.
    • Fewer duplicated procedures to maintain.
  1. Placeholder UI surfaces should either ship or disappear
  • src/ffx/help_screen.py and src/ffx/settings_screen.py are placeholders.
  • Optimization:
    • Either remove them from the active UI surface or complete them.
    • Avoid paying ongoing maintenance cost for unfinished navigation targets.
  • Expected value:
    • Leaner interface.
    • Lower UX ambiguity.
  1. Large Textual screens repeat configuration and controller loading
  1. Several helper functions are unfinished or dead-weight
  • src/ffx/helper.py contains permutateList(...): pass.
  • There are many combinator and conversion placeholders across tests and migrations.
  • Optimization:
    • Remove dead code, finish it, or isolate it behind a clearly dormant area.
    • Avoid carrying stubbed utility surface that looks reusable but is not.
  • Expected value:
    • Smaller mental model.
    • Less time spent re-evaluating inactive paths.
  1. Test suite shape is expensive to understand and likely expensive to run
  • The project still carries a large legacy matrix of combinator files under tests/legacy, several placeholder pass implementations, and at least one suspicious filename with an embedded space: [tests/legacy/disposition_combinator_2_3 .py](/home/osgw/.local/src/codex/ffx/tests/legacy/disposition_combinator_2_3 .py).
  • A first focused replacement slice now exists in tests/integration/subtrack_mapping/test_cli_bundle.py, so the remaining work is migration and consolidation rather than creating the modern test shape from scratch.
  • Optimization:
    • Continue replacing broad combinator matrices with focused parametrized integration and unit tests.
    • Retire the bespoke legacy discovery and runner path once equivalent coverage exists.
    • Normalize file naming and test discovery conventions.
  • Expected value:
    • Faster contributor onboarding.
    • Easier CI adoption later.
  1. Process resource limiting semantics could be clearer
  • src/ffx/process.py prepends nice and cpulimit directly when values are set.
  • Optimization:
    • Validate and document effective behavior for combined nice + cpulimit.
    • Consider explicit no-limit vs configured-limit states in the CLI and requirements.
  • Expected value:
    • Fewer surprises in production-like runs.
    • Easier support for user-reported performance behavior.
  1. Import-time dependency coupling makes maintenance commands brittle
  • Even after recent CLI maintenance additions, the top-level CLI module still imports most application modules before Click dispatch.
  • Optimization:
    • Push imports for ORM, Textual, TMDB, ffmpeg helpers, and descriptors behind the commands that actually need them.
  • Expected value:
    • Maintenance commands such as setup and upgrade stay usable when optional runtime dependencies are broken.
    • Better separation between media runtime code and maintenance tooling.
  1. Regex and string utility cleanup
  • src/ffx/helper.py still emits a SyntaxWarning for RICH_COLOR_PATTERN.
  • Optimization:
    • Convert regex literals to raw strings where appropriate.
    • Review filename and TMDB substitution helpers for repeated string churn.
  • Expected value:
    • Cleaner runtime output.
    • Less warning noise during dry-run maintenance commands.
  1. Database startup always runs schema creation and version checks
  • src/ffx/database.py runs Base.metadata.create_all(...) and version checks every time a DB-backed context is created.
  • Optimization:
    • Measure startup cost and consider separating bootstrapping from ordinary command execution.
    • Keep schema migration/version enforcement explicit.
  • Expected value:
    • Faster command startup.
    • Clearer operational boundaries.

Open

  • Should optimization work focus first on operator-perceived latency, internal maintainability, or correctness-risk cleanup that also has performance upside?
  • Is the long-term supported model still “local Linux workstation plus Textual UI,” or should optimization decisions bias toward a more scriptable/headless CLI?

Gaps Right Now

  • No explicit prioritization owner or milestone for the optimization backlog.
  • No benchmark or timing harness exists for startup, probe, DB, or conversion orchestration overhead.
  • Repo hygiene is still mixed with generated artifacts and some clearly unfinished files.

Next

  1. Triage the list into quick wins, medium refactors, and long-horizon cleanup.
  2. Tackle the cheapest high-impact items first:
    • regex raw-string warning cleanup,
    • count() plus first() query cleanup,
    • single-call ffprobe refactor.
  3. Decide whether maintenance/tooling command imports should be split from media-runtime imports before adding more CLI maintenance surface.

Delete When

  • Delete this scratchpad once the optimization backlog is either converted into issues/work items or distilled into durable project guidance.