Opt pattern matching

This commit is contained in:
Javanaut
2026-04-09 16:11:51 +02:00
parent be0f4b4c4e
commit d19e69990a
16 changed files with 1248 additions and 522 deletions

View File

@@ -0,0 +1,68 @@
# Pattern Management
This file defines the behavioral contract for managing shows, patterns, and
pattern-backed filename matching.
Primary source: actual tool code in `src/ffx/`.
Secondary source: operator intent captured in task discussion.
## Scope
- The show, pattern, and track hierarchy stored in SQLite.
- The role of a pattern as a reusable normalization definition for related media files.
- Filename-driven assignment of a scanned media file to one show through one matching pattern.
- Duplicate-match handling when more than one pattern matches the same filename.
## Terms
- `show`: logical series identity such as one TV show entry in the database.
- `pattern`: regex-backed normalization definition attached to one show.
- `track`: one persisted target-track definition attached to one pattern.
- `scanned media file`: one source file currently being inspected or converted.
- `duplicate pattern match`: a filename state where more than one stored pattern matches the same scanned media file.
- `pattern-backed target schema`: the combination of one pattern's stored media tags and stored track definitions.
## Rules
- `PATTERN_MANAGEMENT-0001`: The domain model shall treat a show as the parent entity for patterns that describe distinct release families or normalization schemas for that show. A show may temporarily exist without patterns during editing or initial TUI creation.
- `PATTERN_MANAGEMENT-0002`: Each persisted pattern shall belong to exactly one show.
- `PATTERN_MANAGEMENT-0003`: The domain model shall treat a pattern as the reusable normalization definition for a series of media files expected to share the same internal track layout and materially similar stream and container metadata.
- `PATTERN_MANAGEMENT-0004`: Each persisted track definition shall belong to exactly one pattern.
- `PATTERN_MANAGEMENT-0005`: A pattern may also carry pattern-level media tags. The pattern's media tags plus its track definitions together form the pattern-backed target schema.
- `PATTERN_MANAGEMENT-0006`: A scanned media file shall resolve to at most one pattern and therefore at most one show.
- `PATTERN_MANAGEMENT-0007`: If no pattern matches a filename, the file shall remain unmatched rather than being assigned implicitly.
- `PATTERN_MANAGEMENT-0008`: If more than one pattern matches the same filename, the system shall raise a duplicate pattern match error instead of silently selecting one.
- `PATTERN_MANAGEMENT-0009`: Duplicate-match detection shall apply regardless of whether the competing patterns belong to the same show or to different shows.
- `PATTERN_MANAGEMENT-0010`: Exact duplicate pattern definitions for the same show should not create multiple persisted pattern rows.
- `PATTERN_MANAGEMENT-0011`: A persisted pattern shall define one or more tracks. Creating or retaining a zero-track pattern in the database is invalid managed state and shall be prohibited.
- `PATTERN_MANAGEMENT-0012`: A show may exist without patterns as an intermediate editing state, for example when a user creates the show first in the TUI and adds patterns later.
- `PATTERN_MANAGEMENT-0013`: Operator-facing pattern management should expose the owning show, regex pattern, stored track set, and stored media-tag set so a user can reason about matching and normalization behavior.
- `PATTERN_MANAGEMENT-0014`: Matching semantics shall be deterministic and documented. Implicit "last matching pattern wins" behavior is not acceptable released behavior.
## Acceptance
- A filename that matches exactly one pattern yields one matched pattern and one show identity.
- A filename that matches no pattern yields no matched pattern and an unmatched state.
- A filename that matches more than one pattern yields an explicit duplicate-match error.
- A pattern-backed target schema can be reconstructed from one pattern's stored media tags and stored track definitions.
- A show may be stored before any patterns are attached to it.
- A pattern cannot be stored or retained as a valid managed pattern unless at least one track is defined for it.
- Pattern-backed conversion never proceeds with two competing matching patterns for the same input filename.
## Current Code Fit
- `src/ffx/model/show.py` implements a one-to-many `Show -> Pattern` relationship.
- `src/ffx/model/pattern.py` implements `Pattern.show_id`, a one-to-many `Pattern -> Track` relationship, a one-to-many `Pattern -> MediaTag` relationship, and a unique `(show_id, pattern)` constraint for freshly created databases.
- `src/ffx/model/track.py` implements `Track.pattern_id`, so each persisted track belongs to one pattern.
- `src/ffx/model/pattern.py` reconstructs a pattern-backed target schema through `Pattern.getMediaDescriptor(...)`, combining stored media tags and stored tracks.
- `src/ffx/file_properties.py` assumes a scanned file resolves to at most one pattern, because it stores only one `self.__pattern` and derives one `show_id` from it.
- `src/ffx/pattern_controller.py` prevents exact duplicate `(show_id, pattern)` definitions during create and update flows, and it refreshes cached compiled regexes when stored pattern expressions change.
- `src/ffx/pattern_controller.py` now complies with duplicate-match safety. `matchFilename(...)` scans deterministically, returns exactly one match, returns `{}` for no match, and raises an explicit duplicate-pattern-match error when more than one pattern matches the same filename.
- The current persistence layer already aligns with the intended empty-show workflow because a show can exist without patterns.
- New pattern creation and schema replacement flows now require at least one track, and `TrackController.deleteTrack(...)` prevents deleting the last persisted track from a pattern.
- Trackless legacy rows can still exist in preexisting databases, but matching now rejects them explicitly instead of letting them participate silently.
## Risks
- The intended "release family" meaning of a pattern is a domain assumption, not something the code verifies automatically across all files matching that pattern.
- Preexisting databases created before the newer validation rules may still contain invalid rows, so upgrade and cleanup paths should continue to treat explicit validation failures as recoverable operator signals.

View File

@@ -47,6 +47,7 @@
- per-pattern stream definitions,
- shifted-season mappings,
- internal database version properties.
- Detailed show, pattern, and duplicate-match management rules live in `requirements/pattern_management.md`.
- The system shall inspect source media using `ffprobe` and derive a structured description of container metadata and streams.
- The system shall optionally open a Textual UI to browse shows, inspect files, and create, edit, or delete shows, patterns, stream definitions, tags, and shifted-season rules.
- The system shall match filenames against stored regex patterns to decide whether an input file should inherit a target stream and metadata schema.