Skip to content

Classifier

LLM-powered item classification with taxonomy, confidence scoring, and caching.

LLMClassifier

LLMClassifier

LLMClassifier(model=None, fallbacks=None, cache=None, concurrency=None, prompt_version=CURRENT_PROMPT_VERSION, taxonomy=None, prompt_template=None)

Production-ready LLM classifier with batching, caching, and cost control.

Usage

Default: in-memory cache (no disk)

classifier = LLMClassifier(model="anthropic/claude-sonnet-4-6") await classifier.enrich(doc)

Persistent: SQLite cache (opt-in)

from pygaeb.cache import SQLiteCache classifier = LLMClassifier(cache=SQLiteCache("~/.pygaeb/cache"))

enrich async

enrich(doc, on_progress=None, force_reclassify=False)

Classify all items in the document (async).

Works for both procurement and trade documents via doc.iter_items().

enrich_sync

enrich_sync(doc, on_progress=None, force_reclassify=False)

Synchronous convenience wrapper — manages event loop internally.

estimate_cost async

estimate_cost(doc)

Estimate the cost of classifying all items in the document.

Taxonomy

taxonomy

Three-level classification taxonomy for construction element types.

TAXONOMY module-attribute

TAXONOMY = {'Structural': {'Wall': ['Interior Wall', 'Exterior Wall', 'Curtain Wall', 'Partition Wall', 'Retaining Wall'], 'Floor': ['Ground Floor Slab', 'Suspended Floor', 'Screed', 'Raised Floor'], 'Roof': ['Flat Roof', 'Pitched Roof', 'Green Roof', 'Roof Structure'], 'Foundation': ['Strip Foundation', 'Pad Foundation', 'Pile Foundation', 'Raft Foundation'], 'Column': ['Concrete Column', 'Steel Column', 'Timber Column'], 'Beam': ['Concrete Beam', 'Steel Beam', 'Timber Beam', 'Lintel']}, 'Finishes': {'Door': ['Single Door', 'Double Door', 'Fire Door', 'Sliding Door', 'Revolving Door'], 'Window': ['Fixed Window', 'Opening Window', 'Skylight', 'Curtain Wall Panel'], 'Ceiling': ['Suspended Ceiling', 'Plasterboard Ceiling', 'Acoustic Ceiling'], 'Cladding': ['External Cladding', 'Internal Cladding', 'Render', 'Natural Stone'], 'Flooring': ['Tile', 'Carpet', 'Vinyl', 'Wood Flooring', 'Epoxy']}, 'Roofing': {'Roof Covering': ['Flat Roof Membrane', 'Tiles', 'Metal Sheets', 'Slate'], 'Insulation': ['Thermal Insulation', 'Acoustic Insulation', 'Waterproofing'], 'Drainage': ['Gutter', 'Downpipe', 'Roof Drain', 'Overflow'], 'Flashing': ['Lead Flashing', 'Zinc Flashing', 'Aluminium Flashing']}, 'MEP-Mechanical': {'Duct': ['Supply Duct', 'Extract Duct', 'Flexible Duct'], 'Air Handling Unit': ['AHU', 'Rooftop Unit', 'Fan Coil Unit'], 'Fan': ['Centrifugal Fan', 'Axial Fan', 'Inline Fan'], 'Diffuser': ['Supply Diffuser', 'Return Grille', 'Linear Diffuser']}, 'MEP-Electrical': {'Cable': ['Power Cable', 'Data Cable', 'Fibre Optic', 'Control Cable'], 'Panel': ['Distribution Board', 'Main Switchboard', 'Sub-Panel'], 'Luminaire': ['Recessed Light', 'Surface Light', 'Emergency Lighting', 'External Light'], 'Socket': ['Power Socket', 'Data Socket', 'Floor Box'], 'Conduit': ['Metal Conduit', 'PVC Conduit', 'Cable Tray', 'Cable Ladder']}, 'MEP-Plumbing': {'Pipe': ['Supply Pipe', 'Drain Pipe', 'Vent Pipe', 'Rainwater Pipe'], 'Valve': ['Gate Valve', 'Ball Valve', 'Check Valve', 'Pressure Reducing Valve'], 'Pump': ['Circulating Pump', 'Booster Pump', 'Sump Pump'], 'Sanitary Fixture': ['WC', 'Washbasin', 'Shower', 'Bath', 'Urinal', 'Sink']}, 'Sitework': {'Excavation': ['Topsoil Strip', 'Bulk Excavation', 'Trench Excavation'], 'Paving': ['Asphalt', 'Concrete Paving', 'Block Paving', 'Kerb'], 'Landscaping': ['Planting', 'Turf', 'Irrigation', 'Tree'], 'Fence': ['Timber Fence', 'Metal Fence', 'Security Fence', 'Gate']}, 'Preliminaries': {'Site Setup': ['Site Hut', 'Hoarding', 'Temporary Fencing', 'Site Signage'], 'Scaffolding': ['Independent Scaffold', 'System Scaffold', 'Mobile Tower'], 'Welfare': ['Welfare Unit', 'Toilet', 'Canteen', 'Drying Room'], 'Temp Works': ['Tower Crane', 'Hoist', 'Temporary Propping', 'Dewatering']}, 'Other': {'Unclassifiable': []}}

ALL_TRADES module-attribute

ALL_TRADES = list(keys())

ALL_ELEMENT_TYPES module-attribute

ALL_ELEMENT_TYPES = [et for trade in (values()) for et in trade]

get_subtypes

get_subtypes(trade, element_type)

is_valid_trade

is_valid_trade(trade)

is_valid_element_type

is_valid_element_type(trade, element_type)

Confidence Scoring

apply_confidence_flag

apply_confidence_flag(result)

Apply confidence-based flag to a classification result.

merge_with_override

merge_with_override(llm_result, override)

Prefer manual override over LLM result.

Classification Cache

ClassificationCache

ClassificationCache(backend=None)

Caches classification results keyed by (text_hash, prompt_version).

Wraps any CacheBackend (default: InMemoryCache — no disk I/O).

save_override

save_override(item)

Save a manual override for an item.

stats

stats()

Aggregate counts by prompt version.

clear

clear()

Remove all non-override entries.