Changelog¶
All notable changes to pyGAEB are documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.14.0] - 2026-05-26¶
Added¶
Item.full_oz— Complete ordinal number (e.g."01.02.0004"), joining the ancestor category/lot chain with the item's leafRNoPart.Item.full_oz_with(separator)for a custom separator; backed by the newItem.oz_pathfield.BoQTree.find_item()now resolves by either the leafRNoPartor the full OZ.- CSV export gains a
full_ozcolumn.
Fixed¶
- Crash parsing files with XML comments (iTWO / RIB Software) — comments embedded inside elements no longer raise
TypeError; the v3 parser now skips comment/processing-instruction nodes during iteration. Applies to DA XML 2.x and 3.x.
[1.11.0] - 2026-03-24¶
Added¶
- Excel Export —
to_excel()exports any GAEB document to structured Excel workbooks with hierarchy-aware layout and phase-specific columns. - Two export modes:
structured(single sheet) andfull(BoQ + Items + Summary + Info sheets). - All document kinds supported with phase-appropriate columns.
- Optional columns via
include_long_text,include_classification,include_bim_guid. - 34 new tests.
[1.10.0] - 2026-03-24¶
Added¶
- BoQ Builder API —
BoQBuilderprovides programmatic construction of GAEB documents from scratch with a fluent, explicit-object API. - Auto OZ generation — Ordinal numbers auto-generated from category
rno+ sequence whenozis omitted. - Decimal convenience —
int/float/strauto-converted toDecimal; auto-computestotal_pricewhen missing. - Field name validation — Unknown kwargs raise
ValueErrorwith typo suggestions. - Phase-aware rules — Warns or errors when items violate exchange phase semantics.
- Version compatibility checks — Detects fields incompatible with the target DA XML version.
- Duplicate OZ detection, auto totals & BoQBkdn, implicit lot shortcut, ItemHandle for long text/attachments.
- Optional XSD validation via
build(xsd_dir=...). - 46 new tests.
[1.9.0] - 2026-03-24¶
Added¶
- Document Diff Engine —
BoQDiff.compare(doc_a, doc_b)performs a deterministic, field-by-field comparison of two GAEB procurement documents with structured results. - OZ-based item matching — Lot-aware matching by OZ (ordinal number) with global fallback for items that moved between lots.
- Field-level change detection — Each changed field carries a
Significancelevel (CRITICAL,HIGH,MEDIUM,LOW) based on construction context impact. - Structural diff — Detects added, removed, and renamed sections (categories), as well as items that moved between categories or lots.
DiffModeenum —DEFAULT(warnings for mismatched projects),STRICT(raisesValueError),FORCE(suppresses warnings).DiffResultPydantic model — Complete comparison output withsummary,items,structure,metadata, andwarningssections. Fully serializable to JSON.- Financial impact — Automatic computation of net financial impact (
grand_total_b - grand_total_a). - Match ratio & compatibility warnings — Low match ratio detection, different project warnings, currency mismatch alerts, version difference notices.
- Result filtering —
ItemModified.filter_changes(min_significance)andItemDiffSummary.filter_modified(min_significance)for targeted reporting. - New exports:
BoQDiff,DiffMode,DiffResult,DiffSummary,DiffDocInfo,Significance,FieldChange,ItemAdded,ItemRemoved,ItemModified,ItemMoved,ItemDiffSummary,MetadataChange,SectionChange,SectionRenamed,StructureDiffSummary. - 53 new tests covering item matching, field comparison, structural diff, integration, models, lazy imports, and edge cases.
1.8.0 - 2026-03-24¶
Added¶
- Read-only BoQ Tree API —
BoQTreeadapter wraps an existingBoQand builds a navigable node graph with parent references, depth tracking, and indexed lookups. The underlying Pydantic models are not modified. BoQNode— Lightweight tree node with O(1)parent,children,depth,index,siblings,ancestors,path,next_sibling,prev_sibling,is_leaf,is_rootproperties.- Type-safe model accessors —
node.boq,node.lot,node.category,node.itemraiseTypeErrorif accessed on the wrong node kind. - Unified convenience properties —
node.label,node.rno,node.label_pathwork across all node kinds (root, lot, category, item). - Subtree queries —
node.iter_descendants(),node.iter_items(),node.iter_categories(),node.find(predicate),node.find_all(predicate). BoQTreelookups —tree.find_item(oz)(O(1) via index),tree.find_category(rno),tree.find_all_categories(rno).- Tree traversal —
tree.walk()(depth-first) andtree.walk_bfs()(breadth-first) over all nodes. NodeKindenum —ROOT,LOT,CATEGORY,ITEMdiscriminator for node types.- New exports:
BoQTree,BoQNode,NodeKind. - New guide: Tree Navigation.
- 87 new tests covering root, lots, categories, items, children ordering, parent chains, ancestors, siblings, lookups, iteration, predicate search, DFS/BFS traversal, counts, multi-lot, type-safe accessors, empty categories, model identity, and repr.
1.7.1 - 2026-03-15¶
Fixed¶
- Procurement long text parsing — Items in DA XML 3.x procurement files (X80–X89) now correctly extract long text from the
<Description>/<CompleteText>/<DetailTxt>structure, matching the behavior already present in trade and cost parsers. - OWN (owner/client) parsed from wrong element —
<OWN>is now correctly located as a child of<Award>instead of<AwardInfo>, producing a fullAddresswith alltgAddressfields instead of a bare string.
Added¶
- AwardInfo metadata — 13 new fields on
AwardInfo:category,open_date,open_time,eval_end,submit_location,construction_start,construction_end,contract_no,contract_date,accept_type,warranty_duration,warranty_unit, andaward_no. AwardInfo.owner_address— FullAddressmodel for the<OWN>/<Address>structure, includingaward_nofrom<OWN>/<AwardNo>.Addressmodel extended to matchtgAddressXSD — Addedname3,name4,contact,iln, andvat_idfields. Thenamefield now maps to<Name1>(with<Name>fallback for older files)._parse_addressconsolidated — Moved fromTradeParserandQtyParserintoBaseV3Parserso all phases (procurement, trade, cost, quantity) share the same XSD-complete address parsing logic.- Writer updated —
_add_addressemits XSD-canonical<Name1>through<Name4>,<Contact>,<ILN>, and<VATID>._add_awardserializes all new AwardInfo fields and the proper<OWN>/<Address>/<AwardNo>structure. - German element map — Added DA XML 2.x mappings for new AwardInfo tags (
Vergabekategorie,Eroeffnungsdatum,Baubeginn,Bauende,Vertragsnummer,Vertragsdatum,Gewaehrleistungsdauer, etc.). - 48 new tests covering long text fallback, AwardInfo metadata, OWN address with full
tgAddressfields, model defaults, round-trip serialization, and integration against thetender.X81fixture.
1.7.0 - 2026-03-15¶
Added¶
- Custom validator registry —
register_validator()/clear_validators()for project-specific validation rules that run after the built-in pipeline. Per-call validators viaextra_validators=onGAEBParser.parse(). - Post-parse hook —
post_parse_hook=callback onGAEBParser.parse()/parse_bytes()/parse_string()receives(item, source_element)for each parsed item. Auto-enableskeep_xmland discards afterwards when needed. collect_raw_data—GAEBParser.parse(..., collect_raw_data=True)populatesitem.raw_datawith XML child elements the parser did not consume.- Custom LLM taxonomy & prompt —
LLMClassifier(taxonomy=..., prompt_template=...)for per-instance overrides.register_prompt()for reusable prompt templates. log_levelapplied — Thelog_levelsetting is now applied to thepygaeblogger onget_settings()andconfigure().- New exports:
register_validator,clear_validators,reset_settings,register_prompt. - New guide: Extensibility.
1.6.0 - 2026-03-15¶
Security¶
- XXE prevention — All XML parsing now uses hardened
lxml.XMLParserwithresolve_entities=False,no_network=True, andhuge_tree=False. External entity injection and Billion Laughs attacks are blocked. - File size guard — New
max_file_sizeparameter onGAEBParser.parse(),parse_bytes(), andparse_string(). Default limit: 100 MB (configurable viamax_file_size_mbsetting). Prevents memory exhaustion from oversized inputs. - ReDoS removal — Removed unused
_UNCLOSED_TAG_REregex that had catastrophic backtracking potential.
Fixed¶
- Recursion depth limits — Hierarchy walkers (
_walk_ctgy,_walk_ec_ctgy,_walk_qty_ctgy) now cap at 50 levels to prevent stack overflow on malicious/deep structures.BoQCtgy.iter_items(),QtyBoQCtgy.iter_items(), andCostElement.iter_cost_elements()converted from recursive to iterative. - InMemoryCache bounded — Now uses LRU eviction with a default
maxsize=10,000entries to prevent unbounded growth in long-running processes. - SQLiteCache resource cleanup — Added
__del__fallback to close connections. Cursors are now explicitly closed after each query. - XSD validation memory — Reuses the parsed XML tree (when
keep_xml=True) instead of reparsing. XSD files opened with explicit file handles.
Added¶
GAEBDocument.discard_xml()— Releases the retained lxml tree and allsource_elementreferences to free memory after XPath/raw-element work is done.pygaeb.parser._xml_safety— Shared module withSAFE_PARSER,SAFE_RECOVER_PARSER, andsafe_iterparse()constants.max_file_size_mbsetting inPyGAEBSettings(default 100).
1.5.0 - 2026-03-15¶
Added¶
- Procurement Totals + VAT —
TotalsandVATPartmodels for authoritative financial summaries from<Totals>elements totalsfield onBoQInfo,BoQCtgy, andLot— parsed from and written back to XML- Full
<Totals>schema coverage:Total,DiscountPcnt/DiscountAmt/TotAfterDisc,TotalLSUM,VAT,TotalNet,TotalNetUpComp(UpComp1–6),VATPart(multiple VAT rates with per-rate breakdown),VATAmount,TotalGross - Item-level VAT —
vatfield onItemfor per-item VAT rate percentage - Complete PrjInfo —
AwardInfonow exposes all<PrjInfo>fields:prj_id,lbl_prj,description,currency_label,bid_comm_perm,alter_bid_perm,up_frac_dig,ctlg_assigns <PrjInfo>serialization in writer — round-trips project metadata correctlyUPFracDig(unit price decimal places) — parsed and exposed onAwardInfo.up_frac_dig- Procurement Item Attachments — URI references (
<attachment>) and embedded base64 images (<image>) from<DetailTxt>are now parsed intoItem.attachments DocumentAPI.summary()now includestotal_net,total_gross,vat_rate,vat_amount, andup_frac_digfor procurement documentsTotalsandVATPartexported from top-levelpygaebmodule- 40 new tests for totals parsing/writing, VATPart, item VAT, PrjInfo, and attachments
1.4.1 - 2026-03-15¶
Added¶
- Shared Catalog Module —
CtlgAssignandCatalogmoved topygaeb.models.catalogfor cross-package reuse ctlg_assignsfield onItem,BoQCtgy,BoQInfo,CostElement, andECCtgyCtlgAssignparsing in procurement (_parse_item,_parse_ctgy,_parse_boq_info), cost (CostElement,ECCtgy), and trade (TradeOrder,OrderInfo,OrderItem)CtlgAssignserialization in procurement writer (items, categories, BoQInfo) and trade writer (Order, OrderInfo, OrderItem)ctlg_assignsfield onOrderItem,OrderInfo, andTradeOrder- Phase-specific procurement namespace —
procurement_namespace()helper; writer output now usesDA83/3.3for X83,DA84/3.3for X84, etc. (was hardcodedDA86) - DA XML 3.0 and earlier correctly falls back to the fixed
200407namespace - MarkupItem support (X52) —
<MarkupItem>elements parsed asItemwithItemType.MARKUP MarkupSubQtymodel for markup sub-quantity referencesmarkup_typeandmarkup_sub_qtysfields onItem_add_markup_item()writer function for round-trip<MarkupItem>serialization- 33 new tests covering all findings (including trade CtlgAssign)
MarkupSubQtyadded to top-level__all__exports
Fixed¶
- Procurement writer namespace no longer hardcodes
DA86— each exchange phase gets its correct namespace CtlgAssignandCatalogno longer tied topygaeb.models.quantity; re-exported for backward compatibility
1.4.0 - 2026-03-15¶
Added¶
- GAEB Quantity Determination Phase Support (X31) — first-class parsing, writing, and API support for quantity take-off data
- New
DocumentKind.QUANTITYfor X31 quantity determination documents ExchangePhase.X31is_quantityproperty and_QUANTITY_PHASESfrozensetQtyDeterminationroot model withQtyBoQ,QtyBoQCtgy,QtyItemhierarchyQtyItem— thin BoQ position with OZ, measurement data, and catalog assignments (no text/prices)QDetermItemandQTakeoffRowfor REB 23.003 measurement row dataCatalogandCtlgAssignfor DIN 276, BIM, locality, and other catalog systemsQtyAttachmentfor base64-encoded attachments (photos, sketches, PDFs) at BoQ levelQtyDetermInfometadata (REB method, dates, creator/profiler addresses)PrjInfoQDfor external project referencesQtyParserfor<QtyDeterm>/<BoQ>/<QtyItem>XML structureGAEBWritersupport for X31 document output with quantity-specific namespace (DA31)DocumentAPIquantity-aware:is_quantity,qty_determination,get_qty_item(), updatedsummary()anditer_hierarchy()GAEBDocument.is_quantity,qty_determinationproperty, updatediter_items(),grand_total,item_count,memory_estimate_mb- Cross-referencing capability between X31 quantity items and procurement BoQ via OZ matching
- Quantity determination documentation guide
- 71 new tests for quantity parsing, models, writer round-trip, API, and enums
1.3.0 - 2026-03-15¶
Added¶
- GAEB Cost & Calculation Phase Support (X50, X51, X52) — first-class parsing, writing, and LLM support for cost estimation workflows
- New
DocumentKind.COSTfor X50/X51 elemental costing documents ExchangePhase.X50,X51,X52enum values withis_costpropertyElementalCostingroot model with recursiveECBody/ECCtgy/CostElementhierarchyCostElementwith LLM-compatible interface (short_text,long_text,qty,unit,classification,extractions)CostPropertyfor BIM integration (cad_id,arithmetic_qty_approach,value_qty_approach)RefGroupfor cross-references between cost elements, BoQ items, and dimension elementsDimensionElementandCategoryElementsupportECInfometadata (ec_type, ec_method, breakdowns, consortium members, totals)CostParserfor<ElementalCosting>/<ECBody>/<CostElement>XML structure- X52 extensions:
CostApproach,CostType,up_components(UPComp1-6),discount_pctonItem GAEBWritersupport for cost document output (X50/X51) and X52 fieldsDocumentAPIcost-aware:is_cost,elemental_costing,get_cost_element(), updatedsummary()anditer_hierarchy()GAEBDocument.is_cost,elemental_costingproperty, updatediter_items(),grand_total,item_count- LLM classifier and extractor
_item_labelupdated forele_no - Cost phases documentation guide
- 65 new tests for cost parsing, models, writer round-trip, and API
1.2.0 - 2026-03-15¶
Added¶
- GAEB Trade Phase Support (X93-X97) — first-class parsing, writing, and LLM support for trade workflows
- New models:
TradeOrder,OrderItem,OrderInfo,SupplierInfo,CustomerInfo,DeliveryPlaceInfo,PlannerInfo,InvoiceInfo,Address DocumentKindenum (PROCUREMENT/TRADE) andExchangePhase.X93/X94/X96/X97GAEBDocument.is_trade,is_procurement,document_kindproperties for explicit discriminationGAEBDocument.iter_items()universal iteration across both document kindsTradeParserfor<Order>/<OrderItem>XML structureGAEBWritersupport for trade document output with phase-specific namespacesDocumentAPItrade-aware methods:order,get_order_item(), updatedsummary()LLMClassifierandStructuredExtractornow work with both procurement and trade documents- Trade phases guide in documentation
- 38 new tests for trade parsing, models, writer round-trip, and API
1.0.1 - 2026-03-14¶
Added¶
- Version Conversion —
GAEBConverterfor converting between DA XML 2.0–3.3 withConversionReport GAEBWriter.write()now acceptstarget_versionparameter for multi-version output (2.0–3.3)GAEBWriter.to_bytes()for in-memory serialization to any version- Version-aware field dropping with warnings (e.g.,
bim_guiddropped in pre-3.3 output) - DA XML 2.x output with automatic English-to-German element translation
- Custom & Vendor Tag Access — opt-in raw XML retention via
keep_xml=True source_elementfield onItem,BoQCtgy,AwardInfo,GAEBInfofor raw lxml element accessGAEBDocument.xpath()for XPath queries against the full XML tree with auto-mapped namespace prefixDocumentAPI.xpath()andDocumentAPI.custom_tag()convenience helpers
Improved¶
- GAEB v3.2 parsing: text extraction from
<span>wrappers,<PrjInfo>fallback, dual-formatBoQBkdnsupport - Test suite expanded to 248 tests
1.0.0 - 2026-03-14¶
Added¶
- Multi-version GAEB DA XML parser (versions 2.0 through 3.3)
- Unified Pydantic v2 domain model (
GAEBDocument,Item,BoQ, etc.) - Automatic version/format/encoding detection
- XML recovery mode for malformed real-world files
GAEBParser.parse(),parse_bytes(),parse_string()entry points- Lenient and strict validation modes
- Structural, numeric, item, and phase-specific validation
- Cross-phase validation (
CrossPhaseValidator) GAEBWriterfor round-trip GAEB DA XML output- JSON and CSV export (
to_json,to_csv,to_json_string) - LLM-powered item classification via LiteLLM (100+ providers)
- Three-level taxonomy (Trade > Element Type > Sub-Type)
- Confidence flags and manual overrides
- Structured extraction into user-defined Pydantic schemas
- Built-in schemas:
DoorSpec,WindowSpec,WallSpec,PipeSpec - Pluggable cache architecture:
InMemoryCache(default),SQLiteCache(opt-in) DocumentAPIfor advanced filtering and navigationpy.typedmarker for PEP 561 compliance- Comprehensive test suite (193 tests)