Skip to main content
Version: 3.1.X

Dublin Core Repository Model

The dubcore database is an implementation of the Dublin Core Metadata Initiative (DCMI) standard.

Unlike the library catalogs (MARC) or archival bases (ISAD-G), this model is optimized for Digital Repositories. It is designed to store not just the description of a resource, but the resource itself (PDFs, Images, Office documents) and, crucially, to index the full text of these documents.

Database Definition

  • Internal Name: dubcore
  • Standard: Simple Dublin Core (15 Elements) + Technical Metadata extensions.
  • Key Feature: Full-text indexing and Batch Import capabilities.

1. Field Structure (FDT)

The FDT is divided into three logical blocks: the standard descriptive metadata, technical metadata for images, and system control fields for the digital library.

Core Elements (Tags 1-15)

These follow the international standard 1:1.

TagNameDescription
1TitleA name given to the resource.
2CreatorEntity primarily responsible for making the resource.
3SubjectThe topic of the resource (Keywords).
4DescriptionAn account of the resource (Abstract).
7DateA point or period of time associated with an event in the lifecycle.
9FormatThe file format, physical medium, or dimensions (e.g., application/pdf).
10IdentifierAn unambiguous reference to the resource within a given context.
15RightsInformation about rights held in and over the resource.

Technical Extensions (Images)

If the record is an image, the system (via extraction tools) can populate EXIF data.

TagNameDescription
50-51Height/WidthImage dimensions in pixels.
52-53ResolutionDPI (X/Y).
61-66GPS DataLatitude, Longitude, and Altitude (for georeferenced photos).
59-60CameraMake and Model of the device.

System & Digital Library (Tags 90+)

Fields used by ABCD to manage the file storage.

TagNameUsage
95HTML URLPath to the converted HTML version of a document.
96Record TypeClassification (e.g., text, image) used to select the worksheet.
97SectionVirtual collection folder (e.g., "Thesis", "Photos").
98Document URLPath to the original downloaded file (PDF, DOCX).
99Doc TextHidden Field. Contains the extracted full text of the document for indexing.

2. Document Types (Worksheets)

The system automatically selects the worksheet based on the type of file being imported or cataloged.

  • text.fmt: For textual documents. Focuses on Title, Author, and the Full Text content.
  • image.fmt: For visual assets. Hides text-specific fields and displays the EXIF/GPS data tags.

3. Indexing Strategy (FST)

The dubcore.fst is aggressive. It indexes almost everything to ensure retrievability, including the contents of the files.

PrefixNameTechniqueScope
TW_Text WordWord (8)Global Search. Indexes Title, Description, Subject, AND Tag 99 (Full Text).
TI_TitleWord (5)Search within titles.
CR_CreatorWord (5)Search for authors/photographers.
SU_SubjectWord (5)Search within keywords.
IMS_Image SizePrefixAllows filtering by dimensions.

4. Full Text Extraction (Apache Tika)

The power of the dubcore model relies on Apache Tika.

  • The Concept: When you upload a PDF or Word document to ABCD using this model, the system calls a Java process (tika.jar).
  • The Process: Tika reads the binary file, extracts all readable text, and injects it into Tag 99 of the record.
  • The Result: When a user searches for a phrase in the OPAC (e.g., "Quantum Mechanics"), ABCD finds the record even if those words only appear on page 45 of the attached PDF, not in the Title or Subject.
Server Requirement

For full-text extraction to work, Java (JRE) must be installed on the server, and the tika.jar file must be present in the cgi-bin or configured utilities folder.

5. Visualizing Data (PFT)

  • dubcore.pft: Designed to be media-aware.
    • If the record is an Image: It displays a thumbnail and the technical specs (Camera, GPS).
    • If the record is a Document: It displays the metadata and a "Download" icon linking to the file (Tag 98).