RetroCat

RetroCat is a sort of database and blob storage intended for large catalogs of similarly-formatted media objects. it was intended to be a deduplicating blob storage for video game ROMs but I'm also intending to use it for archival data in general.

What's the point?

The goal is to decompose and recompose object contents interred into its blob storage, such that the contents can be queried individually, or the entire object reconstructed 1:1 to its original file. As well, if you had two familially-related files with minor changes in an object inside of it, retrocat would only store those changes.

Object contents would be reversibly deconstructed, decrypted/decompressed, and loaded into a content-addressible blob storage, backed by a database, which could then deduplicate and differentially compress across multiple items of the same type.

This would make common things like, in the example of a video game library, SDK libs or assets, or update files, which are part of the original ROM, able to be deduplicated away as your library grows; and carrying multiple regional or version revisions of the same title would only keep the differences (e.g., patches to the code, or string and gfx changes) stored. Ttems would be imported and exported from the blob store, and the database would track what items changed, and their deltas.

So say - for instance - you have a bunch of ISO images of operating systems, different languages, different system builds, but a lot of the same stuff in all of them. You just put them all in retrocat, and it would splay out the ISO filesystem into a set of catalogued objects independently, and then - if say, you had Windows XP in English and one in Portuguese - it would see one 'dominant' file, and every 'sub' file would be differences applied to the dominant one.

RetroCat will have an API where, say, a video game emulator could simply request “what block where” over some IPC, and RetroCat would do the heavy lifting of decompressing and reconstructing the filesystem on-the-fly.

  • RetroCat does deduplication, except with delta-tracking of familial files rather than block-by-block changes
  • The idea is that you would use this in tandem with a 'known-good dump' tracker, and it checks object hashes against known-good media for a media library, such as OS install images, arcade machine ROMs or video games.
    • Patches are “children of” a media item. You can patch individual files inside an ISO or ROM, rather than patching the media at large.
  • Intended to be used either as a “file manager” to retrieve or put data into, or with a library, libretrocat, which can natively provide data from, and browse through, the catalog as if it were a filesystem or file chooser.
  • it maintains filial relationships between media by version or region, and only keeps the differences between those, rather than having five different ISOs taking up 5x the space.
  • If something cannot be recomposed, it will not be decomposed.
    • That means, if any loss may be applied in the import process, that process is left out.
  • Inspired by the Venti block storage system
  • Takes some hints from https://github.com/mhx/dwarfs
  • meant to use a WEMI-like cataloguing scheme to denote familial relationships between items in the catalog, for deduplication and reconstitution
© 2025 significant bit - cc-by
Cookie disclosure: This blog stores two cookies for visitors - one for your light/dark theme setting, and one session cookie for the duration of your browsing session.
CC Attribution 4.0 International