Sepia: a object-hierarchy-to-disk serializer shard

I have not seen this around but it’s a pattern I use a lot in my own coding:

  • I want data structured in a hierarchy
  • I want to save it to disk
  • I don’t want to use a database, I want to use plain old files and directories

Why?

  • Because I don’t want to lock the user in.
  • Because I want to manipulate the data using unixy tools
  • Because I want to version-control it

So, that’s Sepia. You lightly annotate your classes, and write how they can be turned to/from a string. Then you can roundtrip from/to your data tree to a directory tree.

Here’s a small example:

require "sepia"

# Configure Sepia to use a local directory for storage.
Sepia::Storage::INSTANCE.path = "./_data"

# A Postit is a simple Serializable object.
class Postit
  include Sepia::Serializable

  property text : String

  def initialize(@text); end
  def initialize; @text = ""; end

  # The to_sepia method defines the content of the serialized file.
  def to_sepia : String
    @text
  end

  # The from_sepia class method defines how to deserialize the object.
  def self.from_sepia(sepia_string : String) : self
    new(sepia_string)
  end
end

# A Board is a Container that can hold other Boards and Postits.
class Board
  include Sepia::Container

  property boards : Array(Board)
  property postits : Array(Postit)

  def initialize(@boards = [] of Board, @postits = [] of Postit); end
end

# --- Create and Save ---

# A top-level board for "Work"
work_board = Board.new
work_board.sepia_id = "work_board"

# A nested board for "Project X"
project_x_board = Board.new
project_x_board.sepia_id = "project_x" # This ID is only used for top-level objects

# Create some Post-its
postit1 = Postit.new("Finish the report")
postit1.sepia_id = "report_postit"
postit2 = Postit.new("Review the code")
postit2.sepia_id = "code_review_postit"

# Assemble the structure
project_x_board.postits << postit2
work_board.boards << project_x_board
work_board.postits << postit1

# Save the top-level board. This will recursively save all its contents.
work_board.save

# --- Load ---

loaded_work_board = Board.load("work_board").as(Board)

puts loaded_work_board.postits[0].text # => "Finish the report"
puts loaded_work_board.boards[0].postits[0].text # => "Review the code"

And it produces this tree on disk:

./_data
├── Board
│   └── work_board
│       ├── boards
│       │   └── project_x
│       │       └── postits
│       │           └── 0 -> ./_data/Postit/code_review_postit
│       └── postits
│             └── 0 -> ./_data/Postit/report_postit
└── Postit
    ├── code_review_postit
    └── report_postit

You can nest containers all you want, some bits are to be implemented (like preserving order when roundtripping an array) and files with data are deduplicated (they have a canonical location and are symlinked to all the places where they are referenced)

UPDATE: And I forgot to add the link, of course. https://github.com/ralsina/sepia

3 Likes

Is there a link to a repo?

Repo seems to be at GitHub - ralsina/sepia: A serializer focused on storing a tree of objects to disk in an intuitive way

I love this!

So many things absolutely do not need the performance of a database, but would benefit greatly by simple access to data via 5 decades of filesystem tools.

Absolutely brilliant. (idea that is, can’t speak for the implementation, have not looked at it :) )

2 Likes

The implementation still needs work and to be used in anger by someone other than me at some point :-D

1 Like

Sepia has recently gained some new capabilities:

  • If you have an object that is a Sepia::Container and has properties, it serializes to a folder with a json file in it containing the properties.
  • If any of those properties is a Sepia::Serializable then it’s serialized to a separate file according to its own configuration and linked into the container’s directory
  • There is a garbage collector
  • Several bugs fixed
1 Like

New experimental features:

  1. Caching so I/O usage is lower
  2. Generations for stored items, so you have version history
  3. Disk watching support, so you could theoretically have two processes using the same store and they get notified when the other modifies things