For my work I use Crystal to pre-process data for statistical analysis in R, but with the lack of support of certain formats used in analytics (e.g., Parquet and Arrow) I often have to resort to CSV, which is inefficient for larger datasets.
Recently I discovered DuckDB, which seemed apt for my use case to do Online Analytical Processing (OLAP) with larger-than-memory datasets. DuckDB fills the void of an in-process relational database (like SQLite) but for OLAP workloads (like ClickHouse).
DuckDB uses a columnar store for data, like most DB engines intended for analytics, and it is one of the most performant for medium-sized data, beating even Pandas and Spark on certain operations. In addition, it offers an appending feature to efficiently add rows to the database directly from the host application (orders of magnitude faster than insert statements). In short, it’s a relatively young but exciting project.
For these reasons, I have created crystal-duckdb, which offers a driver for crystal-db plus some features specific to DuckDB. This is an initial implementation compatible with the recently released DuckDB v0.2.8, but it should already cover many use cases.
It you need to do analytics and would like to use Crystal, give it a try. As always PRs are welcomed!