9. Understanding the file formats
The gene expressions
The database.sqlite is, as the name suggests, an sqlite database. It holds all the non-zero gene expression values as a sparse matrix would. The database contains three tables. The genes
table contains gene names and gene ids, the cells
table holds cell names and cell ids, and datavalues
holds gene ids, cell ids and gene expression values. The CREATE
statements that populates the database are below.
CREATE TABLE datavalues ('gene_id' REAL, 'cell_id' REAL, 'value' REAL);
CREATE TABLE genes ('id' INTEGER NOT NULL UNIQUE,'gname' varchar(20) COLLATE NOCASE)
CREATE TABLE cells ('id' INTEGER NOT NULL UNIQUE,'cname' varchar(20) COLLATE NOCASE)
CREATE UNIQUE INDEX gnameIDX ON genes (gname)
CREATE UNIQUE INDEX cnameIDX ON cells (cname)
CREATE INDEX gene_id_data ON datavalues ('gene_id')
CREATE INDEX cell_id_data ON datavalues ('cell_id')