Genome

is the main object you’ll be using to connect to and query a database

class cruzdb.Genome(db='', user='genome', host='genome-mysql.cse.ucsc.edu', password='', engine=None)[source]

Connect to a particular database

Returns a new Genome object

Parameters :

db : str

either an sqlalchemy dburl, or just the database name.

user : str

if db is a dburl, this is not needed. otherwise it’s the database user

host : str

if db is a dburl, this is not needed. otherwise it’s the database host

password : str

if db is a dburl, this is not needed. otherwise it’s the database password

engine : sqlalchemy.engine

if specified, all other parameters must be unused. just forces

use of an existing engine

Methods

annotate
bin_query
bins
commit
connection
create_url
dataframe
david_go
delete
downstream
entity
execute
expunge
expunge_all
flush
join
knearest
load_file
map
map_to
mirror
rollback
save_bed
sql
upstream
with_labels
annotate(fname, tables, feature_strand=False, in_memory=False, header=None, out=<open file '<stdout>', mode 'w' at 0x2aaaaab0a150>, parallel=False)[source]

annotate a file with a number of tables

Parameters :

fname : str or file

file name or file-handle

tables : list

list of tables with which to annotate fname

feature_strand : bool

if this is True, then the up/downstream designations are based on the features in tables rather than the features in fname

in_memoory : bool

if True, then tables are read into memory. This usually makes the annotation much faster if there are more than 500 features in fname and the number of features in the table is less than 100K.

header : str

header to print out (if True, use existing header)

out : file

where to print output

parallel : bool

if True, use multiprocessing library to execute the annotation of each chromosome in parallel. Uses more memory.

bin_query(table, chrom, start, end)[source]

perform an efficient spatial query using the bin column if available. The possible bins are calculated from the start and end sent to this function.

Parameters :

table : str or table

table to query

chrom : str

chromosome for the query

start : int

0-based start postion

end : int

0-based end position

static bins(start, end)[source]

Get all the bin numbers for a particular interval defined by (start, end]

create_url(db='', user='genome', host='genome-mysql.cse.ucsc.edu', password='')[source]

internal: create a dburl from a set of parameters or the defaults on this object

dataframe(table, limit=None, offset=None)[source]

create a pandas dataframe from a table or query

Parameters :

table : table

a table in this database or a query

limit: integer :

an integer limit on the query

offset: integer :

an offset for the query

static david_go(refseq_list, annot=('SP_PIR_KEYWORDS', 'GOTERM_BP_FAT', 'GOTERM_CC_FAT', 'GOTERM_MF_FAT'))[source]

open a web-browser to the DAVID online enrichment tool

Parameters :

refseq_list : list

list of refseq names to check for enrichment

annot : list

iterable of DAVID annotations to check for enrichment

downstream(table, chrom_or_feat, start=None, end=None, k=1)[source]

Return k-nearest downstream features

Parameters :

table : str or table

table against which to query

chrom_or_feat : str or feat

either a chromosome, e.g. ‘chr3’ or a feature with .chrom, .start, .end attributes

start : int

if chrom_or_feat is a chrom, then this must be the integer start

end : int

if chrom_or_feat is a chrom, then this must be the integer end

k : int

number of downstream neighbors to return

knearest(table, chrom_or_feat, start=None, end=None, k=1, _direction=None)[source]

Return k-nearest features

Parameters :

table : str or table

table against which to query

chrom_or_feat : str or feat

either a chromosome, e.g. ‘chr3’ or a feature with .chrom, .start, .end attributes

start : int

if chrom_or_feat is a chrom, then this must be the integer start

end : int

if chrom_or_feat is a chrom, then this must be the integer end

k : int

number of downstream neighbors to return

_direction : (None, “up”, “down”)

internal (don’t use this)

load_file(fname, table=None, sep='t', bins=False, indexes=None)[source]

use some of the machinery in pandas to load a file into a table

Parameters :

fname : str

filename or filehandle to load

table : str

table to load the file to

sep : str

CSV separator

bins : bool

add a “bin” column for efficient spatial queries.

indexes : list[str]

list of columns to index

mirror(tables, dest_url)[source]

miror a set of tables from dest_url

Returns a new Genome object

Parameters :

tables : list

an iterable of tables

dest_url: str :

a dburl string, e.g. ‘sqlite:///local.db’

classmethod save_bed(query, filename=<open file '<stdout>', mode 'w' at 0x2aaaaab0a150>)[source]

write a bed12 file of the query. Parameters ———-

query : query
a table or query to save to file
filename : file
string or filehandle to write output
sql(query)[source]

show the sql of a query

upstream(table, chrom_or_feat, start=None, end=None, k=1)[source]

Return k-nearest upstream features

Parameters :

table : str or table

table against which to query

chrom_or_feat : str or feat

either a chromosome, e.g. ‘chr3’ or a feature with .chrom, .start, .end attributes

start : int

if chrom_or_feat is a chrom, then this must be the integer start

end : int

if chrom_or_feat is a chrom, then this must be the integer end

k : int

number of upstream neighbors to return

Previous topic

Welcome to cruzdb’s documentation!

This Page