
If you want to start with sam/bam files you can use sam-dump instead of fastq-dump. A very similar process should work for any RNAseq samples that you want. That will give you the raw exome sequence data for the T47D cell line.

This should produce two fastq files (one for R1 and one for R2). fastq-dump -outdir /opt/fastq/ -split-files /home//ncbi/public/sra/SRR925811.sra Note where the sra file is downloaded (by default to /home//ncbi/public/sra/.) and then convert to fastq with something like the following. Note: You can also find this SRX record page directly from the SRA project page for SRP026538 listed above.ĭetermine the SRR number and then download the data at the command-line with: prefetch -v SRR925811 Under 'Relations' is a link to the corresponding SRA page: Please read:įor example, to get fastq files for the T47D exome cell line data you could do something like the following:įind the appropriate GEO record for T47D from the GEO data set sub-series page for GSE48215 listed above. You can download the raw data using the SRA toolkit. From there you can link to the relevant SRA projects for RNA-seq at SRP026537 and Exome-seq at SRP026538. 14:R110.ĭata were deposited at GEO/SRA and are accessible through the GEO data set super-series for GSE48216 which is comprised of a sub-series for RNA-seq at GSE48213 and Exome-seq at GSE48215. Modeling precision treatment of breast cancer.

I will use the data released along with the following publication as an example:ĭaemen A, Griffith OL et al. How do you get started? First, things first, you need the sequence data. Suppose you want to download some raw sequence data in fastq format from GEO/SRA and run through an appropriate aligner (BWA, TopHat, STAR, etc) and then variant caller (Strelka, etc) or other analysis pipeline.
