While the first step of RNA-Seq analysis is aligning your sequencing reads to a reference genome, first you need to get your data on a Linux server to use those analysis tools. This guide will introduce you to using Linux on the command line and teach you how to get your data on to the Linux server.
1. Get Access to a Linux Server
Some alignment tools can require large amounts of RAM and your files are large, so you’ll want to use a high-performance Linux server. Helpfully, many universities have these available for their students and staff, and they keep them updated with useful software. Some labs decide to instead maintain their own computers. Find out what your options are.
2. Connect to That Computer
You probably won’t sit in front of the server that you’re using and type on it. Instead, you’ll connect through your own computer. Doing so is easy from a Mac, and not very complicated from a Windows machine.
The server that you’re using will have its own IP address. You’ll get a user name and password for it.
For a Mac, open the terminal. Type “ssh <username>@<IP address>. It will then ask for your password.
For Windows, you’ll need to download what’s called an SSH client, and it will connect you. The most common one is called PuTTY. After installing and opening PuTTY, type in your IP address. The host number is almost always 22. Hit open, it will connect, then ask you for user name and password.
You’re now connected!
3. Adjust to Linux
If you’re new to Linux, getting used to the command line can take some time. Instead of clicking on folders or files to open them, you navigate by typing into the terminal. Some useful commands are:
pwd – print the path of your current directory
ls – list all contents of your current directory
ls -lh – list contents of directory with their size, when last modified, and more information
rm <file name> – remove specified file
cd <location> – change directory to a specified directory
cd ../ – change directory to one level higher
mkdir <name> – make a new directory with the specified name
cp <source> <destination> – copy file from source to destination
mv <source> <destination> – move file from source to destination
head <file name> – see first ten lines of a file
tail <file name> – see last ten lines of a file
cat <file name> – see entire file
more <file name> – scroll through entire file
gunzip – unzip files with .gz extension
gzip – zip file to .gz extension
tar -xzvf – extract and unzip files from file ending in .tar.gz or .tgz
There are a couple options for downloading data. One is to download it to your computer, then transfer it to the Linux server. The second is to directly download the data to the server.
If you first download to your computer, you’ll create a backup of your data on your own computer. An easy way to get the data from your computer to the server is using the Filezilla client.
If you download directly to the server, you save time (but remember to make a backup!). Whoever did the sequencing will provide a link to where the data are stored. While connected to the server, type in:
wget -c <link>
If you disconnect from the server during the download, type that command again, and your download will continue from where it ended.
At this point, you have your data and are ready to start analyzing it! Stay tuned for more advice on choosing a genome alignment tool, detecting differential gene expression, and more!
Let’s face it, when you leave the lab for the day, your mind is still racing with ideas and questions about your experiment: how to fix this or what went wrong or what does this data point even mean. It can be difficult to actually shut down and decompress when you walk out of the […]
It’s great to have you in the Bitesize Bio family! We’ve sent you an email to confirm your registration. Please click on the link in the email or paste it into your browser to finalize your registration.
For more information on how to use Bitesize Bio, take a look at the following image (click it, for a larger version)
An error occured while registering you, please reload the page and try again