apoinformatica: Coworkers

Friday, August 15, 2008

Coworkers

Tips, advises or news in bioinformatics useful in applied programs.
Welcome!

16 comments:

AA2 said...: From Varinia Lopez

Te mando la instrucción para sustituir de un archivo multifasta un
conjunto de caracteres.

" perl -i.bak -pe 's/\|.*\[/-/;s/\]//' Nombre del archivo "

Mis secuencias son de este tipo:
>gi|21219042|ref|NP_624821.1| DEAD-box RNA-helicase [Streptomyces
coelicolor A3(2)]
MNAKPTASFDGLGLPPVLVETMTSLGVTRPFPIQAATLPEALAGRDVLGRARTGSGKTLAFGLALLAGTA
GRRAEPKRPLALVLVSTRELAQQVSDALAPYARALGVRLTTVVGGLSINRQTQALRDGAEVVVATPGRLT
DLVSRRDCHLNQVRITVLDEADQMCDLGFLPQVSGILDQVPSDGQRLLFSATLDGDVDQLVRDHLHDPVP
VSVDPASASVSTMEHHVLTVHPADKYATATEIAARDGRVLMFLDTKAGVDRFTRELRAAGVSAGALHSGK
SQPQRTHTLARFVEGGVTVLVATNVAARGIHVDDLDLVVNVDPPADAKDYLHRGGRTARAGRAGSVVTLV
TPDQRREVNRMMSEAGIRPTVTPVRSGEQKLTDLTGAKRPPAGRGKESGNAPFRGMGTRPAGAAKGSRKA
VEARRAAEARAAARVRKGR

con el comando sed sustituyo gi| por espacio en blanco y con el comando de
perl (que te mando), las secuencias quedan como sigue:

>21219042-Streptomyces coelicolor A3(2)
MNAKPTASFDGLGLPPVLVETMTSLGVTRPFPIQAATLPEALAGRDVLGRARTGSGKTLAFGLALLAGTA
GRRAEPKRPLALVLVSTRELAQQVSDALAPYARALGVRLTTVVGGLSINRQTQALRDGAEVVVATPGRLT
DLVSRRDCHLNQVRITVLDEADQMCDLGFLPQVSGILDQVPSDGQRLLFSATLDGDVDQLVRDHLHDPVP
VSVDPASASVSTMEHHVLTVHPADKYATATEIAARDGRVLMFLDTKAGVDRFTRELRAAGVSAGALHSGK
SQPQRTHTLARFVEGGVTVLVATNVAARGIHVDDLDLVVNVDPPADAKDYLHRGGRTARAGRAGSVVTLV
TPDQRREVNRMMSEAGIRPTVTPVRSGEQKLTDLTGAKRPPAGRGKESGNAPFRGMGTRPAGAAKGSRKA
VEARRAAEARAAARVRKGR

Creo que es un comando muy poderoso.

Varinia; August 15, 2008 at 12:41 PM
Juan said...: Y porque usar sed? basta con pasar la sustitucion a Perl antes:

perl -i.bak -pe 's/gi\|//; s/\|.*\[//; s/\]//' archivo; August 18, 2008 at 1:39 PM
AA2 said...: It's an useful expression, we use it for eleminate all the sequence, and choose the description and GI from a fasta file.

$grep ">" file.pep | sed 's/>//' > file.out

It was described by Juan Caballero and was used in a perl program which filtrated the blast results. I want to know if is possible apply this expression and generate and small Data Base only with the description?
It is useful?; August 18, 2008 at 3:45 PM
AA2 said...: Some useful expression, I used it to change and write fasta's name into a sequence

$ sed i\>name_sequence file.txt > file.fasta

another expression to cut information before the GI into the descriptor

$ cut -f1 -d " " file.fasta > file_2.fasta

and the next expresison it is useful ot delete a specific sequence, like poliA o trim vector

$ sed 's/ .+sequence//g'

$ sed 's/sequence.+$//g'; August 19, 2008 at 9:41 AM
Juan said...: Perl One Liners:
http://sial.org/howto/perl/one-liner/

(to undertand how the Varinia's code works).

PD:
Please define the language for this blog.
Por favor definir el idioma de este blog.
Veuillez définir la langue de ce blog.; August 19, 2008 at 2:09 PM
AA2 said...: el lenguaje bro es el bioinformatico, lo demas es accesorio!!

Salut et Bientot!!

Cesare; August 19, 2008 at 2:19 PM
AA2 said...: Gracias por http://sial.org/howto/perl/one-liner/

Estabamos preguntando el porque de ese lenguaje

Gracias Bro!!

Cesar; August 19, 2008 at 2:21 PM
AA2 said...: para imprimir un archivo por la terminal, con el comando lpr

$ lpr file.pdf; August 26, 2008 at 4:12 PM
AA2 said...: I found in Juan Caballero's blogg (linxe'eye) the next script, useful for mapping big sequences into a chromosome

http://linxe-eye.blogspot.com/2008/08/maping-big-sequences.html; August 28, 2008 at 11:35 AM
AA2 said...: I was looking the way of made some scripts executables, some tried to change the .bash_profile, but Juan Caballero said that add export PATH=$PATH:/my/dir/scripts/bin to the end of the file .bashrc close the terminal or open other new shell, check by echo $PATH

see you; September 3, 2008 at 1:50 PM
AA2 said...: Gustavo Hernandez send the next link, which has some scripts perl one liner, useful in blast and fasta files manager

FAS center for sys biol:
http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/Protocols/Sequences.html

Cesar; September 9, 2008 at 2:17 PM
AA2 said...: Gustavo Hernandez and Andres Zurita had recommended the nexts plugins for Firefox, useful in reasearch, thanks!!!

hay dos plugins para firefox, uno es el the moleular biologist´s toolbar y el otro se llama zotero

The molecular biologist's toolbar
http://molecularbiology.toolbar.fm/

http://www.zotero.org/

la búsqueda de pdf se vuelve más sencillo

saludos

César; October 31, 2008 at 6:02 PM
AA2 said...: Nidia write us the next expression to Mysql, it is when you need compare two tables and search the elements which are no present in table1. mysql> select gene2 from table1 left join table2 on gene1 = gene2 where gene1 is null;; January 7, 2009 at 10:41 AM
AA2 said...: I suggest the next link http://www.mendeley.com/ useful to make our papers database, like endnote, but this software extract metadata from PDF's files, and with a plugin to use in word (I hope appears soon some plugin in open office) to format the text document with a reference style.; July 22, 2009 at 7:45 AM
AA2 said...: Lo siguiente lo Mando Linxe, esta relacionado con el analisis de regiones 5 y 3 primeas ademas de las intergenicas. El trabajo toma como ejemplo al genoma humano, saludos y disfruten! :)

usa la del UCSC Genome Browse, la ultima estable es la hg18
(http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/), para los
5' y 3' puedes sacarlos del knownGenes.txt o del foldUTR3.txt y
foldUTR5.txt. Para las intergenicas, basta con que saques las
secuencias entre genes del knowGenes.txt usando las coordenadas y el
fasta del genoma, yo lo que hago es tomar el cromosoma y usando las
coordenadas de knowGenes enmascarar las regiones en un "substr(chr,
$ini, $len) = 'N' x $len", luego nomas parto el cromosoma en bloques
con un "split(/N+/, $chr)".

La otra es que uses R::Bioconductor::BiomaRt
http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html; October 8, 2009 at 11:38 AM
Linxe said...: Mas en detalle sobre como hago para extraer las regiones intergenicas (Perl-style).

1. Supongamos que tenemos una variable con una secuencia o cromosoma ($seq) y que tenemos las coordenadas de los genes en un hash de tipo:
my %genes = ();
$genes{'gen1'}{'ini'} = 100;
$genes{'gen1'}{'end'} = 200;
$genes{'gen2'}{'ini'} = 400;
$genes{'gen2'}{'end'} = 550;
2. Usamos un ciclo para "enmascarar" los genes:
foreach my $gen (keys %genes) {
my $ini = $genes{'$gen'}{'ini'};
my $end = $genes{'$gen'}{'end'};
my $len = $end - $ini;
substr($seq, $ini, $len) = 'N' x $len;
}

3. Finalmente partimos la secuencia en bloques usando como delimitador el las "mascaras":
my @intergenic = split(/N+/, $seq);

Notas:
-Hay que tener cuidado con las coordenadas numericas, recuerden que Perl empieza a contar desde CERO y muchas bases de datos usan UNO.
-En caso de traslapes no importa, de igual forma se enmascaran las regiones.
-El subir el una secuencia grande a memoria va a saturar su RAM seguramente, haganlo con la supervision de un adulto.; October 8, 2009 at 12:19 PM

apoinformatica

Friday, August 15, 2008

Coworkers

16 comments:

Noticias de hoy-La jornada

FEEDJIT Live Traffic Map