SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG

 
 

bioawk rpm build for : CentOS 6. For other distributions click bioawk.

Name : bioawk
Version : 030715 Vendor : obs://build_opensuse_org/home:halocaridina
Release : 2.4 Date : 2018-07-04 04:07:10
Group : Other Source RPM : bioawk-030715-2.4.src.rpm
Size : 0.17 MB
Packager : (none)
Summary : Bioawk is an extension to Brian Kernighan\'s awk for dealing with several common biological data formats
Description :
Bioawk is an extension to Brian Kernighan\'s awk acquired from [1], adding the
support of several common biological data formats, including optionally gzip\'ed
BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with the column names. It
also adds a few built-in functions including, as of now, and(), or(), xor(),
reverse() and revcomp(). The following are a few examples demonstrating the new
functionality:

1. Extract unmapped reads without header:

awk -c sam \'and($flag,4)\' aln.sam.gz

2. Extract mapped reads with header:

awk -c sam -H \'!and($flag,4)\'

3. Reverse complement FASTA:

awk -c fastx \'{print \">\"$name;print revcomp($seq)}\' seq.fa.gz

4. Create FASTA from SAM (uses revcomp if FLAG & 16)

samtools view aln.bam | \\
awk -c sam \'{s=$seq; if(and($flag, 16)) {s=revcomp($seq)} print \">\"$qname\"\
\"s}\'

5. Get the %GC from FASTA:

awk -c fastx \'{print \">\"$name;print gc($seq)}\' seq.fa.gz

6. Get the mean Phred quality score from FASTQ:

awk -c fastx \'{print \">\"$name;print meanqual($qual)}\' seq.fq.gz

7. Take column name from the first line (where \"age\" appears in the first line
of input.txt):

awk -c header \'{print $age}\' input.txt


Note that when \"-c\" is not specified and the new built-in functions are not
used, bioawk should behave exactly the same as the original BWK awk. At least
this is the intention. Bioawk also tries to minimize the modification to the
original code base such that improvements in the future versions of BWK awk
can be readily incorporated into bioawk (yes, Brian Kernighan is still
maintaining his code).


Bioawk may have the following limitations:

1. To parse FASTA and FASTQ formats, bioawk replaces the line reading module of
awk, which also allows bioawk to seamlessly parse gzip\'ed files. However,
the new line reading code does not fully mimic the original code. It may
fail in corner cases. Thus when \"-c\" is not specified, awk falls back to the
original line reading code and does not support gzip\'ed input.

2. When \"-c\" is in use, several strings allocated in the new line reading
module are not freed in the end. These will be reported by valgrind as
\"still reachable\". To some extend, these are not memory leaks.


[1] http://www.cs.princeton.edu/~bwk/btl.mirror/

RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/home:/halocaridina/CentOS_CentOS-6/i686

Content of RPM  Changelog  Provides Requires

Hmm ... It's impossible ;-) This RPM doesn't exist on any FTP server

Provides :
bioawk
bioawk(x86-32)

Requires :
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
libc.so.6(GLIBC_2.1)
libc.so.6(GLIBC_2.0)
libz.so.1
rpmlib(PayloadIsXz) <= 5.2-1
rtld(GNU_HASH)
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
libc.so.6(GLIBC_2.7)
libc.so.6
libc.so.6(GLIBC_2.3)
libm.so.6(GLIBC_2.0)
libm.so.6


Content of RPM :
/usr/bin/bioawk
/usr/share/doc/bioawk-030715
/usr/share/doc/bioawk-030715/FIXES
/usr/share/doc/bioawk-030715/README.awk
/usr/share/doc/bioawk-030715/README.md
/usr/share/man/man1/bioawk.1.gz

 
ICM