Name : bioawk
| |
Version : 030715
| Vendor : obs://build_opensuse_org/home:halocaridina
|
Release : 2.4
| Date : 2018-07-04 04:07:10
|
Group : Other
| Source RPM : bioawk-030715-2.4.src.rpm
|
Size : 0.17 MB
| |
Packager : (none)
| |
Summary : Bioawk is an extension to Brian Kernighan\'s awk for dealing with several common biological data formats
|
Description :
Bioawk is an extension to Brian Kernighan\'s awk acquired from [1], adding the support of several common biological data formats, including optionally gzip\'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with the column names. It also adds a few built-in functions including, as of now, and(), or(), xor(), reverse() and revcomp(). The following are a few examples demonstrating the new functionality:
1. Extract unmapped reads without header:
awk -c sam \'and($flag,4)\' aln.sam.gz
2. Extract mapped reads with header:
awk -c sam -H \'!and($flag,4)\'
3. Reverse complement FASTA:
awk -c fastx \'{print \">\"$name;print revcomp($seq)}\' seq.fa.gz
4. Create FASTA from SAM (uses revcomp if FLAG & 16)
samtools view aln.bam | \\ awk -c sam \'{s=$seq; if(and($flag, 16)) {s=revcomp($seq)} print \">\"$qname\"\ \"s}\'
5. Get the %GC from FASTA:
awk -c fastx \'{print \">\"$name;print gc($seq)}\' seq.fa.gz
6. Get the mean Phred quality score from FASTQ:
awk -c fastx \'{print \">\"$name;print meanqual($qual)}\' seq.fq.gz
7. Take column name from the first line (where \"age\" appears in the first line of input.txt):
awk -c header \'{print $age}\' input.txt
Note that when \"-c\" is not specified and the new built-in functions are not used, bioawk should behave exactly the same as the original BWK awk. At least this is the intention. Bioawk also tries to minimize the modification to the original code base such that improvements in the future versions of BWK awk can be readily incorporated into bioawk (yes, Brian Kernighan is still maintaining his code).
Bioawk may have the following limitations:
1. To parse FASTA and FASTQ formats, bioawk replaces the line reading module of awk, which also allows bioawk to seamlessly parse gzip\'ed files. However, the new line reading code does not fully mimic the original code. It may fail in corner cases. Thus when \"-c\" is not specified, awk falls back to the original line reading code and does not support gzip\'ed input.
2. When \"-c\" is in use, several strings allocated in the new line reading module are not freed in the end. These will be reported by valgrind as \"still reachable\". To some extend, these are not memory leaks.
[1] http://www.cs.princeton.edu/~bwk/btl.mirror/
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/home:/halocaridina/CentOS_CentOS-6/i686 |
Hmm ... It's impossible ;-) This RPM doesn't exist on any FTP server
Provides :
bioawk
bioawk(x86-32)
Requires :