#Please insert up references in the next lines (line starts with keyword UP) UP arb.hlp UP glossary.hlp #Please insert subtopic references (line starts with keyword SUB) #SUB subtopic.hlp #SUB parser.hlp #SUB regexpr.hlp SUB exec_bug.hlp # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} #************* Title of helpfile !! and start of real helpfile ******** TITLE ARB Command Interpreter (ACI) OCCURRENCE NDS [ export db ] [ ARB_NT/Species/search/parse_fields ] DESCRIPTION The command interpreter is a simple interpreter. All commands take the data from the input streams, modify it and write it to the output (which may be the input of the next command). The first input stream is normally the value of a database field (see NDS for more information). e.g. count("a") counts every 'a' in each input stream and generates an output stream (== the sum of 'a') for every input. Many commands have command modifiers which are appended to the command. Different commands can be separated by: ';' all !!! commands take all !!! the input streams and each command generates its own output streams '|' the output of the left commands are used as the input of the right. e.g. count("A");count("AG") creates two streams: 1. how many A's 2. and how many A's and G's count("A");count("G")|per_cent per_cent is a command that divides two numbers (number of 'A's / number of 'G's) and returns the result as percent. Finally all output streams are concatenated and - NDS: printed at the tips of the tree. - MODIFY DATABASE FIELD: stored in the destination field. DESCRIPTION eg: count("A");count("G")|"a/g = "; per_cent input --> count("A") -->| -----> "a/g = " --> | \ "AGG" \ | \ / | --> 'a/g = 50' \ | \ | --> \ | / --> per_cent --> | / . ->count("G")-->| -----> | SECTION COMMANDLIST If not otherwise mentioned every command creates one output stream for each input stream. STREAM HANDLING echo(x1;x2;x3...) creates an output stream for each parameter 'x' and writes 'x' onto it. "text" == echo("text") dd copies all input streams to output streams cut(N1,N2,N3) copies the Nth input stream(s) drop(N1,N2) copies all but the Nth input stream(s) dropempty drops all empty input streams dropzero drops all non-numeric or zero input streams swap(N1,N2) swaps two input streams (w/o parameters: swaps last two streams) toback(X) moves the Xth input stream to the end of output streams tofront(X) moves the Xth input stream to the start of output streams merge([sep]) merges all input streams into one output stream. If 'sep' is specified, it's inserted between them. If no input streams are given, it returns 1 empty input stream. split([sep[,mode]]) splits all input streams at separator string 'sep' (default: split at linefeed). Modes: 0 remove found separators (default) 1 split before separator 2 split after separator streams returns the number of input streams STRING head(n) the first n characters left(n) the first n characters tail(n) the last n characters right(n) the last n characters len the length of the input len("chr") the length of the input excluding the characters in 'chr' mid(x,y) the string from x to y y < 0 means a position relative to the end crop("str") removes characters of 'str' from both ends of the input remove("str"); removes all characters of 'str' e.g. remove(" ") removes all blanks keep("str"); the opposite of remove: remove all chars that are not a member of 'str' srt("orig=dest",...) replace command, invokes SRT (see LINK{parser.hlp}) translate("old","new"[,"other"]) translates all characters from input that occur in the first argument ("old") by the corresponding character of the second argument ("new"). An optional third argument (one character only) means: replace all other characters with the third argument. Example: Input: "--AabBCcxXy--" translate("abc-","xyz-") "--AxyBCzxXy--" translate("abc-","xyz-",".") "--.xy..z...--" This can be used to replace illegal characters from sequence date (see predefined expressions in 'Modify fields of listed species'). tab(n) append n-len(input) spaces pretab(n) prepend n-len(input) spaces upper converts string to upper case lower converts string to lower case caps capitalizes string format(options) takes a long string and breaks it into several lines option (default) description ========================================================== width=# (50) line width firsttab=# (10) first line left indent tab=# (10) left indent (not first line) "nl=chrs" (" ") list of characters that specify a possibly point of a line break; This character is deleted ! "forcenl=chrs" ("\n") Force a newline at these characters. extract_words("chars",val) Search for all words (separated by ',' ';' ':' ' ' or 'tab') that contain more characters of type chars than val, sort them alphabetically and write them separated by ' ' to the output STRING COMPARISON compare(a,b) return -1 if ab equals(a,b) return 1 if a=b, 0 otherwise contains(a,b) if a contains b, this returns the position of b inside a (1..N) and 0 otherwise. partof(a,b) if a is part of b, this returns the position of a inside b (1..N) and 0 otherwise. The above functions are binary operators (see below). For each of them a case-insensitive alternative exists (icompare, iequals, ...). CALCULATOR plus add arguments minus subtract arguments mult multiply arguments div divide arguments per_cent divide arguments * 100 (rounded) rest divide arguments, take rest The above functions work as binary operators (see below). To avoid 'division by zero'-errors, the operators 'div', 'per_cent' and 'rest' return 0 if the second argument is zero. Calculation is performed with integer numbers. BINARY OPERATORS Several operators work as so called 'binary operators'. These operators may be used in various ways, which are shown using the operator 'plus': ACI OUTPUT STREAMS plus(a,b) a+b input:0 output:1 a;b|plus a+b input:2 output:1 a;b;c;d|plus a+b;c+d input:4 output:2 a;b;c|operator(x) a+x;b+x;c+x input:3 output:3 That means, if the binary operator - has no arguments, it expects an even number of input streams. The operator is applied to the first 2 streams, then to the second 2 stream and so on. The number of output streams is half the number of input streams. - has 1 argument, it accepts one to many input streams. The operator is applied to each input stream together with the argument. For each input stream one output stream is generated. - has 2 arguments, it is applied to these. The arguments are interpreted as escaped ACI commands and are applied for each input stream. The results of the commands are passed as arguments to the binary operator. For each input stream one output stream is generated. CONDITIONAL select(a,b,c,...) each input stream is converted into a number (non-numeric text converts to zero). That number is used to select one of the given arguments: 0 selects 'a', 1 selects 'b', 2 selects 'c' and so on. The selected argument is interpreted as ACI command and is applied to an empty input stream. DEBUGGING trace(onoff) toggle tracing of ACI actions to standard output. Start arb from a terminal to see the output. Parameter: 0 or 1 (switch off or on) All streams are copied (like 'dd'). DATABASE AND SEQUENCE readdb(field_name) the contents of the field 'field_name' sequence the sequence in the current alignment. Note: older ARB versions returned 'no sequence' if the current alignment contained no sequence. Now it returns an empty string. For genes it returns only the corresponding part of the sequence. If the field complement = 1 then the result is the reverse-complement. sequence_type the default sequence's type(rna/dna..) sequence_name the default sequence name(ali_16s,..) Note: The commands above only work at the beginning of the ACI expression. checksum(options) calculates a CRC checksum options: "exclude=chrs" remove 'chrs' before calculation "toupper" make everything uppercase first gcgchecksum a gcg compatible checksum format_sequence(options) takes a long string ( sequence ) and breaks it into several lines option (default) description ============================================================= width=# (50) sequence line width firsttab=# (10) first line left indent tab=# (10) left indent (not first line) numleft (NO) numbers on the left side gap=# (10) insert a gap every # seq. characters. extract_sequence("chars",rel_len) like extract_words, but do not sort words, but rel_len is the minimum percentage of characters of a word that mach a character in 'chars' before word is taken. All words will be separated by white space. taxonomy([treename,] depth) Returns the taxonomy of the current species or group as defined by a tree. If 'treename' is specified, its used as tree, otherwise the 'default tree' is used (which in most cases is the tree displayed in the ARB_NT main window). 'depth' specifies how many "levels" of the taxonomy are used. FILTERING There are several functions to filter sequential data: - filter - diff - gc All these functions use the following COMMON OPTIONS to define what is used as filter sequence: - species=name Use species 'name' as filter. - SAI=name Use SAI 'name' as filter. - first=1 Use 1st input stream as filter for all other input streams. - pairwise=1 Use 1st input stream as filter for 2nd stream, 3rd stream as filter for 4th stream, and so on. - align=ali_name Use alignment 'ali_name' instead of current default alignment (only meaningful together with 'species' or 'SAI'). Note: Only one of the parameters 'species', 'SAI', 'first' or 'pairwise' may be used. diff(options) Calculates the difference between the filter (see common options above) and the input stream(s) and write the result to output stream(s). Additional options: - equal=x Character written to output if filter and stream are equal at a position (defaults to '.'). To copy the stream contents for equal columns, specify 'equal=' (directly followed by ',' or ')') - differ=y Character written to output if filter and stream don't match at one column position. Default is to copy the character from the stream. filter(options) Filters only specified columns out of the input stream(s). You need to specify either - exclude=xyz to use all columns, where the filter (see common options above) has none of the characters 'xyz' or - include=xyz to use only columns, where the filter has one of the characters 'xyz' All used columns are concatenated and written to the output stream(s). change(options) Randomly modifies the content of columns selected by the filter (see common options above). The options 'include=xyz' and 'exclude=xyz' work like with 'filter()', but here they select the columns to modify - all other columns get copied unmodified. How the selected columns are modified, is specified by the following parameters: - change=percent percentage of changed columns (default: silently change nothing, to make it more difficult for you to ignore this helpfile) - to=xy randomly change to one of the characters 'xy'. Hints: - Use 'xyy' to produce 33% 'x' and 66% 'y' - Use 'xxxxxxxxxy' to produce 90% 'x' and 10% 'y' - Use 'x' to replace all matching columns by 'x' I think the intention for this (long undocumented) command is to easily generate artificial sequences with different GC-content, in order to test treeing-software. SPECIALS exec(command,var1,...) Execute external (unix) command WARNING !!!!!! You should not use this command for NDS !!! because any slow command will disable all editing -> You never can remove this command from the NDS. Even arb_panic will not easily help you. command(escapedCommand) applies 'escapedCommand' to all input streams using - ACI, - SRT (if starts with ':') (see LINK{parser.hlp}) - or as REG (if starts with '/') (see LINK{regexpr.hlp}). In escapedCommand you have to escape '\' and '"' by preceding a '\'. If you nest calls you have to use multiple escapes (e.g. inside an export filter - which is in fact an SRT expression - you'll have to use double escapes). eval(escapedCommand) the 'escapedCommand' is evaluated (using an empty string as input) and the result is interpreted as command and gets applied to all input streams. Example: Said you have two numeric positions stored in database fields 'pos1' and 'pos2' for each species. Then the following command extracts the sequence data from pos1 to pos2: sequence|eval(" \"mid(\";readdb(pos1);\";\";readdb(pos2);\")\" ") How the example works: The argument to eval is the escaped version of the command '"mid(" ; readdb(pos1) ; ";" ; readdb(pos2) ; ")"'. If pos1 contains '10' and pos2 contains '20' that command evaluates to 'mid(10;20)'. The resulting ACI for the example species is 'sequence|eval("mid(10;20)")' which is equivalent to 'sequence|mid(10;20)'. define(name,escapedCommand) defines a ACI-macro 'name'. 'escapedCommand' contains an escaped ACI command sequence. This command sequence can be executed with do(name). do(name) applies a previously defined ACI-macro to all input streams (see 'define'). 'define' followed by 'do' works similar to 'command'. See embl.eft for an example using define and 'do' origin_organism(escapedCommand) origin_gene(escapedCommand) like command() but readdb() etc. reads all data from the origin organism/gene of a gene-species (not from the gene-species itself). This function applies only to gene-species! SECTION Future features statistic creates a character statistic of the sequence (not implemented yet) EXAMPLES sequence|format_sequence(firsttab=0;tab=10)|"SEQUENCE_";dd fetches the default sequence, formats it, and prepends 'SEQUENCE_'. sequence|remove(".-")|format_sequence get the default sequence, remove all '.-' and format it sequence|remove(".-")|len the number of non '.-' symbols (sequence length ) "[";taxonomy(tree_other,3);" -> ";taxonomy(3);"]" shows for each species how their taxonomy changed between "tree_other" and current tree equals(readdb(tmp),readdb(acc))|select(echo("tmp and acc differ"),) returns 'tmp and acc differ' if the content of the database fields 'tmp' and 'acc' differs. empty result otherwise. readdb(full_name)|icontains(bacillus)|compare(0)|select(echo(..),readdb(full_name)) returns the content of the 'full_name' database entry if it contains the substring 'bacillus'. Otherwise returns '..' BUGS The output of taxonomy() is not always instantly refreshed.