#Please insert up references in the next lines (line starts with keyword UP) UP arb.hlp UP glossary.hlp #Please insert subtopic references (line starts with keyword SUB) #SUB subtopic.hlp # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} #************* Title of helpfile !! and start of real helpfile ******** TITLE Searching OCCURRENCE ARB_NT/Species/Search and Query ARB_NT/Genome/Search and Query ARB_NT/Tree/Search groups.. DESCRIPTION This describes the search feature in ARB as used in the following search and query modules: * LINK{sp_search.hlp} * LINK{group_search.hlp} * LINK{gene_search.hlp} When we talk about 'items' below, we mean e.g. 'species', 'genes', 'taxonomic groups' etc., depending which search tool you are currently using. SECTION SEARCH FIELD Each search expression applies either - to a specific item field (e.g. 'full_name') or - to some criterion calculated on the fly (e.g. amount of marked species inside a taxonomic group) or - to any or all item fields, if you select one of the entries in "[...]". The following special search fields may be available: * "[any field]" reports a match if any direct field matches the expression. * "[all fields]" reports a match if all direct fields match the expression. * "[any recursive]" reports a match if any direct or hierarchical field matches the expression. * "[all recursive]" reports a match if all direct and hierarchical fields match the expression. Notes: * search is much slower using one of the 'recursive' fields mostly because sequence data is searched as well. * "[all fields]" is often used together with "not equal" (see below), making it equivalent to "no field matches expression". SECTION SEARCH OPERATORS There are two kinds of search operators directly available for queries: 1. the "equal" sign between the field and the match expression means that the selected field (or any field) should match the expression. Clicking on the sign inverts it into a "not equal" sign, which means the selected field shall not match the expression. 2. the search operators at the beginning of the 2nd and 3rd line allow to connect the 3 search expressions available for each query. Possible values are 'and', 'or' or 'ign'. - 'ign' stands for "ignore" (the rest of the line will be ignored) - selecting 'and' means the preceeding and the expression behind have to match - selecting 'or' means the preceeding or the expression behind have to match There is no operator precedence, i.e. - "1st and 2nd or 3rd" is interpreted as "(1st and 2nd) or 3rd" AND - "1st or 2nd and 3rd" is interpreted as "(1st or 2nd) and 3rd" More search operators are available to connect multiple (consecutive) queries: - using 'Add species' provides a global OR operator (uniting the results of the preceeding and the next query), - using 'Keep species' provides a global AND operator (intersecting the results of the preceeding and the next query) and - using "that don't match the q." provides a global NOT operator for the next query Results of queries can be transformed into a set of 'marked species' using "Mark listed unmark rest" and the marked species can be stored as LINK{species_configs.hlp}. Multiple stored configurations can be logically combined to new sets of marked species. To again create a query result from all marked species simply use "Search species ... that are marked". SECTION MATCH EXPRESSION - Each expression tries to match the complete field content (or the result of the underlaying calculation), i.e. searching for 'test' will match only fields which exactly contain 'test' (not 'my test' or 'testing'). - If you search for '' (empty expression), all fields w/o data, i.e. all non-existing fields will be found. - if you want to match all fields that contain some substring then use wildcards: - '*' will match any number of characters (including no characters). - '?' will match exactly one character If the whole search expression is '*', then it is handled like '?*' (which means 'at least one character'). That means searching for '*' will match any non-empty field. Examples: '*pseu*' matches all fields with the substring 'pseu' 'pyrococcus*' matches all fields starting with 'pyrococcus' '*bact*ther*' matches all fields with the substring 'bact' followed by 'ther' (there may be many characters in-between or none, i.e. it does match 'bactther' as well as 'Corynebacterium diphtheriae') - if the first character is '<' or '>' and the rest is a number, then a numerical comparison is performed: - '<7' matches all fields containing a number smaller than 7 - '>10' matches all fields containing a number greater than 10 Be careful: Negating '<7' does NOT only match numbers greater or equal to seven. It as well finds all non-numeric contents. Use something like '>6.999' instead. - if the first character is '/' then the following regular expression is used for the query (see LINK{reg.hlp}). - if the first character is '|' then the following ACI expression is evaluated and the query hits, if the evaluation is not "0". See LINK{aci.hlp}. - if the query string is completely empty, it hits if the selected field does not exist (or if a calculation produces no/empty result). SECTION SORTING RESULTS Search results are displayed unsorted by default. You can sort them, by selecting a different order with the sort radio button. The provided sort criteria depend on the kind of query. The following list shows the sort criteria available in LINK{sp_search.hlp}: unsorted display items like they are stored in database by value sort by content of first query field by number same as "by value", but sort numerically (for string-type fields this sorts multiple columns of numbers) by id sort by unique item id (e.g. 'name' for species) by parent sort by globally unique id of parent item (e.g. 'name' of organism for genes) by marked sort marked before unmarked items by hit sort by (and display) hit description (the hit description tells you why an item was hit by query) reverse reverts previously selected sort order ARB remembers and uses all the sort criteria you apply. Example: Selecting 'by id' will sort the items by their id (e.g. 'name'). If you select 'by value' afterwards, ARB will sort items by the content of the first query field - if the contents of some items are equal, it will still sort them by name. NOTES Wildcarded or exact search always searches case insensitive. Regular expression search always searches case sensitive. EXAMPLES see LINK{sp_search.hlp} WARNINGS Using ACI is a bit tricky here, cause you cannot see what happens. Using 'trace(1)' somewhere in the ACI expression starts to print an ACI trace to the console. To view the console refer to LINK{console.hlp}. BUGS No bugs known