#Please insert up references in the next lines (line starts with keyword UP) UP arb.hlp UP glossary.hlp #Please insert subtopic references (line starts with keyword SUB) #SUB subtopic.hlp # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} #************* Title of helpfile !! and start of real helpfile ******** TITLE Chimera check OCCURRENCE ARB_NT/Sequence/Chimera check DESCRIPTION Takes sequences, a tree and a column statistic as input, and generates a short sequence quality output string, which will be stored into the database under a user defined key. First the sequences are split into different slices: - 2 pieces (front and back half) - 5 equally sized pieces - user defined pieces The programs sums up the weighted mutations for each sequence slice using a maximum likelihood technique. For each slice a students t-test (see LINK{http://en.wikipedia.org/wiki/T-test}) is performed and its result is written into the XXX portions of the entries mentioned below. The t-test tests whether the likelihood of a specific sequence slice (of one species) follows a t-distribution of the likelihoods for that sequence slice in all examined species. The meaning of each X contains the result of the t-test (the "t-value") as follows: * if the t-test succeeds the value of X is '1' up to '8' (where '5' is shown as '-'). * if the t-test fails '0' or '9' is written to the X's * if there is not enough data to perform the t-test, '.' is written to the X's Rule of thumb: Values near '0' or '9' indicate regions with an abnormal, values near '5' ('-') regions with a normal (i.e. expectable) number of weighted mutations. The sequence quality string written into a user-definable species field has the following format: MED SUM aXX bXXXXX cXXXXX...XXXX where: * MED is the median of all t-values (0.0 = normal; <5.0 = succeeds t-test (mean); >5.0 = abnormal) * SUM is the sum of all t-values * aXX shows the quality for 2 pieces * bXXXXX shows the quality for 5 pieces * cXXXXX...XXXX shows the quality for user defined slices Optionally a 'quality' entry may be written to the alignment, allowing to display it in EDIT4 below the sequence. That quality entry simply is a "blown up" version of the "cXXXXX...XXXX" part of the sequence quality field. If 'Keep existing reports' is unchecked, the selected 'species field' and the quality entry in the alignment are removed from ALL species when running a (new) chimera check. By clicking the 'Remove them now' button they will be removed w/o running a new check. Make sure the selected 'species field' is correct! NOTES Only sequences which are in the tree are used. Slices in high variance regions more easily pass the t-test. Slices from sequences with higher overall variance more easily pass the t-test. WARNINGS Needs really a lot of memory! BUGS None