#Please insert up references in the next lines (line starts with keyword UP) UP arb.hlp UP arb_ntree.hlp UP e4.hlp UP arb_edit4.hlp UP glossary.hlp #Please insert subtopic references (line starts with keyword SUB) # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} #************* Title of helpfile !! and start of real strunk ******** TITLE Protein Alignments OCCURRENCE ARB_EDIT4 ARB_NTREE DESCRIPTION Protein gene sequences and (predicted) protein primary structures (= amino acid sequences) as well as protein secondary structures can be stored in the ARB database and protein alignments can be created. Using import filters amino acid sequences and/or protein secondary structures can be imported from DSSP files. Refer to LINK{arb_import.hlp} and especially LINK{dssp_ift.hlp} for information on how this is done, please. Description of the DSSP code and format as well as an example file can be found there, too. Once a protein secondary structure is present as species in the database it can be converted to an SAI (see LINK{sp_sp_2_ext.hlp}) to use it as reference for comparing other protein secondary structures or amino acid sequences. SAIs can be created from the protein secondary structure information in a special field named 'sec_struct', too (see LINK{pfold_sai.hlp}). This is useful, if one has a protein secondary structure aligned along with the amino acid sequence. An approach for visualizing matches between protein structures has been incorporated in ARB. The match computation for sequences and secondary structures is based on the Chou-Fasman algorithm (see below) or adaptions to it and depends on the used match method. The match methods are described in detail in LINK{pfold_props.hlp} along with all other related settings that can be configured via the 'Properties' menu. SECTION Overview of the Chou-Fasman Algorithm The Chou-Fasman algorithm is a statistical method for predicting a protein secondary structure from its amino acid sequence. It is based on the fact that certain amino acids tend to form or break alpha-helices ('H'), beta-sheets ('E') and beta-turns ('T'). The experimentally obtained Chou-Fasman parameters (former and breaker values) are used to predict the possible occurrence of the individual structure types which can then be merged to create a secondary structure summary. Further information on how this approach is used for protein structure match computation can be found in LINK{pfold_props.hlp} in section 'Description of Match Methods'. SECTION REFERENCES [1] Chou-Fasman Algorithm Details on the Chou-Fasman algorithm can be found in the original paper: "Chou, P. and Fasman, G. (1978). Prediction of the secondary structure of proteins from their amino acid sequence. Advanced Enzymology, 47, 45-148.". [2] DSSP The DSSP program was developed to standardize secondary structure assignment. It assigns protein secondary structures to amino acid sequences from the amino acids' crystallographic atom coordinates as specified by protein entries in the Protein Data Bank (PDB). The program can be found on the web at "LINK{http://swift.cmbi.ru.nl/gv/dssp/}". Details on the algorithm can be found in "Kabsch, W. and Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22 (12), 2577-2637. PMID: 6667333; UI: 84128824." NOTES The used method for protein secondary structure prediction, i.e. the Chou-Faman algorithm, is fast which was the main reason for choosing it. Performance is important for a large number of sequences loaded in the editor. However, it is not very accurate and should only be used as rough estimation. Thus, the match computation can only give an approximate overview if a given amino acid sequence matches a certain secondary structure. EXAMPLES None WARNINGS Protein secondary structure in the field 'sec_struct' is not aligned automatically with the sequence (yet). It has to be aligned manually! BUGS The editor might be unstable and may crash if sequences are not formatted.