#Please insert up references in the next lines (line starts with keyword UP)
UP	arb.hlp
UP	arb_ntree.hlp
UP	e4.hlp
UP	arb_edit4.hlp
UP	glossary.hlp

#Please insert subtopic references  (line starts with keyword SUB)

# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}

#************* Title of helpfile !! and start of real strunk ********
TITLE		Protein Alignments

OCCURRENCE	ARB_EDIT4
		ARB_NTREE

DESCRIPTION

	Protein gene sequences and (predicted) protein primary structures (= amino
	acid sequences) as well as protein secondary structures can be stored in the
	ARB database and protein alignments can be created. Using import filters
	amino acid sequences and/or protein secondary structures can be imported from
	DSSP files. Refer to LINK{arb_import.hlp} and especially LINK{dssp_ift.hlp}
	for information on how this is done, please. Description of the DSSP code
	and format as well as an example file can be found there, too.
	
	Once a protein secondary structure is present as species in the database it
	can be converted to an SAI (see LINK{sp_sp_2_ext.hlp}) to use it as reference
	for comparing other protein secondary structures or amino acid sequences. SAIs
	can be created from the protein secondary structure information in a special
	field named 'sec_struct', too (see LINK{pfold_sai.hlp}). This is useful, if
	one has a protein secondary structure aligned along with the amino acid
	sequence.
	
	An approach for visualizing matches between protein structures has been
	incorporated in ARB. The match computation for sequences and secondary
	structures is based on the Chou-Fasman algorithm (see below) or adaptions
	to it and depends on the used match method. The match methods are described
	in detail in LINK{pfold_props.hlp} along with all other related settings that
	can be configured via the 'Properties' menu.

SECTION Overview of the Chou-Fasman Algorithm

	The Chou-Fasman algorithm is a statistical method for predicting a protein
	secondary structure from its amino acid sequence. It is based on the fact
	that certain amino acids tend to form or break alpha-helices ('H'),
	beta-sheets ('E') and beta-turns ('T'). The experimentally obtained
	Chou-Fasman parameters (former and breaker values) are used to predict the
	possible occurrence of the individual structure types which can then be
	merged to create a secondary structure summary. Further information on how
	this approach is used for protein structure match computation can be found
	in LINK{pfold_props.hlp} in section 'Description of Match Methods'.

SECTION REFERENCES

	[1] Chou-Fasman Algorithm
	
		Details on the Chou-Fasman algorithm can be found in the original
		paper: "Chou, P. and Fasman, G. (1978). Prediction of the secondary
		structure of proteins from their amino acid sequence. Advanced
		Enzymology, 47, 45-148.".

	[2] DSSP
	
		The DSSP program was developed to standardize secondary structure
		assignment. It assigns protein secondary structures to amino acid
		sequences from the amino acids' crystallographic atom coordinates
		as specified by protein entries in the Protein Data Bank (PDB). The
		program can be found on the web at
		"LINK{http://swift.cmbi.ru.nl/gv/dssp/}". Details on the algorithm
		can be found in "Kabsch, W. and Sander, C. (1983). Dictionary of
		protein secondary structure: pattern recognition of hydrogen-bonded
		and geometrical features. Biopolymers, 22 (12), 2577-2637.
		PMID: 6667333; UI: 84128824."

NOTES

	The used method for protein secondary structure prediction, i.e. the Chou-Faman
	algorithm, is fast which was the main reason for choosing it. Performance is
	important for a large number of sequences loaded in the editor. However, it
	is not very accurate and should only be used as rough estimation. Thus, the
	match computation can only give an approximate overview if a given amino acid
	sequence matches a certain secondary structure.

EXAMPLES 	None

WARNINGS        Protein secondary structure in the field 'sec_struct' is not aligned
                automatically with the sequence (yet). It has to be aligned manually!

BUGS            The editor might be unstable and may crash if sequences are not formatted.