#Please insert up references in the next lines (line starts with keyword UP) UP arb.hlp UP arb_ntree.hlp UP arb_import.hlp UP pfold.hlp UP glossary.hlp #Please insert subtopic references (line starts with keyword SUB) # Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain} #************* Title of helpfile !! and start of real helpfile ******** TITLE NOTES: dssp OCCURRENCE ARB_IMPORT DESCRIPTION See NOTES in LINK{arb_import.hlp} for HOWTO reactivate disabled import filters. The three filters 'dssp_all.ift', 'dssp_2nd_struct.ift' and 'dssp_sequence.ift' import protein secondary structure information and/or amino acid sequences from DSSP files. In addition, some of the associated information is extracted, too. The following fields are created (see also example below): - name: [PDB ID]_[Chain char] (extracted from 'HEADER' and the optional chain character in 'RESIDUE') - full_name: [PDB ID] (extracted from 'HEADER') Chain [Chain char] (extracted from the optional chain character in 'RESIDUE'); [Description] (extracted from 'HEADER' and 'COMPND') - tax: [Organism description] (extracted from 'SOURCE') - author: [Author(s)] (extracted from 'AUTHOR') - date: [Date] (extracted from 'HEADER') - remark: [Remark] (extracted from headline and 'REFERENCE') - ali_[alignment name]/data: [Amino acid sequence or secondary structure] (extracted from 'AA' or 'STRUCTURE') - sec_struct: [Secondary structure] (extracted from 'STRUCTURE') SECTION The DSSP code - H = alpha helix - B = residue in isolated beta-bridge - E = extended strand, participates in beta ladder - G = 3-helix (3/10 helix) - I = 5-helix (pi helix) - T = hydrogen bonded turn - S = bend NOTES - If a protein consists of several chains these are extracted individually and stored as different species. - The filter 'dssp_2nd_struct.ift' fills 'ali_[alignment name]/data' with the protein secondary structure and 'dssp_sequence.ift' as well as 'dssp_all.ift' fill it with the amino acid sequence. - The field 'sec_struct' is only used by the filter 'dssp_all.ift'. - Gaps-characters ('-') are inserted where no secondary structure is present. - The DSSP files are first piped through the script 'format_dssp.pl' (in "$ARBHOME/ARB/PERL_SCRIPTS/ARBTOOLS/IFTHELP") to format the files for use with the filters 'dssp_all2.ift2', 'dssp_2nd_struct2.ift2' and 'dssp_sequence2.ift2'. - Reference to DSSP can be found in LINK{pfold.hlp} in section 'REFERENCES' [2]. EXAMPLES The DSSP format looks like this: ==== Secondary Structure Definition by the program DSSP, updated CMBI version by ElmK / April 1,2000 ==== DATE=27-JUN-2003 . REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637 . HEADER RNA BINDING PROTEIN 22-NOV-99 1DG1 . COMPND 2 MOLECULE: ELONGATION FACTOR TU; . SOURCE 2 ORGANISM_SCIENTIFIC: ESCHERICHIA COLI; . AUTHOR K.ABEL,M.YODER,R.HILGENFELD,F.JURNAK . ... ... # RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA 1 9 G K 0 0 143 0, 0.0 65,-0.2 0, 0.0 64,-0.1 0.000 360.0 360.0 360.0 143.2 13.7 48.3 -15.2 2 10 G P - 0 0 38 0, 0.0 65,-2.6 0, 0.0 2,-0.6 -0.404 360.0-137.6 -64.4 148.4 12.2 51.7 -14.1 3 11 G H E -a 67 0A 88 63,-0.2 2,-0.3 191,-0.1 65,-0.2 -0.949 24.0-180.0-114.6 117.8 10.1 51.4 -10.9 4 12 G V E -a 68 0A 0 63,-2.0 65,-2.3 -2,-0.6 2,-0.5 -0.855 18.2-141.7-116.0 149.5 6.8 53.4 -10.8 5 13 G N E +a 69 0A 36 -2,-0.3 86,-2.5 63,-0.2 87,-1.2 -0.949 31.8 154.0-113.1 127.7 4.2 53.5 -8.0 6 14 G V E -ab 70 92A 0 63,-2.5 65,-2.0 -2,-0.5 2,-0.3 -0.820 16.5-171.5-139.8-179.7 0.5 53.6 -8.8 7 15 G G E -ab 71 93A 0 85,-0.5 87,-2.2 63,-0.3 2,-0.3 -0.969 28.1-103.8-167.9 164.5 -2.7 52.6 -7.2 8 16 G T E + b 0 94A 0 63,-1.9 65,-0.4 -2,-0.3 2,-0.3 -0.735 37.7 175.7 -97.6 147.4 -6.4 52.2 -7.9 9 17 G I E + b 0 95A 3 85,-1.9 87,-2.6 -2,-0.3 2,-0.2 -0.962 21.1 98.6-147.2 156.0 -8.8 54.9 -6.7 10 18 G G - 0 0 0 -2,-0.3 87,-0.1 85,-0.2 97,-0.1 -0.829 69.1 -41.6 148.3 177.9 -12.6 55.4 -7.1 11 19 G H S > S- 0 0 35 85,-0.4 3,-1.4 -2,-0.2 5,-0.3 -0.330 72.6 -81.6 -74.1 153.0 -15.9 54.9 -5.4 12 20 G V T 3 S+ 0 0 58 1,-0.2 -1,-0.1 2,-0.1 94,-0.1 -0.094 111.7 15.7 -52.8 150.0 -16.8 51.7 -3.5 13 21 G D T 3 S+ 0 0 145 1,-0.1 -1,-0.2 -3,-0.1 -2,-0.1 0.575 91.3 115.4 59.3 12.4 -17.9 48.6 -5.4 14 22 G H S < S- 0 0 4 -3,-1.4 -2,-0.1 82,-0.1 85,-0.1 0.789 92.2 -97.1 -80.7 -25.6 -16.7 50.1 -8.6 15 23 G G S > S+ 0 0 11 -4,-0.2 4,-2.5 81,-0.1 5,-0.2 0.577 75.0 138.4 123.2 16.6 -14.0 47.5 -9.1 16 24 G K H > S+ 0 0 12 -5,-0.3 4,-2.7 1,-0.2 5,-0.1 0.931 82.3 40.1 -55.3 -48.3 -10.7 48.7 -7.8 17 25 G T H > S+ 0 0 17 2,-0.2 4,-2.3 1,-0.2 -1,-0.2 0.887 114.0 51.1 -70.8 -40.4 -9.8 45.4 -6.2 18 26 G T H > S+ 0 0 30 2,-0.2 4,-2.4 1,-0.2 -1,-0.2 0.899 113.8 47.6 -64.1 -39.4 -11.1 43.1 -9.0 19 27 G L H X S+ 0 0 0 -4,-2.5 4,-2.6 2,-0.2 -2,-0.2 0.955 107.8 53.8 -66.2 -49.6 -9.1 ... ... The extracted ARB database entry looks like this (for alignment with the name 'ali_prot' and imported with 'dssp_all.ift'): name S6: 1DG1_G full_name S0: 1DG1 Chain G; RNA BINDING PROTEIN; MOLECULE: ELONGATION FACTOR TU tax S0: ORGANISM_SCIENTIFIC: ESCHERICHIA COLI author S0: K.ABEL,M.YODER,R.HILGENFELD,F.JURNAK date S0: 22-NOV-99 ali_prot %0: ali_prot/data S0: KPHVNVGTIGHVDHGKTTL... sec_struct S0: --EEEEEEE-STTSSHHHH... remark S0: === Secondary Structure Definition by the program DSSP, updated CMBI version by ElmK / April 1,2000 ==== DATE=22-FEB-2008 DSSP program by: W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637 ... ... WARNINGS None BUGS No bugs known