------------------------------------------------------

EMBL NUCLEOTIDE SEQUENCE DATABASE SUBMISSION FORM

HOW TO USE THIS FORM - PLEASE READ FIRST

1) WEBIN: THE WORLD WIDE WEB SUBMISSION TOOL
============================================
If you have access to the World Wide Web then DO NOT use this form. Use the
WebIn form on the World Wide Web at 

           ##############################################
           # http://www.ebi.ac.uk/submission/webin.html #
           ##############################################

If you do not have access to the World Wide Web then please use this form
and email it to DATASUBS@EBI.AC.UK. 

It is only necessary to submit to one database. Public data are exchanged
between EMBL, GenBank and DDBJ on a daily basis. 

2) MULTIPLE SUBMISSIONS
=======================
If you have more than one but less than 25 sequences to submit, copy this
form and send all the submissions together in one email with a note saying
how many sequences you are sending.

3) BULK SUBMISSIONS
===================
If you have more than 25 related sequences to submit DO NOT send them all
using this form. Instead email DATASUBS@EBI.AC.UK and include the following
information
a) how many sequences you are going to submit
b) a short explanation of how the sequences are related
c) what type of differences there are between the entries (e.g. isolate)
d) one completed email submission form as an example
You will be contacted by a curator who will create a template for you which
you should then use to submit all of the sequences.

4) UPDATES
==========
DO NOT use this form for submitting updates or corrections.
If you are sending an update please complete the update form available on
the web at: http://www.ebi.ac.uk/ebi_docs/update.html or get a copy of the
update form via anonymous FTP: 
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/update.doc
If you need help with updates contact UPDATE@EBI.AC.UK

5) PROTEIN SEQUENCES
====================
DO NOT use this form to submit protein sequences.
For submissions to the SWISS-PROT protein sequence databank access the
World Wide Web at http://www.ebi.ac.uk/ebi_docs/swissprot_db/swisshome.html
or email DATALIB@EBI.AC.UK

6) ACCESSION NUMBERS AND CONFIDENTIALITY
========================================
Your data can be made public immediately, or they can be kept confidential
until a release date which you provide. Confidential data are ALWAYS made
available to the public after publication.

If your data contain all the information we require we will assign unique
accession numbers within two working days. We will email you to tell you
the new accession numbers.

You should submit your sequence data BEFORE you have galley proofs. We
suggest that the following text be used to cite the accession number(s) in
publication(s): "The nucleotide sequence data reported in this paper will
appear in the DDBJ/EMBL/GenBank Nucleotide Sequence Database under the
accession number(s) ________"

7) FORM FILLING INSTRUCTIONS
============================

<============== DO NOT EXCEED THIS LINE WIDTH IN YOUR REPLY ==============>

To display this form properly choose a fixed width font (e.g. Courier) in
your editor. If you are saving files in a word processing program then
please save the file as TEXT ONLY WITH LINE BREAKS. (To do this in
Microscoft Word you will need to choose File, Save as, Save file type as,
and select Text only with line breaks). Please do not send files that are
saved in Word or Wordperfect format. Processing of the submission may be
delayed if your email is text wrapped, encoded or binhexed.

  ########################################################################
  # Fill in the form as follows:                                         #
  # a) if there is a colon : then enter text (e.g. Last name    : Smith) #
  # b) if there is an empty box [ ] and if the answer is yes then fill   #
  #    the box with an X (e.g. Genomic DNA     [X])                      #
  # c) if the option is not relevant then do not enter any text and/or   #
  #    do not write an X in the box.                                     #
  # d) DO NOT delete lines from this form.                               #
  ########################################################################

8) ENTERING FEATURES AND LOCATIONS
==================================
Enter the feature key from the list given in Appendix I at the end of this
document. Enter the locations, gene name, product name, and EC number,
where appropriate. Use < and > in the locations to show whether the feature
is partial at the 5' end and/or the 3' end. Mark with an X in the box [ ]
if the feature is on the complementary strand and if you have experimental
evidence for the feature.

If you do not provide any features or adequate locations and names for the
features you will be contacted for more information before an accession
number is assigned to the sequence. For CDS features you must provide a gene
name AND a product name, even if the product name is putative.

If a CDS is partial at the 5' end then write the codon start number. This
is the number (1,2 or 3) of the first base of the first complete codon of
the translation. For example the following CDS is partial and the codon
start is 2 because the first complete codon, T, starts with the base a,
which is the second base in the feature.
DNA         tacatcgatg...
Translation  T  S  M...

FEATURE EXAMPLE NO.1
Feature key           :CDS
>From                  :201
To                    :500
Gene name             :abcD
Product name          :ABC repressor protein
Codon start 1,2 or 3  :
EC number             :
Complementary strand  [ ]
Experimental evidence [X]

FEATURE EXAMPLE NO.2
Feature key           :rRNA
>From                  :<1
To                    :>1500
Gene name             :16S rRNA
Product name          :16S ribosomal RNA
Codon start 1,2 or 3  :
EC number             :
Complementary strand  [ ]
Experimental evidence [ ]

If you have further questions after reading this form please contact
DATASUBS@EBI.AC.UK

I.  CONFIDENTIAL STATUS

Enter an X if you want these data to be confidential    [ ] 
If confidential write the release date here :
(Date format DD-MMM-YYYY e.g. 30-JUN-1998)


II.  CONTACT INFORMATION

Last name            :$(LAST_NAME)            
First name           :$(FIRST_NAME)          
Middle initials      :
Department           :$(DEPT)
Institution          :$(INSTITUTION)
Address              :$(ADDRESS)
                     :
                     :
Country              :$(COUNTRY)
Telephone            :$(PHONE)
Fax                  :$(TELEFAX)
Email                :$(MAIL)                                                         


III. CITATION INFORMATION

Author 1             :$(author_1)
Author 2             :$(author_2)
Author 3             :$(author_3)
Author 4             :$(author_4)
Author 5             :$(author_5)
Author 6             :$(author_6)
Author 7             :$(author_7)
Author 8             :$(author_8)
Author 9             :$(author_9)
Author 10            :$(author_10)
Author 11            :$(author_11)
Author 12            :$(author_12)
(e.g. Smith A.B.)
(Copy line for extra authors)
Title                :$(title)
Journal              :$(journal)
Volume               :$(volume)
First page           :$(page_1)
Last page            :$(page_2)
Year                 :$(year_pub)
Institute (if thesis):

Publication status
Mark one of the following
In preparation       [ ]
Accepted             [x]
Published            [ ]
Thesis/Book          [ ]
No plans to publish  [ ]


IV. SEQUENCE INFORMATION

Sequence length (bp) :$(SEQ_LEN)

Molecule type
Mark one of the following
Genomic DNA          [ ]
cDNA to mRNA         [ ]
rRNA                 [x]
tRNA                 [ ]
Genomic RNA          [ ]
cDNA to genomic RNA  [ ]

Mark if either of these apply
Circular             [ ]
Checked for vector
contamination        [ ]


V. SOURCE INFORMATION

Organism             :$(full_name)
Sub species          :
Strain               :$(strain)
Cultivar             :
Variety              :
Isolate/individual   :
Developmental stage  :
Tissue type          :
Cell type            :
Cell line            :
Clone                :$(clone)
Clone (if >1)        :
Clone library        :
Chromosome           :
Map position         :
Haplotype            :
Natural host         :
Laboratory host      :
Macronuclear         [ ]

Mark one if immunoglobulin
or T cell receptor 
Germline             [ ]
Rearranged           [ ]

Mark one if viral
Proviral             [ ]
Virion               [ ]

Mark one if from an organelle
Chloroplast          [ ]
Mitochondrion        [ ]
Chromoplast          [ ]
Kinetoplast          [ ]
Cyanelle             [ ]
Plasmid (not clone)  [ ]

Further source information
(e.g. taxonomy, specimen voucher etc)
Note                 :$(tax)


VI. FEATURES OF THE SEQUENCE


YOU MUST DESCRIBE AT LEAST ONE FEATURE OF THE SEQUENCE OR THERE WILL BE A
DELAY IN THE PROCESSING OF YOUR SUBMISSION


Complete the block below for every feature you need to describe. If you
have more than one feature copy the block as many times as you require. For
help see 8) ENTERING FEATURES AND LOCATIONS above.

 
FEATURE NO.1
Feature key           :$(seq_type)
>From                  :$(start)
To                    :$(end)
Gene name             :$(gene)
Product name          :$(gene_prod)
Codon start 1,2 or 3  :
EC number             :
Complementary strand  [ ]
Experimental evidence [ ]


VII. SEQUENCE INFORMATION 

Enter the sequence data below
(IUPAC nucleotide base codes, Nucl. Acids Res. 13: 3021-3030, 1985)

BEGINNING OF SEQUENCE:
$(SEQUENCE)

END OF SEQUENCE


Include the translation for each CDS feature below.


BEGINNING OF TRANSLATION:


END OF TRANSLATION


---------------------------------------------------------------------------
These data will be shared among the following databases: DDBJ Database
(DNA Data Bank of Japan; Mishima, Japan); EMBL Nucleotide Sequence Database
(EBI, Cambridge, UK); GenBank (NCBI, Bethesda, USA); SWISS-PROT Protein
Sequence Database (Geneva, Switzerland and Heidelberg, FRG); International
Protein Information Database in Japan (JIPID; Noda, Japan) Martinsried
Institute For Protein Sequence Data (MIPS; Martinsried, FRG) National
Biomedical Research Foundation Protein Identification Resource (NBRF-PIR;
Washington, D.C., USA.)

EMBL Data Submissions                     E-mail    datasubs@ebi.ac.uk
European Bioinformatics Inst.             Telephone +44 (0)1223 494499
Hinxton Hall, Hinxton                     Telefax   +44 (0)1223 494472
Cambridge CB10 1SD, UK
---------------------------------------------------------------------------

































APPENDIX I FEATURE KEYS
=======================
A full description of features is found in the DDBJ/EMBL/GenBank Feature
Table Definition Document at
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/ftable.doc
and on the EBI website at
http://www.ebi.ac.uk/ebi_docs/embl_db/ft/feature_table.html
An abbreviated list of features keys is given below

C_region         constant region of immunoglobulin light and heavy chain,
                 and T-cell receptor alpha, beta and gamma chains
CAAT_signal      eukaryotic promoter element; consensus=GG(C or T)CAATCT
CDS              protein coding sequence (includes stop codon)
conflict         the "same" sequence reported by different laboratories
                 differ at this site or region        
D-segment        diversity segment of immunoglobulin heavy chain and
                 T-cell receptor beta-chain
enhancer         cis-acting enhancer of eukaryotic promoter function
exon             region that codes for part of spliced mRNA
GC_signal        eukaryotic promoter element; consensus=GGGCGG 
intron           transcribed region excised by mRNA splicing
J_segment        joining segment of immunoglobulin light and heavy chains, 
                 T-cell receptor alpha, beta and gamma-chains
LTR              long terminal repeat
mat_peptide      mature peptide coding region (does not include stop codon)
                 or signal peptide
misc_feature     region of biological interest which cannot be described
                 by any other known feature
mRNA             messenger RNA
mutation         a related strain has an abrupt, inheritable change in the
                 sequence
polyA_signal     polyadenylation signal recognition region
polyA_site       polyadenylation site to which adenine residues are added 
primer_bind      non-covalent primer binding site
promoter         promoter region involved in transcription initiation
protein_bind     non-covalent protein binding site on DNA or RNA
RBS              ribosome binding site
rep_origin       origin of replication 
repeat_region    region of genome containing repeating units
repeat_unit      single repeat element 
rRNA             ribosomal RNA
S_region         switch region of immunoglobulin heavy chains
satellite        many tandem repeats of a short basic repeating unit 
sig_peptide      signal peptide coding region
stem_loop        hair-pin loop structure in DNA or RNA
STS              sequence tagged site
TATA_signal      eukaryotic promoter element; consensus=TATA(A or T)A(A or T) 
terminator       transcription termination signal
transit_peptide  transit peptide coding region
tRNA             transfer RNA
V_region         variable region of immunoglobulin light and heavy chains, 
                 and T-cell receptor alpha, beta, and gamma chains
V_segment        variable segment of immunoglobulin light and heavy chains, 
                 and T-cell receptor alpha, beta, and gamma chains.
variation        a related strain contains stable mutations from the same
                 gene (e.g., RFLPs, polymorphisms) 
3'UTR            region at the 3' end of a mature transcript, following the
                 stop codon
5'UTR            region at the 5' end of a mature transcript, preceding the
                 initiation
-10_signal       prokaryotic promoter element, consensus=TAtAaT 
-35_signal       prokaryotic promoter element, consensus=TTGACa or TGTTGACA

(Last change: 08-DEC-1998) 
(Wendy Baker, EMBL nucleotide sequence database curator)





Agnes Leyen
EMBL Outstation -  The European Bioinformatics Institute
Wellcome Trust Genome Campus
Cambridge CB10 1SD
UK


DATASUBMISSIONS:
+44 1223 494499 
datasubs@ebi.ac.uk

UPDATES:
+44 1223 494499
updates@ebi.ac.uk

PERSONAL:
+44 1223 494411 
leyen@ebi.ac.uk