UP	arb.hlp
UP	glossary.hlp
UP	save.hlp

SUB	arb_edit.hlp
SUB	ale.hlp

TITLE		GDE Interface and Editor

DESCRIPTION	Starts the GDE Editor designed by Steven Smith.
		See next chapter of this text for the original help text.
		As GDE originally used its own built-in database, it had to be
		slightly modified to run under ARB. So

		**** READ THE WARNINGS/BUGS CAREFULLY ****

WARNINGS	As soon as you start GDE, it creates a copy of the selected
		sequences. That means that you may change the sequences
		with either GDE or ARB, but not both. Therefore, if you have started
		GDE, do nothing but sequence editing in GDE till you quit GDE.
		To really save sequences to disc, you have to send the sequence
		changes to ARB and then use ARB to save the ARB database.


BUGS		Many functions, especially

                                        -deleting,
					-moving,
					-duplicating,
					-creating,
					-importing,

		species do not work correctly.


        ********* Part of the Original GDE HELPTEXT ******************

SECTION Introduction

        The Genetic Data Environment is part of a growing
        set of programs for manipulating and analyzing
        "genetic" data. It differs in design from other
        analysis programs in that it is intended to be an
        expandable and customizable system, while still
        being easy to use.

        There are a tremendous number of publicly available
        programs for sequence analysis. Many of these
        programs have found their way into commercial
        packages which incorporate them into integrated,
        easy to use systems. The goal of the GDE is to
        minimize the amount of effort required to integrate
        sequence analysis functions into a common
        environment. The GDE takes care of the user
        interface issues, and allows the programer to
        concentrate on the analysis itself. Existing programs
        can be tied into the GDE in a matter of hours (or
        minutes) as apposed to days or weeks. Programs
        may be written in any language, and still seamlessly
        be incorporated into the GDE.

        These programs are, and will continue to be,
        available at no charge. It is the hope that this
        system will grow in functionality as more and more
        people see the benefits of a modular analysis
        environment. Users are encouraged to make
        modifications to the system, and forward all changes
        and additions to Steven Smith at
        smith@bioimage.millipore.com.

SECTION What's New for this Release

        GDE 2.2 represents a maintainence release. Several
        small bugs have been fixed, as well as new editing
        features and user interface elements. Also, I have
        tried to update all of the contributed external
        programs to their latest release. Updated programs
        include:

                - Phylip
                - Treetool
                - LoopTool
                - Readseq
                - Blast
                - Fasta

        Improved versions of printing, and translate are
        included as well. As for new editing features, a
        useful "yanking" feature has been added by Scott
        Ferguson from Exxon Research, and the capability
        to export the colormap for a seqeunce (see
        appendicies A/C). Among the bugs fixed in this
        release are:

        Selection mask problems when exporting to
        Genbank (fixed in 2.1)
        Memory leaks (fixed in 2.1)
        Correct handling of circular sequences
        More liberal interpretation of Genbank formatted
        files. (not column dependent)


SECTION System Requirements

        GDE 2.2 currently runs on the Sun family of
        workstations. This includes the Sun3 and Sun4
        (Sparcstation) systems. It was written in XView,
        and runs on Suns using OpenWindows 3.0 or MIT's
        X Windows. It runs in both monochrome, and color,
        and can be run remotely on any system capable of
        running X Windows Release 4. You should have at
        least 15 meg of free disk space available. The binay
        release for SparcStations was compiled under
        SunOS 4.1.2 and Openwindows 3.0.

        We are also supporting a DECStation version of
        GDE. This is running under XView 3.0/X11R5. We
        encourage interested people to port the programs to
        their favorite Unix platform. There are informal
        ports to the SGI line of unix machines.

SECTION Note to Motif users

        GDE2.2 can be run using different window
        managers. The most common alternative to olwm is
        the Motif window manager (mwm). The only
        problem in using another window manager is that
        the status line is not displayed. We have added a
        "Message panel" as an option under "File-
        >Properties" which displays all of the information
        contained on the status line.

        People using other window managers may also
        prefer using xterm, and xedit as default terminals and
        file editors. This can be accomplished by replacing
        all occurrences of 'shelltool' and 'textedit' with
        'xterm -e' and 'xedit' in the
        $GDE_HELP_DIR/.GDEmenus file.


        FastA and Blast need to have the properly formatted
        databases installed in the $GDE_HELP_DIR under
        the directories FASTA/PIR, FASTA/GENBANK,
        BLAST/pir BLAST/genbank. For FASTA, simply
        copy a version of PIR and Genbank into the proper
        directory. Alternately, the PIR and GENBANK
        files can be symbolic links to copies of Genbank
        held elsewhere on your system. You may need to
        look at the .GDEmenus file in $GDE_HELP_DIR to
        verify that you are using the same divisions for
        these databases.

        Blast installation involves converting PIR and
        GENBANK to a temporary FASTA format (using
        pir2fasta and gb2fasta) and then using pressdb for
        nucleic acid, and setdb for amino acid to reformat the
        databases again into blast format. The .GDEmenus
        file is currently set up to search with blast using the
        following databases: pir, genpept, genupdate, and
        genbank. If you wish to divide these into
        subdivisions, then the .GDEmenus file will have to
        be edited.

        The most up to date release of blast can be obtained
        via anonymous ftp to ncbi.nlm.nih.gov. The most
        recent release of FASTA can be obtained via
        anonymous ftp to uvaarpa.virginia.edu. It is
        strongly recommended that you retrieve these copies,
        and become familiar with their setup.

SECTION Using the GDE

        It is assumed that the user is familiar with the Unix,
        and OpenWindows/Xwindows environments. It is
        also assumed that people running standard MIT X-
        Windows will be using the OpenLook window
        manager (olwm). Other window managers work
        with varied success. If you are not certain as to how
        your system is set up, please contact your systems
        administrator.


        The GDE uses a menu description language to
        define what external programs it can call, and what
        parameters and data to pass to each function. This
        language allows users to customize their own
        environment to suite individual needs.

        The following is how the GDE handles external
        programs when selected from a menu:

        Each step in this process is described in a file
        .GDEmenus in the user's current or home directory.

        The language used in this file describes three phases
        to an external function call. The first phase
        describes the menu item as it will appear, and the
        Unix command line that is actually run when it is
        selected. The second phase describes how to prompt
        for the parameters needed by the function. The third
        phase describes what data needs to be passed as
        input to the external function, and what data (if any)
        needs to be read back from its output.

        The form of the language is a simple keyword/value
        list delimited by the colon (:) character. The
        language retains old values until new ones are set.
        For example, setting the menu name is done once for
        all items in that menu, and is only reset when the
        next menu is reached.

        The keywords for phase one are:

                menu:menu name      Name of current menu
                item:item name      Name of current menu item
                itemmeta:meta_key   Meta key equivalence (quick keys)
                itemhelp:help_file  Help file (either full path, or in GDE_HELP_DIR)
                itemmethod:         Unix command

        The item method command is a bit more involved, it
        is the Unix command that will actually run the
        external program intended. It is one line long, and
        can be up to 256 characters in length. It can have
        embedded variable names (starting with a '$') that
        will be replaced with appropriate values later on. It
        can consist of multiple Unix commands separated by
        semi-colons (;), and may contain shell scripts and
        background processes as well as simple command
        names. Examples will be given later.

        The keywords for phase two are:

            arg:argument_variable_name

            		Name of this variable. It will appear
            		in the itemmethod: line with a dollar
            		sign ($) in front of it.

            argtype:slider,chooser,choice_menu or text

            		The type of graphic object
            		representing this argument.

            arglabel:descriptive label

            		A short description of what this
            		argument represents

            argmin:minimum_value (integer)

            		Used for sliders.

            argmax:maximum_value (integer)

            		Used for sliders.

            argvalue:default_value (integer)

            		It is the numeric value associated with
            		sliders or the default choice in
            		choosers, choice_menus, and choice_lists
            		(the first choice is 0, the second is 1 etc.)

            argtext:default value

            		Used for text fields.

            argchoice:displayed value:passed value

            		Used for choosers and
            		choice_menus. The first value is
            		displayed on screen, and the second
            		value is passed to the itemmethod
            		line.

        The keywords for phase three are as follows:

            in:input_file

            		GDE will replace this name with a
            		randomly generated temporary file
            		name. It will then write the selected
            		data out to this file.

            informat:file_format

            		Write data to this file for input to
            		this function. Currently support
            		values are Genbank, and flat.

            inmask:

            		This data can be controlled by a
            		selection mask.

            insave:

            		Do not remove this file after running
            		the external function. This is useful
            		for functions put in the background.

            out:output_file

            		GDE will replace this name with a
            		randomly generated temporary file
            		name. It is up to the external function
            		to fill this file with any results that
            		might be read back into the GDE.

            outformat:file_format

            		The data in the output file will be in
            		this format. Currently support
            		values are colormask, Genbank, and
            		flat.

            outsave:

            		Do not remove this file after reading.
            		This is useful for background tasks.

            outoverwrite:

            		Overwrite existing sequences in the current
            		GDE window. Currently supported with
            		"gde" format only.


        Here is a sample dialog box, and it's entry in the
        .GDEmenus file:

        Using the default parameters given in the dialog
        box, the executed Unix command line would be:

             (tr '[a-z]' '[A-Z]' < .gde_001 >.gde_001.tmp ; mv .gde_001.tmp CAPS ; gde CAPS -Wx medium ; rm .gde_001 ) &

        where .gde_001 is the name of the temporary file
        generated by the GDE which contains the selected
        sequences in flat file format. Since the GDE runs
        this command in the background ('&' at the end) it
        is necessary to specify the insave: line, and to
        remove all temporary files manually. There is no
        output file specific because the data is not loaded
        back into the current GDE window, but rather a new
        GDE window is opened on the file. A simpler
        command that reloads the data after conversion
        might be:

              item:          All caps
              itemmethod:    tr '[a-z]' '[A-Z]' <INPUT > OUTPUT
              in:            INPUT
              informat:      flat
              out:           OUTPUT
              outformat:     flat

        In this example, no arguments are specified, and so
        no dialog box will appear. The command is not run
        in the background, so the GDE can clean up after
        itself automatically. The converted sequence is
        automatically loaded back into the current GDE
        window.

        In general, the easiest type of program to integrate
        into the GDE is a program completely driven from a
        Unix command line. Interactive programs can be
        tied in (MFOLD for example), however shell scripts
        must be used to drive the parameter entry for these
        programs. Programs of the form:

                program_name -a1 argument1 -a2 arguement2 -f inputfile -er errorfile > outputfile

        can be specified in the .GDEmenus file directly. As
        this is the general form of most one Unix commands,
        these tend to be simpler to implement under the
        GDE.

        As functions grow in complexity, they may begin to
        need a user interface of their own. In these cases, the
        command line calling arguments are still necessary
        in order to allow the GDE to hand them the
        appropriate data, and possible retrieve results after
        some external manipulation.


SECTION Appendix C, External functions

    ClustalV - Cluster multiple sequence alignment

        Author: Des Higgins.

        Reference: Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991)

        CLUSTAL V: improved software for multiple sequence alignment. ms. submitted to CABIOS

        Parameters:

        	k-tuple pairwise search	Word size for pairwise comparisons
        	Window size		Smaller values give faster alignments,
        				larger values are more sensitive.
        	Transitions weighted	Can weight transitions twice as high as
                                        transversions (DNA only).
        	Fixed gap penalty	Gap insertion penalty, lower value, more gaps
        	Floating gap penalty	Gap extension penalty, lower value, longer gaps


        Comments:

        		ClustalV is a directed multiple sequence alignment algorithm that
        		aligns a set of sequences based on their level of similarity. It first
        		uses a Lipman Peasron pairwise similarity scoring to find "clusters"
        		of similar sequences, and pre-aligns those sequences. It then adds
        		other sequences to the alignment in the order of their similarity so as
        		to produce the cleanest alignment.

        Warning:

                        ClustalV only uses unambiguous character codes. It will also
        		convert all sequences to upper case in the process of aligning. Clustal
        		does not pass back comments, author etc. Be sure to keep copies of your
        		sequences if you do not wish to lose this information.


    MFOLD - RNA secondary prediction

        Author: Michael Zuker

        Reference:

                        M. Zuker
        		On Finding All Suboptimal Foldings of an RNA Molecule.
        		Science, 244, 48-52, (1989)

        		J. A. Jaeger, D. H. Turner and M. Zuker
        		Improved Predictions of Secondary Structures for RNA.
        		Proc. Natl. Acad. Sci. USA, BIOCHEMISTRY, 86, 7706-7710, (1989)

        		J. A. Jaeger, D. H. Turner and M. Zuker
        		Predicting Optimal and Suboptimal Secondary Structure for RNA.
        		in "Molecular Evolution: Computer Analysis of Protein and
        		Nucleic Acid Sequences", R. F. Doolittle ed.
        		Methods in Enzymology, 183, 281-306 (1989)

        Parameters:

        		Linear/circular RNA fold

        		ct File to save results

        Comments:

        		MFOLD passes it's output to a program Zuk_to_gen that translates the secondary
        		structure prediction to a nested bracket ([]) notation.
                        This notation can then be used in the Highlight Helix, and Draw
                        Secondary structure (LoopTool) functions.

        		MFOLD currently does not support much in the way of additional parameters.
        		We hope to have all additional parameters available soon.


    Blast - Basic Local Alignment Search Tool

        Reference:

        		Karlin, Samuel and Stephen F. Altschul (1990). Methods for
        		assessing the statistical significance of molecular sequence
        		features by using general scoring schemes, Proc. Natl. Acad.
        		Sci. USA 87:2264-2268.

             		Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W.
             		Myers, and David J. Lipman (1990). Basic local alignment
             		search tool, J. Mol. Biol. 215:403-410.

           		Altschul, Stephen F. (1991). Amino acid substitution
           		matrices from an information theoretic perspective. J. Mol.
             		Biol. 219:555-565.


        Parameters:

        		Which Database		Which nucleic or amino acid database
        					to search.

        		Word Size		Length of initial hit. after locating a match of
        					this length, alignment extension is attempted. Blastn
        		Match score		Score for matches in secondary alignment extension
        		Mismatch score		Score for mismatches in secondary alignment extension
                        Blastx, tblastn, blastp, blast3
        		Substitution Matrix PAM120 or PAM250


        Comments:

                        The report is loaded into a text editor. This should be saved as a new file
                        as the default file is removed after execution. The latest version of blast
                        can be obtained via anonymous ftp to ncbi.nlm.nih.gov.

    FastA - Similarity search

        	Reference:

        		W. R. Pearson and D. J. Lipman (1988),
        		"Improved Tools for Biological Sequence Analysis", PNAS 85:2444-2448

        		W. R. Pearson (1990) "Rapid and Sensitive Sequence
        		Comparison with FASTP and FASTA" Methods in Enzymology 183:63-98

        	Parameters:

        		Database        Which database to search
        		Number of alignments to report
        		SMATRIX         Which similarity matrix to use


        	Comments:

          		The FastA package includes several additional programs for pairwise alignment.
        		We have only included a bare bones link to FastA. We hope to include a more
        		complete setup for the actual 2.2 release.


    Assemble Contigs - CAP Contig Assembly Program

        	Author

                        Xiaoqiu Huang
        		Department of Computer Science
        		Michigan Technological University
        		Houghton, MI 49931
        		E-mail: huang@cs.mtu.edu

        		Minor modifications for I/O by S. Smith

        	Reference

        		"A Contig Assembly Program Based on Sensitive Detection of
         		Fragment Overlaps" (submitted to Genomics, 1991)

        	Parameters:

        		Minimum overlap                 Number of bases required for overlap
        		Percent match within overlap    Percentage match required in the overlap
                                                        region before merge is alowwed.

        	Comments:

        		CAP returns the aligned sequences to the current editor window. The sequences are
        		placed into contigs by setting the groupid. Cap does not change the order of the
        		sequences, and so the results should be sorted by group and offset (see sort under
                        the Edit menu).


    Lsadt - Least squares additive tree analysis

        Author:

                Geert De Soete,
                'C' implementation by Mike Maciukenas,
                University of Illinois

        Reference:

                LSADT, 1983 Psychometrika, 1984,
                Quality and Quantity

        Parameters:

        		Distance correction to use in distance matrix calculations (see count below).
        		What should be used for initial parameters estimates.
        		Random number seed.
        		Display method (See TreeTool below).

        Comments:

        		The program has been rewritten in 'C' and will be included with the rRNA Database
        		phylogenetic package being written at the University of Illinois Department of
        		Microbiology.

        		Count is a short program to calculate a distance matrix from a sequence
        		alignment (see below).


    Count - Distance matrix calculator

        Author: Steven Smith

        Parameters:

        		Correction method Currently Jukes-Cantor or none,
        		Include dashed columns,
        		Match upper case to lower


        Comments:

        		Passes back a distance matrix in a format readable by LSADT.


    Treetool - Tree drawing/manipulation

        Author: Michael Maciukenas, University of Illinois

        Comments: See included documentation for TreeTool usage.


    Readseq - format conversion program

        Author: Don Gilbert

        Parameters: Many, but can easily be run in interactive mdoe.

        Comments:

        		Readseq is a very useful program for format conversion.
                        The latest versionsupports over a dozen different file formats, as
                        well as formating capabilities for publication. GDE makes of Readseq
                        for importing and exporting seqeuences as well as a filtering tool to
                        some external functions.


SECTION Copyright Notice

        The Genetic Data Environment (GDE) software and
        documentation are not in the public domain.
        Portions of this code are owned and copyrighted by
        the The Board of Trustees of the University of
        Illinois and by Steven Smith. External functions
        used by GDE are the proporty of, their respective
        authors. This release of the GDE program and
        documentation may not be sold, or incorporated into
        a commercial product, in whole or in part without
        the expressed written consent of the University of
        Illinois and of its author, Steven Smith.

        All interested parties may redistribute the GDE as
        long as all copies are accompanied by this
        documentation, and all copyright notices remain
        intact. Parties interested in redistribution must do
        so on a non-profit basis, charging only for cost of
        media. Modifications to the GDE core editor should
        be forwarded to the author Steven Smith. External
        programs used by the GDE are copyright by, and are
        the property of their respective authors unless
        otherwise stated.


        While all attempts have been made to insure the
        integrity of these programs:

SECTION Disclaimer

        THE UNIVERSITY OF ILLINOIS, HARVARD
        UNIVERSITY AND THE AUTHOR, STEVEN
        SMITH GIVE NO WARRANTIES, EXPRESSED
        OR IMPLIED FOR THE SOFTWARE AND
        DOCUMENTATION PROVIDED, INCLUDING,
        BUT NOT LIMITED TO WARRANTY OF
        MERCHANTABILITY AND WARRANTY OF
        FITNESS FOR A PARTICULAR PURPOSE.
        User understands the software is a research tool for
        which no warranties as to capabilities or accuracy are
        made, and user accepts the software "as is." User
        assumes the entire risk as to the results and
        performance of the software and documentation. The
        above parties cannot be held liable for any direct,
        indirect, consequential or incidental damages with
        respect to any claim by user or any third party on
        account of, or arising from the use of software and
        associated materials. This disclaimer covers both the
        GDE core editor and all external programs used by
        the GDE.