BLAST3(1) USER COMMANDS BLAST3(1) NAME blast3 - protein database search for three-way alignments, using the BLAST pairwise search algorithm SYNOPSIS blast3 database query [E=#] [S=#] [T=#] [X=#] [W=#] [M=subfile] [Y=#] [Z=#] [F=#] [L=#] [R=#] [N=#] [U=#] [K=#] [L=#] [H=#] [V=#] [B=#] [D=#] DESCRIPTION BLAST (Basic Local Alignment Search Tool) is the heuristic search algorithm employed by the program blast3. This pro- gram is used to compare an amino acid query sequence against a protein sequence database. The principal use of this pro- gram is to identify statistically significant three-way sequence alignments in which the component pairwise align- ments are statistically _i_nsignificant. The output from this program may be of four different types. The first type of output consists of statistically signifi- cant High-scoring Segment Pairs (HSP), where a _s_e_g_m_e_n_t is an arbitrarily long run of contiguous residues. An HSP is a pair of segments, one from the query sequence and one from a database sequence, where the score of their ungapped align- ment meets or exceeds a parametrized cutoff value. A set of zero or more HSPs is thus defined by two sequences, a scor- ing scheme, and a cutoff score. Depending on the parameters used, there is a non-zero probability of missing one or more HSPs. The second type of output consists of a one-line description for each diagonal on which at least one HSP was found. This list accounts for all diagonals--those due to HSPs that are statistically significant, as well as those that are sta- tistically _i_nsignificant. The third type of output consists of a summary of the three-way alignments, which contain one segment from the query sequence and two from the database. A given HSP may occur at most once in this list, represented by the highest scoring three-way alignment in which it participates that has greater statistical significance than any of its com- ponent pairwise alignments. The fourth type of output consists of individual three-way alignments with scores greater than or equal to a parametrized cutoff value. PARAMETERS Parameters are modified using a _n_a_m_e=_v_a_l_u_e syntax, _e._g., S=100. Sun Release 4.1 Last change: 29 December 1991 1 BLAST3(1) USER COMMANDS BLAST3(1) S is the cutoff score for recording HSPs. Unless explicitly specified on the command line, the cutoff score is deter- mined by the setting of E, which is an estimate for the number of HSPs found in the search of a random database. For a fixed value of E (_e._g., the default value of 1000) and a given query sequence, the calculated value of S will be different when searching databases of different lengths. When calculated automatically, S is rounded down to the nearest integer. W is the word size for finding initial _h_i_t_s against the database. These hits are extended in both directions along the diagonal until the segment score drops off by the quan- tity X (see below). The default W is 4 amino acids. The value of W should not be changed. T is the threshold for generating neighborhood words from the query sequence prior to scanning the database. Raising the value of T increases the likelihood of completely miss- ing HSPs, but can decrease the search time and memory requirements of the program. If not explicitly specified, a value suitable for use with the default PAM120 substitution matrix will be calculated at run time. X is a positive integer representing the maximum permissible drop-off of the cumulative score during word-hit extension. Raising X may decrease the chance that the BLAST algorithm overlooks an HSP, but raising X may significantly increase the search time, as well. If computation time is of little concern, X might be increased a few points. The default value of X is 20, which is intended to complement the default PAM120 substitution matrix. M is the name of a file containing the substitution matrix. At the present time, only the PAM120 matrix is available. For the purpose of calculating significance levels, Y is the effective length of the query sequence and Z is the effec- tive length of the database, both measured in residues. The default values for these parameters are the actual lengths of the query and database used. Pairwise alignments with _E_x_p_e_c_t-_v_a_l_u_e (the expected number of alignments with equivalent or greater score) less than N are considered significant in their own right and, there- fore, are not used to form three-way alignments. The default value for N is 0.1. This may be overridden indirectly by specifying a value for U, the highest pairwise score to be used to form three-way alignments. Sun Release 4.1 Last change: 29 December 1991 2 BLAST3(1) USER COMMANDS BLAST3(1) Three-way alignments with an Expect-value less than F are not reported. The default value for F is 5.0. This may be overridden indirectly by specifying a value for L, the lowest three-way alignment score to be reported. A three- way alignment is also not reported if the _r_a_t_i_o of its Expect-value to the smallest Expect-value associated with any of its component pairwise alignments exceeds the value of R. The default value for R is 2.0, but the user may well want to use a value of 1.0 or less. The command line parameters K and L can be used to set fixed values for the Karlin statistics' K and lambda parameters, respectively. Users should generally avoid setting these parameters unless the full ramifications of doing so are understood. As an example of one of the less obvious effects of manually setting these parameters, the value of the H statistic reported at the end of each program's output (which is distinct from the command line parameter of the same name) is a function of lambda; and the default value for the neighborhood word-score threshold parameter T is in turn a function of H. REGULATING OUTPUT By default, blast3 reports the pairwise alignments found during the initial BLAST portion of the search. Parameter H regulates the display of an histogram of the scores of the highest-scoring hit extensions for each database sequence. As long as H has a non-zero value, the histogram will be displayed. The default value for H is 1. Parameter V is the maximum number of database _s_e_q_u_e_n_c_e_s for which one-line descriptions will be reported. The default value for V is 500. A warning message is prominently displayed at the end of the one-line descriptions section if more sequences than V yielded HSPs. When V is zero, no one-line descriptions are reported and no warning is given. Negative values for V are undefined and disallowed. Parameter B regulates the display of the high-scoring seg- ment pairs. For positive values, B is the maximum number of database _s_e_q_u_e_n_c_e_s for which high-scoring segment pairs will be reported. This may be much smaller than the actual number of high-scoring segment pairs reported, since any given database sequence may yield several HSPs. For nega- tive values, no limit is imposed on the number of HSPs that will be reported. The default value for B is 250. Negative values for B are undefined and disallowed. If parameter D is made non-zero, the program reports concise information about all of the pairwise diagonals remembered for the three-way search. Sun Release 4.1 Last change: 29 December 1991 3 BLAST3(1) USER COMMANDS BLAST3(1) SUPPORT UTILITIES Databases to be searched by blast3 must first be processed by the setdb program. SEE ALSO blast(1), blastp(1), blastn(1), blastx(1), tblastn(1). REFERENCES Karlin, Samuel and Stephen F. Altschul (1990). _M_e_t_h_o_d_s _f_o_r _a_s_s_e_s_s_i_n_g _t_h_e _s_t_a_t_i_s_t_i_c_a_l _s_i_g_n_i_f_i_c_a_n_c_e _o_f _m_o_l_e_c_u_l_a_r _s_e_q_u_e_n_c_e _f_e_a_t_u_r_e_s _b_y _u_s_i_n_g _g_e_n_e_r_a_l _s_c_o_r_i_n_g _s_c_h_e_m_e_s, Proc. Natl. Acad. Sci. USA 87:2264-2268. Altschul, Stephen F. and David J. Lipman (1990). _P_r_o_t_e_i_n _d_a_t_a_b_a_s_e _s_e_a_r_c_h_e_s _f_o_r _m_u_l_t_i_p_l_e _a_l_i_g_n_m_e_n_t_s, Proc. Natl. Acad. Sci. USA 87:5509-5513. Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). _B_a_s_i_c _l_o_c_a_l _a_l_i_g_n_m_e_n_t _s_e_a_r_c_h _t_o_o_l, J. Mol. Biol. 215:403-410. Sun Release 4.1 Last change: 29 December 1991 4