JDSA is a Java program that will degenerately search DNA sequences. This program searches sequences for multiple nucleotides at a given position (positional degeneracy), limited overall sequence accuracy (group degeneracy) or variable spacing between multiple DNA sequences (spacing degeneracy). This program was designed to search the S. pombe genome or the D. melanogaster genome, but custom searches can be performed provided the input is in the correct format. This program has been written in Java 2, SDK 1.4.2_03, contains a graphical interface (Swing v 1.1) and has been tested on both a Win98/XP PC, Mac OS 9.1. and Mac OS X (10.1.4)
Necessary downloads: Win 9x/2000/XP:
Mac OS 9.x: Mac OS X:
Additional Files: If you want to search an entire genome
and the
files are not stored locally, you must create a file containing all of
the GI numbers for every genome
sequence file and point the filechooser to that file. Here are the Drosophila
melanogaster Release v3.2
and S. pombe genome GI lists. These may
not be the most up to date annonations of the sequencing results. 1. Positional degeneracy:
If you wanted to search for a DNA element that contained degeneracy at a given position
- the initiator (Inr) region of Drosophila, for example. The Initiator of Drosophila
is the consensus sequence from which transcription starts (the +1 of an RNA transcript.)
This sequence is T-C-A-(G or T)-T-(T or C). Searching each permutation (by BLAST
search for example) is very inefficient. Instead, using the standard IUPAC designation
for degenerate nucleotides, the JDSA algorithm will return all permutations of the
Inr sequence. For example, the above Inr sequence can be written: TCAKTY. The Downstream
Promoter Element (Burke & Kadonaga(1996); Kutach & Kadonaga(2000)) would be written
as: RGWYG. For more information regarding the DPE and promoters, click here. The IUPAC
degeneracy codes are below:
For
any java application, you need to have the proper virtual machine installed before
an application will run. These are platform (operating system) specific, and can
be obtained by following the links below.
1. You will
need to download and install the Java Runtime Environment (JRE). It is available
here.
2.
Download the program file (JDSAv01.jar)
and run it (see instructions section below.)
3. You may also need the Java
Foundation Classes (JFC)/Swing Package. That is available here.
1. You will need to download and install the Macintosh Runtime
for Java (MRJ) 2.2.5. It is available here.
2. You will need to download a file called swingall.jar.
This file will need to be placed in the folder: System Files: Extensions: MRJ Libraries:
MRJClasses
3. Download the program file and run it (see instructions section
below.)
1. Download the file: JDSAv01.jar.
Double-click the icon and go.
Principal:
This project
was originally begun to search the Drosophila genome for promoter regions that contain
the Downstream Promoter Element (or DPE) or Motif Ten Element (MTE). The project design was to create a search
algorithm that contained the ability to search for DNA sequences using the following
three kinds of degeneracy variables:
IUPAC | Nucleotide(s) | Complement Nucleotide(s) |
A | A | T |
C | C | G |
G | G | C |
T | T | A |
M | A OR C | K |
R | A OR G | Y |
W | A OR T | W |
S | C OR G | S |
Y | C OR T | R |
K | G OR T | M |
V | A OR C OR G | B |
H | A OR C OR T | D |
D | A OR G OR T | H |
B | C OR G OR T | V |
N | A OR C OR T OR G | N |
2. Group degeneracy: When searching for a DNA binding element,
a perfect match may not be necessary because the entirety of the element may not
be necessary for binding the corresponding protein. That is to say, if an element
is 6 nucleotides long, sometimes a given protein that normally binds to this region
will bind if there is only a 5/6 match to the consensus sequence. Binding to a perfect
match versus an imperfect match could be an integral part of regulating activity.
Again using the Inr region, it is possible that transcriptional machinery will bind
to an Inr region if only 5/6 nucleotides are a match. In this case, we would say
that the maximum allowable mismatch is 1. If you ask the JDSA program to search
for a given sequence with a maximum mismatch of 1, the mismatch has the potential
to appear anywhere within the sequence.
3. Spacing degeneracy:
This type of degeneracy has to do with the spacing between two DNA elements. Returning
to the promoter example, if you wanted to look for a TATA box that was close to
an initiator, you could do so with the JDSA program. A TATA box is the most biologically
pertinent when it appears further away than 30 nucleotides upstream of the initiator,
and no closer than 10. To have the JDSA program search for these results, you would
enter the following: Please
note: the number signifies the nucleotides in between the given elements.
For example, if you wanted to search element A and element B, and the end of A could
be 0 nucleotides away from the start of element B, then the minimum distance would
be 0. Searching for
a sequence: To search for a given sequence or set of sequences, click "New..."
from the MenuBar, and then "New JDSA Search..." from the pull-down menu. A pop-up
menu will appear and ask you "How many fragments?" From the pull-down menu, you
should select the number of separate DNA elements that you wish to search. In the
promoter example, if you just wanted to search for the initiator, you would enter
1. If you wanted to search for a TATA box and an initiator, you would enter 2, and
so on. Then click "OK". Clicking "Cancel" will abort the search. A new screen
titled "JDSA input" will appear, and its appearance will depend upon how many fragments
you said that you needed to search. In the separate top subpanels, you can enter
the needed information: what is the sequence of a given fragment, how many mismatches
will you allow, how far is it from the previous fragment, and so on). Please
note, when you start, you should not be able to click the OK button (bottom right).
The OK button will only become enabled when you have entered enough VALID information
to proceed. If you enter a character that the program does not recognize (a non-IUPAC
character or a letter where a number is expected) the OK button will be disabled
and remain disabled until the problem is corrected.
Parsing and Filtering:
There are several options available to cut down on unwanted reported results, as
well as attempting to maximize the information returned so that the results have
more meaning. These options are in the lower right-corner, just above the Cancel/Proceed
buttons. Parsing: This is an attempt to place the results of a genomic
search in its genomic context. If the Parse the results? checkbox is checked,
the results will look like this: 1. ggatggattgatttgcctattgcatttata
[C]SPAC7D4 {5646}:In ORF: SPAC7D4.12c; Start of exon <-- {1373 bp} SEQUENCE FOUND
complement strand {906bp} --> end of exon In both of these results, you can see how the
formatting is returned. The result lists the sequence, the strand (complementary
strand results are designated with a [C]), the file name (in this case the pombe
file name SPAC7D4), the nucleotide number (5646), and where those sequences are
in the genome, either In an ORF, Extragenic, Intronic or some basic combinations
of these possibilities. It also lists how close the resultant sequence is from those
surrounding elements. In the extragenic example, the sequence is 649bp from gene
SPAC7D4.15c, and 355bp from gene SPAC7D4.08. Parse Filter: by default
is set to No Parse Filter but can be changed to extragenic only, in ORF only,
in intron only, or extragenic/in intron. Strand Filter: by default is set
to No Strand Filter but can be changed to Forward Strand Only (especially
useful if you have a custom search to perform, see below) or Complement Strand Only.
Starting the search: After you've entered all of the information, click Proceed.
That window will disappear and another small window will appear saying "Click here
to START". Click when you're ready to begin. The program will begin its search of
your query and return your results to you when it is done. Three types of files are allowed as valid inputs. The speed of this program is dependent on several factors. Since
it is a degenerative search, it may have to do the functional equivalent of searching
the genome multiple times to find what you're asking for. And this takes time.
Also, this program pulls the genome files from NCBI as it needs to search them.
Therefore, internet traffic and the load on NCBI will impact performance. In
heavy traffic, a complex search of the Drosophila genome using the Internet as the
source can and has taken up to an hour to run. There is a status bar, but just be
forewarned.
- New Search, 2 DNA elements.
- For element #1: TATAAA
(the TATA box consensus) with any desired maximum allowable mismatch (see group
degeneracy)
- For element #2: TCAKTY (the Drosophila Initiator sequence) with the given
maximum allowable mismatch.
- For element #2: Maximum distance away from element
1 would be 30
- For element #2: Minimum distance would be 10.
Likewise, if the distance between two DNA elements is fixed, simply
enter the same number for both the maximum and minimum distance.
Instructions:
Starting the program: This can be done from a DOS prompt on a Win
9x/XP machine from the directory that JDSAv01.jar is in by typing
java -jar
JDSAv01.jar or from a Mac by clicking on the JDSA icon.
6. tataaactgcatatttatactccttttccaatt
SPAC7D4 {13918}:Extragenic: Previous gene:SPAC7D4.15c [C]<-- {649 bp} SEQUENCE FOUND
{355bp} --> SPAC7D4.08 [C]
Input
Source Parameters/Selecting a Source File 23095176 "AE003474"
There must be only one entry per line. The GI number must precede any description
you wish to provide. The descriptions are optional. If you include a description,
it must be in quotations. This way, when JDSA returns the results of the search,
the results can be listed along with their respective title.
23092840 "AE003475"
2894275 "Pombe cosmid c6B1"
6689257
Notes
on usage:
Known Bugs:
Future Directions
Contact info:
This project is in non-continual development. Any input is welcomed. If you have
any ideas as to features that should be implemented, deleted or you want to report
a bug or offer support, please direct all contacts here: tboulay at biomail dot ucsd dot edu.
Last updated, October 2004