70 - SOL:SOURCES SOUGHT FOR AUTOMATED BIOTECHNOLOGY SEQUENCE (02/24/97)

COMMERCE BUSINESS DAILY ISSUE OF FEBRUARY 24,1997 PSA#1788

Patent & Trademark Office, Office of Procurement, Box 6, Washington, D.C. 20231

70 -- SOURCES SOUGHT FOR AUTOMATED BIOTECHNOLOGY SEQUENCE SEARCH (ABSS) SYSTEM POC Henrietta V. Brox, Contract Specialist, 703-305-8016, Michael J. Anastasio, Jr., Contracting Officer. The U.S. Patent and Trademark Office (USPTO) is considering alternatives to enhance its capabilities for automated molecular sequence searching in support of USPTO examiners. The requirement for improving and streamlining the examination of sequence patent applications has been identified as a critical requirement by the USPTO. The USPTO has concluded that there is a need to evaluate alternatives for responding to these issues. In order to handle the present workload and future increases in biotechnology applications, the USPTO is considering enhancements to their existing hardware/software platform or replacement of their existing hardware/software platform with additional tools and computer processing power. Currently, the USPTO computing platform, the Automated Biotechnology Sequence Search (ABSS) system, consists of Sun Microsystems workstations which access RNA/DNA and protein sequence databases resident on the Sun Microsystems database server, supplemented by two (2) MasPar massively-parallel computing platforms. IBM compatible computers and Macintosh computers are used for input data preparation and administrative functions. The system is configured as a distributed processing system. The ABSS System has one (1) major application function: automated sequence search processing. This major application is further divided into three (3) functional subsystems: Computer Readable Forms (CRF), Sequence Search subsystem, and Sequence Dissemination subsystem. However, from a hardware configuration viewpoint, the ABSS may be viewed as two (2) separate subsystems: the CRF and Sequence Search subsystems. The CRF subsystem has Sun SPARC workstations that are attached to a LAN. The CRF is responsible for the pre-processing of all computer readable format data received from patent applications. Data from patent applicants may be submitted on magnetic media with MS-DOS format, Macintosh format, or UNIX format. After a standalone personal computer has been used to validate that the incoming does not contain a virus, the data is transferred to a LAN-connected PC for validation processing. Validation processing includes steps to convert the data, edit checks to ensure that the data is compliant with CRF standards, correction of obvious format errors, and reformatting of the data into standard IntelliGenetics (IG) sequence search format. The Sequence Search subsystem has two (2) Sun SPARC servers acting as database servers for performing sequence searches. Sun SPARC workstations are connected to the database servers with an Ethernet LAN and are used to enter sequence search information. In addition, the MasPar system provides a searching platform for extensive and sensitive searches that would take much longer to process on the Sun workstations. Software used to perform the sequence search processing is the IG Suite (primarily the FASTDB program) and MPSRCH developed by IG, Inc. (now the Oxford Molecular Group), and the Genetics Computer Group (GCG) Wisconsin package. The MPSRCH software incorporates provisions for: (1) support of affine gap penalties to maximize sensitivity; (2) supports standard nucleotide and amino acide codes as well as IUPAC ambiguity codes; (3) reporting of alignment, prediction values, and annotations for each result, reporting of percent matching residues for each result in the list of top scores and in individual alignment output for each match; (4) supports standard and user-defined matrices such as PAM, BLOSUM, etc.; (5) supports automated batch submission of multiple searches using defined parameters; (6) supports the use of a graphical user interface (GUI) that automates sequence selection and searching process including performing multiple database searches of both commercial and/or in-house (non-commercial) databases; (7) supports searching sequence ranges and oligomer searches; (8) supports reverse translation in all six (6) frames (for comparing DNA queries with protein databanks and for comparing protein queries with DNA databanks); (9) supports search procedures for both DNA strands; and (10)supports creation of output which can be further processed analytically. Commercially available databases such as Genbank, EMBL, PIR, etc. are received from IG, Inc. and GCG on a regular basis as part of their maintenance contract with USPTO. The data used to update the commercial databases on the Sun SPARC servers. The Sequence Dissemination subsystem processes sequence data in the Issued databases and creates output on a magnetic tape which is then distributed to Genbank. Examiners can have the sequences searched against the commercially available databases or the in-house pending sequence databases. The choice of which software package is used is based upon the nature of each application and its claimed sequences. After processing, on the Sun workstations or on the MasPar, the search results files are downloaded to disk or printed and forwarded to the examiner. Results files contain a predefined number of highest scoring sequence alignments and associated annotation information. The examiner opens the files on his/her workstation and examines the results. Results returned from database searching often identify literature references pertinent to sequences matched. The examiner can review files on his/her workstation, isolating pertinent material for printout and storage for the record for that particular application. The USPTO is interested in obtaining information technology desktop tools which can improve the sequence examination process by facilitating analysis (and utilization) of search results. Current sequence search tools produce voluminous textual results which require considerable examiner analysis to manually isolate and summarize the results most pertinent to a given application. Specifically, the USPTO is interested in emerging or existing software tools which can aid examiners in improving the quality of, and enhance the efficiency of, sequence examinations. Such tools could include pre-processing tools for expediting the setup of sequence searches, search engines which meet or exceed the performance of existing search system capabilities, and post-processing tools which could help examiners in reviewing raw sequence search results to best meet patentability decision-making needs. Of special interest would be the ability to output the results into a database format such that one could sort by different fields. Potentially useful fields would include author/inventor, gene, locus, date, best score, etc. Also of interest would be the ability for merging various results files into one file containing non-redundant data which can then be further incorporated into a database. The future ability to hyperlink search output directly to full-text online documents, would also be of special interest for future application. Parties interested in responding to this annoucement should send as full and specific a description of proposed hardware and software tools as possible, including a description of the potential and methods for integrating the proposed tools into the existing ABSS platform, as described above. For example, proposed software should be fully-described functionally and should include technical specifications such as programming languages, hardware platform necessary, size, level of support available, costs, and installed base. Interested sources should send their responds to Henrietta V. Brox within fifteen (15) days of this publication. (0051)

Loren Data Corp. http://www.ld.com (SYN# 0350 19970224\70-0008.SOL)

70 - General Purpose ADP Equipment Software, Supplies and Support Eq. Index Page