Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Plasmid replication in thermophilic bacteria Tom Hargreaves 200374823 Submitted in part fulfilment for the requirements of the degree of Natural Sciences March 2010 Supervisor: Dr. C. D. Thomas Faculty of Biological Sciences Undergraduate School University of Leeds, Leeds LS2 9JT 1 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Abstract pSTK1 is a 1883 bp cryptic plasmid found in Bacillus stearothermophilus. A putative replication-initiation (Rep) protein has been identified in an open reading frame, and expressed in E. coli. Previous topoisomerase assays have cast doubt over the effectiveness of this protein at binding pSTK1, leading to this study. SELEX was used for selection of highaffinity DNA sequences for both the pSTK1 Rep protein and the well-characterized RepD from pC221. A YF mutant that result in inability to nick without harming DNA binding was used. Electrophoretic mobility shift assays were performed to confirm binding of the selected sequences. These assays were inconclusive, and the DNA sequences produced were hard to interpret. Abbreviations used aa: amino acid Absx: absorbance at x nm. bp: base pair ds: double-stranded dso: double-strand origin EtBr: ethidium bromide ICR: inverted complementary repeat MW: molecular weight ODx: absorbance at x nm. ORF: open reading frame ori: origin of replication SELEX: Selective Enrichment of Ligands by Exponential Enrichment sso: single-strand origin Acknowledgements Thanks are due to everyone who helped to make this project possible. It wasn't their fault. 2 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Introduction and previous work pT181 is a family of high-copy-number plasmids found in gram-positive bacteria (notably Staphylococcus aureus) whose rolling-circle replication mechanism has been extensively studied (Kahn, 2005). Each pT181-family plasmid encodes a Rep protein responsible for initiation of replication by nicking of the plasmid and recruitment of PcrA helicase. pSTK1 (figure 2) is a 1883 bp cryptic plasmid found in Bacillus stearothermophilus. pSTK1 has an open reading frame that codes for a 269-aa protein fragment that has sequence similarities with the active site of pT181-family Rep proteins (table 1). When expressed in E. coli, this protein was insoluble, but extending the ORF at the N-terminal end resulted in a 343-aa soluble protein hereafter called BstRep. The interest in this particular plasmid comes from its origin in a thermophile. No Rep protein structures are currently known. Any pSTK1 Rep protein must be sufficiently robust to function at the high temperatures favoured by the host organism (60-65 °C), which might make it amenable to crystallization for subsequent X-ray diffraction experiments. In 1993, Narumi et. al constructed a shuttle vector pSTE33 from pSTK1 and the common E. coli vector pUC19 (Narumi, 1993), with the intention that it should be capable of replication in both B. stearothermophilus and E. coli. However, after acquiring a copy of pSTE33 from RIKEN it was found not to be viable in B. stearothermophilus, and furthermore had slight deviations from the published pSTK1 sequence, including a missing GT adjacent to the nick site. Nonetheless BstRep was found to nick both the published sequence and the truncated pSTE33 sequence (Thomas, 2009). Protein RepD,E,I,N RepC RepJ BstRep Plasmid Host pC221 et al. S.aureus pT181 S.aureus pUB112 S.aureus pSTK1 B.stearothermophilus Identity Sequence TKYFGVRDSDRFIRIYNKKQE TKYFGVRDSNRFIRIYNKKQE TKYFGSRDSNRFIRIYNKKKE TLYFGAPSSDIQVRFYEKNVQ T.YFG...S....R.Y.K... Table 1: Sequences of the active sites of some pT181-family Rep proteins, with BstRep for comparison. 3 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Rep proteins perform nicking of the circular dsDNA strand and recruit (host-encoded) PcrA helicase. The origin of replication (ori) contains the nick site which is flanked by a region of inverted complementary repeats known as ICR II. ICR II is highly conserved among pT181family plasmids. Specificity is determined by a nearby ICR region called ICR III to which the Rep protein binds prior to nicking at ICR II. Each plasmid type contains a different ICR III (and a gene for a Rep protein that binds to it), but all of them have another ICR region immediately adjacent to ICR II. That ICR III is the binding site for pT181-family Rep proteins has been confirmed by mutagenesis (Thomas et. al, 1995). Unlike these plasmids, pSTK1 does not have an ICR region immediately adjacent to the nick site (figure 1). It is not immediately obvious what the expected binding site of BstRep would be; while pSTK1 does not have a good adjacent ICR region, there does exist a widely spaced ICR region somewhat further away that is a potential candidate for Rep binding. It is also unclear whether binding to pSTK1 occurs with any reasonable efficiency, given that pSTE33 has not been successfully reintroduced into B. stearothermophilus. Thus, this study aims to determine the binding site of BstRep. SELEX (Stoltenburg et al., 2007) was used to find dsDNA aptamers that BstRep will bind to. A Y191F1 BstRep mutant was used. In RepD, such a mutation prevents nicking at the active site Y191 but not (non-covalent) DNA binding. It is hoped that a similar prevention of nicking in BstRep would allow useful aptamer selection. (a) ICR I nick site (ICR II) binding site (ICR III) ----- -> <- -----========> | <======== ------> <-----AATTACTTACAAAATAAGGATTTAGACAATTTTTCTAAAACCGGCTACTCT'AATAGCCGGTTAAGTGGTAATTTTTTTACCACCCCTCAACCAGAATT (b) nick site possible binding site? ----------> | <---------------> <-----AAATTGGAAAAAATTCAGAGTCTACCCCCGTTGTGT'AACACGGGGGTAGAAAGTACAAGTCAAAGTGGTCTGAAGCCTTGTGTAGACTGGCTTCAAGT Figure 1: Nick site and surrounding ICR regions for (a) pC221 oriD; (b) putative pSTK1 ori. 1 Residue 191 refers not to the location in BstRep (or RepD!) but to the analogous location in other Rep proteins. See Appendix A. 4 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Purpose 5' 3' SELEX AATTCTAATACGACTCACTATAGGGAGAAGGGCACGGCACGTAGGCAACTA TTATGAGTGCACGGGCGGCA Sequencing GATGTGCTCCATGGCGATTAAGTTGG TCATTAATGCAGCTGGCACG Table 2: PCR primers used. orf3 (rep) pSTK1 1883 bp sso (rep?) dso Figure 2: Map of pSTK1. Open reading frames are represented by arrows. The putative double-strand origin (dso) is the location of the BstRep nick site. 5 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 3: Overview of experimental procedures 6 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Materials and methods Experiments were carried out as shown in figure 3. Buffers used were as follows: • “K0”: 50 mM Tris (pH 7.5). • “K200”: 50 mM Tris, 200 mM KCl (pH 7.5). • “K600”: 50 mM Tris, 600 mM KCl (pH 7.5), filtered under vacuum with 1μm filter. • “K1M”: 50 mM Tris, 1 M KCl (pH 7.5). • TAE (50x): 242.2 g/l Tris base, 50 ml/l glacial acetic acid, 18.6 g/l disodium EDTA, pH 8.3. Filtered under vacuum with 0.2 μm filter. • Te buffer: 10 mM Tris, 1 mM EDTA, pH 8.0. • NEBuffer 4: 50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 1mM dithiothreitol (New England Biolabs). LB was used as a growth medium for agar plates: 10 g/l tryptone, 5 g/l yeast extract, 10 g/l NaCl, pH 7.5. 2YT was used as a growth medium for cell culture solutions: 16 g/l tryptone, 10 g/l yeast extract, 5 g/l NaCl, pH 7.5 1. Binding of protein to magnetic beads. 2 ml of 40 mg/ml 1 μm-diameter Talon beads (Invitrogen) were used. The beads were first washed with buffer, a magnet placed against the tube and the supernatant removed. The protein was washed twice with 1 ml of K600 buffer to remove preservatives, then added to the beads. The following steps were repeated three times: 1. The tube was rolled gently for 20 minutes. 2. The magnet was placed against the tube, the unbound supernatant was collected, centrifuged for 2 minutes and the Abs260 and Abs280 checked. 3. 300 μl of K600 buffer was added. (200 μl in the final iteration.) 2. Extraction of protein on heparin sepharose column. The column volume was 0.3 ml. The salt concentration of the protein solution was adjusted to 200 mM, and the protein applied to the column. The unbound fraction was collected and its Abs280 measured. The column was washed twice with 2 ml K200, the wash fractions collected and their absorbance measured. The protein was eluted with 7 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 2 ml K600. 3. DNA SELEX. 15 rounds of SELEX were carried out. A pool of synthetic oligonucleotides with random 50 bp region was used (the same pool was used for all experiments). The sequences of the flanking regions were the same as those of the PCR primers (see Table 2). SELEX services were kindly provided by Dr. David Bunka. 4. Preparation of competent E. coli cells. A single colony of Escherischia coli strain DH5α was picked, and incubated in 5 ml 2YT broth overnight at 37 °C. A ~1 ml aliquot corresponding to an OD600 of 0.05 was removed, placed in a flask containing 50 ml 2YT broth, and incubated at 37 °C. Growth was monitored at 20-minute intervals until the OD600 reached 0.5. The doubling time was calculated. The cell culture solution was cooled to 4 °C, centrifuged at 3000rpm for 10 minutes, and the supernatant discarded. The pellet was resuspended in 25ml sterile 100 mM CaCl2, kept at 4 °C for 30 minutes, then centrifuged at 3000rpm for 10 minutes. The supernatant was discarded. The pellet was resuspended in 5ml sterile 100 mM CaCl2 and stored overnight at 4 °C. 5. Ligation of DNA into plasmid vector. pGEM T-Easy (Promega) was used as a vector. Four ligations were carried out as per the manufacturer's recommendations: a positive control with provided test insert, a negative control with no insert, and two reaction mixtures with varying amounts of sample DNA. 6. Transformation of E. coli. The four 10 μl ligation mixes and a fifth control mix containing 2ng supercoiled plasmid DNA from pCER19 were placed in microcentrifuge tubes. 10 μl Te buffer and 200 μl of competent cells were added, and the tubes incubated on ice for 30 minutes. The cells were heat shocked for 2 minutes at 42 °C. 500 μl of 2YT broth was added, the tubes inverted once then incubated at 37 °C for 90 minutes. 550μl was carefully removed from the top of the liquid, and the remaining liquid was agitated. The cells were transferred to agar plates containing 50 μg/ml ampicillin, and incubated overnight at 37 °C. 7. Plasmid miniprep. 1.5 ml of bacterial culture was collected in a microcentrifuge tube. Plasmid DNA was 8 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 prepared using a Qiagen QIAprep Spin Miniprep kit as per the manufacturer's instructions. Two modifications to the protocol were made. Firstly, after the wash buffer (the ethanol-containing “buffer PE”) was removed by centrifugation, the spin columns were placed under a fan for 3 minutes to evaporate any residual wash buffer. Secondly, DNA was eluted in a final volume of 100 μl instead of 50 μl. 8. Agarose gel electrophoresis. A solution of 1.2% agarose in TAE buffer was prepared (approx. 100ml per gel) and heated until fully dissolved. Ethidium bromide was added for a total concentration of 1μg/ml (from stock 10mg/ml). The agarose was allowed to cool to 55 °C, then poured and allowed to set. After samples were loaded, 100V DC (current limited to 100 mA) was applied across the gel for 80-90 minutes. The gel was removed and photographed under UV light. 9. DNA sequencing. 30 μl of DNA at a concentration of approximately 50ng/μl was used. Primers used were from oriDF and CER0372R; sequences are in Table 2. DNA sequencing services were provided by the Faculty of Biological Sciences, University of Leeds, using an ABI 3130xl capillary sequencer. 10. Restriction digest with PvuII. 5μl of each DNA sample was incubated with 0.155 μl PvuII-HF (New England Biolabs) in NEBuffer 4 (total volume 10 μl) for 60 minutes at 37 °C, followed by inactivation of the enzyme by incubation at 80 °C for 20 minutes. 11. Electrophoretic mobility shift assay (EMSA). 10 μl of each of the PvuII restriction digest products were incubated with 10 μl protein or K0 (control) for 60 minutes at 60 °C (BstRep) or 37 °C (RepD). The samples were run on an agarose gel without EtBr to avoid interference with bound protein during electrophoresis: the gel was subsequently immersed in a 1 μg/ml EtBr solution for 30 minutes before being photographed. 12. Quantitation of DNA. A 5 μl aliquot of DNA sample was run on an agarose gel along with reference samples containing 25 ng, 50 ng, 100 ng and 200 ng of supercoiled plasmid DNA. The relative intensities of the bands were compared quantitatively using ImageQuant software. 9 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Results BstRep is 343 amino acids long, but was prepared with a 20-aa N-terminal His6 tag for a total molecular weight of 42.1 kDa (Appendix A). The stock BstRep was at a concentration of 0.64 mg/ml, stored in 1 mM Tris, 1 mM EDTA, 10% ethanediol, 500 mM KCl at pH 7.5. The protein was bound to beads. The absorbance of the unbound fractions indicated that very little protein had bound (data not shown). This was surmised to be due to the competitive chelation of the Co2+ ions in the beads by the EDTA present in the buffer. Thus the unbound fractions were pooled and the EDTA was removed by elution on a heparin sepharose column. The protein was again bound to the beads. The beads were submitted for SELEX. The SELEX procedure as performed by Dr. David Bunka (Bunka et. al., 2006): 1. Immobilize the protein of interest on beads. 2. Add solution containing random oligonucleotide pool. 3. Repeat multiple times: 1. Add protein. 2. Allow DNA to bind to protein immobilized on beads. 3. Remove unbound solution. 4. Perform PCR on remaining oligonucleotides. The PCR products after 15 rounds of SELEX were run (also by Dr. Bunka) on a polyacrylamide gel, and the ~120bp products extracted. The result is a selection of “aptamers”: oligonucleotides that exhibit unusually-high affinity for the protein under consideration. The PCR product after 15 rounds of SELEX contained 19 ng of DNA in a 5 μl aliquot, for a concentration of 3.75 ng/μl. This was ligated into the plasmid vector pGEM T-Easy, which was used to transform E. coli cells. Two ligations were carried out containing 1 μl and 3 μl of DNA for an approximate ratio of DNA to vector of 1:1 and 3:1 respectively. 24 colonies were picked and grown, the first 12 (A-L) were from the 1:1 ligation and the other 12 (M-X) were from the 3:1 ligation. Plasmid DNA was extracted. 10 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 4: Restriction digest of pGEM vector with inserts from BstRep SELEX. Of the 24 plasmid DNA samples, restriction digest analysis with PvuII (figure 4) showed that 16 contained an insert (furthermore, an insert that lacked a PvuII site). These 16 were sequenced. The result was 16 pairs of DNA sequences, one from each primer. Having two versions of each sequence, one in each direction, allowed for confirmation of the fidelity of the sequencing. Only one of the 16 sequences (“A”) had any significant dropouts or mismatches between versions, and it was nonetheless possible to deduce the correct sequence of A with reasonable certainty. 11 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 2: multiple sequence alignment of the random region of the 16 BstRep SELEX sequences (A-X) and their complements (rev suffix). The forward and reverse sequences are nearly disjoint, indicating that the primer direction is significant. The markedly skewed proportion of AC bases is also apparent. Figure 5: multiple sequence alignment of the random region of the 16 BstRep SELEX sequences (AX) and their reverse complements (-rev suffix). Sequence length is in the righthand column. The forward and reverse sequences are nearly disjoint, indicating that the primer direction is significant. The markedly skewed proportion of AC bases is also apparent. Sequence analysis. The sequences were oriented so that the SELEX primer sequences were all facing in the same direction, and multiple alignment was performed using ClustalX (figure 5). No obvious consensus sequence is apparent from the sequence alignment; nonetheless the sequences are decidedly non-random. The possibility that BstRep's region of specificity is non-contiguous was tested by supposing that it encompasses exactly N bases in a contiguous M-base region. An exhaustive search of all possibilities was undertaken by assigning a score to each possibility P according to how well it fits with each sequence S. Two different scoring algorithms were used in an attempt to reduce artifacts. 12 Plasmid replication in thermophilic bacteria (a) CACCACNNNCCNNNNN CCNNCACNNNCCNANN CACNNNCAACCNNNNN ACACNNNCAANCNNNN CACNACNNACCNNNNN CACCNNNNACCNNCNN ACNCNACCNNCNNNAN ACNCNACCNNCNNANN CNCCNCCANNNNCCNN CNNNCACCACNNNCNN CACCNNCANANNNCNN CNCCANCANNNNCCNN ACACNNNCAACNNNNN CNCNACCNNCNNAANN CNCCNNCANANNCCNN Tom Hargreaves, February 2010 (b) CACCACNNNCCNNNNN ACNCNNCCNNCNNANA ACNCNACCNNCNNNNA CCNNCACNNNCNNAAN ACNCCNCCNNCNNNNA CNCCACNNNNCNCCNN CCNCCACNNNCCNNNN CCNNCACNNNCCNANN ACNCNACCNNCNNANN ACNCNACNNNCNNANA CACNCNNCCNNCNNAN ACNCCACCNNNNNNNA ACNCCACCNNCNNNNN ACNCNANCNNCNNANA ACNNNNCACNNACNNA 104 104 104 103 103 103 103 103 103 103 103 103 103 103 103 1888 1880 1872 1856 1848 1824 1824 1808 1808 1792 1792 1792 1776 1776 1768 Table 3: Top-matching consensus sequences for the forward-direction BstRep SELEX sequences, for N=8, M=16 (i.e. 8 significant bases are chosen out of 16). (a) additive scoring; (b) exponential scoring (see text for details of scoring algorithms). Perfect scores would be 128 and 2048 respectively. A possibility is scored as follows. For each S: P is matched with both S and its reverse complement at every position. Bases that match score 1, up to a maximum of N for a perfect match. Let P(S) be the highest score when P is matched with S. Then, for additive scoring, P scores ΣP(S), the sum of all highest scores. For exponential scoring, P scores Σ2P(S). Additive scoring ranks reasonable overall matches highly, whereas exponential scoring prefers matches that are perfect for some sequences but that fare poorly on others. The same consensus sequence is ranked first by both algorithms; however, a large number of unrelated sequences rank similarly so this may be misleading. All of the top ten sequences contain only A and C. Since AC-richness is the only clear trend, pSTK1 was searched for AC-rich (or GT-rich) regions. While there are such regions, there are none in the vicinity of the nick site. 13 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 6: restriction map of pGEM T-Easy vector showing PvuII sites. Restriction digests and gel shift assay. The restriction endonuclease PvuII cuts at CAG'CTG sites; the resulting fragments have blunt ends. pGEM T-Easy is 3015 bp long prior to insertion and has two PvuII sites positioned as shown in figure 6. Since the inserts used were about 120 bp, this resulted in fragments of ~2500 bp and ~570 bp, with the insert located within the 570 bp fragment. An electrophoretic mobility shift assay (EMSA) was used to test binding of BstRep protein to the restriction fragments (figure 7). Such a large insert-containing fragment was desirable because the behaviour of the Rep proteins at binding small DNA fragments was unknown. 330nM BstRep was used, with an ionic strength of ~45mM. No shifts were observable, however the lanes containing protein showed bands still in the well, indicating total immobility. 14 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 7: BstRep mobility shift assay. Each sample occupies two adjacent lanes, without and with protein added respectively. 15 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 RepD. The entire experiment was repeated using RepD instead of BstRep. The RepD used was wild-type with a 20aa N-terminal tag, with a molecular weight of 39.6 kDa. EDTA extraction and SELEX were performed as before. Two ligations were carried out, but both used 3 µl of DNA due to the concentration of the PCR product being substantially lower. 24 colonies were picked; the first 12 (A-L) were taken from the PCR product after 10 rounds of SELEX, and the other 12 (M-X) from the PCR product after the full 15 rounds as before 2. 17 colonies out of 24 showed an insert after digestion with PvuII (figure 8), and were sequenced. Only 10 out of 17 DNA sequences were viable. Figure 9 shows the 10 sequences and their reverse complements aligned with ClustalX as before. Table 4 shows the results of a consensus sequence search. Gel mobility shift assays (figure 10) did not show any observable shift, however as before, samples with protein contained species that were effectively immobile in the gel, perhaps pointing to aggregation. Furthermore, control experiments (figure 10(d)) showed that no shift was observable even with a sequence that RepD is known to bind to (pCER19 plasmid containing the oriD sequence), and similarly with BstRep. Even without any DNA present, BstRep was immobile (and visible!) in the gel, thus casting doubt over all previous mobility shift assays. 2 It seemed like a good idea at the time. 16 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 8: PvuII restriction digest of vector containing RepD SELEX inserts. Figure 9: multiple sequence alignment of the 10 RepD SELEX sequences (A-X) and their reverse complements (-rev suffix). Sequence length is in the righthand column. A repeated CCGG motif is apparent. 17 Plasmid replication in thermophilic bacteria (a) CCNNGNANNNCANCNG CNNGNNANANANNCGG CNNGGNANANNNNCGG CNGNANNNNACCNGGN CNNGNNANNCANNCGG GCAANNNACCNGNNNN CNNNGNANNNCACCNG CNNGNNANANNNCCGG CNNNNNANANANCCGG GNAANNNANCCGGNNN GNAANNCANCNGGNNN CNNNGNANNANACCNG CNNGNNNNNCANCCGG CNNNNNANNCANCCGG GNAANNNACCNGGNNN GNAANANACNCGNNNN Tom Hargreaves, February 2010 (b) CNNGNNANANNNCCGG CNNGNNANNCNCCNGG CNNGNAANANNNNCGG CNNGNNANANANNCGG GCAANNNACCNGNNNN CNNGNNANANNCNCGG CCNGNNANANNNNCGG CNNGNNANNCANCNGG CNNGNANNNCACCNGN CNNGNNANACNNNCGG CCNGNNNNNCANCNGG GNAANANANNCGGNNN CNNGNANNNNACCNGG CCNGNANNNNANCNGG CNNGNANNNCANCNGG GNNANANNCCCGGNNN 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 1536 1456 1456 1440 1440 1424 1424 1424 1424 1408 1408 1408 1408 1408 1408 1408 Table 4: top-matching consensus sequences for RepD SELEX sequences, for N=8, M=16 (i.e. 8 significant bases are chosen out of 16). (a) additive scoring; (b) exponential scoring (see text for details of scoring algorithms). Perfect scores would be 80 and 2560 respectively. Note that in (a) there were 41 joint-top-scoring sequences: the selection is illustrative. 18 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Figure 11: mobility shift assay for RepD. (a) lanes 1 and 14, size markers; lanes 2-13, samples A-F, without10: mobility shiftrespectively. (b) lanes 5 and 16, size size markers; lanes 2-13, samples A-F, Figure and with RepD assay for RepD. (a) lanes 1 and 14, markers; lanes 1 and 2, sample G, with and without RepD respectively; lanes 3 lanes 5sample H, without and with RepD respectively; lanes 6without and with RepD respectively. (b) and 4, and 16, size markers; lanes 1 and 2, sample G, with 15, samplesRepDM, O and P, without 3 and 4, sample H, without and with RepD respectively; lanes 6and without J, L, respectively; lanes and with RepD respectively. (c) lanes 1 and 12, size markers; lanes 2-9, samples Q, S, T and U, without with with RepD respectively; lanes 10 and 11, positive 15, samples J, L, M, O and P, without and and RepD respectively. (c) lanes 1 and 12, size markers; control sequence from S, T and U, without and with RepD respectively; lanes 10 and 11, lanes 1 and lanes 2-9, samples Q, pCER19 containing oriD, without and with RepD respectively. (d) positive 14, sizesequence from pCER19 containing oriD, without andand with RepD respectively; lanes 1 and control markers; lanes 2-9, samples B, D, F and P, without with RepD respectively. (d) lanes 10-11, positive control sequence from pCER19 D, F and P, oriD, without with RepD respectively; lanes 10-11, 14, size markers; lanes 2-9, samples B, containing without and and with RepD respectively; lanes 12-13, negative sequence from pCER19 containing oriD,oriD, with and with RepD respectively; lane positive control control sequence from pCER19 without without and with RepD respectively; lanes 15, negative control sequence from pCER19 without oriD, with BstRep;with RepD respectively; (no 12-13, negative control sequence from pCER19 without oriD, with and lane 16, BstRep alone lane DNA). 15, negative control sequence from pCER19 without oriD, with BstRep; lane 16, BstRep alone (no DNA). 19 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 Discussion Plasmids are a major vector for the spread of antibiotic resistance within pathogenic bacteria. Emerging bacterial strains such as methicillin-resistant Staphylococcus aureus (MRSA) are a growing clinical problem. Structural studies of the methods of replication and conjugation of resistance plasmids are of potential interest as ways of inhibiting the proliferation of such strains are sought. BstRep is an interesting target for structural determination due to its thermophilic nature. If the structure of BstRep can be elucidated it may shed light on the action of Rep proteins in general. Without confirmed protein binding, the DNA sequences presented here are meaningless. However, assuming that binding is nonetheless occurring, it is possible to attempt to make some infererences. The BstRep experiments show no clear consensus sequence and a very puzzling orientation correlation with respect to the primers (perhaps BstRep is binding to the primers?). The RepD experiments show a slightly clearer consensus sequence but not one that matches well with the known binding site of RepD. Obvious further work would include making the mobility shift assay work. Further further work could include mutagenesis of pSTK1 to determine the nature of the BstRep binding site. Conclusion SELEX produced some DNA sequences, but their meaning is deeply unclear. Mobility shift assays were inconclusive, thus it is still unknown whether these sequences are actually bound by their respective proteins. 20 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 References BUNKA, D.H.J., STOCKLEY, P.G., 2006. Aptamers come of age - at last. Nature Reviews Microbiology, 4, 588-596. KHAN, S. A. 2005. Plasmid rolling-circle replication: highlights of two decades of research. Plasmid, 53, 126-136. NAKAYAMA, N., NARUMI, I., NAKAMOTO, S. & KIHARA, H. 1993. Complete nucleotide sequence of pSTK1, a cryptic plasmid from Bacillus stearothermophilus TK015. Biotechnology Letters, 15, 1013-1016. NARUMI, I., NAKAYAMA, N., NAKAMOTO, S., KIMURA, T., YANAGISAWA, T. & KIHARA, H. 1993. Construction of a new shuttle vector pSTE33 and its stabilities in Bacillus stearothermophilus, Bacillus subtilis, and Escherichia coli. Biotechnology Letters, 15, 815-820. STOLTENBURG, R., REINEMANN, C. & STREHLITZ, B. 2007. SELEX--A (r)evolutionary method to generate high-affinity nucleic acid ligands. Biomolecular Engineering, 24, 381-403. THOMAS, C. D., NIKIFOROV, T. T., CONNOLLY, B. A. & SHAW, W. V. 1995. Determination of Sequence Specificity between a Plasmid Replication Initiator Protein and the Origin of Replication. Journal of Molecular Biology, 254, 381-391. THOMAS, C. D., LYNCH, G. P. 2009. Unpublished observations. Appendix A Sequence of BstRep (363aa; N-terminal tag underlined, “Y191F” mutation highlighted in bold): 1 MGSSHHHHHH SSGLVPRGSH MSGLKPCVDW LQVTFKTGQD SVKKCVEKLE KVFEILGLNE 61 AEFLPLKNGK YGYKQGVAFQ GNPVLAVYYD GADDMGIHVE MTGQGCRLFE LHTSINWYEL 121 FYRLVYEYEV NITRLDVAVD DFKGYFKINT LVKKLKDDEV TSRFKKARHI ENIVIEGGET 181 IGHTLYFGAP SSDIQVRFFE KNVQMGMDID VWNRTEIQLR DDRAHVVAQI IADDVLPLGE 241 IVAGLLRNYI QFRTRKATDK NKKRWPLARF WLNFLGDVQP LRIAKQMPKT SIEKKYRWID 301 SQVSKSFFMI YYCLNEEEKQ RFIDDVLAEG ASKLTKADLQ VINQFKSKNI TYDEMIKIIR 361 QSK Predicted MW is 42085.40 Da. Appendix B Source code to the brute-force consensus searching (“consensus.pl”), window overlap determination (“window.pl”), and other custom software used in this study are available http://sphere.chronosempire.org.uk/~HEx/bioc3160/, as well as being reproduced (in part) below. 21 at Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 #!/usr/bin/perl -wl use strict; use Bio::SeqIO; use Getopt::Long; use Math::Combinatorics; my $format = 'fasta'; GetOptions('f|format:s' => \$format); my @bases=qw[A C G T N]; my @tests; use constant MAXRESULTS => 100; use Inline C=> Config => CCFLAGS => '-O3'; use Inline C => <<'EOF'; int run_test (char *test, char *seq) { int len=strlen(test); int maxscore=0; int seqlen=strlen(seq); while (*seq) { int score=0; int n; for (n=0; n maxscore) maxscore = score; seq++; seqlen--; } } EOF return maxscore; my $len=shift; # total bases specified (4^n combinations) my $size=shift; # space in which the bases occur sub num2seq { my $a; my $num=$_[0]; my $numns=0; for (0..$len-1) { my $n=$num % 4; $a.=$bases[$n]; $num/=4; } $a; } my %seq; my %revseq; foreach my $f (@ARGV) { my $in = new Bio::SeqIO(-file => $f, -format => $format); while (my $seq = $in->next_seq) { my $name=$seq->primary_id; my $s=$seq->primary_seq->seq; $seq{$name}=$s; my $rev = $seq->revcom (); my $r=$seq->primary_seq->seq; $revseq{$name}=$r; } } my (%results,%results2); my (@results, @results2); 22 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 my $num=4**$len; my $tests=0; my $m=Math::Combinatorics->new(count => $len-1, data => [1..$size-1]); while (my @combo=$m->next_combination) { $tests++; for (0..$num-1) { my @test=('N') x $size; @test[0,@combo] = split//, num2seq($_); my $test=join"", @test; my ($r1,$r2) = test($test); if (!@results || $r1 > $results{$results[0]}) { $results{$test}=$r1; push @results, $test; @results = sort { $results{$a}<=>$results{$b} } @results; if (@results > MAXRESULTS) { delete $results{shift @results}; } } } if (!@results2 || $r2 > $results{$results[1]}) { $results2{$test}=$r2; push @results2, $test; @results2 = sort { $results2{$a}<=>$results2{$b} } @results2; if (@results2 > MAXRESULTS) { delete $results2{shift @results2}; } } } print "Len: $len size: $size"; print "Total tests: ". $tests*$num; print print print print "Additive scoring (top ".MAXRESULTS."):"; map {"$_\t$results{$_}\n"} reverse @results; "Exponential scoring (top ".MAXRESULTS."):"; map {"$_\t$results2{$_}\n"} reverse @results2; sub test { my $test=shift; my $score1; my $score2; for my $name (keys %seq) { my ($seq,$rev)=($seq{$name}, $revseq{$name}); my $sscore=run_test ($test, $seq); my $rscore=run_test ($test, $seq); $sscore=$rscore if $rscore>$sscore; $score1+=$sscore; $sscore=1<<$sscore; $score2+=$sscore; } } return ($score1,$score2); Listing 1: consensus.pl. Note that this code is naïve and egregiously inefficient (running time for the results shown in the text was approximately 24 hours). 23 Plasmid replication in thermophilic bacteria Tom Hargreaves, February 2010 List of COSHH forms signed [this page intentionally left blank] 24