Skip to content

Commit 547c168

Browse files
committed
new extra
1 parent 5d194c6 commit 547c168

File tree

7 files changed

+280
-0
lines changed

7 files changed

+280
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ tex2pdf*
1010
.log
1111
.coverage
1212
.idea
13+
.vscode

extra/08_rna/Makefile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
.PHONY: test pdf clean
2+
3+
pdf:
4+
asciidoctor-pdf README.adoc
5+
6+
test:
7+
pytest -xv test.py
8+
9+
clean:
10+
rm -rf __pycache__

extra/08_rna/README.adoc

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Transcribing DNA into RNA
2+
3+
For this exercise, we'll be applying what we learned about modifying strings to a variation on this Rosalind exercise that transcribes DNA into RNA:
4+
5+
http://rosalind.info/problems/rna/
6+
7+
You will write a Python program called `transcribe.py` that will accept:
8+
9+
* One or more positional arguments which must be readable files
10+
* An optional `-o` or `--outdir` argument that names an output directory (default `'out'`)
11+
12+
You can use the `os.path.isdir` to check if the output directory exists.
13+
It works just like the `os.path.isfile` function we've used that will return `True` or `False` if a given string names an existing file, only this checks for a directory.
14+
Here assuming that "blargh" does not exist on your system:
15+
16+
----
17+
>>> import os
18+
>>> os.path.isdir('blargh')
19+
False
20+
----
21+
22+
If the directory does not exist, you should use the `os.makedirs` function to create it.
23+
Here is a bit of code you can put into your program:
24+
25+
----
26+
if not os.path.isdir(out_dir):
27+
os.makedirs(out_dir)
28+
----
29+
30+
Your program will read each of the input files which will contain a single DNA sequence on each line.
31+
The sequences will need to replace the `T` bases with `U`.
32+
For instance, the `input1.txt` file contains a single sequence `'GATGGAACTTGACTACGTAAATT'` which will become `'GAUGGAACUUGACUACGUAAAUU'`.
33+
34+
The new sequences from each input file will be written to a new output file in the `--outdir`.
35+
The name of the file will be the "basename" of the input file which you can get by using the `os.path.basename` function.
36+
For instance, the "basename" of `'./inputs/input1.txt'` is `'input1.txt'`:
37+
38+
----
39+
>>> base = os.path.basename('./inputs/input1.txt')
40+
>>> base
41+
'input1.txt'
42+
----
43+
44+
If the output directory is `'out'`, you can create a new path for the output file by using the `os.path.join` function with the basename of the input file's basename:
45+
46+
----
47+
>>> out_dir = 'out'
48+
>>> os.path.join(out_dir, base)
49+
'out/input1.txt'
50+
----
51+
52+
If you declare your `args.file` parameter using `type=argparse.FileType('r')`, then you'll be iterating over a list of _open file handles_.
53+
You can use the `fh.name` to get the name of the file:
54+
55+
----
56+
for fh in args.file:
57+
out_file = os.path.join(out_dir, os.path.basename(fh.name))
58+
out_fh = open(out_file, 'wt')
59+
----
60+
61+
You will have two levels of iteration:
62+
63+
* Each `file` argument
64+
* Each line in each file
65+
66+
You will need to `open` the output file for writing text, iterate over each line in the input file, and print the transcribed sequences to the output file.
67+
68+
Your program should print a brief usage when given no arguments:
69+
70+
----
71+
$ ./transcribe.py
72+
usage: transcribe.py [-h] [-o DIR] FILE [FILE ...]
73+
transcribe.py: error: the following arguments are required: FILE
74+
----
75+
76+
And a longer usage for `-h` and `--help`:
77+
78+
----
79+
$ ./transcribe.py -h
80+
usage: transcribe.py [-h] [-o DIR] FILE [FILE ...]
81+
82+
Transcribing DNA into RNA
83+
84+
positional arguments:
85+
FILE Input file(s)
86+
87+
optional arguments:
88+
-h, --help show this help message and exit
89+
-o DIR, --outdir DIR Output directory (default: out)
90+
----
91+
92+
The output from the program should summarize how many sequences and files were processed.
93+
For example, the `input1.txt` file contains a single line/sequence, so the result should be this:
94+
95+
----
96+
$ ./transcribe.py inputs/input1.txt
97+
Done, wrote 1 sequence in 1 file to directory "out".
98+
----
99+
100+
While the `input2.txt` file contains two lines/sequences:
101+
102+
----
103+
$ ./transcribe.py inputs/input2.txt
104+
Done, wrote 2 sequences in 1 file to directory "out".
105+
----
106+
107+
When you process both together, it should summarize for all the inputs:
108+
109+
----
110+
$ ./transcribe.py inputs/*
111+
Done, wrote 3 sequences in 2 files to directory "out".
112+
----
113+
114+
Note that you must use the correct singular/plural for both "sequence(s)" and "file(s)."
115+
116+
Many elements of this program are almost identical to the `wc.py` program, so I would recommend you revisit that.
117+
118+
A passing test suite looks like this:
119+
120+
----
121+
$ make test
122+
pytest -xv test.py
123+
============================= test session starts ==============================
124+
...
125+
collected 7 items
126+
127+
test.py::test_exists PASSED [ 14%]
128+
test.py::test_usage PASSED [ 28%]
129+
test.py::test_no_args PASSED [ 42%]
130+
test.py::test_bad_file PASSED [ 57%]
131+
test.py::test_good_input1 PASSED [ 71%]
132+
test.py::test_good_input2 PASSED [ 85%]
133+
test.py::test_good_multiple_inputs PASSED [100%]
134+
135+
============================== 7 passed in 0.36s ===============================
136+
----

extra/08_rna/README.pdf

70.9 KB
Binary file not shown.

extra/08_rna/inputs/input1.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
GATGGAACTTGACTACGTAAATT

extra/08_rna/inputs/input2.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
CTTAGGTCAGTGGTCTCTAAACTTTCGGTTCTGTCGTCTTCATAGGCAAATTTTTGAACCGGCAGACAAGCTAATCCCTGTGCGGTTAGCTCAAGCAACAGAATGTCCGATCTTTGAACTTCCTAACGAACCGAACCTACTATAATTACATACGAATAATGTATGGGCTAGCGTTGGCTCATCATCAAGTCTGCGGTGAAATGGGAACATATTCGCATTGCATATAGGGCGTATCTGACGATCGATTCGAGTTGGCTAGTCGTACCAAATGATTATGGGCTGGAGGGCCAATGTATACGTCAGCCAGGCTAAACCACTGGACCGCTTGCAATCCATAGGAAGTAAAATTACCCTTTTTAAACTCTCTAAGATGTGGCGTCTCGTTCTTAAGGAGTAATGAGACTGTGACAACATTGGCAAGCACAGCCTCAGTATAGCTACAGCACCGGTGCTAATAGTAAATGCAAACACCGTTTCAAGAGCCGAGCCTTTTTTTAATGCAAGGTGACTTCAGAGGGAGTAAATCGTGGCCGGGGACTGTCCAGAGCAATGCATTCCCGAGTGCGGGTACCCGTGGTGTGAGAGGAATCGATTTCGCGTGTGATACCATTAATGGTCCTGTACTACTGTCAGTCAGCTTGATTTGAAGTCGGCCGACAAGGTTGGTACATAATGGGCTTACTGGGAGCTTAGGTTAGCCTCTGGAAAACTTTAGAATTTATATGGGTGTTTCTGTGTTCGTACAGGCCCCAGTCGGGCCATCGTTGTTGAGCATAGACCGGTGTAACCTTAATTATTCACAGGCCAATCCCCGTATACGCATCTGAAAGGCACACCGCCTATTACCAATTTGCGCTTCCTTACATAGGAGGACCTGTTATCGTCTTCTCAATCGCTGAGTTACCTTAAAACTAGGATC
2+
ACCGAGTAAAAGGCGACGGTTCGTTTCCGAACCTATTTGCTCTTATTTCTACGGGCTGCTAGTGTTGTAGGCTGCAAAACCTACGTAGTCCCATCTATCATGCTCGACCCTACGAGGCTAATGTCTTGTCAGAGGCCCGTCATGTGCCACGTACATACACCAATGTATACCGCTCTAGCGGTTTGGTGTAGTAGGACTTGTGTATGCACGCTACAGCGAACAACGTTGATCCCTAACTGAAGTCGGGCTCCGCAGGCCTACTCACGCCGTTTCTATAGGTTGAGCCGCATCAAACATTGGGTTGAGTCTCGAGTATAGAGGAAGGCTCTGGTGGCAGGCGCGACGTTGATCGGGAGGAGTATGGATGGTGATCAATCCCCGTGCCAATCGCGAGTACTACAGGAGGAGGGGGCGGCTCTGTTCAATCATCACCCGTTCCATCACACGGGCAGCACAGTTGACCTCCCGAGCCGTCTCACGGACCTAGTGGCAACAGGTGTATTGAAGCGCCGGGAATAGTCATACCCGTGGGCTTGATTGAGAGACCGAAATTCCGACCGCCAAAACTGCTGATATCGTACGCCTTACTACAAAACAAATGACGTCACTACCGGCCAGGGACAAGCTTATTAATTAAGTAGGAACCCTATACCTTGCACATCCTAAATCTAGCAGCGGGTCCAGGATTGGTTCCAGTCCAACGCGCGATGCGCGTCAAGCTAGGCGAATGACCACGGTCGAAACACCACTTATGTGACCCACCTTGGCCAACTCTCCCGATTCTCCTCGCTACTATCTTGAAGGTCACTGAGAATATCCCTTATGGGTCGCATACGGAGACAGCCGCAGGAGCCTTAACGGAGAATACGCCAATACTATGTTCTGGGTCGGTGGGTGTAATGCGATGCAATCCGATCGTGCGAACGTTCCCTTTGATGACTATAGGGTCTAGTGATCGTACATGTGC

extra/08_rna/test.py

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
#!/usr/bin/env python3
2+
"""tests for transcribe.py"""
3+
4+
from subprocess import getstatusoutput
5+
import os.path
6+
import re
7+
import string
8+
import random
9+
from shutil import rmtree
10+
11+
prg = './transcribe.py'
12+
input1 = './inputs/input1.txt'
13+
input2 = './inputs/input2.txt'
14+
15+
16+
# --------------------------------------------------
17+
def random_filename():
18+
"""generate a random filename"""
19+
20+
return ''.join(random.choices(string.ascii_uppercase + string.digits, k=5))
21+
22+
23+
# --------------------------------------------------
24+
def test_exists():
25+
"""usage"""
26+
27+
assert os.path.isfile(prg)
28+
29+
30+
# --------------------------------------------------
31+
def test_usage():
32+
"""usage"""
33+
34+
for flag in ['-h', '--help']:
35+
rv, out = getstatusoutput('{} {}'.format(prg, flag))
36+
assert rv == 0
37+
assert re.match("usage", out, re.IGNORECASE)
38+
39+
40+
# --------------------------------------------------
41+
def test_no_args():
42+
"""die on no args"""
43+
44+
rv, out = getstatusoutput(prg)
45+
assert rv != 0
46+
assert re.match("usage", out, re.IGNORECASE)
47+
48+
49+
# --------------------------------------------------
50+
def test_bad_file():
51+
"""die on missing input"""
52+
53+
bad = random_filename()
54+
rv, out = getstatusoutput(f'{prg} {bad}')
55+
assert rv != 0
56+
assert re.match('usage:', out, re.I)
57+
assert re.search(f"No such file or directory: '{bad}'", out)
58+
59+
60+
# --------------------------------------------------
61+
def test_good_input1():
62+
"""runs on good input"""
63+
64+
out_dir = 'out'
65+
try:
66+
if os.path.isdir(out_dir):
67+
rmtree(out_dir)
68+
69+
rv, out = getstatusoutput(f'{prg} {input1}')
70+
assert rv == 0
71+
assert out == 'Done, wrote 1 sequence in 1 file to directory "out".'
72+
assert os.path.isdir(out_dir)
73+
out_file = os.path.join(out_dir, 'input1.txt')
74+
assert os.path.isfile(out_file)
75+
assert open(out_file).read().rstrip() == 'GAUGGAACUUGACUACGUAAAUU'
76+
77+
finally:
78+
if os.path.isdir(out_dir):
79+
rmtree(out_dir)
80+
81+
# --------------------------------------------------
82+
def test_good_input2():
83+
"""runs on good input"""
84+
85+
out_dir = random_filename()
86+
try:
87+
if os.path.isdir(out_dir):
88+
rmtree(out_dir)
89+
90+
rv, out = getstatusoutput(f'{prg} -o {out_dir} {input2}')
91+
assert rv == 0
92+
assert out == f'Done, wrote 2 sequences in 1 file to directory "{out_dir}".'
93+
assert os.path.isdir(out_dir)
94+
out_file = os.path.join(out_dir, 'input2.txt')
95+
assert os.path.isfile(out_file)
96+
assert open(out_file).read().rstrip() == output2().rstrip()
97+
98+
finally:
99+
if os.path.isdir(out_dir):
100+
rmtree(out_dir)
101+
102+
# --------------------------------------------------
103+
def test_good_multiple_inputs():
104+
"""runs on good input"""
105+
106+
out_dir = random_filename()
107+
try:
108+
if os.path.isdir(out_dir):
109+
rmtree(out_dir)
110+
111+
rv, out = getstatusoutput(f'{prg} --outdir {out_dir} {input1} {input2}')
112+
assert rv == 0
113+
assert out == f'Done, wrote 3 sequences in 2 files to directory "{out_dir}".'
114+
assert os.path.isdir(out_dir)
115+
out_file1 = os.path.join(out_dir, 'input1.txt')
116+
out_file2 = os.path.join(out_dir, 'input2.txt')
117+
assert os.path.isfile(out_file1)
118+
assert os.path.isfile(out_file2)
119+
assert open(out_file1).read().rstrip() == 'GAUGGAACUUGACUACGUAAAUU'
120+
assert open(out_file2).read().rstrip() == output2().rstrip()
121+
122+
finally:
123+
if os.path.isdir(out_dir):
124+
rmtree(out_dir)
125+
126+
# --------------------------------------------------
127+
def output2():
128+
return """CUUAGGUCAGUGGUCUCUAAACUUUCGGUUCUGUCGUCUUCAUAGGCAAAUUUUUGAACCGGCAGACAAGCUAAUCCCUGUGCGGUUAGCUCAAGCAACAGAAUGUCCGAUCUUUGAACUUCCUAACGAACCGAACCUACUAUAAUUACAUACGAAUAAUGUAUGGGCUAGCGUUGGCUCAUCAUCAAGUCUGCGGUGAAAUGGGAACAUAUUCGCAUUGCAUAUAGGGCGUAUCUGACGAUCGAUUCGAGUUGGCUAGUCGUACCAAAUGAUUAUGGGCUGGAGGGCCAAUGUAUACGUCAGCCAGGCUAAACCACUGGACCGCUUGCAAUCCAUAGGAAGUAAAAUUACCCUUUUUAAACUCUCUAAGAUGUGGCGUCUCGUUCUUAAGGAGUAAUGAGACUGUGACAACAUUGGCAAGCACAGCCUCAGUAUAGCUACAGCACCGGUGCUAAUAGUAAAUGCAAACACCGUUUCAAGAGCCGAGCCUUUUUUUAAUGCAAGGUGACUUCAGAGGGAGUAAAUCGUGGCCGGGGACUGUCCAGAGCAAUGCAUUCCCGAGUGCGGGUACCCGUGGUGUGAGAGGAAUCGAUUUCGCGUGUGAUACCAUUAAUGGUCCUGUACUACUGUCAGUCAGCUUGAUUUGAAGUCGGCCGACAAGGUUGGUACAUAAUGGGCUUACUGGGAGCUUAGGUUAGCCUCUGGAAAACUUUAGAAUUUAUAUGGGUGUUUCUGUGUUCGUACAGGCCCCAGUCGGGCCAUCGUUGUUGAGCAUAGACCGGUGUAACCUUAAUUAUUCACAGGCCAAUCCCCGUAUACGCAUCUGAAAGGCACACCGCCUAUUACCAAUUUGCGCUUCCUUACAUAGGAGGACCUGUUAUCGUCUUCUCAAUCGCUGAGUUACCUUAAAACUAGGAUC
129+
ACCGAGUAAAAGGCGACGGUUCGUUUCCGAACCUAUUUGCUCUUAUUUCUACGGGCUGCUAGUGUUGUAGGCUGCAAAACCUACGUAGUCCCAUCUAUCAUGCUCGACCCUACGAGGCUAAUGUCUUGUCAGAGGCCCGUCAUGUGCCACGUACAUACACCAAUGUAUACCGCUCUAGCGGUUUGGUGUAGUAGGACUUGUGUAUGCACGCUACAGCGAACAACGUUGAUCCCUAACUGAAGUCGGGCUCCGCAGGCCUACUCACGCCGUUUCUAUAGGUUGAGCCGCAUCAAACAUUGGGUUGAGUCUCGAGUAUAGAGGAAGGCUCUGGUGGCAGGCGCGACGUUGAUCGGGAGGAGUAUGGAUGGUGAUCAAUCCCCGUGCCAAUCGCGAGUACUACAGGAGGAGGGGGCGGCUCUGUUCAAUCAUCACCCGUUCCAUCACACGGGCAGCACAGUUGACCUCCCGAGCCGUCUCACGGACCUAGUGGCAACAGGUGUAUUGAAGCGCCGGGAAUAGUCAUACCCGUGGGCUUGAUUGAGAGACCGAAAUUCCGACCGCCAAAACUGCUGAUAUCGUACGCCUUACUACAAAACAAAUGACGUCACUACCGGCCAGGGACAAGCUUAUUAAUUAAGUAGGAACCCUAUACCUUGCACAUCCUAAAUCUAGCAGCGGGUCCAGGAUUGGUUCCAGUCCAACGCGCGAUGCGCGUCAAGCUAGGCGAAUGACCACGGUCGAAACACCACUUAUGUGACCCACCUUGGCCAACUCUCCCGAUUCUCCUCGCUACUAUCUUGAAGGUCACUGAGAAUAUCCCUUAUGGGUCGCAUACGGAGACAGCCGCAGGAGCCUUAACGGAGAAUACGCCAAUACUAUGUUCUGGGUCGGUGGGUGUAAUGCGAUGCAAUCCGAUCGUGCGAACGUUCCCUUUGAUGACUAUAGGGUCUAGUGAUCGUACAUGUGC
130+
"""

0 commit comments

Comments
 (0)