A Scheme implementation of the DNA to Protein conversion program given in Python 3 at: https://www.geeksforgeeks.org/dna-protein-python-3/
The program creates two functions: one to read a sequence of text from a file, returning all the contents as one string without line breaks; and a second to convert each DNA triple in the string into a single letter protein.
For the first function, we use read-line
, which reads all the text on a
line without line breaks, so our read-sequence-file
function can
append all the lines together to make a complete string.
For the second function, we use an association list for the table: we could use a hash-table or map, but the association list is built-in to R7RS-small and the example is small enough that efficiency does not matter.
Final Program
(import (scheme base)
(scheme file)
(scheme read)
(scheme write))
(define (read-sequence-file filename)
(with-input-from-file ;
filename
(lambda ()
(do ((line (read-line) (read-line)) ;
(sequence "" (string-append sequence line))) ;
((eof-object? line) sequence))))) ;
(define (translate sequence)
(let ((table ;
'(("ATA" . #\I) ("ATC" . #\I) ("ATT" . #\I) ("ATG" . #\M)
("ACA" . #\T) ("ACC" . #\T) ("ACG" . #\T) ("ACT" . #\T)
("AAC" . #\N) ("AAT" . #\N) ("AAA" . #\K) ("AAG" . #\K)
("AGC" . #\S) ("AGT" . #\S) ("AGA" . #\R) ("AGG" . #\R)
("CTA" . #\L) ("CTC" . #\L) ("CTG" . #\L) ("CTT" . #\L)
("CCA" . #\P) ("CCC" . #\P) ("CCG" . #\P) ("CCT" . #\P)
("CAC" . #\H) ("CAT" . #\H) ("CAA" . #\Q) ("CAG" . #\Q)
("CGA" . #\R) ("CGC" . #\R) ("CGG" . #\R) ("CGT" . #\R)
("GTA" . #\V) ("GTC" . #\V) ("GTG" . #\V) ("GTT" . #\V)
("GCA" . #\A) ("GCC" . #\A) ("GCG" . #\A) ("GCT" . #\A)
("GAC" . #\D) ("GAT" . #\D) ("GAA" . #\E) ("GAG" . #\E)
("GGA" . #\G) ("GGC" . #\G) ("GGG" . #\G) ("GGT" . #\G)
("TCA" . #\S) ("TCC" . #\S) ("TCG" . #\S) ("TCT" . #\S)
("TTC" . #\F) ("TTT" . #\F) ("TTA" . #\L) ("TTG" . #\L)
("TAC" . #\Y) ("TAT" . #\Y) ("TAA" . #\_) ("TAG" . #\_)
("TGC" . #\C) ("TGT" . #\C) ("TGA" . #\_) ("TGG" . #\W))))
(do ((i 0 (+ i 3)) ;
(result '() (cons (cdr (assoc (substring sequence i (+ i 3))
table)) ;
result)))
((>= i (string-length sequence)) ;
(list->string (reverse result))))))
(let* ((dna-sequence (read-sequence-file "dna_sequence.txt"))
(protein-sequence (translate dna-sequence))
(target-sequence (read-sequence-file "amino_acid_sequence.txt")))
(display "Comparing translated with target: ")
(display (equal? protein-sequence target-sequence))
(newline))
![]() |
Opens the given file as an input port |
![]() |
Reads each line from the current input port as a string |
![]() |
Joining the strings together in turn |
![]() |
… until the end of file is reached, when the read sequence is returned. |
![]() |
The conversion table is stored in an association list. |
![]() |
An index takes us through the string, one triple at a time |
![]() |
… looking up each codon in turn, and recording the protein letter. |
![]() |
When the string is fully processed, turn the sequence of letters into a string to return. |