Archive
PyLaTeX: Python + LaTeX
“PyLaTeX is a Python library for creating LaTeX files. The goal of this library is being an easy, but extensible interface between Python and LaTeX.”
I haven’t tried it yet but since I work with LaTeX a lot, it can be interesting in the future. If you need to generate a PDF report for instance, it can be a good way to go.
Generating pseudo random text using Markov chains
The following entry is based on this post: Generating pseudo random text with Markov chains using Python (by Shabda Raaj).
Problem
I’ve been interested for a long time in generating “random” texts using a given corpus. A naive way is to take words randomly and drop them together but it would result in an unreadable text. The words in the generated text should come in an order that gives the impression that the text is more or less legit :)
Solution
We will use Markov chains to solve this problem. In short, a Markov chain is a stochastic process with the Markov property. By this property the changes of state of the system depend only on the current state of the system, and not additionally on the state of the system at previous steps.
The algorithm for generating pseudo random text is the following:
- Take two consecutive words from the corpus. We will build a chain of words and the last two words of the chain represent the current state of the Markov chain.
- Look up in the corpus all the occurrences of the last two words (current state). If they appear more than once, select one of them randomly and add the word that follows them to the end of the chain. Now the current state is updated: it consists of the 2nd word of the former tail of the chain and the new word.
- Repeat the previous step until you reach the desired length of the generated text.
When reading and splitting up a corpus to words, don’t remove commas, punctuations, etc. This way you can get a more realistic text.
Example
Let’s see this text:
A is the father of B. C is the father of A.
From this we can build the following dictionary:
{('A', 'is'): ['the'],
('B.', 'C'): ['is'],
('C', 'is'): ['the'],
('father', 'of'): ['B.', 'A.'],
('is', 'the'): ['father', 'father'],
('of', 'B.'): ['C'],
('the', 'father'): ['of', 'of']}
The key is a tuple of two consecutive words. The value is a list of words that follow the two words in the key in the corpus. The value is a multiset, i.e. duplications are allowed. This way, if a word appears several times after the key, it will be selected with a higher probability.
Let’s start the generated sentence with “A is“. “A is” is followed by “the” (“A is the“). “is the” is followed by “father” (“A is the father“). “the father” is followed by “of” (“A is the father of“). At “father of” we have a choice: let’s pick “A” for instance. The end result is: “A is the father of A.“.
Python code
This is a basic version of the algorithm. Since the input corpus can be a UTF-8 file, I wrote it in Python 3 to suffer less with Unicode.
#!/usr/bin/env python3
# encoding: utf-8
import sys
from pprint import pprint
from random import choice
EOS = ['.', '?', '!']
def build_dict(words):
"""
Build a dictionary from the words.
(word1, word2) => [w1, w2, ...] # key: tuple; value: list
"""
d = {}
for i, word in enumerate(words):
try:
first, second, third = words[i], words[i+1], words[i+2]
except IndexError:
break
key = (first, second)
if key not in d:
d[key] = []
#
d[key].append(third)
return d
def generate_sentence(d):
li = [key for key in d.keys() if key[0][0].isupper()]
key = choice(li)
li = []
first, second = key
li.append(first)
li.append(second)
while True:
try:
third = choice(d[key])
except KeyError:
break
li.append(third)
if third[-1] in EOS:
break
# else
key = (second, third)
first, second = key
return ' '.join(li)
def main():
fname = sys.argv[1]
with open(fname, "rt", encoding="utf-8") as f:
text = f.read()
words = text.split()
d = build_dict(words)
pprint(d)
print()
sent = generate_sentence(d)
print(sent)
if sent in text:
print('# existing sentence :(')
####################
if __name__ == "__main__":
if len(sys.argv) == 1:
print("Error: provide an input corpus file.")
sys.exit(1)
# else
main()
Tips
Try to choose a long corpus to work with.
In our version the current state consists of two words. If you decide to put more words (3 for instance) in the current state, then the text will look less random, but also, it will look less gibberish (see also this gist).
Links
pythonium: a Python to JavaScript translator
Pythonium is a Python 3 to Javascript translator written in Python that produce fast portable JavaScript code.
Example:
$ echo "for i in range(10): print(i)" >> loop.py
$ pythonium -V loop.py
var iterator_i = range(10);
for (var i_iterator_index=0; i_iterator_index < iterator_i.length; i_iterator_index++) {
var i = iterator_i[i_iterator_index];
console.log(i);
}
I haven’t tried it yet, so this post is a reminder for me to check it out.
What is a BDFL?
Python news in French
I just came across the site http://news.humancoders.com which is a news collector in French. Users can submit and discuss news here. It has a subpage dedicated to Python.
Human Coders News est un service permettant de partager les meilleures ressources trouvées sur la toile à propos d’un thème précis. Vous pouvez consulter l’ensemble des news sur la page d’accueil, ou bien, cliquer sur un sujet pour filtrer.
monkeypatching the string type
Problem
“A monkey patch is a way to extend or modify the run-time code of dynamic languages without altering the original source code.” (via wikipedia) That is, we have the standard library, and we want to add new features to it. For instance, in the stdlib a string cannot tell whether it is a palindrome or not, but we would like to extend the string type to support this feature:
>>> s = "racecar" >>> print(s.is_palindrome()) # Warning! It won't work. True
Is it possible in Python?
Solution
As pointed out in this thread, built-in types are implemented in C and you cannot modify them in runtime. As I heard Ruby allows this, but it doesn’t work in Python.
However, there is a workaround if you really want to do something like this. You can make a subclass of the built-in type and then you can extend it as you want. Example:
from __future__ import (absolute_import, division,
print_function, unicode_literals)
class MyStr(unicode):
"""
"monkeypatching" the unicode class
It's not real monkeypatching, just a workaround.
"""
def is_palindrome(self):
return self == self[::-1]
def main():
s = MyStr("radar")
print(s.is_palindrome())
####################
if __name__ == "__main__":
main()
Capture the exit code, the stdout, and the stderr of an external command
Update (20140110): The get_exitcode_stdout_stderr function was improved. Thanks to @Rhomboid for the tip. You will find the old version at the end of the post.
Problem
You have an arbitrary program that you want to execute as an external command, i.e. you have a wrapper around it. In the wrapper script you want to get the exit code, the stdout, and the stderr of the executed program.
Solution
Let the external command be this simple C program saved as test.c:
#include <stdio.h>
int main()
{
printf("go 2 stdout\n");
fprintf(stderr, "go 2 stderr\n");
return 3;
}
Compile and run it:
$ gcc test.c $ ./a.out go 2 stdout go 2 stderr
Note that this external command can be anything, written in any language.
After executing it, we pose the following questions:
- What was its exit code?
- What did it send to the stdout?
- What did it send to the stderr?
Here is a wrapper that can capture all these three things. Thanks to @Rhomboid for his tip on improving the get_exitcode_stdout_stderr function.
#!/usr/bin/env python
# encoding: utf-8
from __future__ import (absolute_import, division,
print_function, unicode_literals)
import sys
import shlex
from subprocess import Popen, PIPE
def frame(text):
"""
Put the text in a pretty frame.
"""
result = """
+{h}+
|{t}|
+{h}+
""".format(h='-' * len(text), t=text).strip()
return result
def get_exitcode_stdout_stderr(cmd):
"""
Execute the external command and get its exitcode, stdout and stderr.
"""
args = shlex.split(cmd)
proc = Popen(args, stdout=PIPE, stderr=PIPE)
out, err = proc.communicate()
exitcode = proc.returncode
#
return exitcode, out, err
def main(params):
cmd = ' '.join(params)
exitcode, out, err = get_exitcode_stdout_stderr(cmd)
print(frame("EXIT CODE"))
print(exitcode)
print(frame("STDOUT"))
print(out)
print(frame("STDERR"))
print(err)
####################
if __name__ == "__main__":
if len(sys.argv) < 2:
print('Usage: {} <external_command>'.format(sys.argv[0]))
sys.exit(1)
# else
main(sys.argv[1:])
Running example
$ ./wrapper.py ./a.out +---------+ |EXIT CODE| +---------+ 3 +------+ |STDOUT| +------+ go 2 stdout +------+ |STDERR| +------+ go 2 stderr
The script wrapper.py can be called in various ways:
$ ./wrapper.py python print.py # pass the command in several pieces ... $ ./wrapper.py "python print.py" # you can pass the command as one argument ...
Fine, but what is it good for?
I’m working on an online judge whose job is to execute a program with different parameters and decide whether it’s correct or not. I want to execute the programs in a sandboxed (secure) environment and I need to get the output (and the errors) of the programs to analyze how they ran. The wrapper script above is a first step in this direction.
Update (20140110)
First I solved this problem by writing the stdout and stderr to temporary files. As it turned out, temp. files are not necessary, so the post above was updated accordingly. Here I leave my old solution:
import shlex
import tempfile
from subprocess import call
# Warning! The code above is better! This is the old version!
def get_exitcode_stdout_stderr(cmd):
"""
Execute the external command and get its exitcode, stdout and stderr.
"""
args = shlex.split(cmd)
try:
fout = tempfile.TemporaryFile()
ferr = tempfile.TemporaryFile()
exitcode = call(args, stdout=fout, stderr=ferr)
fout.seek(0)
ferr.seek(0)
out, err = fout.read(), ferr.read()
finally:
fout.close()
ferr.close()
#
return exitcode, out, err
<old>
You might be tempted to redirect the stdout and stderr in the function call(...) to a string (to a StringIO) instead of a file. Unfortunately it doesn’t work. Although a StringIO behaves like a file-like object, it’s not a file, thus it doesn’t have a fileno() method and you would get an error because of this (see this thread for instance).
So, we must redirect the outputs to files. After reading the content of these files, they can be removed, thus we use tempfiles from the standard library. When a tempfile.TemporaryFile is closed, it is removed, so we don’t need to unlink them (more info @pymotw and @docs).
</old>
Python equivalent of Java .jar files
Problem
In Java, you can distribute your project in JAR format. It is essentially a ZIP file with some metadata. The project can be launched easily:
$ java -jar project.jar
What is its Python equivalent? How to distribute a Python project (with several modules and packages) in a single file?
Solution
The following is based on this post, written by bheklilr. Thanks for the tip.
Let’s see the following project structure:
MyApp/
MyApp.py <--- Main script
alibrary/
__init__.py
alibrary.py
errors.py
anotherlib/
__init__.py
another.py
errors.py
configs/
config.json
logging.json
Rename the main script to __main__.py and compress the project to a zip file. The extension can be .egg:
myapp.egg/ <--- technically, it's just a zip file
__main__.py <--- Renamed from MyApp.py
alibrary/
__init__.py
alibrary.py
errors.py
anotherlib/
__init__.py
another.py
errors.py
configs/
config.json
logging.json
How to zip it? Enter the project directory (MyApp/) and use this command:
zip -r ../myapp.egg .
Now you can launch the .egg file just like you launch a Java .jar file:
$ python myapp.egg
You can also use command-line arguments that are passed to __main__.py.
2014: moving towards Python 3
My new year’s resolution is to move (slowly) towards Python 3. I don’t want to switch yet but I will use things in my code that will facilitate the transition in the future. Thus, from now on I will use this skeleton for my new scripts:
#!/usr/bin/env python
# encoding: utf-8
from __future__ import (absolute_import, division,
print_function, unicode_literals)
def main():
pass
####################
if __name__ == "__main__":
main()
“A future statement must appear near the top of the module. The only lines that can appear before a future statement are:
- the module docstring (if any),
- comments,
- blank lines, and
- other future statements.“
More info on the __future__ imports
Compatibility layers

You must be logged in to post a comment.