python | Python Adventures

PyLaTeX: Python + LaTeX

January 25, 2014 Jabba Laci Leave a comment

“PyLaTeX is a Python library for creating LaTeX files. The goal of this library is being an easy, but extensible interface between Python and LaTeX.”

I haven’t tried it yet but since I work with LaTeX a lot, it can be interesting in the future. If you need to generate a PDF report for instance, it can be a good way to go.

Categories: python Tags: latex, pylatex

Generating pseudo random text using Markov chains

January 23, 2014 Jabba Laci Leave a comment

The following entry is based on this post: Generating pseudo random text with Markov chains using Python (by Shabda Raaj).

Problem
I’ve been interested for a long time in generating “random” texts using a given corpus. A naive way is to take words randomly and drop them together but it would result in an unreadable text. The words in the generated text should come in an order that gives the impression that the text is more or less legit :)

Solution
We will use Markov chains to solve this problem. In short, a Markov chain is a stochastic process with the Markov property. By this property the changes of state of the system depend only on the current state of the system, and not additionally on the state of the system at previous steps.

The algorithm for generating pseudo random text is the following:

Take two consecutive words from the corpus. We will build a chain of words and the last two words of the chain represent the current state of the Markov chain.
Look up in the corpus all the occurrences of the last two words (current state). If they appear more than once, select one of them randomly and add the word that follows them to the end of the chain. Now the current state is updated: it consists of the 2nd word of the former tail of the chain and the new word.
Repeat the previous step until you reach the desired length of the generated text.

When reading and splitting up a corpus to words, don’t remove commas, punctuations, etc. This way you can get a more realistic text.

Example
Let’s see this text:

A is the father of B.
C is the father of A.

From this we can build the following dictionary:

{('A', 'is'): ['the'],
 ('B.', 'C'): ['is'],
 ('C', 'is'): ['the'],
 ('father', 'of'): ['B.', 'A.'],
 ('is', 'the'): ['father', 'father'],
 ('of', 'B.'): ['C'],
 ('the', 'father'): ['of', 'of']}

The key is a tuple of two consecutive words. The value is a list of words that follow the two words in the key in the corpus. The value is a multiset, i.e. duplications are allowed. This way, if a word appears several times after the key, it will be selected with a higher probability.

Let’s start the generated sentence with “A is“. “A is” is followed by “the” (“A is the“). “is the” is followed by “father” (“A is the father“). “the father” is followed by “of” (“A is the father of“). At “father of” we have a choice: let’s pick “A” for instance. The end result is: “A is the father of A.“.

Python code
This is a basic version of the algorithm. Since the input corpus can be a UTF-8 file, I wrote it in Python 3 to suffer less with Unicode.

#!/usr/bin/env python3
# encoding: utf-8

import sys
from pprint import pprint
from random import choice

EOS = ['.', '?', '!']


def build_dict(words):
    """
    Build a dictionary from the words.

    (word1, word2) => [w1, w2, ...]  # key: tuple; value: list
    """
    d = {}
    for i, word in enumerate(words):
        try:
            first, second, third = words[i], words[i+1], words[i+2]
        except IndexError:
            break
        key = (first, second)
        if key not in d:
            d[key] = []
        #
        d[key].append(third)

    return d


def generate_sentence(d):
    li = [key for key in d.keys() if key[0][0].isupper()]
    key = choice(li)

    li = []
    first, second = key
    li.append(first)
    li.append(second)
    while True:
        try:
            third = choice(d[key])
        except KeyError:
            break
        li.append(third)
        if third[-1] in EOS:
            break
        # else
        key = (second, third)
        first, second = key

    return ' '.join(li)


def main():
    fname = sys.argv[1]
    with open(fname, "rt", encoding="utf-8") as f:
        text = f.read()

    words = text.split()
    d = build_dict(words)
    pprint(d)
    print()
    sent = generate_sentence(d)
    print(sent)
    if sent in text:
        print('# existing sentence :(')

####################

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print("Error: provide an input corpus file.")
        sys.exit(1)
    # else
    main()

Tips
Try to choose a long corpus to work with.

In our version the current state consists of two words. If you decide to put more words (3 for instance) in the current state, then the text will look less random, but also, it will look less gibberish (see also this gist).

Links

Generating pseudo random text with Markov chains using Python (this present post is based on this)
Mark V. Shaney at Your Service
Generate random words based on markov chains rather than random sentences

Categories: python Tags: Markov, python3, random text, text generator

pythonium: a Python to JavaScript translator

January 17, 2014 Jabba Laci Leave a comment

Pythonium is a Python 3 to Javascript translator written in Python that produce fast portable JavaScript code.

Example:

$ echo "for i in range(10): print(i)" >> loop.py
$ pythonium -V loop.py
var iterator_i = range(10);
for (var i_iterator_index=0; i_iterator_index < iterator_i.length; i_iterator_index++) {
    var i = iterator_i[i_iterator_index];
    console.log(i);
}

I haven’t tried it yet, so this post is a reminder for me to check it out.

Categories: python Tags: javascript

What is a BDFL?

January 14, 2014 Jabba Laci Leave a comment

“A BDFL, a term originally used by Python creator Guido van Rossum, is basically a leader of an open-source project who resolves disputes and has final say on big decisions.” (source)

Categories: python Tags: BDFL, guido

Python news in French

January 13, 2014 Jabba Laci Leave a comment

I just came across the site http://news.humancoders.com which is a news collector in French. Users can submit and discuss news here. It has a subpage dedicated to Python.

Human Coders News est un service permettant de partager les meilleures ressources trouvées sur la toile à propos d’un thème précis. Vous pouvez consulter l’ensemble des news sur la page d’accueil, ou bien, cliquer sur un sujet pour filtrer.

Categories: python Tags: French, news

a command line progress bar for your loops

January 10, 2014 Jabba Laci Leave a comment

Problem
When I was working on Project Euler, there were several problems that I solved with a brute force approach and thus the runtime was several hours. To see some changes and to know approximately when it finishes, I added a simple counter to the main loop that showed the progress in percentage.

Is there a simpler way to add a progress bar to a loop?

Solution
The project tqdm addresses this problem. Reddit discussion is here.

Usage example:

import time
from tqdm import tqdm

def main():
    for i in tqdm(range(10000)):
        time.sleep(.005)

Notes:

you can use xrange too: tqdm(xrange(10000))
you can write trange: trange(10000)

On the github page of the project you will find an animated gif too.

Categories: python Tags: Arabia, command-line, progress bar, tqdm

monkeypatching the string type

January 8, 2014 Jabba Laci Leave a comment

Problem
“A monkey patch is a way to extend or modify the run-time code of dynamic languages without altering the original source code.” (via wikipedia) That is, we have the standard library, and we want to add new features to it. For instance, in the stdlib a string cannot tell whether it is a palindrome or not, but we would like to extend the string type to support this feature:

>>> s = "racecar"
>>> print(s.is_palindrome())    # Warning! It won't work.
True

Is it possible in Python?

Solution
As pointed out in this thread, built-in types are implemented in C and you cannot modify them in runtime. As I heard Ruby allows this, but it doesn’t work in Python.

However, there is a workaround if you really want to do something like this. You can make a subclass of the built-in type and then you can extend it as you want. Example:

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

class MyStr(unicode):
    """
    "monkeypatching" the unicode class

    It's not real monkeypatching, just a workaround.
    """ 
    def is_palindrome(self):
        return self == self[::-1]

def main():
    s = MyStr("radar")
    print(s.is_palindrome())

####################

if __name__ == "__main__":
    main()

Categories: python Tags: monkeypatch, palindrome, unicode

Capture the exit code, the stdout, and the stderr of an external command

January 8, 2014 Jabba Laci Leave a comment

Update (20140110): The get_exitcode_stdout_stderr function was improved. Thanks to @Rhomboid for the tip. You will find the old version at the end of the post.

Problem
You have an arbitrary program that you want to execute as an external command, i.e. you have a wrapper around it. In the wrapper script you want to get the exit code, the stdout, and the stderr of the executed program.

Solution
Let the external command be this simple C program saved as test.c:

#include <stdio.h>

int main()
{
    printf("go 2 stdout\n");
    fprintf(stderr, "go 2 stderr\n");
    return 3;
}

Compile and run it:

$ gcc test.c
$ ./a.out
go 2 stdout
go 2 stderr

Note that this external command can be anything, written in any language.

After executing it, we pose the following questions:

What was its exit code?
What did it send to the stdout?
What did it send to the stderr?

Here is a wrapper that can capture all these three things. Thanks to @Rhomboid for his tip on improving the get_exitcode_stdout_stderr function.

#!/usr/bin/env python
# encoding: utf-8

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

import sys
import shlex
from subprocess import Popen, PIPE


def frame(text):
    """
    Put the text in a pretty frame.
    """
    result = """
+{h}+
|{t}|
+{h}+
""".format(h='-' * len(text), t=text).strip()
    return result


def get_exitcode_stdout_stderr(cmd):
    """
    Execute the external command and get its exitcode, stdout and stderr.
    """
    args = shlex.split(cmd)

    proc = Popen(args, stdout=PIPE, stderr=PIPE)
    out, err = proc.communicate()
    exitcode = proc.returncode
    #
    return exitcode, out, err


def main(params):
    cmd = ' '.join(params)
    exitcode, out, err = get_exitcode_stdout_stderr(cmd)

    print(frame("EXIT CODE"))
    print(exitcode)

    print(frame("STDOUT"))
    print(out)

    print(frame("STDERR"))
    print(err)

####################

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print('Usage: {} <external_command>'.format(sys.argv[0]))
        sys.exit(1)
    # else
    main(sys.argv[1:])

Running example

$ ./wrapper.py ./a.out
+---------+
|EXIT CODE|
+---------+
3
+------+
|STDOUT|
+------+
go 2 stdout

+------+
|STDERR|
+------+
go 2 stderr

The script wrapper.py can be called in various ways:

$ ./wrapper.py python print.py    # pass the command in several pieces
...
$ ./wrapper.py "python print.py"    # you can pass the command as one argument
...

Fine, but what is it good for?
I’m working on an online judge whose job is to execute a program with different parameters and decide whether it’s correct or not. I want to execute the programs in a sandboxed (secure) environment and I need to get the output (and the errors) of the programs to analyze how they ran. The wrapper script above is a first step in this direction.

Update (20140110)
First I solved this problem by writing the stdout and stderr to temporary files. As it turned out, temp. files are not necessary, so the post above was updated accordingly. Here I leave my old solution:

import shlex
import tempfile
from subprocess import call

# Warning! The code above is better! This is the old version!

def get_exitcode_stdout_stderr(cmd):
    """
    Execute the external command and get its exitcode, stdout and stderr.
    """
    args = shlex.split(cmd)

    try:
        fout = tempfile.TemporaryFile()
        ferr = tempfile.TemporaryFile()
        exitcode = call(args, stdout=fout, stderr=ferr)
        fout.seek(0)
        ferr.seek(0)
        out, err = fout.read(), ferr.read()
    finally:
        fout.close()
        ferr.close()
    #
    return exitcode, out, err

<old>
You might be tempted to redirect the stdout and stderr in the function call(...) to a string (to a StringIO) instead of a file. Unfortunately it doesn’t work. Although a StringIO behaves like a file-like object, it’s not a file, thus it doesn’t have a fileno() method and you would get an error because of this (see this thread for instance).

So, we must redirect the outputs to files. After reading the content of these files, they can be removed, thus we use tempfiles from the standard library. When a tempfile.TemporaryFile is closed, it is removed, so we don’t need to unlink them (more info @pymotw and @docs).
</old>

Categories: python Tags: exit code, redirect, sandbox, stderr, stdout, wrapper

Python equivalent of Java .jar files

January 5, 2014 Jabba Laci Leave a comment

Problem
In Java, you can distribute your project in JAR format. It is essentially a ZIP file with some metadata. The project can be launched easily:

$ java -jar project.jar

What is its Python equivalent? How to distribute a Python project (with several modules and packages) in a single file?

Solution
The following is based on this post, written by bheklilr. Thanks for the tip.

Let’s see the following project structure:

MyApp/
    MyApp.py          <--- Main script
    alibrary/
        __init__.py
        alibrary.py
        errors.py
    anotherlib/
        __init__.py
        another.py
        errors.py
    configs/
        config.json
        logging.json

Rename the main script to __main__.py and compress the project to a zip file. The extension can be .egg:

myapp.egg/             <--- technically, it's just a zip file
    __main__.py        <--- Renamed from MyApp.py
    alibrary/
        __init__.py
        alibrary.py
        errors.py
    anotherlib/
        __init__.py
        another.py
        errors.py
    configs/
        config.json
        logging.json

How to zip it? Enter the project directory (MyApp/) and use this command:

zip -r ../myapp.egg .

Now you can launch the .egg file just like you launch a Java .jar file:

$ python myapp.egg

You can also use command-line arguments that are passed to __main__.py.

Categories: python Tags: .egg, .jar, distribution, java, zip

2014: moving towards Python 3

January 3, 2014 Jabba Laci Leave a comment

My new year’s resolution is to move (slowly) towards Python 3. I don’t want to switch yet but I will use things in my code that will facilitate the transition in the future. Thus, from now on I will use this skeleton for my new scripts:

#!/usr/bin/env python
# encoding: utf-8

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

def main():
    pass

####################

if __name__ == "__main__":
    main()

“A future statement must appear near the top of the module. The only lines that can appear before a future statement are:

the module docstring (if any),
comments,
blank lines, and
other future statements.“

More info on the __future__ imports

Compatibility layers

future (reddit discussion)
pies

Categories: python Tags: absolute import, division, print function, python3, unicode literals, __future__

Newer Entries Older Entries

Python Adventures

Archive

PyLaTeX: Python + LaTeX

Generating pseudo random text using Markov chains

pythonium: a Python to JavaScript translator

What is a BDFL?

Python news in French

a command line progress bar for your loops

monkeypatching the string type

Capture the exit code, the stdout, and the stderr of an external command

Python equivalent of Java .jar files

2014: moving towards Python 3

Blog Stats

Random Post

Recent Posts

Archives

Meta