Introduction
This document (notebook) discusses the Python package “DSLExamples”, [AAp1], which is a “data package” with examples of DSL commands translations to programming code. The DSL examples are suitable for LLM few-shot training.
The function llm_example_function provided by “LLMFunctionObjects”, [AAp2], can be effectively used to create translation functions utilizing those examples. The utilization of such LLM-translation functions is exemplified below. A table with the results of correctness and speed experiments is also shown.
The presentation “Robust LLM pipelines (Mathematica, Python, Raku)”, [AAv1], discusses the use of LLM-example functions in more general settings:
Similar translations — with much less computational resources — are achieved with grammar-based DSL translators; see the Raku package “DSL::Translators”, [AAp2]. The Raku package “LLM::Resources”, [AAp4], has LLM-graphs for code generation that utilize the DSL examples of this package.
Usage examples
Get all examples:
from DSLExamples import *from DataTypeSystem import * import pandas as pdt = dsl_examples()deduce_type(t)
# Assoc(Atom(<class 'str'>), Assoc(Atom(<class 'str'>), Assoc(Atom(<class 'str'>), Atom(<class 'str'>), 20), 7), 4)
Tabulate all translation languages and available workflow examples:
rows = [ {"language": lang, "workflow": workflow} for lang, workflows in dsl_examples().items() for workflow in workflows.keys()]pd.DataFrame(rows).sort_values(["language", "workflow"]).reset_index(drop=True)
| language | workflow |
|---|---|
| Python | LSAMon |
| Python | QRMon |
| Python | SMRMon |
| Python | pandas |
| R | DataReshaping |
| R | LSAMon |
| R | QRMon |
| R | SMRMon |
| Raku | DataReshaping |
| Raku | SMRMon |
| Raku | TriesWithFrequencies |
| WL | ClCon |
| WL | DataReshaping |
| WL | LSAMon |
| WL | QRMon |
| WL | SMRMon |
| WL | Tabular |
| WL | TriesWithFrequencies |
Note that for dsl_examples the language to translate from is specified. Currently, the package has DSL examples for Bulgarian, English, Portuguese, and Russian (being from-languages.)
Get the examples for Latent Semantic Analysis (LSA) Monadic pipeline segments in Python:
pd.DataFrame([{"command": k, "code": v} for k, v in dsl_examples('Python', 'LSAMon').items()])
| command | code |
|---|---|
| load the package | from LatentSemanticAnalyzer import * |
| use the documents aDocs | LatentSemanticAnalyzer(aDocs) |
| use dfTemp | LatentSemanticAnalyzer(dfTemp) |
| make the document-term matrix | make_document_term_matrix() |
| make the document-term matrix with automatic s… | make_document_term_matrix[stemming_rules=None,… |
| make the document-term matrix without stemming | make_document_term_matrix[stemming_rules=False… |
| apply term weight functions | apply_term_weight_functions() |
| apply term weight functions: global IDF, local… | apply_term_weight_functions(global_weight_func… |
| extract 30 topics using the method SVD | extract_topics(number_of_topics=24, method=’SVD’) |
| extract 24 topics using the method NNMF, max s… | extract_topics(number_of_topics=24, min_number… |
| Echo topics table | echo_topics_interpretation(wide_form=True) |
| show the topics | echo_topics_interpretation(wide_form=True) |
| Echo topics table with 10 terms per topic | echo_topics_interpretation(number_of_terms=10,… |
| find the statistical thesaurus for the words n… | echo_statistical_thesaurus(terms=stemmerObj.st… |
| show statistical thesaurus for king, castle, p… | echo_statistical_thesaurus(terms=stemmerObj.st… |
Make an LLM example function for translation of LSA workflow building commands:
from LLMFunctionObjects import *;from LLMPrompts import *conf = llm_configuration('ChatGPT', model = "gpt-5.5")llm_pipeline_segment = llm_example_function(dsl_examples('WL', 'LSAMon'), llm_evaluator=conf)
Run the LLM function over a list of DSL commands:
commands =["use the dataset aAbstracts","make the document-term matrix without stemming","extract 42 topics using the method non-negative matrix factorization","show the topics"]gen_code = "⟹\n".join([llm_pipeline_segment(x).replace('Output:','').strip() for x in commands])print(gen_code)
# LSAMonUnit[aAbstracts]⟹# LSAMonMakeDocumentTermMatrix["StemmingRules" -> None]⟹# LSAMonExtractTopics["NumberOfTopics" -> 42, Method -> "NNMF"]⟹# LSAMonEchoTopics[]
Same workflow specified in Bulgarian:
llm_pipeline_segment_bg = llm_example_function(dsl_examples(lang='WL', workflow='LSAMon', from_lang='Bulgarian'), llm_evaluator=conf)commands = ["използавай данните aAbstracts","направи документ-терм матрицата без да използаваш стъблата на думите","намери 40 теми ползвайки методата не-отрицателна матрична факторизация","покажи темите"]gen_code = "⟹\n".join([llm_pipeline_segment_bg(x).replace('Output:','').strip() for x in commands])print(gen_code)
# LSAMonUnit[aAbstracts]⟹# LSAMonMakeDocumentTermMatrix["StemmingRules" -> None]⟹# LSAMonExtractTopics[40, Method -> "NNMF"]⟹# LSAMonEchoTopicsTable[]
Correctness and speed of different models
The following performance table — for both correctness and speed — was derived with the following code:
# LLM access configurationconf = llm_configuration(<Provider>, model = <model>)# Main testing functionllm_pipeline_segment = llm_example_function(dsl_examples('WL', 'LSAMon'), llm_evaluator=conf)# Secondary testing function (for ≈ 1/4th of the models)llm_pipeline_segment = llm_example_function(dsl_examples('WL', 'LSAMon'), llm_evaluator = conf, prompts = 'Do the generation by example as quickly as possible. Do not overthink it.')
A few observations:
- For a few of the models the secondary function produced different results, but for most not difference in results and timings were observed.
- For some of the models that produced a correct results multiple executions were made and occasionally different (and wrong) code was generated.
- The tests are for line-by-line generation; en-bloc tests can produce different results.
- Hopefully, fast and correct.
- Of course, a more extensive benchmark study can be staged for different combinations of models, programming languages, natural languages, and workflows.
- But that would require more planning, time, and resources.
| Provider | Model | Approx. Time, s | Approx. Time, m | InWords | Success | Comments |
|---|---|---|---|---|---|---|
| Ollama | gemma4:26b | 240 | 4 | Very slow | Correct | |
| Ollama | gemma4:26b | 190 | 3.2 | Very slow | Correct | max_tokens = 500 |
| Ollama | gemma4:26b | 240 | 4 | Very slow | Wrong | with the do-not-overthink-it prompt |
| Ollama | gemma3:12b | 240 | 4 | Very slow | Wrong | generates lots of redundant code and explanations |
| Ollama | qwen3-coder:30b | 25 | 0.5 | Fast | Wrong | generates a bunch of explanations |
| Ollama | gpt-oss:20b | 90 | 1.5 | Slow | Wrong | generates a bunch of explanations |
| Ollama | qwen3.5:9b | 630 | 10.5 | Very, very slow | Correct | Correct code followed by a Markdown table |
| Ollama | qwen3.6:35b | 480 | 8 | Very, very slow | Correct | |
| Ollama | qwen3.6:35b | 300 | 5 | Slow | Correct | with the do-not-overthink prompt |
| Ollama | llama3.2:latest | 25 | 0.5 | Fast | Wrong | generates a bunch of explanations |
| Ollama | gemma3:1b | 15 | 0.4 | Fast | Wrong | |
| Ollama | gemma3:4b | 60 | 1 | Slowish | Wrong | generates a bunch of code |
| Ollama | deepseek-r1:latest | 250 | 4.5 | Slow | Wrong | generates a bunch of explanations and JSON structures |
| Ollama | lfm2 | 40 | 0.67 | Fast | Wrong | generates a bunch of explanations |
| Ollama | lfm2 | 2400 | 40 | Very, very slow | Interrupted | with the do-not-overthink-it prompt |
| Ollama | granite4.1:3b | 12 | 0.2 | Fast | Wrong | generates a bunch of explanations |
| Ollama | granite4.1:8b | 60 | 1 | Slowish | Wrong | generates a bunch of explanations |
| Ollama | medgemma1.5:4b | 1640 | 24 | Very, very slow | Interrupted | |
| ChatGPT | gpt-4.1-mini | 4 | 0.1 | Fast | Wrong | |
| ChatGPT | gpt-4.1 | 5 | 0.1 | Fast | Wrong | |
| ChatGPT | gpt-5.3-chat-latest | 13 | 0.2 | Fast | Correct | |
| ChatGPT | gpt-5.4-mini | 4 | 0.1 | Fast | Wrong | |
| ChatGPT | gpt-5.5 | 45 | 0.75 | Slowish | Correct | |
| Gemini | gemini-3.5-flash | 40 | 0.67 | Slowish | Correct |
Testing for speed and correctness helps to make more adequate designs for agentic translation architectures like the ones described in the article “Day 6 – Robust code generation combining grammars and LLMs”, [AA1, AAn1].
Implementation details
There are several ways to organize the DSL examples with respect to the from-languages:
| Type | Comment | Currently used |
|---|---|---|
| Have a separate file for each from-langauge | Convenient editing and refinement | Yes |
| One file of all examples; from-langauge is a key for each workflow | Can be produces with the separate files | No |
| Keep English-only DSL examples and use dictionaries of command translations to English | Does not train the LLM directly with the from-language | Dictionaries are kept for reference |
This Jupyter notebook has a workflow for the translation of the English DSL examples into other languages.
References
Articles, blog posts
[AA1] Anton Antonov, “Day 6 – Robust code generation combining grammars and LLMs”, (2025), Raku Advent Calendar.
Notebooks
[AAn1] Anton Antonov, “Robust code generation combining grammars and LLMs”, (2025), Wolfram Community.
Packages
[AAp1] Anton Antonov, DSLExamples, Python package, (2026), GitHub/antononcube. (PyPI.org.)
[AAp2] Anton Antonov, LLMFunctionObjects, Python package, (2023-2026), GitHub/antononcube. (PyPI.org.)
[AAp3] Anton Antonov, DSL::Translators, Raku package, (2020-2026), GitHub/antononcube.
[AAp4] Anton Antonov, LLM::Resources, Raku package, (2026), GitHub/antononcube.
Videos
[AAv1] Anton Antonov, “Robust LLM pipelines (Mathematica, Python, Raku)”, (2024), YouTube/AAA4prediction.




