Skip to content

sequgen/sequgen

sequgen

Purpose

Programmatically generate synthetic sequence data such as time series, strings, DNA, etc. Sequence data generation is fully controlled by the user. sequgen does not build models from real-world sequence data.

Badges

fair-software.nl recommendations
(1/5) code repository github repo badge
(2/5) license github license badge
(3/5) community registry pypi badge
(4/5) citation DOI
(5/5) checklist core infrastructures badge
overall fair-software badge
Other best practices
Documentation Documentation Status
Supported Python versions python versions badge
Code quality Quality Gate Status
Code coverage of unit tests Coverage
GitHub Actions
Citation metadata consistency workflow cffconvert badge
Unit tests workflow tests badge

Install

pip3 install sequgen

Usage example

This usage example involves generating time series data. We generate a time series with three channels: 1. a normal distribution, 2. Gaussian noise, and 3. the combination (sum) of the first two channels. The normal distribution is positioned between 8 and 12 on an abstract time axis of 100 intervals starting at 0 and ending at 20. The standard deviation of the distribution is a value between 1 and 2 and its peak has a height between 4 and 5. For the Gaussian noise we use the default values (standard deviation 1 and average value 0). The third channel is defined as the sum of the other two channels. After creating the three channels, graphs with their values are plotted:

from matplotlib import pyplot as plt
import numpy
from sequgen.deterministic.normal_peak import normal_peak
from sequgen.stochastic.gaussian import gaussian
from sequgen.parameter_space import ParameterSpace
from sequgen.dimension import Dimension

time_axis = numpy.linspace(start=0, stop=20, num=101)
parameter_space_0 = ParameterSpace([
    Dimension("location", 8, 12),
    Dimension("stddev", 1, 2),
    Dimension("height", 4, 5),
])

channel_1 = normal_peak(time_axis, **parameter_space_0.sample())
channel_2 = gaussian(time_axis)
channel_3 = channel_1 + channel_2
channels = { "channel 1: normal peak": channel_1,
             "channel 2: gaussian noise": channel_2,
             "channel 3: combined": channel_3 }

i = 0
for title, channel in channels.items():
    plt.subplot(len(channels), 1, i+1)
    plt.plot(time_axis, channel)
    plt.title(title, y=0.75, x=0.01, loc="left")
    i += 1
plt.show()

And these are the results:

usage example

You can find more usage examples in the notebooks repository on GitHub: https://github.com/sequgen/notebooks.

Contributing

For developer documentation, go to the developer's README.

If you want to contribute to the development of sequgen, have a look at the contribution guidelines.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

About

Synthetic sequence generator

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors