Skip to content

Software Development Kit (SDK)

When subscribing to the Cebule Tier, you can access the API via the Python SDK which can be regarded as a simple and user-friendly overlay to make use of well-defined functions to interact with the Cebule engine.

First we present how the API has been structured and then move on to present the collection of functions and argument possibilities of the SDK,

Application Programming Interface (API)

The API can be utiilized to search for molecules and includes the following internal features of the search method through the database holding more than 200.000.000 molecules:

  • Molecules ranking based on the identifiers (e.g. SMILES, formula).
  • Definition of how many molecules should be returned with an exact number. This improvement refers to the GET /search endpoint.
  • The query parameters for the GET /search endpoint are:
q

- the keyword to search for (required)

start

- offset of the first result to return (default: 0)

limit

- number of results to return (default: 50)

cursor

- value of nextCursorMark from the previous response

You can find the current documentation of the MQS API also under the respective tab within the Dashboard:

Figure 1: Documentation page of the API within the Dashboard.

Software Development Kit (SDK)

The SDK allows you to work in a more simplified way to integrate the MQS tool stack with your own coding projects or software in comparison to the bare bone MQS REST-API interface.

The Python SDK provides currently an additional layer for working with the API by improving the ease of use of the API with tailored functions.

As an example of what the SDK provides here the authentication and search functions in use from the SDK after importing mqsdk and setting up a session:

import mqsdk
session = mqsdk.Session("<email address>", "<password>")
results = session.data.search("aspirin", start=0, limit=10)
print(results)

Atom Order

Various Cebule tasks in the SDK (GEOMETRY_OPT, GROUP_CONTRIBUTION, FORCE_FIELD_MD) rely on a set atom order, and to interpret the outputs of those tasks one can use the ATOM_ORDER task in Cebule.

Cebule SDK TaskType: ATOM_ORDER

  • Determine the Cebule atom order for a given molecule specified as a SMILES string, a list of atomic symbols, or a polymer chain
  • Inputs:
  • smiles: str - SMILES representation of the molecule, OR
  • smiles: list[str] - List of pSMILES (polymer SMILES) for polymer chain with connection tags, OR
  • symbols: list[str] - List of atomic symbols (e.g., ["C", "H", "O"])
  • geometry: list[list[list[float]]] - (Optional) List of 3D geometries for each polymer unit when using polymer chain SMILES. When provided, returns both atom order and stitched geometry.
  • Cebule max_processors: N/A (lightweight operation)
  • Output:
  • Without geometry: List of element symbols in the order they appear across Cebule task outputs
  • With geometry: Dictionary with "atom_order" (list of symbols) and "geometry" (stitched polymer 3D coordinates)

Example with SMILES:

session.cebule.create_task("order", TaskType.ATOM_ORDER, smiles="CCO")

# Result:
["H", "H", "H", "H", "H", "H", "C", "C", "O"]

Example with symbols:

session.cebule.create_task("order", TaskType.ATOM_ORDER, symbols=["C", "O", "H", "N", "H"])

# Result:
["H", "H", "C", "N", "O"]

Example with polymer chain:

# Without geometry - just get atom order
session.cebule.create_task("order", TaskType.ATOM_ORDER, smiles=["[*]CC[*:1]", "[*:1]CCC[*]"])

# With geometry - get atom order and stitched geometry
session.cebule.create_task("order", TaskType.ATOM_ORDER,
                          smiles=["[*]CC[*:1]", "[*:1]CCC[*]"],
                          geometry=[unit1_coords, unit2_coords])

# Result (with geometry):
{
    "atom_order": ["H", "H", "H", ..., "C", "C", "C"],
    "geometry": [[x1, y1, z1], [x2, y2, z2], ...]
}

This result lets us know the atomic indices used in other tasks. For instance in GEOMETRY_OPT output list we know that the first XYZ coordinate corresponds to a hydrogen atom and the last coordinate corresponds to an oxygen. In the GROUP_CONTRIBUTION output indices we know that if a group contains the atom at index 6, this is a carbon atom.

The Python SDK can be found under the following link: https://gitlab.com/mqsdk/python-sdk and check out the Molecule Search Jupyter Notebook.

You need to have subscribed to the Quantum & Machine Learning Tier for getting access to the API which the Python SDK depends on.

All example notebooks for the Python SDK can be found in the notebooks folder and example scripts in the scripts folder of the repository.

Polymers

The Cebule SDK supports polymer calculations through a specialized polymer SMILES (pSMILES) notation that uses dummy atoms (*) to represent repeating units and connection points. This enables quantum chemical calculations on polymer systems while maintaining computational tractability.

pSMILES Format

Polymer structures are specified using exactly two dummy atoms (*) that mark the connection points of the repeating unit:

  • *CC* - Ethylene repeating unit (polyethylene)
  • *CCC* - Propylene repeating unit (polypropylene)
  • *C1=CC=CC=C1* - Benzene repeating unit (polyphenylene)

The dummy atoms indicate where the polymer chain continues, defining the periodic boundary of the repeating unit.

Polymer Chains

For specifying multi-unit polymer chains (oligomers), Cebule supports a list of pSMILES with connection tags that define how units link together:

Format:

["[*]CC[*:1]", "[*:1]CCC[*]"]

["[*]C1OCCC(O)C1CC1OC([*:1])C(O)CC1", "[*:1]C1OCC(O)CC1CC1OCCC(O)C1CC1OC([*:2])C(O)CC1", "[*:2]C1OCC(O)CC1CC1OC([*])C(O)CC1"]

Connection Rules:

  • Tagged dummy atoms ([*:1], [*:2], etc.) specify connection points between units
  • Tags must appear exactly twice across the chain - once on each unit to be connected
  • Plain dummy atoms ([*]) mark the chain ends (exactly two required per chain)
  • Units are automatically stitched together based on matching tags

Example polymer chain:

# 3-unit alkane chain:
# Unit 1: [*]CC[*:1]  (plain end, connects with tag 1 to next molecule)
# Unit 2: [*:2]CCC[*:1] (connects with tag 2 to next molecule, with tag 1 to previous molecule)
# Unit 3: [*:2]CC[*]  (connects with tag 2 to previous molecule, plain end)

smiles_chain = ["[*]CC[*:1]", "[*:2]CCC[*:1]", "[*:2]CC[*]"]

Use Cases:

  • Geometry optimization: Optimize individual units (monomers or smaller oligmers), then stitch into complete chain geometry using ATOM_ORDER with geometry parameter
  • Molecular dynamics: Simulate entire polymer chain as a single molecule with FORCE_FIELD_MD
  • Property prediction: Include polymer chains in mixtures for GNN_TRAIN and GNN_PREDICT
  • Group contribution: Calculate thermodynamic properties of polymer-containing mixtures with GROUP_CONTRIBUTION

GNN Periodicity Support

Mixture property prediction (viscosity and surface tension) supports polymer molecules through the allow_polymer parameter in GNN training, enabling accurate modeling of polymer-containing mixtures with pSMILES notation.

When using polymer SMILES with Graph Neural Network (GNN) models, the periodicity is automatically incorporated into the molecular graph representation, so the model understands the extended polymer structure beyond just the repeating unit.

Hydrogen Capping for Quantum Chemistry

For quantum chemistry calculations (GEOMETRY_OPT, ATOM_ORDER, FORCE_FIELD_MD, and GROUP_CONTRIBUTION), the polymer SMILES are automatically processed to create finite, computable molecular models:

  1. Dummy atom removal: The * atoms are removed from the structure
  2. Hydrogen capping: Auxiliary hydrogen atoms are added to cap the dangling bonds where the dummy atoms were removed
  3. Canonical ordering: Atoms are reordered with auxiliary hydrogens first, followed by the original polymer atoms sorted by atomic number

One can also specify repeating oligomers instead of monomers, in case running quantum chemistry simulations on the oligomer rather than monomer would be advantageous for observing properties of interest:

  • Monomer: *CC* (ethylene unit)
  • Dimer: *CCCC* (two ethylene units)
  • Trimer: *CCCCCC* (three ethylene units)