Skip to content

Chemical Reaction Networks Analysis and Catalyst Design

Reaction Networks

Chemical reaction networks have numerous different pathways to produce target molecules of interest. Solving the constrained combinatorial optimization problem of finding optimal reaction pathways has critical applications in synthesis planning and metabolic pathway analysis. This problem is generally a NP-hard problem due to the combinatorial explosion from the network graph size. A pre-print publication published by us, MQS, presents a benchmark between different solvers for reaction network analysis and the CP-SAT based approach has been implemented in the Cebule SDK which shows almost the same performance with respect to computational time (identical optimal minimum) in comparison to the best performing solver (Gurobi).

Mathematical Formulation

A chemical reaction network (CRN) can be represented as a bipartite directed graph with two types of nodes: reactions and species.

Each reaction in the network has associated costs:

  • Unit cost (unit_cost): Cost per quantity of the reaction (e.g., substrate purchase, byproduct disposal)
  • Fixed cost (fixed_cost): One-time cost incurred if the reaction occurs at all (e.g., equipment preparation)
  • Bounds: Lower (low) and upper (high) limits on reaction quantities

The optimization problem, as presented by Mizuno and Komatsuzaki, can be formulated as:

minimize: Σ(r∈R) [c_r^unit × x_r + c_r^fixed × p(x_r)]

subject to: 
  ∀s∈S: Σ(r∈R) v_{s,r} × x_r = 0  (mass balance)
  ∀r∈R: l_r ≤ x_r ≤ u_r           (reaction bounds)

Where:

  • R is the set of reactions, S is the set of species
  • x_r is the quantity of reaction r (optimization variable)
  • v_{s,r} is the signed stoichiometric coefficient of species s in reaction r
  • p(x_r) is the positivity indicator function (1 if x_r > 0, 0 otherwise)

Task Types

RXN_NETWORK_OPT

  • Optimize chemical reaction network pathways to minimize total cost while satisfying mass balance constraints
  • Inputs: crn_data: Dict[str, List[Dict]] containing the reaction network definition with keys reaction_list and species_list, time_limit: int maximum solve time in seconds
  • Cebule max_processors: Specify the number of cores to use
  • Output: Dict[str, Union[Dict[str, int], float, bool, Dict[str, float]]] containing the keys reaction_quantities (mapping reaction names to optimal quantities), total_cost (minimum total cost achieved), solve_time (total time taken in seconds), and optimal (whether optimal solution was found)

Input Format

Within crn_data dictionary:

  • reaction_list: Array of reaction definitions
  • name: Unique identifier for the reaction
  • fixed_cost: One-time cost if reaction occurs (≥0)
  • unit_cost: Cost per unit quantity of reaction (≥0)
  • low: Minimum allowed reaction quantity (≥0)
  • high: Maximum allowed reaction quantity (≥low)
  • species_list: Array of chemical species definitions
  • name: Unique identifier for the species
  • stoich: Dictionary mapping reaction names to signed stoichiometric coefficients
    • Positive values: species is produced by the reaction
    • Negative values: species is consumed by the reaction

Inflow and Outflow Reactions

Chemical reaction networks require boundary conditions to represent material inputs and product outputs. These are typically modeled as special reactions:

  • Inflow reactions: Represent external material supply (e.g., raw material purchases). These reactions have no reactants and produce species needed by the network. They should positive low bounds if the material is a required input to ensure minimum supply, but if the material is optional (ex. one pathway to produce the desired target includes it but not another) a positive lower bound should not be given.

  • Outflow reactions: Represent product extraction or waste disposal. These reactions consume species produced by the network and have no products. They may have positive low bounds to ensure minimum production targets are met. For species that require waste disposal (are byproducts or reactions but not required target chemicals) they should not have a positive lower bound.

Without inflow and outflow reactions, the mass balance constraints would force all reaction quantities to zero, as no net production or consumption would be possible in a closed system.

Example: Complete Solvay Process CRN

The following example details the Solvay Process CRN, shown in the below figure: Figure 1: Solvay Reaction Network with Optimal Solution Quantities

The input crn_data would be in the below format:

{
  "reaction_list": [
    {
      "name": "inflow1",
      "fixed_cost": 2,
      "unit_cost": 4,
      "low": 0,
      "high": 5
    },
    {
      "name": "inflow2",
      "fixed_cost": 7,
      "unit_cost": 3,
      "low": 1,
      "high": 5
    },
    {
      "name": "1",
      "fixed_cost": 2,
      "unit_cost": 1,
      "low": 0,
      "high": 5
    },
    {
      "name": "2",
      "fixed_cost": 4,
      "unit_cost": 2,
      "low": 0,
      "high": 5
    },
    {
      "name": "3",
      "fixed_cost": 9,
      "unit_cost": 3,
      "low": 0,
      "high": 5
    },
    {
      "name": "4",
      "fixed_cost": 1,
      "unit_cost": 8,
      "low": 0,
      "high": 5
    },
    {
      "name": "5",
      "fixed_cost": 7,
      "unit_cost": 2,
      "low": 0,
      "high": 5
    },
    {
      "name": "outflow1",
      "fixed_cost": 2,
      "unit_cost": 3,
      "low": 1,
      "high": 5
    },
    {
      "name": "outflow2",
      "fixed_cost": 4,
      "unit_cost": 4,
      "low": 1,
      "high": 5
    }
  ],
  "species_list": [
    {
      "name": "NaCl",
      "stoich": {
        "inflow1": 1,
        "1": -1
      }
    },
    {
      "name": "CO2",
      "stoich": {
        "1": -1,
        "2": 1,
        "3": 1
      }
    },
    {
      "name": "NaHCO3",
      "stoich": {
        "1": 1,
        "2": -2
      }
    },
    {
      "name": "Na2CO3",
      "stoich": {
        "2": 1,
        "outflow1": -1
      }
    },
    {
      "name": "H2O",
      "stoich": {
        "1": -1,
        "2": 1,
        "4": -1,
        "5": 2
      }
    },
    {
      "name": "NH3",
      "stoich": {
        "1": -1,
        "5": 2
      }
    },
    {
      "name": "NH4Cl",
      "stoich": {
        "1": 1,
        "5": -2
      }
    },
    {
      "name": "CaCO3",
      "stoich": {
        "inflow2": 1,
        "3": -1
      }
    },
    {
      "name": "CaO",
      "stoich": {
        "3": 1,
        "4": -1
      }
    },
    {
      "name": "Ca(OH)2",
      "stoich": {
        "4": 1,
        "5": -1
      }
    },
    {
      "name": "CaCl2",
      "stoich": {
        "5": 1,
        "outflow2": -1
      }
    }
  ]
}

This represents the complete Solvay process with:

  • inflow1: NaCl supply (salt input)
  • inflow2: CaCO₃ supply (limestone input)
  • Reaction 1: Main ammonia-soda reaction: NaCl + CO₂ + H₂O + NH₃ → NaHCO₃ + NH₄Cl
  • Reaction 2: Sodium carbonate formation: 2 NaHCO₃ → Na₂CO₃ + CO₂ + H₂O
  • Reaction 3: Limestone calcination: CaCO₃ → CaO + CO₂
  • Reaction 4: Lime slaking: CaO + H₂O → Ca(OH)₂
  • Reaction 5: Ammonia recovery: Ca(OH)₂ + 2 NH₄Cl → 2 NH₃ + 2 H₂O + CaCl₂
  • outflow1: Na₂CO₃ product extraction
  • outflow2: CaCl₂ byproduct removal

Optimize reaction network with Cebule

# Create CRN optimization task
task_crn_opt = session.cebule.create_task(
    "Solvay Process Optimization",
    TaskType.RXN_NETWORK_OPT,
    crn_data=solvay_crn,
    time_limit=10
)

# Wait for result
result = session.cebule.wait_for_result(task_crn_opt.id, "result")
print(f"Optimal cost: {result['total_cost']}")
print(f"Reaction quantities: {result['reaction_quantities']}")
print(f"Optimization completed in {result['solve_time']} seconds")
print(f"Optimal solution found: {result['optimal']}")

Catalyst Design

Task Types

MAKE_SURF

  • Inputs:

element1: str first atomic element

element2: str second atomic element

surface_type: str

n_surfaces: int number of surfaces to generate

reaction_network: reaction_network definition

vacuum: number of vacuum layers over surface with length

dataset_tag: dataset tag/name for the dataset which is being generated

supercell_size_y: int or float?

layers: int

  • Optional Inputs:

support_structure: str

support_supercell_size_x: float

support_supercell_size_y: float

support_layers: int

support_lattice_const: float

support: str

  • Output:

GAS_SPECIES_ENERGY

energy_calculator: string

optimizer_type: string

unitcell_length:

energy_offsets_path: string

  • Optional Inputs:

dataset_tag: string

num_cores: int

SURFACE_REACTION_ENERGIES

reaction_type:

temperature: float;

pressure: float; Total pressure

adsorbate_positions_path: string

  • Optional Inputs:

single_surface: True or False

backup_site:

only_aggregate:

uncertainty: True or False; Enable kinetic uncertainty quantification using ensemble DFT calculations.

gfn: True or False

electronic_temperature: float

geometry_opt: True or False

fmax: float

dataset_tag: string

num_cores: int

GAN_TOF

reaction_type

dataset_tag

surf_data_tag

rx_energy_data_tag

epochs

nclass

score_max

score_min

scaler

min_loss_diff

WULFF_CONSTRUCTION

VISUALIZE_SURF

GAN_TOF_PLOT

POTENTIAL_ENERGY_DIAGRAM

FRACTIONAL_COVERAGES

REACTION_ENERGIES_COMPARISON