Chemical Reaction Networks Analysis and Catalyst Design¶

Reaction Networks¶

Chemical reaction networks have numerous different pathways to produce target molecules of interest. Solving the constrained combinatorial optimization problem of finding optimal reaction pathways has critical applications in synthesis planning and metabolic pathway analysis. This problem is generally a NP-hard problem due to the combinatorial explosion from the network graph size. A pre-print publication published by us, MQS, presents a benchmark between different solvers for reaction network analysis and the CP-SAT based approach has been implemented in the Cebule SDK which shows almost the same performance with respect to computational time (identical optimal minimum) in comparison to the best performing solver (Gurobi).

Mathematical Formulation¶

A chemical reaction network (CRN) can be represented as a bipartite directed graph with two types of nodes: reactions and species.

Each reaction in the network has associated costs:

Unit cost (unit_cost): Cost per quantity of the reaction (e.g., substrate purchase, byproduct disposal)
Fixed cost (fixed_cost): One-time cost incurred if the reaction occurs at all (e.g., equipment preparation)
Bounds: Lower (low) and upper (high) limits on reaction quantities

The optimization problem, as presented by Mizuno and Komatsuzaki, can be formulated as:

minimize: Σ(r∈R) [c_r^unit × x_r + c_r^fixed × p(x_r)]

subject to: 
  ∀s∈S: Σ(r∈R) v_{s,r} × x_r = 0  (mass balance)
  ∀r∈R: l_r ≤ x_r ≤ u_r           (reaction bounds)

Where:

R is the set of reactions, S is the set of species
x_r is the quantity of reaction r (optimization variable)
v_{s,r} is the signed stoichiometric coefficient of species s in reaction r
p(x_r) is the positivity indicator function (1 if x_r > 0, 0 otherwise)

Task Types¶

RXN_NETWORK_OPT¶

Optimize chemical reaction network pathways to minimize total cost while satisfying mass balance constraints
Inputs: crn_data: Dict[str, List[Dict]] containing the reaction network definition with keys reaction_list and species_list, time_limit: int maximum solve time in seconds
Cebule max_processors: Specify the number of cores to use
Output: Dict[str, Union[Dict[str, int], float, bool, Dict[str, float]]] containing the keys reaction_quantities (mapping reaction names to optimal quantities), total_cost (minimum total cost achieved), solve_time (total time taken in seconds), and optimal (whether optimal solution was found)

Input Format¶

Within crn_data dictionary:

reaction_list: Array of reaction definitions
name: Unique identifier for the reaction
fixed_cost: One-time cost if reaction occurs (≥0)
unit_cost: Cost per unit quantity of reaction (≥0)
low: Minimum allowed reaction quantity (≥0)
high: Maximum allowed reaction quantity (≥low)
species_list: Array of chemical species definitions
name: Unique identifier for the species
stoich: Dictionary mapping reaction names to signed stoichiometric coefficients
- Positive values: species is produced by the reaction
- Negative values: species is consumed by the reaction

Inflow and Outflow Reactions¶

Chemical reaction networks require boundary conditions to represent material inputs and product outputs. These are typically modeled as special reactions:

Inflow reactions: Represent external material supply (e.g., raw material purchases). These reactions have no reactants and produce species needed by the network. They should positive low bounds if the material is a required input to ensure minimum supply, but if the material is optional (ex. one pathway to produce the desired target includes it but not another) a positive lower bound should not be given.
Outflow reactions: Represent product extraction or waste disposal. These reactions consume species produced by the network and have no products. They may have positive low bounds to ensure minimum production targets are met. For species that require waste disposal (are byproducts or reactions but not required target chemicals) they should not have a positive lower bound.

Without inflow and outflow reactions, the mass balance constraints would force all reaction quantities to zero, as no net production or consumption would be possible in a closed system.

Example: Complete Solvay Process CRN¶

The following example details the Solvay Process CRN, shown in the below figure: Figure 1: Solvay Reaction Network with Optimal Solution Quantities

The input crn_data would be in the below format:



href="#__codelineno-1-1">{ "reaction_list": [ { "name": "inflow1", "fixed_cost": 2, "unit_cost": 4, "low": 0, "high": 5 }, { "name": "inflow2", "fixed_cost": 7, "unit_cost": 3, "low": 1, "high": 5 }, { "name": "1", "fixed_cost": 2, "unit_cost": 1, "low": 0, "high": 5 }, { "name": "2", "fixed_cost": 4, "unit_cost": 2, "low": 0, "high": 5 }, { "name": "3", "fixed_cost": 9, "unit_cost": 3, "low": 0, "high": 5 }, { "name": "4", "fixed_cost": 1, "unit_cost": 8, "low": 0, "high": 5 }, { "name": "5", "fixed_cost": 7, "unit_cost": 2, "low": 0, "high": 5 }, { "name": "outflow1", "fixed_cost": 2, "unit_cost": 3, "low": 1, "high": 5 }, { "name": "outflow2", "fixed_cost": 4, "unit_cost": 4, "low": 1, "high": 5 } ], "species_list": [ { "name": "NaCl", "stoich": { "inflow1": 1, "1": -1 } }, { "name": "CO2", "stoich": { "1": -1, "2": 1, "3": 1 } }, { "name": "NaHCO3", "stoich": { "1": 1, "2": -2 } }, { "name": "Na2CO3", "stoich": { "2": 1, "outflow1": -1 } }, { "name": "H2O", "stoich": { "1": -1, "2": 1, "4": -1, "5": 2 } }, { "name": "NH3", "stoich": { "1": -1, "5": 2 } }, { "name": "NH4Cl", "stoich": { "1": 1, "5": -2 } }, { "name": "CaCO3", "stoich": { "inflow2": 1, "3": -1 } }, { "name": "CaO", "stoich": { "3": 1, "4": -1 } }, { "name": "Ca(OH)2", "stoich": { "4": 1, "5": -1 } }, { "name": "CaCl2", "stoich": { "5": 1, "outflow2": -1 } } ] class="p">}
This represents the complete Solvay process with:

inflow1: NaCl supply (salt input)
inflow2: CaCO₃ supply (limestone input) 
Reaction 1: Main ammonia-soda reaction: NaCl + CO₂ + H₂O + NH₃ → NaHCO₃ + NH₄Cl
Reaction 2: Sodium carbonate formation: 2 NaHCO₃ → Na₂CO₃ + CO₂ + H₂O
Reaction 3: Limestone calcination: CaCO₃ → CaO + CO₂
Reaction 4: Lime slaking: CaO + H₂O → Ca(OH)₂
Reaction 5: Ammonia recovery: Ca(OH)₂ + 2 NH₄Cl → 2 NH₃ + 2 H₂O + CaCl₂
outflow1: Na₂CO₃ product extraction
outflow2: CaCl₂ byproduct removal

Optimize reaction network with Cebule¶
# Create CRN optimization task
task_crn_opt = session.cebule.create_task(
    "Solvay Process Optimization",
    TaskType.RXN_NETWORK_OPT,
    crn_data=solvay_crn,
    time_limit=10
)

# Wait for result
result = session.cebule.wait_for_result(task_crn_opt.id, "result")
print(f"Optimal cost: {result['total_cost']}")
print(f"Reaction quantities: {result['reaction_quantities']}")
print(f"Optimization completed in {result['solve_time']} seconds")
print(f"Optimal solution found: {result['optimal']}")

Catalyst Design¶
Task Types¶
MAKE_SURF¶

Inputs:

element1: str first atomic element
element2: str second atomic element
surface_type: str
n_surfaces: int number of surfaces to generate
reaction_network: reaction_network definition
vacuum: number of vacuum layers over surface with length
dataset_tag: dataset tag/name for the dataset which is being generated
supercell_size_y: int or float?
layers: int

Optional Inputs:

support_structure: str
support_supercell_size_x: float
support_supercell_size_y: float
support_layers: int
support_lattice_const: float
support: str

Output:

GAS_SPECIES_ENERGY¶
energy_calculator: string
optimizer_type: string
unitcell_length:
energy_offsets_path: string

Optional Inputs:

dataset_tag: string
num_cores: int
SURFACE_REACTION_ENERGIES¶
reaction_type:
temperature: float;
pressure: float; Total pressure
adsorbate_positions_path: string

Optional Inputs:

single_surface: True or False
backup_site:
only_aggregate:
uncertainty: True or False; Enable kinetic uncertainty quantification using ensemble DFT calculations.
gfn: True or False
electronic_temperature: float
geometry_opt: True or False
fmax: float
dataset_tag: string
num_cores: int
GAN_TOF¶
reaction_type
dataset_tag
surf_data_tag
rx_energy_data_tag
epochs
nclass
score_max
score_min
scaler
min_loss_diff
WULFF_CONSTRUCTION¶
VISUALIZE_SURF¶
GAN_TOF_PLOT¶
POTENTIAL_ENERGY_DIAGRAM¶
FRACTIONAL_COVERAGES¶
REACTION_ENERGIES_COMPARISON¶