Chemical Reaction Networks Analysis and Catalyst Design¶
Reaction Networks¶
Chemical reaction networks have numerous different pathways to produce target molecules of interest. Solving the constrained combinatorial optimization problem of finding optimal reaction pathways has critical applications in synthesis planning and metabolic pathway analysis. This problem is generally a NP-hard problem due to the combinatorial explosion from the network graph size. A pre-print publication published by us, MQS, presents a benchmark between different solvers for reaction network analysis and the CP-SAT based approach has been implemented in the Cebule SDK which shows almost the same performance with respect to computational time (identical optimal minimum) in comparison to the best performing solver (Gurobi).
Mathematical Formulation¶
A chemical reaction network (CRN) can be represented as a bipartite directed graph with two types of nodes: reactions and species.
Each reaction in the network has associated costs:
- Unit cost (
unit_cost): Cost per quantity of the reaction (e.g., substrate purchase, byproduct disposal) - Fixed cost (
fixed_cost): One-time cost incurred if the reaction occurs at all (e.g., equipment preparation) - Bounds: Lower (
low) and upper (high) limits on reaction quantities
The optimization problem, as presented by Mizuno and Komatsuzaki, can be formulated as:
minimize: Σ(r∈R) [c_r^unit × x_r + c_r^fixed × p(x_r)]
subject to:
∀s∈S: Σ(r∈R) v_{s,r} × x_r = 0 (mass balance)
∀r∈R: l_r ≤ x_r ≤ u_r (reaction bounds)
Where:
Ris the set of reactions,Sis the set of speciesx_ris the quantity of reaction r (optimization variable)v_{s,r}is the signed stoichiometric coefficient of species s in reaction rp(x_r)is the positivity indicator function (1 if x_r > 0, 0 otherwise)
Task Types¶
RXN_NETWORK_OPT¶
- Optimize chemical reaction network pathways to minimize total cost while satisfying mass balance constraints
- Inputs:
crn_data: Dict[str, List[Dict]]containing the reaction network definition with keysreaction_listandspecies_list,time_limit: intmaximum solve time in seconds - Cebule max_processors: Specify the number of cores to use
- Output:
Dict[str, Union[Dict[str, int], float, bool, Dict[str, float]]]containing the keysreaction_quantities(mapping reaction names to optimal quantities),total_cost(minimum total cost achieved),solve_time(total time taken in seconds), andoptimal(whether optimal solution was found)
Input Format¶
Within crn_data dictionary:
reaction_list: Array of reaction definitionsname: Unique identifier for the reactionfixed_cost: One-time cost if reaction occurs (≥0)unit_cost: Cost per unit quantity of reaction (≥0)low: Minimum allowed reaction quantity (≥0)high: Maximum allowed reaction quantity (≥low)species_list: Array of chemical species definitionsname: Unique identifier for the speciesstoich: Dictionary mapping reaction names to signed stoichiometric coefficients- Positive values: species is produced by the reaction
- Negative values: species is consumed by the reaction
Inflow and Outflow Reactions¶
Chemical reaction networks require boundary conditions to represent material inputs and product outputs. These are typically modeled as special reactions:
-
Inflow reactions: Represent external material supply (e.g., raw material purchases). These reactions have no reactants and produce species needed by the network. They should positive
lowbounds if the material is a required input to ensure minimum supply, but if the material is optional (ex. one pathway to produce the desired target includes it but not another) a positive lower bound should not be given. -
Outflow reactions: Represent product extraction or waste disposal. These reactions consume species produced by the network and have no products. They may have positive
lowbounds to ensure minimum production targets are met. For species that require waste disposal (are byproducts or reactions but not required target chemicals) they should not have a positive lower bound.
Without inflow and outflow reactions, the mass balance constraints would force all reaction quantities to zero, as no net production or consumption would be possible in a closed system.
Example: Complete Solvay Process CRN¶
The following example details the Solvay Process CRN, shown in the below figure:
Figure 1: Solvay Reaction Network with Optimal Solution Quantities
The input crn_data would be in the below format:
{
"reaction_list": [
{
"name": "inflow1",
"fixed_cost": 2,
"unit_cost": 4,
"low": 0,
"high": 5
},
{
"name": "inflow2",
"fixed_cost": 7,
"unit_cost": 3,
"low": 1,
"high": 5
},
{
"name": "1",
"fixed_cost": 2,
"unit_cost": 1,
"low": 0,
"high": 5
},
{
"name": "2",
"fixed_cost": 4,
"unit_cost": 2,
"low": 0,
"high": 5
},
{
"name": "3",
"fixed_cost": 9,
"unit_cost": 3,
"low": 0,
"high": 5
},
{
"name": "4",
"fixed_cost": 1,
"unit_cost": 8,
"low": 0,
"high": 5
},
{
"name": "5",
"fixed_cost": 7,
"unit_cost": 2,
"low": 0,
"high": 5
},
{
"name": "outflow1",
"fixed_cost": 2,
"unit_cost": 3,
"low": 1,
"high": 5
},
{
"name": "outflow2",
"fixed_cost": 4,
"unit_cost": 4,
"low": 1,
"high": 5
}
],
"species_list": [
{
"name": "NaCl",
"stoich": {
"inflow1": 1,
"1": -1
}
},
{
"name": "CO2",
"stoich": {
"1": -1,
"2": 1,
"3": 1
}
},
{
"name": "NaHCO3",
"stoich": {
"1": 1,
"2": -2
}
},
{
"name": "Na2CO3",
"stoich": {
"2": 1,
"outflow1": -1
}
},
{
"name": "H2O",
"stoich": {
"1": -1,
"2": 1,
"4": -1,
"5": 2
}
},
{
"name": "NH3",
"stoich": {
"1": -1,
"5": 2
}
},
{
"name": "NH4Cl",
"stoich": {
"1": 1,
"5": -2
}
},
{
"name": "CaCO3",
"stoich": {
"inflow2": 1,
"3": -1
}
},
{
"name": "CaO",
"stoich": {
"3": 1,
"4": -1
}
},
{
"name": "Ca(OH)2",
"stoich": {
"4": 1,
"5": -1
}
},
{
"name": "CaCl2",
"stoich": {
"5": 1,
"outflow2": -1
}
}
]
}
This represents the complete Solvay process with:
- inflow1: NaCl supply (salt input)
- inflow2: CaCO₃ supply (limestone input)
- Reaction 1: Main ammonia-soda reaction: NaCl + CO₂ + H₂O + NH₃ → NaHCO₃ + NH₄Cl
- Reaction 2: Sodium carbonate formation: 2 NaHCO₃ → Na₂CO₃ + CO₂ + H₂O
- Reaction 3: Limestone calcination: CaCO₃ → CaO + CO₂
- Reaction 4: Lime slaking: CaO + H₂O → Ca(OH)₂
- Reaction 5: Ammonia recovery: Ca(OH)₂ + 2 NH₄Cl → 2 NH₃ + 2 H₂O + CaCl₂
- outflow1: Na₂CO₃ product extraction
- outflow2: CaCl₂ byproduct removal
Optimize reaction network with Cebule¶
# Create CRN optimization task
task_crn_opt = session.cebule.create_task(
"Solvay Process Optimization",
TaskType.RXN_NETWORK_OPT,
crn_data=solvay_crn,
time_limit=10
)
# Wait for result
result = session.cebule.wait_for_result(task_crn_opt.id, "result")
print(f"Optimal cost: {result['total_cost']}")
print(f"Reaction quantities: {result['reaction_quantities']}")
print(f"Optimization completed in {result['solve_time']} seconds")
print(f"Optimal solution found: {result['optimal']}")
Catalyst Design¶
Task Types¶
MAKE_SURF¶
- Inputs:
element1: str first atomic element
element2: str second atomic element
surface_type: str
n_surfaces: int number of surfaces to generate
reaction_network: reaction_network definition
vacuum: number of vacuum layers over surface with length
dataset_tag: dataset tag/name for the dataset which is being generated
supercell_size_y: int or float?
layers: int
- Optional Inputs:
support_structure: str
support_supercell_size_x: float
support_supercell_size_y: float
support_layers: int
support_lattice_const: float
support: str
- Output:
GAS_SPECIES_ENERGY¶
energy_calculator: string
optimizer_type: string
unitcell_length:
energy_offsets_path: string
- Optional Inputs:
dataset_tag: string
num_cores: int
SURFACE_REACTION_ENERGIES¶
reaction_type:
temperature: float;
pressure: float; Total pressure
adsorbate_positions_path: string
- Optional Inputs:
single_surface: True or False
backup_site:
only_aggregate:
uncertainty: True or False; Enable kinetic uncertainty quantification using ensemble DFT calculations.
gfn: True or False
electronic_temperature: float
geometry_opt: True or False
fmax: float
dataset_tag: string
num_cores: int
GAN_TOF¶
reaction_type
dataset_tag
surf_data_tag
rx_energy_data_tag
epochs
nclass
score_max
score_min
scaler
min_loss_diff