DFT, Semi-empirical Methods & Tight-binding Models¶
Geometry optimization via semi-empirical methods and machine-learned interatomic potentials (MLIPs)¶
Semi-empirical methods and machine-learned interatomic potentials (MLIPs) serve as computationally efficient alternatives to density functional theory (DFT), offering rapid yet reasonably accurate predictions of molecular structures and energetics.
Semi-empirical methods, such as the g-xTB, GFN2-xTB, and AM1 models, balance quantum mechanical accuracy and computational cost by employing parameterized approximations derived from experimental or higher-level theoretical data. g-xTB and GFN2-xTB are modern tight-binding approaches with anisotropic electrostatics and built-in dispersion corrections, offering excellent geometry predictions and non-covalent interaction energies; g-xTB can in most cases be a substitue for low/mid accuracy DFT calculations. AM1 (Austin Model 1) is a classic method using modified core-core repulsion functions, originally parameterized for organic molecules.
Similarly, MLIPs such as the ORB model use machine learning (graph neural networks, transformers) to infer interatomic interactions from reference datasets, enabling accurate force and energy evaluations for geometry optimizations at a fraction of DFT's computational expense. The newer ORB V3 offers over 10x lower latency and 8x reduced memory requirements compared to ORB V2, while maintaining or improving accuracy across a range of chemical systems. Both ORB versions were trained on extensive datasets covering diverse chemical space: ORB V2 was trained on a combination of the MPtrj and Alexandria datasets (containing approximately 30 million calculations on crystalline materials) at the DFT PBE level of theory, while ORB V3 incorporates additional data from the OMAT24 dataset, which includes high-energy configurations, molecular dynamics trajectories, and relaxation paths for a more comprehensive representation of potential energy surfaces and out-of-equilibrium structures. The OMAT24 dataset is particularly valuable as it contains DFT calculations for over 110 million structures with diverse elemental compositions covering most of the periodic table, with energy, force, and stress distributions much wider than previous datasets. The MACE-MP (Multi-Atomic Cluster Expansion - Materials Project) model provides accuracy for crystalline materials by leveraging the complete Many-Body Expansion and equivariant message passing, trained on the MPTrj database containing over 1.6 million bulk crystal structures from DFT relaxation trajectories. Meta AI's UMA (Universal Models for Atoms) is trained on diverse datasets including crystalline materials, catalysts with adsorbed species, and molecular systems --- over 30 billion atoms across all training data from Meta datasets released in the last 5 years --- with specialized task heads for different applications including catalysis (oc20), inorganic materials (omat), metal-organic frameworks (mof), molecules (omol), and molecular crystals (omc), enabling unified modeling across multiple domains of materials science.
Cebule SDK TaskType: GEOMETRY_OPT¶
- Semi-empirical/MLIP geometry optimization of molecule 3D coords after either initial force field optimization or user-defined geometry
- Inputs:
optimization_method: str
from [g_xtb, gfn2_xtb, am1, uma],smiles_list: List[str]
of SMILES andforce_field: str
from [mmff94, ghemical] for initial optimization, orgeometry_list: List[List[List[float]]]
of 3D coordinates andsymbols_list: List[List[str]]
of atomic symbols. - Cebule max_processors: Used to limit concurrency of optimization
- Output: List containing each molecule's optimized 3D coords (see Atom Order which defines the order of atoms in this outputted geometry list)
Example (SMILES Input):
task_geometry_opt = session.cebule.create_task("Geometry Opt Example",
TaskType.GEOMETRY_OPT,
smiles_list=["CCO", "O"],
# Optimize with MMFF94 force field followed by GFN2-xTB.
force_field="mmff94",
optimization_method="gfn2_xtb",
max_processors=4)
Example (Geometry and Symbols Input):
task_geometry_opt_coords = session.cebule.create_task("Geometry Opt Coords Example",
TaskType.GEOMETRY_OPT,
geometry_list=[[[0.0, 0.0, 0.0], [0.96, 0.0, 0.0]]],
symbols_list=[["O", "O"]],
optimization_method="gfn2_xtb",
max_processors=4)
Cebule SDK TaskType: PERIODIC_GEOMETRY_OPT¶
- MLIP/semi-empirical geometry optimization of periodic systems (crystals, surfaces, interfaces) using specialized calculators designed for extended systems
- Inputs:
optimization_method: str
from [mace_mp, orb_v2, orb_v3, uma, gfn1_xtb, gfn2_xtb],geometry: List[List[float]]
of atomic coordinates,symbols: List[str]
of atomic symbols,cell: List[List[float]]
as 3×3 cell parameter matrix,pbc: List[bool]
for periodic boundary conditions (UMA and ORB require[True, True, True]
), optionalfmax: float
force convergence (default 0.10), optionalfixed: List[bool]
to specify which atoms should be fixed during optimization (same length as geometry/symbols, default allows all atoms to move), andstructure_type: str
required for UMA only from [catalysis, inorganic_material, metal_organic_framework, molecular_crystal] - Cebule max_processors: Used to limit concurrency of optimization
- Output: Dictionary with
geometry
(optimized atomic coordinates) andsymbols
(preserved atomic symbols) keys
Examples:
-
Ni(111) surface with MACE-MP:
# 4-atom Ni(111) surface slab geometry = [[0.0, 0.0, 7.5], [2.49, 0.0, 7.5], [1.245, 2.156, 7.5], [3.735, 2.156, 7.5]] symbols = ["Ni", "Ni", "Ni", "Ni"] cell = [[4.98, 0.0, 0.0], [0.0, 4.312, 0.0], [0.0, 0.0, 15.0]] pbc = [True, True, False] task_geometry_opt_slab = session.cebule.create_task("Periodic Geometry Opt Ni Example", TaskType.PERIODIC_GEOMETRY_OPT, geometry=geometry, symbols=symbols, cell=cell, pbc=pbc, optimization_method="mace_mp", fmax=0.10, max_processors=4)
-
Pt(111) surface with adsorbed H using UMA:
# 6-atom Pt(111) surface with H adsorbate geometry = [[0.0, 0.0, 5.0], [2.77, 0.0, 5.0], [1.385, 2.40, 5.0], # Bottom Pt layer [0.0, 0.0, 7.77], [2.77, 0.0, 7.77], [1.385, 2.40, 7.77], # Top Pt layer [1.385, 0.80, 9.0]] # H adsorbate symbols = ["Pt", "Pt", "Pt", "Pt", "Pt", "Pt", "H"] cell = [[5.54, 0.0, 0.0], [0.0, 4.80, 0.0], [0.0, 0.0, 15.0]] pbc = [True, True, True] # UMA requires full periodicity so it can self-select pbc fixed = [True, True, True, False, False, False, False] # Fix bottom layer for optimization task_geometry_opt_catalyst = session.cebule.create_task("Periodic Geometry Opt Pt Example", TaskType.PERIODIC_GEOMETRY_OPT, geometry=geometry, symbols=symbols, cell=cell, pbc=pbc, optimization_method="uma", structure_type="catalysis", fmax=0.10, fixed=fixed, max_processors=4)
This section is still under development. For more information see the following pages and reference papers:
Density Functional Theory (DFT)
Semiempirical tight binding (TB) method GFN-xTB
Extended semiempirical tight-binding method/model GFN2-xTB
We have worked on various projects utilizing these models and their extensions/modifications/re-parameterizations.
Let us know if you want to discuss your use-case with us and evaluate which models should be used with respect to the molecules you work with (contact [at] mqs (dot) dk).