DFT, Semi-empirical Methods & Tight-binding Models¶

Geometry optimization via semi-empirical methods and machine-learned interatomic potentials (MLIPs)¶

Semi-empirical methods and machine-learned interatomic potentials (MLIPs) serve as computationally efficient alternatives to density functional theory (DFT), offering rapid yet reasonably accurate predictions of molecular structures and energetics.

Semi-empirical methods, such as the g-xTB, GFN2-xTB, and AM1 models, balance quantum mechanical accuracy and computational cost by employing parameterized approximations derived from experimental or higher-level theoretical data. g-xTB and GFN2-xTB are modern tight-binding approaches with anisotropic electrostatics and built-in dispersion corrections, offering excellent geometry predictions and non-covalent interaction energies; g-xTB can in most cases be a substitue for low/mid accuracy DFT calculations. AM1 (Austin Model 1) is a classic method using modified core-core repulsion functions, originally parameterized for organic molecules.

Similarly, MLIPs such as the ORB model use machine learning (graph neural networks, transformers) to infer interatomic interactions from reference datasets, enabling accurate force and energy evaluations for geometry optimizations at a fraction of DFT's computational expense. The newer ORB V3 offers over 10x lower latency and 8x reduced memory requirements compared to ORB V2, while maintaining or improving accuracy across a range of chemical systems. Both ORB versions were trained on extensive datasets covering diverse chemical space: ORB V2 was trained on a combination of the MPtrj and Alexandria datasets (containing approximately 30 million calculations on crystalline materials) at the DFT PBE level of theory, while ORB V3 incorporates additional data from the OMAT24 dataset, which includes high-energy configurations, molecular dynamics trajectories, and relaxation paths for a more comprehensive representation of potential energy surfaces and out-of-equilibrium structures. The OMAT24 dataset is particularly valuable as it contains DFT calculations for over 110 million structures with diverse elemental compositions covering most of the periodic table, with energy, force, and stress distributions much wider than previous datasets. The MACE-MP (Multi-Atomic Cluster Expansion - Materials Project) model provides accuracy for crystalline materials by leveraging the complete Many-Body Expansion and equivariant message passing, trained on the MPTrj database containing over 1.6 million bulk crystal structures from DFT relaxation trajectories. The PET-MAD (Point Edge Transformer trained on Massive Atomic Diversity Dataset) model combines transformer architectures with physics-informed learning to achieve high accuracy across molecular and materials systems, trained on 95,595 structures, including 3D and 2D inorganic crystals, surfaces, molecular crystals, nanoclusters, and molecules. Meta AI's UMA (Universal Models for Atoms) is trained on diverse datasets including crystalline materials, catalysts with adsorbed species, and molecular systems --- over 30 billion atoms across all training data from Meta datasets released in the last 5 years --- with specialized task heads for different applications including catalysis (oc20), inorganic materials (omat), metal-organic frameworks (mof), molecules (omol), and molecular crystals (omc), enabling unified modeling across multiple domains of materials science.

Cebule SDK TaskType: GEOMETRY_OPT¶

Semi-empirical/MLIP geometry optimization of molecule 3D coords after either initial force field optimization or user-defined geometry
Inputs: optimization_method: str from [g_xtb, gfn2_xtb, am1, uma], smiles_list: List[str] of SMILES and force_field: str from [mmff94, ghemical] for initial optimization, or geometry_list: List[List[List[float]]] of 3D coordinates and symbols_list: List[List[str]] of atomic symbols.
Cebule max_processors: Used to limit concurrency of optimization
Output: List containing each molecule's optimized 3D coords (see Atom Order which defines the order of atoms in this outputted geometry list)

Example (SMILES Input):

task_geometry_opt = session.cebule.create_task("Geometry Opt Example",
                                               TaskType.GEOMETRY_OPT, 
                                               smiles_list=["CCO", "O"], 
                                               # Optimize with MMFF94 force field followed by GFN2-xTB.
                                               force_field="mmff94", 
                                               optimization_method="gfn2_xtb",
                                               max_processors=4)

Example (Geometry and Symbols Input):

task_geometry_opt_coords = session.cebule.create_task("Geometry Opt Coords Example",
                                                     TaskType.GEOMETRY_OPT,
                                                     geometry_list=[[[0.0, 0.0, 0.0], [0.96, 0.0, 0.0]]],
                                                     symbols_list=[["O", "O"]],
                                                     optimization_method="gfn2_xtb",
                                                     max_processors=4)

Cebule SDK TaskType: PERIODIC_GEOMETRY_OPT¶

MLIP/semi-empirical geometry optimization of periodic systems (crystals, surfaces, interfaces) using specialized calculators designed for extended systems
Inputs: optimization_method: str from [mace_mp, orb_v2, orb_v3, uma, pet_mad, gfn1_xtb, gfn2_xtb], geometry: List[List[float]] of atomic coordinates, symbols: List[str] of atomic symbols, cell: List[List[float]] as 3×3 cell parameter matrix, pbc: List[bool] for periodic boundary conditions (UMA and ORB require [True, True, True]), optional fmax: float force convergence (default 0.10), optional fixed: List[bool] to specify which atoms should be fixed during optimization (same length as geometry/symbols, default allows all atoms to move), and structure_type: str required for UMA only from [catalysis, inorganic_material, metal_organic_framework, molecular_crystal]
Cebule max_processors: Used to limit concurrency of optimization
Output: Dictionary with geometry (optimized atomic coordinates), symbols (preserved atomic symbols), and energy (final energy in eV) keys

Examples:

Ni(111) surface with MACE-MP:

# 4-atom Ni(111) surface slab
geometry = [[0.0, 0.0, 7.5], [2.49, 0.0, 7.5], [1.245, 2.156, 7.5], [3.735, 2.156, 7.5]]
symbols = ["Ni", "Ni", "Ni", "Ni"] 
cell = [[4.98, 0.0, 0.0], [0.0, 4.312, 0.0], [0.0, 0.0, 15.0]]
pbc = [True, True, False]

task_geometry_opt_slab = session.cebule.create_task("Periodic Geometry Opt Ni Example",
                                             TaskType.PERIODIC_GEOMETRY_OPT,
                                             geometry=geometry,
                                             symbols=symbols, 
                                             cell=cell,
                                             pbc=pbc,
                                             optimization_method="mace_mp",
                                             fmax=0.10,
                                             max_processors=4)

Pt(111) surface with adsorbed H using UMA:

# 6-atom Pt(111) surface with H adsorbate
geometry = [[0.0, 0.0, 5.0], [2.77, 0.0, 5.0], [1.385, 2.40, 5.0],     # Bottom Pt layer
            [0.0, 0.0, 7.77], [2.77, 0.0, 7.77], [1.385, 2.40, 7.77],   # Top Pt layer  
            [1.385, 0.80, 9.0]]                                           # H adsorbate
symbols = ["Pt", "Pt", "Pt", "Pt", "Pt", "Pt", "H"]
cell = [[5.54, 0.0, 0.0], [0.0, 4.80, 0.0], [0.0, 0.0, 15.0]]
pbc = [True, True, True]  # UMA requires full periodicity so it can self-select pbc
fixed = [True, True, True, False, False, False, False]  # Fix bottom layer for optimization

task_geometry_opt_catalyst = session.cebule.create_task("Periodic Geometry Opt Pt Example",
                                                TaskType.PERIODIC_GEOMETRY_OPT,
                                                geometry=geometry,
                                                symbols=symbols,
                                                cell=cell,
                                                pbc=pbc,
                                                optimization_method="uma",
                                                structure_type="catalysis",
                                                fmax=0.10,
                                                fixed=fixed,
                                                max_processors=4)

This section is still under development. For more information see the following pages and reference papers:

Density Functional Theory (DFT)

Kohn-Sham equations

Plane-Wave DFT (PW-DFT)

Semi-empirical methods

PM6

PM7

Semiempirical tight binding (TB) method GFN-xTB

Extended semiempirical tight-binding method/model GFN2-xTB

We have worked on various projects utilizing these models and their extensions/modifications/re-parameterizations.

Let us know if you want to discuss your use-case with us and evaluate which models should be used with respect to the molecules you work with (contact [at] mqs (dot) dk).