Skip to content

DFT, Semi-empirical Methods & Tight-binding Models

DFT fundamentals: from Hohenberg-Kohn to practical calculations

Density Functional Theory represents a paradigm shift from traditional wavefunction methods by expressing all ground-state properties through the electron density n(r), a function of only 3 spatial coordinates rather than 3N coordinates for N electrons.

The theoretical foundation rests on two Hohenberg-Kohn theorems: the first establishes that the external potential (and hence total energy) is a unique functional of electron density, while the second proves that the correct ground-state density minimizes the energy functional.

The Kohn-Sham framework makes DFT computationally tractable by mapping the intractable many-body problem onto a fictitious system of non-interacting electrons moving in an effective potential that reproduces the real ground-state density.

The Kohn-Sham equation takes the familiar eigenvalue form: \([-ℏ²/2m ∇² + v_eff(r)] φᵢ(r) = εᵢ φᵢ(r)\), where the many-body physics is contained in the exchange-correlation potential \(v_xc(r)\).

The self-consistent field (SCF) procedure iteratively solves these equations: an initial density guess generates an effective potential, which yields new orbitals, which produce a new density—repeating until convergence.

The Jacob's ladder: choosing the right exchange-correlation functional

All DFT approximations arise from the unknown exchange-correlation functional, organized by John Perdew into a hierarchy called Jacob's Ladder with increasing sophistication and accuracy.

Rung 1 (LDA) depends only on local electron density and is exact for the uniform electron gas, but overbinds molecules. Rung 2 (GGA) adds density gradients, with PBE being non-empirical and widely used for solids, while BLYP suits organic molecules. Rung 3 (meta-GGA) includes kinetic energy density; r2SCAN satisfies 17 exact constraints with improved numerical stability over SCAN.

Hybrid functionals (Rung 4) incorporate Hartree-Fock exchange. B3LYP remains the most widely used functional in computational chemistry with 20% HF exchange, though it must always be paired with dispersion corrections. PBE0 (25% HF exchange) provides better barrier heights. M06-2X (54% HF exchange) excels for thermochemistry and noncovalent interactions but requires fine integration grids.

Range-separated hybrids vary HF exchange with distance. ωB97X-D (22% short-range → 100% long-range HF exchange) includes built-in dispersion and represents an excellent general-purpose choice. For highest accuracy, double-hybrid functionals like revDSD-PBEP86-D4 incorporate MP2-like correlation but cost 10-100× more than standard hybrids.

Dispersion corrections: essential for modern DFT

Standard DFT functionals fundamentally cannot describe London dispersion forces—the long-range correlation responsible for van der Waals interactions.

Dispersion corrections are mandatory for non-covalent complexes, conformational energies, crystalstructures, and any system larger than ~10 atoms.

DFT-D3(BJ) with Becke-Johnson damping uses coordination-number-dependent C₆ coefficients and includes R⁻⁸ terms with finite dispersion at R→0 providing more physical behavior than zero-damping.

Parameters exist for over 60 functionals. DFT-D4 represents the current state-of-the-art, using charge-dependent polarizabilities that improve performance for metal-containing systems, with 3.8% mean relative deviation for C₆ coefficients versus 4.7% for D3.

VV10 non-local correlation takes a fundamentally different approach as a true density functional rather than atom-pairwise correction. Functionals ending in "-V" (ωB97X-V, ωB97M-V) have built-in VV10 and should never be combined with D3/D4 corrections.

Basis set selection for production calculations

Karlsruhe def2 basis sets are recommended for general-purpose DFT work. def2-SVP (split-valence polarized) suits initial optimizations, def2-TZVP (triple-zeta polarized) represents the production standard, and def2-QZVP approaches complete basis set accuracy. Adding "D" provides diffuse functions essential for anions, excited states, and hydrogen bonding.

Dunning correlation-consistent basis sets (cc-pVXZ) enable systematic CBS extrapolation—crucial for coupled-cluster benchmarks—with aug- prefixes adding diffuse functions.

Pople basis sets (6-31G, 6-311++G*) remain common historically but have known issues: the 6-311G family has functions that are too tight, describing core rather than valence regions.

Application Recommended Basis
Geometry screening def2-SVP
Production optimization def2-TZVP
Accurate energies def2-TZVPP(D) or aug-cc-pVTZ
Anions/weak interactions def2-SVPD with counterpoise

PM6 and PM7: workhorse semi-empirical methods

GFN-xTB: fast and accurate for large systems

DFTB and specialized methods

Method selection guide

Use case Primary recommendation Alternative
Small molecules ωB97X-D/def2-TZVP r2SCAN-D4/def2-TZVP
(<50 atoms),
high accuracy
Large molecules GFN2-xTB PM7
(50-1000 atoms)
High-throughput GFN2-xTB or PM6-D3H4 GFN-FF
screening
Conformer sampling GFN2-xTB + CREST PM7 + screening
Reaction barriers g-xTB or PBE0-D3(BJ) ωB97X-D
Noncovalent interactions ωB97M-V or GFN2-xTB PM6-D3H4X
Transition metals GFN1-xTB or PBE0-D3 BP86
Periodic solids PW-DFT or PM7 DFTB3

Computational complexity/scaling behaviour of methods:

Method Scaling System size
CCSD(T) O() <20 atoms
DFT O() 50-200 atoms
GFN2-xTB or g-xTB O() 100-1000 atoms
PM6/PM7 O() 100-1000 atoms
GFN-FF O() 1000+ atoms

GPU accelerated methods

SCF

Geometry optimization via semi-empirical methods and machine-learned interatomic potentials (MLIPs)

Semi-empirical methods and machine-learned interatomic potentials (MLIPs) serve as computationally efficient alternatives to density functional theory (DFT), offering rapid yet reasonably accurate predictions of molecular structures and energetics.

Semi-empirical methods, such as the g-xTB, GFN2-xTB, and AM1 models, balance quantum mechanical accuracy and computational cost by employing parameterized approximations derived from experimental or higher-level theoretical data. g-xTB and GFN2-xTB are modern tight-binding approaches with anisotropic electrostatics and built-in dispersion corrections, offering excellent geometry predictions and non-covalent interaction energies; g-xTB can in most cases be a substitue for low/mid accuracy DFT calculations. AM1 (Austin Model 1) is a classic method using modified core-core repulsion functions, originally parameterized for organic molecules.

Similarly, MLIPs such as:

  • The ORB models apply machine learning (graph neural networks, transformers) to infer interatomic interactions from reference datasets, enabling accurate force and energy evaluations for geometry optimizations at a fraction of DFT's computational expense. The newer ORB V3 offers over 10x lower latency and 8x reduced memory requirements compared to ORB V2, while maintaining or improving accuracy across a range of chemical systems. Both ORB versions were trained on extensive datasets covering diverse chemical space: ORB V2 was trained on a combination of the MPtrj and Alexandria datasets (containing approximately 30 million calculations on crystalline materials) at the DFT PBE level of theory, while ORB V3 incorporates additional data from the OMAT24 dataset, which includes high-energy configurations, molecular dynamics trajectories, and relaxation paths for a more comprehensive representation of potential energy surfaces and out-of-equilibrium structures. The OMAT24 dataset is particularly valuable as it contains DFT calculations for over 110 million structures with diverse elemental compositions covering most of the periodic table, with energy, force, and stress distributions much wider than previous datasets.

  • The MACE-MP (Multi-Atomic Cluster Expansion - Materials Project) model provides accuracy for crystalline materials by leveraging the complete Many-Body Expansion and equivariant message passing, trained on the MPTrj database containing over 1.6 million bulk crystal structures from DFT relaxation trajectories.

  • The PET-MAD (Point Edge Transformer trained on Massive Atomic Diversity Dataset) model combines transformer architectures with physics-informed learning to achieve high accuracy across molecular and materials systems, trained on 95,595 structures, including 3D and 2D inorganic crystals, surfaces, molecular crystals, nanoclusters, and molecules. Meta AI's UMA (Universal Models for Atoms) is trained on diverse datasets including crystalline materials, catalysts with adsorbed species, and molecular systems --- over 30 billion atoms across all training data from Meta datasets released in the last 5 years --- with specialized task heads for different applications including catalysis (oc20), inorganic materials (omat), metal-organic frameworks (mof), molecules (omol), and molecular crystals (omc), enabling unified modeling across multiple domains of materials science.

  • The GRACE-2L-OMAT model is a two-layer machine learning interatomic potential that was pre-fitted on the Meta Open Materials 2024 dataset and fine-tuned on the sAlex and MPTraj datasets. The two-layer models include semi-local interactions mediated by equivariant message passing and employ chemical embedding for efficiently condensing chemical interactions into low rank representations.

Cebule SDK TaskType: GEOMETRY_OPT

  • Semi-empirical/MLIP geometry optimization of molecule 3D coords after either initial force field optimization or user-defined geometry

  • Inputs:

optimization_method: str from [g_xtb, gfn2_xtb, am1, uma]

smiles_list: List[str] SMILES list

force_field: str from [mmff94, ghemical] for initial optimization

geometry_list: List[List[List[float]]] of 3D coordinates can be provided

symbols_list: List[List[str]] of atomic symbols.

  • Cebule max_processors: Used to limit concurrency of optimization

  • Output:

List containing each molecule's optimized 3D coords (see Atom Order which defines the order of atoms in this outputted geometry list)

Example (SMILES Input):

task_geometry_opt = session.cebule.create_task("Geometry Opt Example",
                                               TaskType.GEOMETRY_OPT, 
                                               smiles_list=["CCO", "O"], 
                                               # Optimize with MMFF94 force field followed by GFN2-xTB.
                                               force_field="mmff94", 
                                               optimization_method="gfn2_xtb",
                                               max_processors=4)

Example (Geometry and Symbols Input):

task_geometry_opt_coords = session.cebule.create_task("Geometry Opt Coords Example",
                                                     TaskType.GEOMETRY_OPT,
                                                     geometry_list=[[[0.0, 0.0, 0.0], [0.96, 0.0, 0.0]]],
                                                     symbols_list=[["O", "O"]],
                                                     optimization_method="gfn2_xtb",
                                                     max_processors=4)

Cebule SDK TaskType: PERIODIC_GEOMETRY_OPT

  • MLIP/semi-empirical geometry optimization of periodic systems (crystals, surfaces, interfaces) applying suitable models (e.g. MLIPs, tight-binding) for extended systems.

  • Inputs:

optimization_method: str from [mace_mp, orb_v2, orb_v3, uma, pet_mad, grace-2l-omat, gfn1_xtb, gfn2_xtb]

geometry: List[List[float]] of atomic coordinates

symbols: List[str] of atomic symbols

cell: List[List[float]] as 3×3 cell parameter matrix

pbc: List[bool] for periodic boundary conditions (UMA and ORB require [True, True, True])

  • Optional Inputs:

optional fmax: float force convergence (default 0.10), optional fixed: List[bool] to specify which atoms should be fixed during optimization (same length as geometry/symbols, default allows all atoms to move),

structure_type: str required for UMA only from [catalysis, inorganic_material, metal_organic_framework, molecular_crystal]

max_processors: int Used to limit concurrency of optimization

  • Output:

geometry: dict optimized atomic coordinates

symbols: list preserved atomic symbols

energy: dict final energy in eV

Examples:

  1. Ni(111) surface with MACE-MP:

    # 4-atom Ni(111) surface slab
    geometry = [[0.0, 0.0, 7.5], [2.49, 0.0, 7.5], [1.245, 2.156, 7.5], [3.735, 2.156, 7.5]]
    symbols = ["Ni", "Ni", "Ni", "Ni"] 
    cell = [[4.98, 0.0, 0.0], [0.0, 4.312, 0.0], [0.0, 0.0, 15.0]]
    pbc = [True, True, False]
    
    task_geometry_opt_slab = session.cebule.create_task("Periodic Geometry Opt Ni Example",
                                                 TaskType.PERIODIC_GEOMETRY_OPT,
                                                 geometry=geometry,
                                                 symbols=symbols, 
                                                 cell=cell,
                                                 pbc=pbc,
                                                 optimization_method="mace_mp",
                                                 fmax=0.10,
                                                 max_processors=4)
    

  2. Pt(111) surface with adsorbed H using UMA:

    # 6-atom Pt(111) surface with H adsorbate
    geometry = [[0.0, 0.0, 5.0], [2.77, 0.0, 5.0], [1.385, 2.40, 5.0],     # Bottom Pt layer
                [0.0, 0.0, 7.77], [2.77, 0.0, 7.77], [1.385, 2.40, 7.77],   # Top Pt layer  
                [1.385, 0.80, 9.0]]                                           # H adsorbate
    symbols = ["Pt", "Pt", "Pt", "Pt", "Pt", "Pt", "H"]
    cell = [[5.54, 0.0, 0.0], [0.0, 4.80, 0.0], [0.0, 0.0, 15.0]]
    pbc = [True, True, True]  # UMA requires full periodicity so it can self-select pbc
    fixed = [True, True, True, False, False, False, False]  # Fix bottom layer for optimization
    
    task_geometry_opt_catalyst = session.cebule.create_task("Periodic Geometry Opt Pt Example",
                                                    TaskType.PERIODIC_GEOMETRY_OPT,
                                                    geometry=geometry,
                                                    symbols=symbols,
                                                    cell=cell,
                                                    pbc=pbc,
                                                    optimization_method="uma",
                                                    structure_type="catalysis",
                                                    fmax=0.10,
                                                    fixed=fixed,
                                                    max_processors=4)
    

This section is still under development. For more information see the following pages and reference papers:

Density Functional Theory (DFT)

Kohn-Sham equations

Plane-Wave DFT (PW-DFT)

Semi-empirical methods

PM6

PM7

Semiempirical tight binding (TB) method GFN-xTB

Extended semiempirical tight-binding method/model GFN2-xTB

We have worked on various projects utilizing these models and their extensions/modifications/re-parameterizations.

Let us know if you want to discuss your use-case with us and evaluate which models should be used with respect to the molecules and structures you work with (contact [at] mqs (dot) dk).