Skip to content

Statistical Thermodynamic Models

UNIFAC models: theory, variants, and implementation

The UNIFAC (Universal Quasichemical Functional-group Activity Coefficients) method enables prediction of activity coefficients in liquid mixtures without requiring experimental data for the specific mixture of interest—a capability that has made it indispensable in chemical process design. By treating molecules as assemblies of functional groups rather than unique entities, UNIFAC reduces the parameter space from millions of potential molecule pairs to roughly 50-60 functional groups, making thermodynamic predictions tractable for any combination of molecules containing tabulated groups.

This documentation covers the theoretical foundations, mathematical formulations, and practical implementation of Original UNIFAC, Modified UNIFAC (Dortmund), PSRK, and the recent UNIFAC 2.0 development incorporating machine learning. Further, Cebule supports also specific parameterized UNIFAC models, namely UNIFAC for surfactant solutions and UNIFAC for ionic liquid viscosity predictions (UNIFAC-VISCO-IL).


Theoretical foundations and the group contribution concept

Traditional activity coefficient models such as NRTL and Wilson require binary interaction parameters fitted to experimental data for each molecular pair. Given the astronomical number of possible binary combinations in organic chemistry, this approach creates an insurmountable data collection problem.

UNIFAC elegantly sidesteps this limitation through a key insight: molecular interactions can be approximated as the sum of contributions from constituent functional groups. A hydroxyl group (-OH) behaves similarly whether attached to methanol or ethanol. The fragmentation of molecules into distinct functional groups allows to describe any molecule of interest with these functional groups. This "solution-of-groups" concept originated with the landmark paper by Wilson and Deal in 1962 1 and was formalized by Fredenslund, Jones, and Prausnitz in their landmark 1975 AIChE Journal paper 2.

UNIFAC builds directly on UNIQUAC

UNIFAC extends the UNIQUAC (Universal Quasi-Chemical) model developed by Abrams and Prausnitz 3, which generalizes Guggenheim's quasi-chemical theory using local surface area fractions. The relationship is straightforward: UNIFAC = UNIQUAC + functional group contribution. UNIQUAC requires molecular parameters and binary interaction data, UNIFAC calculates molecular parameters from group contributions and uses tabulated group interaction parameters.

Feature UNIQUAC UNIFAC
Parameters Molecular \(r\), \(q\) + binary interactions Group \(R_k\), \(Q_k\) + group interactions
Data required Experimental for each molecule pair Tabulated group parameters
Capability Correlation Prediction

The UNIFAC consortium manages the parameter database for the original UNIFAC, modified UNIFAC, PSRK and VTPR models. The supplementary material of the following reference has documented parameters for the UNIFAC and PSRK parameters at the time of pulbication in 2005: https://doi.org/10.1016/j.fluid.2004.11.002


Original UNIFAC: complete mathematical formulation

The activity coefficient γᵢ for molecule i decomposes into combinatorial and residual contributions:

\[\ln \gamma_i = \ln \gamma_i^C + \ln \gamma_i^R\]

The combinatorial term accounts for entropic effects from molecular size and shape differences; the residual term captures energetic interactions between functional groups.

Combinatorial contribution (Staverman-Guggenheim)

The combinatorial contribution derives from lattice theory 4:

\[\ln \gamma_i^C = \ln\frac{\Phi_i}{x_i} + \frac{z}{2}q_i\ln\frac{\theta_i}{\Phi_i} + L_i - \frac{\Phi_i}{x_i}\sum_j x_j L_j\]

where the component fractions and parameters are defined as:

Volume fraction: $\(\Phi_i = \frac{r_i x_i}{\sum_j r_j x_j}\)$

Surface area fraction: $\(\theta_i = \frac{q_i x_i}{\sum_j q_j x_j}\)$

Bulk factor: $\(L_i = \frac{z}{2}(r_i - q_i) - (r_i - 1)\)$

The coordination number is set to z = 10 as a standard value.

Molecular volume (\(r_i\)) and surface area (\(q_i\)) parameters are additive over groups:

\[r_i = \sum_k \nu_k^{(i)} R_k \qquad q_i = \sum_k \nu_k^{(i)} Q_k\]

where νₖ⁽ⁱ⁾ is the count of group k in molecule i, and Rₖ, Qₖ are group-specific parameters derived from van der Waals volumes and surface areas using Bondi's method with normalization constants Vws = 15.17 cm³/mol and Aws = 2.5 × 10⁹ cm²/mol.

Residual contribution (solution-of-groups)

The residual term sums group activity coefficient differences:

\[\ln \gamma_i^R = \sum_k \nu_k^{(i)}\left[\ln \Gamma_k - \ln \Gamma_k^{(i)}\right]\]

Here Γₖ is the group activity coefficient in the mixture and Γₖ⁽ⁱ⁾ is the same quantity in pure component i. The subtraction ensures proper normalization (γᵢ → 1 as xᵢ → 1).

Group activity coefficient: \(\ln \Gamma_k = Q_k\left[1 - \ln\sum_m \Theta_m \Psi_{mk} - \sum_m \frac{\Theta_m \Psi_{km}}{\sum_n \Theta_n \Psi_{nm}}\right]\)

Group surface area fraction: \(\Theta_m = \frac{Q_m X_m}{\sum_n Q_n X_n}\)

Group mole fraction: \(X_m = \frac{\sum_j \nu_m^{(j)} x_j}{\sum_j \sum_n \nu_n^{(j)} x_j}\)

Group interaction parameters

The group-group interactions parameter matrix is defined as:

\[\Psi_{mn} = \exp\left(-\frac{a_{mn}}{T}\right)\]

where aₘₙ has units of Kelvin and represents (Uₘₙ - Uₙₙ)/R. These parameters are asymmetric: aₘₙ ≠ aₙₘ, and self-interactions give aₘₘ = 0 (hence Ψₘₘ = 1). The aₘₙ values are regressed from experimental VLE data and tabulated.


Modified UNIFAC (Dortmund): enhanced accuracy across conditions

Developed by Weidlich and Gmehling (1987) at the University of Dortmund, Modified UNIFAC addresses systematic limitations of the original model through three key modifications that collectively reduce prediction errors by 50-70% for certain properties.

Modified combinatorial term with 3/4 exponent

The original Staverman-Guggenheim term systematically underpredicts activity coefficients for size-asymmetric systems. The Dortmund modification introduces a modified volume fraction with exponent 3/4:

\[\ln \gamma_i^C = 1 - V_i' + \ln V_i' - 5q_i\left(1 - \frac{V_i}{F_i} + \ln\frac{V_i}{F_i}\right)\]

where: $\(V_i' = \frac{r_i^{3/4}}{\sum_j x_j r_j^{3/4}}\)$

\[V_i = \frac{r_i}{\sum_j x_j r_j} \qquad F_i = \frac{q_i}{\sum_j x_j q_j}\]

This modification reduces the Flory-Huggins overcorrection, improving predictions for mixtures such as ethanol + hexadecane where molecular sizes differ substantially.

Temperature-dependent interaction parameters

Original UNIFAC cannot simultaneously describe VLE and excess enthalpies because the Gibbs-Helmholtz relation requires proper temperature dependence. Modified UNIFAC (Dortmund) introduces a quadratic temperature dependence:

\[\Psi_{mn} = \exp\left(-\frac{a_{mn,1} + a_{mn,2} \cdot T + a_{mn,3} \cdot T^2}{T}\right)\]

This formulation uses up to six parameters per group pair (three each for m→n and n→m), enabling:

  • Correct activity coefficient temperature dependence

  • Accurate excess enthalpy (hᴱ) predictions

  • Improved extrapolation to temperatures above 140°C or below 0°C

Training data and parameter development

Unlike Original UNIFAC (fitted primarily to VLE data), Modified UNIFAC (Dortmund) parameters are fitted simultaneously to:

  • Vapor-liquid equilibrium (VLE)

  • Activity coefficients at infinite dilution (γ∞)

  • Excess enthalpies (hᴱ)

  • Liquid-liquid equilibrium (LLE)

  • Solid-liquid equilibrium (SLE)

The Dortmund Data Bank, containing over 11 million data tuples from 103,800+ references, provides the training foundation. The UNIFAC Consortium (established 1996) coordinates systematic parameter development; current parameter matrices cover 1,675+ group pairs with 6,308+ individual parameters.


PSRK: extending UNIFAC to high pressures

Traditional UNIFAC assumes ideal gas vapor phase and low pressures (typically <10 bar). The Predictive Soave-Redlich-Kwong (PSRK) model, developed by Holderbaum and Gmehling (1991), bridges activity coefficient predictions to equation of state calculations, extending applicability to hundreds of bar and enabling predictions involving supercritical components.

The SRK equation of state foundation

PSRK uses the Soave-Redlich-Kwong cubic equation:

\[P = \frac{RT}{v - b} - \frac{a\alpha(T)}{v(v + b)}\]

Pure component parameters derive from critical properties: $\(a_i = 0.42748 \frac{R^2 T_{c,i}^2}{P_{c,i}} \qquad b_i = 0.08664 \frac{R T_{c,i}}{P_{c,i}}\)$

The PSRK mixing rule

The innovation lies in connecting excess Gibbs energy from UNIFAC to the EOS attractive parameter through a modified Huron-Vidal mixing rule:

\[\frac{a}{bRT} = \sum_i x_i \frac{a_i}{b_i RT} + \frac{g^E_{UNIFAC}/RT + \sum_i x_i \ln(b/b_i)}{-0.64663}\]
\[b = \sum_i x_i b_i\]

The constant -0.64663 is optimized between the MHV1 value (-0.593) and original Huron-Vidal value (-0.693). The gᴱ_UNIFAC term is calculated using modified UNIFAC equations with group interaction parameters expanded to include gas main groups (CH₄, CO₂, N₂, H₂, CO, O₂, H₂S, NH₃, etc.)—now covering over 60 main groups and 900+ parameter pairs.

Applications and advantages

PSRK excels at:

  • High-pressure VLE (tested to 500+ bar)

  • Gas solubility predictions (H₂, N₂, O₂, CH₄, CO₂ in organic solvents)

  • Supercritical extraction calculations

  • Natural gas processing (H₂S removal, CO₂ capture)

Known limitations include the "double combinatorial term" issue affecting size-asymmetric systems, and SRK's inherent underprediction of liquid densities. The successor model VTPR (Volume-Translated Peng-Robinson) addresses these issues.


UNIFAC 2.0: machine learning completes the parameter matrix

A fundamental limitation of traditional UNIFAC is parameter incompleteness: Modified UNIFAC (Dortmund) 1.0 has 63 main groups creating 1,953 possible group pairs, but only 756 pairs (39%) have published parameters. If even one parameter is missing, UNIFAC cannot be applied to that system.

Matrix completion methodology

UNIFAC 2.0, developed by Hayer, Wendel, Mandt, Hasse, and Jirasek at RPTU Kaiserslautern (2024-2025), embeds machine learning matrix completion methods (MCM) into the UNIFAC framework. Rather than fitting parameters pair-by-pair as traditionally done, MCM:

  1. Arranges interaction parameters in sparse matrices
  2. Decomposes matrices into learnable feature representations for each main group
  3. Trains end-to-end on experimental ln γᵢ and hᴱ data
  4. Predicts all missing matrix entries simultaneously

Training data (Modified UNIFAC 2.0; 2024):

  • 500,000+ experimental data points from Dortmund Data Bank

  • 27,035 binary systems

No equation changes required

Critically, UNIFAC 2.0 uses the exact same equations as Modified UNIFAC (Dortmund). The innovation is entirely in parameter estimation—users simply replace parameter tables for a drop-in upgrade. Complete parameter matrices (aₘₙ, bₘₙ) are freely available as supplementary material .

Performance improvements

Validation shows:

  • Mean MAE nearly halved compared to Modified UNIFAC 1.0

  • Significantly fewer outlier predictions

  • Outperforms previous versions even in true predictive tests (data withheld during training)

  • 100% applicability—no missing parameters

The companion neural network model HANNA achieves even higher accuracy (MAE reduced to ~1/3 of UNIFAC's) but sacrifices the interpretable group-contribution framework.


Practical implementation guide

Molecular fragmentation into groups

UNIFAC uses a two-level hierarchy: main groups define interaction parameters, while subgroups specify R and Q values. Example fragmentation:

Molecule Groups
n-Hexane 2×CH3, 4×CH2
Ethanol 1×CH3, 1×CH2, 1×OH
Acetone 2×CH3, 1×CH3CO
Toluene 5×ACH, 1×ACCH3
Phenol 5×ACH, 1×ACOH

Key rules: identify largest specific groups first (ACOH takes priority over AC+OH separately), ensure every atom is assigned to exactly one group, and verify parameter availability for all group pairs present.

Parameter table structure

Sample R and Q values:

Subgroup Main Group R Q
CH3 CH2 0.9011 0.8480
CH2 CH2 0.6744 0.5400
OH OH 1.0000 1.2000
H2O H2O 0.9200 1.4000
CH3CO CH2CO 1.6724 1.4880

Interaction parameters (aᵢⱼ) are tabulated in asymmetric matrices; Modified UNIFAC adds bᵢⱼ and cᵢⱼ columns for temperature dependence.

Temperature and pressure limitations

  • Original UNIFAC: 275-425 K, low pressure (<10 bar)
  • Modified UNIFAC (Dortmund): Extended temperature range, but accuracy degrades outside 273-413 K without appropriate training data
  • PSRK: Extends pressure range to hundreds of bar; suitable for supercritical components

Systems requiring special treatment

Electrolytes: Standard UNIFAC cannot model ion-ion interactions. Use LIFAC, Extended UNIQUAC, or eNRTL instead.

Polymers: Original UNIFAC shows ~18.7% average deviation due to missing free-volume effects. UNIFAC-FV adds a Flory equation of state term, reducing errors to ~7.4%.

Strong hydrogen bonding: Activity coefficients often overpredicted for water-alcohol, carboxylic acid, and amine systems. Validate against experimental data for critical applications.

Large molecules (>10 functional groups): Accuracy degrades; consider molecular simulation alternatives.


UNIFAC remains the workhorse method for predicting liquid-phase activity coefficients in chemical engineering practice. The progression from Original UNIFAC (1975) through Modified UNIFAC (Dortmund) to UNIFAC 2.0 represents steady improvements in accuracy and applicability while preserving the intuitive group-contribution framework.

For most applications, Modified UNIFAC (Dortmund) offers the best balance of accuracy and parameter availability through the UNIFAC Consortium. UNIFAC 2.0 eliminates parameter gaps entirely through machine learning, though parameters are newer and less extensively validated in industrial practice. PSRK extends predictions to high-pressure systems involving supercritical components.

Another workhorse method worth mentioning and benchmarking against UNIFAC 2.0 is the COSMO-RS/-SAC model to assess the best prediction capabilities for a given chemical compound class.

Key practical guidance: always verify parameter availability before relying on predictions, use Modified UNIFAC when possible for improved temperature extrapolation, validate against experimental data for final designs, and recognize model limitations for electrolytes, polymers, and strongly associating systems.

Group Contribution Models in Cebule SDK

Cebule currently supports two specialized UNIFAC parameter sets:

  1. UNIFAC Surfactant Parameters: These parameters are specifically optimized for systems containing surfactants, allowing for better prediction of properties in detergents, cosmetics, and micellar systems. These parameters are based on the research presented in "Thermodynamic Modelling of Surfactant Solutions".

  2. UNIFAC Viscosity Ionic Liquid Parameters (UNIFAC-VISCO-IL): This parameter set is designed for predicting viscosities of ionic liquid mixtures and their solutions. The model has been adapted to account for the unique structure and interactions present in ionic liquids, as detailed in "New Method for the Estimation of Viscosity of Pure and Mixtures of Ionic Liquids Based on the UNIFAC–VISCO Model".

Cebule SDK TaskType: GROUP_CONTRIBUTION

  • Perform group contribution analysis on molecules to obtain functional group parameters
  • Inputs:
  • smiles_list: List[Union[str, List[str]]] - List of SMILES representations of molecules. Each element can be a SMILES string or a list of SMILES strings (polymer chain). See Polymer Chains for polymer chain format.
  • gc_type: str - Type of group contribution method from [unifac_surfactant, unifac_visco_il]
  • batch: bool = False - If True, treats input as list of mixtures for batch processing. If False, treats as single mixture.
  • Cebule max_processors: Not used
  • Output: Dictionary (single mixture) or List of Dictionaries (batch processing) containing:
  • counts: List of dictionaries mapping group names to their occurrences in each molecule
  • indices: List of dictionaries mapping group names to atom indices for each occurrence
  • params: Dictionary mapping group names to their volume (R) and surface area (Q) parameters
  • interaction: Dictionary containing group-group interaction parameters between all groups present among all molecules

Example with regular molecules:

session.cebule.create_task("gc", TaskType.GROUP_CONTRIBUTION, smiles_list=["CCO", "[Na+].[Cl-]"], gc_type="unifac_surfactant")

# Result:
{
  "counts": [{"CH3": 1, "CH2": 1}, {"Na+": 1}],
  "indices": [{"CH3": [[3]], "CH2": [[4]]}, {"Na+": [[0]]}],
  "params": {
    "CH3": {"R": 0.9011, "Q": 0.848},
    "CH2": {"R": 0.6744, "Q": 0.54},
    "Na+": {"R": 1.0612, "Q": 1.0404}
  },
  "interaction": {
    "CH3": {"CH3": 0.0, "CH2": 0.0, "Na+": 0.0},
    "CH2": {"CH3": 0.0, "CH2": 0.0, "Na+": 2500.0},
    "Na+": {"CH3": 0.0, "CH2": -129.4, "Na+": 0.0}
  }
}

Example with polymer chains:

# Mixture containing a polymer chain and solvent
session.cebule.create_task("gc", TaskType.GROUP_CONTRIBUTION,
                          smiles_list=[["[*]CC[*:1]", "[*:1]CCC[*]"], "O"],  # polymer chain + water
                          gc_type="unifac_surfactant")

The results provide: 1. counts: Number of each functional group in each molecule 2. indices: Atom indices corresponding to each functional group (see Atom Order) 3. params: Volume (R) and surface area (Q) parameters for each group 4. interaction: Interaction parameters between different functional groups


  1. G. M. Wilson and C. H. Deal. Activity coefficients and molecular structure. activity coefficients in changing environments-solutions of groups. Industrial & Engineering Chemistry Fundamentals, 1(1):20–23, 1962. doi:10.1021/i160001a003

  2. Aage Fredenslund, Russell L. Jones, and John M. Prausnitz. Group-contribution estimation of activity coefficients in nonideal liquid mixtures. AIChE Journal, 21(6):1086–1099, 1975. doi:https://doi.org/10.1002/aic.690210607

  3. Denis S. Abrams and John M. Prausnitz. Statistical thermodynamics of liquid mixtures: a new expression for the excess gibbs energy of partly or completely miscible systems. AIChE Journal, 21(1):116–128, 1975. doi:https://doi.org/10.1002/aic.690210115

  4. E. A. Guggenheim. Statistical thermodynamics of co-operative systems (a generalization of the quasi-chemical method). Trans. Faraday Soc., 44:1007–1012, 1948. doi:10.1039/TF9484401007