Skip to content

Implicit Solvation Models: COSMO (CPCM), PCM (DPCM), IEFPCM

Solvation models

Historically, quantum chemistry was developed for isolated molecular systems in vacuum, which corresponds well to molecules in the gas phase. But the vast majority of technical and biological chemistry does not happen in gas phase, but in liquids, where the solute under consideration is closely surrounded by other molecules, which interact with the solute by dispersive, electrostatic, and hydrogen bond interactions. While the dispersive interactions have little influence on the electronic state and can energetically reasonably well be approximated by simple, surface proportional contributions, electronic and hydrogen bond interactions strongly depend on and interact with the electron distribution of the solute. This is the reason, why the crude, but somehow successful dielectric continuum approximation has become widely used. It assumes, that the electrostatic and partly the hydrogen bond interactions between solute and solvent can be described approximating the solvent by a macroscopic dielectric continuum. All of these models are based on the assumption that the solute under consideration is separated from the solvent by a closed surface, the cavity, and that inside the cavity the solute is described quantum chemically, and outside the cavity there is a homogeneous dielectric of permittivity \(\epsilon\).

While the dielectric continuum models originally were formulated assuming simple spherical cavities allowing for analytical expressions for the interaction energy of solute and continuum, nowadays molecular shaped cavities are used, mostly in the apparent surface charge (ASC) formulation. In these the polarization response of the continuum is described by the apparent surface charges appearing on the boundary of the dielectric continuum, i.e. on the cavity. The corresponding screening charge (surface) density \(\sigma\)(r) can be calculated from the dielectric boundary condition, once the electric potential and field generated by the solute on the cavity boundary are known, usually from a quantum chemical calculation, e.g. via a DFT model. The interaction energy of the solute with the dielectric continuum can then be calculated as integral of the product of electrostatic potential and screening charge density over the surface of the solute. It needs to be said, that the existence of a cavity separating the solute and the solvent is a theoretical artifact, and the definition and construction of the cavities is subject to empiricism and differs in detail in the different models. Nevertheless it is common sense, that such cavities roughly have an average distance of 1.2 Bondi radius from the atoms of the solute.

The polarizable continuum model (PCM) was the first of these ASC-models, originally using the exact dielectric boundary condition. Later it turned out, that the conductor-like screening model (COSMO), which uses a scaled conductor boundary condition, is not only algorithmically much simpler, but also considerably reduces the artifacts resulting from the small part of the solute electron density, which penetrates out of the cavity, the so-called outlying charge error, compared to the exact dielectric boundary conditions. Therefore, nowadays either COSMO or its PCM re-implementation C-PCM are widely used in quantum chemical programs.

The integral equation formulation of PCM (IEFPCM) achieves an as good reduction of the outlying charge error as COSMO, but is more complicated than COSMO and never proofed to be more accurate 1. In practice, the cavities of ASC models usually are represented by surface segments, also called tesserae, and the screening charges induced on the segments are calculated by matrix algebra, which in essence turns the interaction of the solute with the solvent into an additional, continuum mediated, electrostatic interaction of the nuclei and electrons of the solute, i.e. into a modification of the Coulomb operator, and thus the dielectric screening charges and interaction energies can be calculated without much overhead in the quantum chemical SCF cycle, e.g. the SCF cycle within a DFT model being solved. Due to its algorithmic simplicity, robustness, and good quality, we offer the COSMO model in the Cebule platform while the C-PCM and IEFPCM models can be provided for research and interfacing purposes.

Cebule Python SDK

The Cebule Python SDK Gitlab Repository provides a documentation with example notebooks and Python scripts. For a full documentation of the API endpoints please find the documentation in the Cebule Engine Dashboard.

Let's start with where actual data is stored before we then present how you can run solvation model calculations with Cebule.

S3 buckets (Object storage)

All tasks and results data are stored in a S3 bucket and can be directly accessed through the cloud provider S3 bucket page (see Figure 2) or one can directly request the individual tasks' data via this Python snippet for an AWS S3 bucket (the boto3 package has to be installed):

import boto3

s3_client = boto3.client('s3',
                         aws_access_key_id='YOUR_ACCESS_KEY',
                         aws_secret_access_key='YOUR_SECRET_KEY')
response = s3_client.get_object(Bucket='YOUR_BUCKET_NAME',
                                Key='YOUR_OBJECT_KEY')
object_content = response['Body'].read().decode('utf-8')
print(object_content)

YOUR_OBJECT_KEY has the following path: "tasks/TASK_ID/TASK_NAME_cosmo.json"

One can also make use of the AWS CLI via the command line:

aws s3 cp s3://YOUR_BUCKET_NAME/TASK_ID/TASK_NAME_cosmo.json .

If no TASK_NAME was defined in the input then the results can be retrieved as follows:

s3://YOUR_BUCKET_NAME/TASK_ID/TASK_ID_cosmo.json

In the AWS dashboard the individual tasks folder view is available (Figure 2).

Figure 2: S3 bucket view of all task folders for specific S3 bucket. The bucket name is defined by the user on the cloud provider platform of choice.

COSMO results data

You can get full access to all variables and parameters in the COSMO model. The results include for example:

  • the number of cosmo segments and their coordinates
  • the molecular surface area
  • the molecular COSMO volume
  • the cosmo surface charges for each segment

When retrieving the results of your calculation tasks you can get a complete and extensive overview which data is currently retrievable from the high performance COSMO implementation.

How to send off a task for a COSMO calculation via the Cebule Python SDK

Cebule SDK TaskType: COSMO

  • Performs COSMO (COnductor-like Screening MOdel) solvation calculations to model molecule-solvent interactions
  • Inputs:
  • geometry: List[float] - List of 3D coordinates in Angstrom (flattened x,y,z format)
  • symbols: List[str] - List of chemical element symbols for each atom
  • method: str - Quantum chemistry method (e.g., "SCF", "DFT")
  • basis: str - Basis set (e.g., "6-31g**")
  • dielec: float - Dielectric constant of the solvent (e.g., 78.0 for water, 1e6 to simulate a perfect conductor for COSMO-RS/SAC)
  • radius: List[float] - Van der Waals radii for each atom in Angstrom
  • Optional: optimize: bool - Whether to perform geometry optimization (default: False)
  • Optional: driver_convergence: str - Convergence criteria for geometry optimization (e.g., "loose", "default")
  • Optional: dft_energy_convergence: float - Energy convergence threshold for DFT calculations (default: 1e-7)
  • Cebule max_processors: Used to limit concurrency of quantum calculations
  • Output: Dictionary containing COSMO surface charge density and geometry.

Example:

from dotenv import load_dotenv
import mqsdk
from mqsdk.core.cebule import TaskType
import os

# Initialize the SDK with user credentials
load_dotenv()
session = mqsdk.Cebule(os.environ['EMAIL'], os.environ['PASSWORD'])

input_data = {
    "geometry": [
        0.0,  0.0000, -0.1294,
        0.0, -1.4941,  1.0274,
        0.0,  1.4941,  1.0274
        ],
    "symbols": ["O", "H", "H"],
    "method": "SCF",
    "basis": "6-31g**",
    "dielec": 78.0,
    "radius": [1.40, 1.16, 1.16],
    "optimize": True,
    "driver_convergence": "loose",
    "dft_energy_convergence": 1e-5
    }
task = session.cebule.create_task("COSMO Water Test", TaskType.COSMO,
                              input_data, nprocs=16)

# Result will contain COSMO surface charges and other information in a dictionary.

The COSMO task can also be connected to a previous GEOMETRY_OPT task to use optimized structures:

geometry_task_water = session.cebule.create_task("Geometry Opt water",
                            TaskType.GEOMETRY_OPT, 
                            smiles_list=["O"], 
                            force_field="mmff94",
                            optimization_method="g_xtb",
                            max_processors=4)

# The connected_task_id automatically uses the geometry from the geometry optimization task
cosmo_task_water = session.cebule.create_task("cosmo water", TaskType.COSMO,
    connected_task_id=geometry_task_water.id,
    method="dft",
    basis="def2-tzvp",
    dielec=1e6, # Simulate perfect conductor
    optimize=False,
    driver_convergence="loose",
    xc="becke88 perdew86",
    disp="vdw 4",
    dft_energy_convergence=1e-7,
    max_processors=64)

Find the whole Jupyter notebook under the following link: https://gitlab.com/mqsdk/python-sdk/-/blob/main/notebooks/2_COSMO_task_Cebule.ipynb


  1. A. Klamt, C. Moya, and J. Palomar. A comprehensive comparison of the iefpcm and ss(v)pe continuum solvation methods with the cosmo approach. Journal of Chemical Theory and Computation, 11(9):4220–4225, 2015. doi:10.1021/acs.jctc.5b00601