Configuration Files in Python Using Dataclasses

How to use the dataconf library to parse HOCON, JSON, YAML, and properties files directly into Python dataclasses with full type safety.

ยท 3 min read
python dataclasses configuration dataconf type safety

Managing configuration files in Python has traditionally been a pain point. You parse a YAML or JSON file, get a dictionary, and then manually extract values with no type safety or IDE autocompletion. The dataconf library changes this by allowing you to parse configuration files directly into Python dataclasses with full type validation.

The Problem with Traditional Config Parsing

Consider a typical approach to loading configuration:

1
2
3
4
5
6
7
8
import yaml

with open("config.yaml") as f:
    config = yaml.safe_load(f)

# No type safety, no autocompletion
database_host = config["database"]["host"]  # Hope it exists!
port = config["database"]["port"]  # Hope it's an int!

This approach has several issues:

  • No IDE autocompletion
  • Runtime errors if keys are missing
  • No type validation
  • Difficult to refactor

Enter dataconf

The dataconf library provides type-safe parsing of configuration files into Python dataclasses. For those coming from Scala, this mirrors the experience of using case classes with PureConfig.

Installation

1
2
3
pip install dataconf
# or
poetry add dataconf

Note: dataconf requires Python >= 3.9 due to typing features not available in earlier versions. I’m also a contributor to this library.

Defining Your Configuration Schema

First, define your configuration structure using dataclasses:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class DatabaseConfig:
    host: str
    port: int
    name: str
    user: str
    password: str

@dataclass
class CacheConfig:
    enabled: bool
    ttl_seconds: int
    max_size: int

@dataclass
class AppConfig:
    database: DatabaseConfig
    cache: CacheConfig
    debug: bool = False
    allowed_origins: Optional[List[str]] = None

Loading Configuration Files

dataconf supports multiple formats: HOCON, JSON, YAML, and properties files.

HOCON Example

# config.hocon
database {
    host = "localhost"
    port = 5432
    name = "myapp"
    user = "admin"
    password = ${DB_PASSWORD}  # Environment variable substitution
}

cache {
    enabled = true
    ttl_seconds = 3600
    max_size = 1000
}

debug = false
1
2
3
4
5
6
7
import dataconf

config = dataconf.load("config.hocon", AppConfig)

# Full type safety and autocompletion
print(config.database.host)  # "localhost"
print(config.cache.ttl_seconds)  # 3600

JSON Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
    "database": {
        "host": "localhost",
        "port": 5432,
        "name": "myapp",
        "user": "admin",
        "password": "secret"
    },
    "cache": {
        "enabled": true,
        "ttl_seconds": 3600,
        "max_size": 1000
    }
}
1
config = dataconf.load("config.json", AppConfig)

YAML Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
database:
  host: localhost
  port: 5432
  name: myapp
  user: admin
  password: secret

cache:
  enabled: true
  ttl_seconds: 3600
  max_size: 1000
1
config = dataconf.load("config.yaml", AppConfig)

Advanced Features

Environment Variable Substitution

HOCON format supports environment variable substitution:

database {
    password = ${DB_PASSWORD}
    password = ${?DB_PASSWORD}  # Optional - won't fail if missing
}

Nested Configurations

dataconf handles deeply nested structures naturally:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@dataclass
class NestedConfig:
    level1: Level1Config

@dataclass
class Level1Config:
    level2: Level2Config

@dataclass
class Level2Config:
    value: str

Writing Configurations

You can also serialize dataclasses back to configuration files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import dataconf

config = AppConfig(
    database=DatabaseConfig(
        host="localhost",
        port=5432,
        name="myapp",
        user="admin",
        password="secret"
    ),
    cache=CacheConfig(
        enabled=True,
        ttl_seconds=3600,
        max_size=1000
    )
)

dataconf.dump("config.json", config, out="json")
dataconf.dump("config.yaml", config, out="yaml")

Benefits Over Traditional Approaches

FeatureTraditionaldataconf
Type safetyNoneFull
IDE autocompletionNoYes
ValidationManualAutomatic
RefactoringError-proneSafe
DocumentationExternalIn code

Immutable Configurations

For truly immutable configurations (like Scala case classes), use frozen dataclasses:

1
2
3
4
@dataclass(frozen=True)
class ImmutableConfig:
    host: str
    port: int

Conclusion

Using dataconf with Python dataclasses provides:

  • Type safety: Catch configuration errors at load time
  • IDE support: Full autocompletion and type hints
  • Validation: Automatic type checking and required field validation
  • Maintainability: Configuration schema is self-documenting
  • Flexibility: Support for multiple file formats

For anyone coming from Scala or TypeScript, this brings the configuration experience you expect to Python.


Originally published on Towards Data Science