How to Integrate MinIO with Geyser Data for S3-Compatible Cold & Transitory Storage

Aug 15
4 min read

Updated: Oct 13

Overview of Integration

When your MinIO cluster fills up with cold or infrequently accessed data, it’s time to think about tiering. By integrating MinIO with Geyser Data’s Tape-as-a-Service, you can move that data to a low-cost, S3-compatible archive tier. The best part? You don’t have to change how you work. Geyser can serve as a permanent cold archive or as a transitory tier for overflow and lifecycle management.

This guide walks you through the integration process step-by-step. Let’s get started on optimizing your storage immediately.

Architecture of the System

None

MinIO Cluster (Primary Hot Tier)

| Lifecycle Rule / ILM Tiering

Geyser Data (Cold/Transitory Tier - S3-compatible)

MinIO continues to serve hot data with high performance. Cold objects are migrated to Geyser Data using:

S3-compatible bucket replication
Lifecycle rules (automatic tiering based on age or prefix)
External scripting via MinIO SDKs (Go, Python, etc.)

Prerequisites for Integration

Before diving into the setup, ensure you have the following:

A running MinIO deployment (standalone or distributed mode)
Access to a Geyser Data account and credentials
mc (MinIO Client) installed on your admin workstation
Python or Go SDK (optional, for advanced control)
IAM user credentials from Geyser with access to a bucket (Access Key ID / Secret Access Key)

Step-by-Step Guide to Integration

Step 1: Set Up Geyser Data Bucket

Login to Geyser Console (or via API if you're automated):
Request credentials if not already provisioned.
Note the endpoint URL (e.g., https://la1.geyserdata.com).
Create a bucket for cold storage:
cold-archive-bucket
Ensure versioning is enabled (optional but recommended for replication).

Step 2: Add Geyser Data Endpoint to MinIO Client (mc)

```shell

mc alias set geyser https://la1.geyserdata.com <ACCESS_KEY_ID> <SECRET_ACCESS_KEY>

```

Example:

```shell

mc alias set geyser https://la1.geyserdata.comgeyseruser123 geysersecret456

```

Validate:

```shell

mc ls geyser

```

Step 3: Configure Lifecycle Rules on MinIO

Use MinIO’s lifecycle configuration feature to automatically transition data based on object age or prefix.

Example: Archive all objects older than 30 days to Geyser.

Create a lifecycle JSON config file (lifecycle.json):

```json

{

"Rules": [

{

"ID": "TransitionToGeyser",

"Status": "Enabled",

"Prefix": "",

"Expiration": {

"Days": 0

"Transitions": [

{

"Days": 30,

"StorageClass": "GLACIER"

}

]

}

]

}

```

Note: Since Geyser Data is not natively “GLACIER” but S3-compatible, MinIO lifecycle tiering must be coupled with scripting or replication. MinIO does not yet support true tiering to custom S3 endpoints, but you can simulate it via replication or scripting.

Step 4: Setup Bucket Replication (Simulating Archive Tiering)

Create the destination bucket in Geyser (Step 1).
Enable versioning on the source MinIO bucket:

```shell

mc version enable local/my-hot-bucket

```

Create a replication configuration JSON (replication.json):

```json

{

"Role": "arn:minio:replication::myminio:replication-role",

"Rules": [

{

"ID": "replicate-to-geyser",

"Status": "Enabled",

"Priority": 1,

"DeleteMarkerReplication": {

"Status": "Disabled"

"Destination": {

"Bucket": "arn:aws:s3:::cold-archive-bucket",

"Endpoint": "https://la1.geyserdata.com"

"Filter": {

"Prefix": ""

"DeleteReplication": "Disabled",

"SourceSelectionCriteria": {}

}

]

}

```

Apply the replication rule using mc:

```shell

mc replicate add local/my-hot-bucket --replicate "replication.json"

```

Step 5: Scripted Archive Using MinIO SDK (Optional)

For tighter control, use MinIO’s SDKs (Go or Python) to script archival logic.

Example (Python with boto3):

```python

import boto3

import os

from datetime import datetime, timedelta

MinIO client (source)

minio_session = boto3.session.Session()

minio_s3 = minio_session.client(

service_name='s3',

endpoint_url='http://minio.local:9000',

aws_access_key_id='minioadmin',

aws_secret_access_key='minioadmin'

)

Geyser client (destination)

geyser_session = boto3.session.Session()

geyser_s3 = geyser_session.client(

service_name='s3',

endpoint_url='https://la1.geyserdata.com',

aws_access_key_id='geyseruser123',

aws_secret_access_key='geysersecret456'

)

Move objects older than 30 days

cutoff = datetime.utcnow() - timedelta(days=30)

bucket_name = 'my-hot-bucket'

response = minio_s3.list_objects_v2(Bucket=bucket_name)

for obj in response.get('Contents', []):

last_modified = obj['LastModified']

if last_modified < cutoff:

key = obj['Key']

# Copy to Geyser

copy_source = {'Bucket': bucket_name, 'Key': key}

geyser_s3.copy_object(

Bucket='cold-archive-bucket',

CopySource=copy_source,

Key=key

)

# Delete from MinIO

minio_s3.delete_object(Bucket=bucket_name, Key=key)

```

Step 6: Validate and Monitor

Use `mc ls geyser/cold-archive-bucket` to confirm successful archival.
Monitor logs and object versions.
Optionally implement event notifications (MinIO supports webhook and AMQP triggers).

Advanced: Using Geyser as a Transitory Tier

For workflows where Geyser is used as a staging/overflow zone:

Write directly to Geyser via S3 client SDKs or backup tools.
Use metadata tagging (x-amz-meta-archive-reason) for traceability.
Retrieve when needed into MinIO using parallel `mc mirror` or boto3.

Example:

```shell

mc mirror --overwrite geyser/cold-archive-bucket local/my-hot-bucket

```

Or via Python:

```python

geyser_s3.download_file('cold-archive-bucket', 'object-key', 'downloads/object-key')

minio_s3.upload_file('downloads/object-key', 'my-hot-bucket', 'object-key')

```

Security & Access Control

Use IAM policies (on Geyser side) to restrict access by prefix, IP, or time.
Use TLS/SSL for all S3 traffic.
Enable object lock/versioning if using Geyser for compliance storage.

Performance Considerations

Geyser Data provides faster Time to First Byte than traditional archive solutions like Glacier or Deep Archive.
Use multipart uploads for large objects.
Geyser has no egress or retrieval fees, enabling aggressive lifecycle tiering without cost penalties.

Conclusion

This integration empowers MinIO users with cloud-like tiering to a cost-effective cold storage backend. You can achieve this without vendor lock-in or expensive retrieval penalties. Whether you're archiving old data or managing overflow in a transitory model, Geyser Data offers a highly compatible and economical solution.

Appendix: Useful Commands

```shell

Sync MinIO bucket to Geyser

mc mirror local/my-hot-bucket geyser/cold-archive-bucket

Restore archive to MinIO

mc mirror geyser/cold-archive-bucket local/my-hot-bucket

Set object retention (if using object lock)

mc retention set --default GOVERNANCE 365d geyser/cold-archive-bucket

```

---wix---