Public interface for manual multipart upload control

Hi,

I'm integrating GCS in our backup flow and the goal is to have a multipart upload being performed by multiple workers.

### How it works

We have a process which tars and compresses the backup data. The tarball is written in small chunks into a temp directory, so our goal is to start a multipart upload as soon as the tar process starts and upload each chunk as a part of the multipart upload. Once the tar process is finished, we complete the multipart upload.

This approach means we never need the full data locally before starting the upload, and it also allows distributing the upload across multiple workers under our control.

### The problem

The `upload_chunks_concurrently` method exposed by the Python SDK requires a filename and, internally, divides the file into parts and uploads them in parallel. This means I would need to have the full file locally before starting the multipart upload, which completely breaks our workflow. Also, because the multipart upload is handled by the library internally, it does not allow us to distribute the work across our own workers.

In short, we need full control over all stages of the multipart upload i.e. initiate + upload parts + complete, similar to what boto3 exposes. This seems like a limitation of the Python SDK specifically, as the Java SDK, for example, seems to give you full control of the complete flow.

We already have this flow working on AWS S3 using the boto3 library and would like to achieve the same with GCS.

### Workaround

Upon inspecting the `upload_chunks_concurrently` implementation, we were able to come up with a solution that uses the internal classes `XMLMPUContainer` and `XMLMPUPart` (the code below is just to illustrate something similar to our usage):


```
from google.cloud import storage
from google.cloud.storage.transfer_manager import XMLMPUContainer, XMLMPUPart

client = storage.Client()
upload_url = f"https://storage.googleapis.com/{bucket_name}/{object_key}"

# === PARENT PROCESS: Initiate upload ===
container = XMLMPUContainer(upload_url, filename=None)
container.initiate(transport=client._http, content_type="application/octet-stream")
upload_id = container.upload_id

# Spawn workers, passing them the upload_url and upload_id...

# === WORKER PROCESS: Upload a single part ===
def upload_part(upload_url, upload_id, part_filename, part_number):
    part = XMLMPUPart(
        upload_url=upload_url,
        upload_id=upload_id,
        filename=part_filename,
        start=0,
        end=os.path.getsize(part_filename),
        part_number=part_number,
    )
    part.upload(transport=client._http)
    return {"PartNumber": part_number, "ETag": part.etag}

# === PARENT PROCESS: Complete upload after all workers finish ===
container = XMLMPUContainer(upload_url, filename=None, upload_id=upload_id)
for part in parts_metadata:
    container.register_part(part["PartNumber"], part["ETag"])
container.finalize(transport=client._http)

```

Using these classes give us total control of the multipart upload process, allowing a seamless integration within our workflow.

### Questions

However these classes originate from `google.cloud.storage._media` (a private module). Are they considered stable? Are we discouraged to use these classes directly?

In case their use is discouraged, is there any alternative for streaming multipart uploads which achieves the same result?

Thanks for any help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public interface for manual multipart upload control #17494

How it works

The problem

Workaround

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Public interface for manual multipart upload control #17494

Description

How it works

The problem

Workaround

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions