6,726 questions
Best practices
0
votes
0
replies
69
views
Building a Restart-Safe Kafka to BigQuery Pipeline with Durable Checkpointing Using Apache Beam - Dataflow
Problem Statement:
I needed a robust way to ingest data from Kafka to BigQuery using Apache Beam/Dataflow, with at-least-once delivery, durable checkpointing, and safe offset progression—even when ...
Advice
1
vote
1
replies
125
views
Efficiently processing 1 year of daily historical files using dataflow
I have an Apache Beam pipeline (running on Dataflow) that normally performs a daily batch load from Cloud Storage to BigQuery. The source team has provided 1 year of historical data that needs to be ...
Best practices
0
votes
0
replies
27
views
maximize throughput for dataflow batch job?
I use dataflow batch job to run some naively parallelizable cpp processes. The cpp processes takes a while (minutes or even hours) to finish, but the default dataflow scheduler does not seem to ramp ...
0
votes
0
answers
65
views
how to connect from one dataflow in dataverse to an existing dataflow in dataverse
I have 2 existing dataflows in a Dataverse environment, dataflow1 and dataflow2. I have created a 3rd dataflow (dataflow3) and I want to get data from dattaflow1 and dataflow2 into dataflow3.
I have ...
0
votes
1
answer
127
views
Apache Beam 2.68.0 throws "Using fallback deterministic coder for type" warning
In the latest Apache Beam 2.68.0, they have changed the behavior of Coders for non-primitive objects. (see the changelog here).
Therefore, I get a warning like this on GCP Dataflow.
"Using ...
0
votes
0
answers
48
views
Snapshot of Power BI table using dataflow
I have a table in power bi which has calculated column based on azure table and Dataverse. (Dataverse is used for write back feature using power automate as few of the column value get changed).
Table ...
0
votes
1
answer
128
views
No Data-Lineage in Dataplex of Google Data Fusion pipelines
I have created GDFusion instance:
Enable needed APIs:
Assign needed permissions:
I run my pipelines several times with success status:
I have lineage in UI:
Then I try to get lineage using REST ...
0
votes
0
answers
58
views
BeamRunJavaPipelineOperator giving 404 issue after dataflow is submitted
Iam using airflow 2.10.5 version to trigger dataflow using BeamRunJavaPipelineOperator ,
Here getting logs as dataflow submiited and dataflow id is 2025-09-18_01_16_18-13565731205394440306 ....
0
votes
1
answer
114
views
Dataflow Python SDK failing to Autheticate to Kafka using truststore and keystore jks files with custom docker image
I am trying to build a Python based Apache Beam pipeline which s going to read from Kafka. Kafka requires Truststore and Keystore JKS file based authentication.
kafka_consumer_config = {
"...
0
votes
1
answer
134
views
Can I enforce the BigQuery table schema when transfering from PostgreSQL with Google Cloud Dataflow?
I'm trying to create a job to mirror a view that I have in my PostgreSQL DB to a BigQuery table in my Google Cloud Project through Dataflow, I created the job using the "Job builder", and I'...
0
votes
1
answer
74
views
dataflow prime picks unavailable machine type?
When I use us-central-2 for the following dataflow job, I got an error. Using us-central-1 is fine. Is this expected? Is there a way to use us-central-2 with prime? I have to use us-central-2 as my ...
0
votes
1
answer
88
views
Unable to fetch GCP Dataflow worker logs using Python google-cloud-logging despite correct filter
def check_worker_logs(event_uuid, dataflow_project, dataflow_job, timeframe_mins=30):
# Start time of the worker log
start_time = (datetime.utcnow() - timedelta(minutes=timeframe_mins))....
0
votes
0
answers
61
views
Automatic Data Flow issue in SymmetricDS
I am doing 3-tier data flow from ci to bo to till nodes using symmetricDS. Whenever I do an update in a table of ci database it is syncing to bo database, but not syncing to till database. I've set ...
0
votes
0
answers
67
views
Solving Version Conflict using Apache beam with ml transforms library
I've been trying for a some time to got a beam pipeline to do data transformations for a fairly simple machine learning transformation, but apache beam and Tensorflow-transform won't play nicely ...
0
votes
0
answers
64
views
how to use the Interface ErrorHandler in Apache beam?
I would like to use an ErrorHandler to catch all the errors that happens during my pipeline.
I have seen that there is an interface which allows to do so : https://beam.apache.org/releases/javadoc/...