Skip to main content
New: Stack Overflow For Agents. The next generation of knowledge exchange. Learn more
Filter by
Sorted by
Tagged with
Filter by Employee ID
Best practices
0 votes
0 replies
69 views

Problem Statement: I needed a robust way to ingest data from Kafka to BigQuery using Apache Beam/Dataflow, with at-least-once delivery, durable checkpointing, and safe offset progression—even when ...
Parag Ghosh's user avatar
Advice
1 vote
1 replies
125 views

I have an Apache Beam pipeline (running on Dataflow) that normally performs a daily batch load from Cloud Storage to BigQuery. The source team has provided 1 year of historical data that needs to be ...
Saravana Kumar's user avatar
Best practices
0 votes
0 replies
27 views

I use dataflow batch job to run some naively parallelizable cpp processes. The cpp processes takes a while (minutes or even hours) to finish, but the default dataflow scheduler does not seem to ramp ...
bill's user avatar
0 votes
0 answers
65 views

I have 2 existing dataflows in a Dataverse environment, dataflow1 and dataflow2. I have created a 3rd dataflow (dataflow3) and I want to get data from dattaflow1 and dataflow2 into dataflow3. I have ...
grasshopper's user avatar
0 votes
1 answer
127 views

In the latest Apache Beam 2.68.0, they have changed the behavior of Coders for non-primitive objects. (see the changelog here). Therefore, I get a warning like this on GCP Dataflow. "Using ...
Praneeth Peiris's user avatar
0 votes
0 answers
48 views

I have a table in power bi which has calculated column based on azure table and Dataverse. (Dataverse is used for write back feature using power automate as few of the column value get changed). Table ...
A_B's user avatar
0 votes
1 answer
128 views

I have created GDFusion instance: Enable needed APIs: Assign needed permissions: I run my pipelines several times with success status: I have lineage in UI: Then I try to get lineage using REST ...
Sergey Shabalov's user avatar
0 votes
0 answers
58 views

Iam using airflow 2.10.5 version to trigger dataflow using BeamRunJavaPipelineOperator , Here getting logs as dataflow submiited and dataflow id is 2025-09-18_01_16_18-13565731205394440306 ....
Rex Ubald's user avatar
0 votes
1 answer
114 views

I am trying to build a Python based Apache Beam pipeline which s going to read from Kafka. Kafka requires Truststore and Keystore JKS file based authentication. kafka_consumer_config = { "...
Bhargav Velisetti's user avatar
0 votes
1 answer
134 views

I'm trying to create a job to mirror a view that I have in my PostgreSQL DB to a BigQuery table in my Google Cloud Project through Dataflow, I created the job using the "Job builder", and I'...
Gustavo Trivelatto's user avatar
0 votes
1 answer
74 views

When I use us-central-2 for the following dataflow job, I got an error. Using us-central-1 is fine. Is this expected? Is there a way to use us-central-2 with prime? I have to use us-central-2 as my ...
bill's user avatar
0 votes
1 answer
88 views

def check_worker_logs(event_uuid, dataflow_project, dataflow_job, timeframe_mins=30): # Start time of the worker log start_time = (datetime.utcnow() - timedelta(minutes=timeframe_mins))....
katsu's user avatar
0 votes
0 answers
61 views

I am doing 3-tier data flow from ci to bo to till nodes using symmetricDS. Whenever I do an update in a table of ci database it is syncing to bo database, but not syncing to till database. I've set ...
venu's user avatar
0 votes
0 answers
67 views

I've been trying for a some time to got a beam pipeline to do data transformations for a fairly simple machine learning transformation, but apache beam and Tensorflow-transform won't play nicely ...
George Chapman-Brown's user avatar
0 votes
0 answers
64 views

I would like to use an ErrorHandler to catch all the errors that happens during my pipeline. I have seen that there is an interface which allows to do so : https://beam.apache.org/releases/javadoc/...
Dev Yns's user avatar

15 30 50 per page
1
2 3 4 5
449