Add Join Order Benchmark (JOB) to tests/benchmarks#107756
Add Join Order Benchmark (JOB) to tests/benchmarks#107756ayakovlev-clickhouse wants to merge 1 commit into
Conversation
Introduce the Join Order Benchmark (JOB) following the existing structure of the `tpc-h` and `tpc-ds` benchmarks: `init.sql` with the schema, `settings.json`, and a `queries/` folder with the 113 ClickHouse-formatted queries (zero-padded `query_NNx.sql` naming). The `README.md` documents loading the data from the public Parquet exports on S3, preparing it from the original PostgreSQL-style CSV files via the included `convert_csv.py`, and lists the queries that deviate slightly from the canonical JOB queries (to avoid empty results). Co-authored-by: Cursor <cursoragent@cursor.com>
|
Workflow [PR], commit [3a4b295] Summary: ✅ AI ReviewSummaryThis PR adds a standalone Join Order Benchmark under Missing context / blind spots
Findings
Tests
Final VerdictStatus: Minimum required action: make |
| if complete: | ||
| writer.writerow(fields) | ||
| pending = None | ||
| if pending is not None: |
There was a problem hiding this comment.
convert_csv.py should fail when parse_record reports an incomplete final record. With the current EOF path, an input such as 1,"unterminated\n reaches this block, parse_record returns (None, False), and the script exits 0 after writing no error/output; if earlier records were complete, it emits only those rows. That can silently build a truncated JOB dataset from a corrupted download. Please return a non-zero exit code whenever pending remains incomplete at EOF instead of dropping it.
alexey-milovidov
left a comment
There was a problem hiding this comment.
Looks good, though they are not wired into CI now.
Introduce the Join Order Benchmark (JOB) following the existing structure of the
tpc-handtpc-dsbenchmarks:init.sqlwith the schema,settings.json, and aqueries/folder with the 113 ClickHouse-formatted queries.The
README.mddocuments loading the data from the public Parquet exports on S3, preparing it from the original PostgreSQL-style CSV files via the includedconvert_csv.py, and lists the queries that deviate slightly from the canonical JOB queries (to avoid empty results).Changelog category (leave one):
...