Add option to compile (external) schema by cmake#31
Conversation
|
@Stinkfist0 thoughts? It touches some of your recent changes. Maybe you have some ideas? |
|
Sure, I can see this might be handy for some. IMHO, the PYTHON_EXECUTABLE biz could be removed and we could assume (and instruct, if this was not instructed yet) that the correct Python must be set to path prior to the CMake run on all platforms (I made the CMake script to abort if Python features were wanted but Python was not in PATH). |
|
BTW just noticed that when I added removal of unused IFC revision source files from the project they're still copied as a part of the installation phase. |
|
Well, the [1] https://github.com/IfcOpenShell/IfcOpenShell/blob/master/src/ifcwrap/CMakeLists.txt#L58 Thanks for noting the installation files issue [3]. I will fix it in this branch to prevent conflicts. [3] https://github.com/IfcOpenShell/IfcOpenShell/blob/master/cmake/CMakeLists.txt#L355 |
|
In the original script I have used a "standard" (I think official Python docs instruct to set this) PYTHONPATH for the location of python.exe. However, PYTHONPATH is appended to PATH, not prepended, which might be better. I don't mind having explicit control for setting the executable (I think in some projects I have seen problems at least on OS X for picking Python 2 vs 3) as long as it's used for all platforms. |
|
I thought the The cmake [1] https://cmake.org/cmake/help/v3.0/module/FindPythonInterp.html |
|
Seems that I have mixed up PYHONPATH & PYHONHOME. The latter seems to be "official" way for telling the location of python.exe, at least on Windows, so maybe better to switch to using that. |
…tion dir Remove unused compiled schema in cpp installation target
|
@Stinkfist0 I see, shall I commit a s/PYTHONPATH/PYTHONHOME/ ? Or does it need more adaptations? |
|
@aothms Sure, these places should be altered: https://github.com/IfcOpenShell/IfcOpenShell/search?utf8=✓&q=PYTHONPATH |
|
Ok, doesn't seem github's search is very reliable or my patch is over-enthusiastic. Your search has 11 results, but my patch has 17 changes. |
Add option to compile (external) schema by cmake + general improvements
|
Note for Windows users: remember to edit existing |
Adds three knobs and three new heartbeat numbers to support the ongoing perf-parity work. None affect behaviour in default runs. cull[wall|compute|upload] split timer The existing cull_timer wrapped both the parallel std::async dispatch and the sequential cullModelCpuUpload loop (queueWriteBuffer × 3 per resident chunk × ~120 chunks ≈ 360 wgpu calls per frame). Splitting them ruled upload out as the bottleneck on a 51-model federation scene: compute ≈ 16-17 ms, upload ≈ 1 ms. WGPU_CULL_THREADS=0 — force sequential cull std::async-per-model was already in place; this env var disables it so we can compare wall time vs sequential and confirm parallelism is working. On the federation scene with 52 models: sequential 74 ms vs parallel 17 ms = 4.4× speedup. Confirmed; the 17 ms floor is not a parallelism failure, it's the cost of culling the largest single model (model 43, 114k instances) ÷ no parallelism within that model. WGPU_STREAM_DEBUG=1 — per-frame [stream-debug] log Surfaces cands/enq/drained/ev_lru/ev_pri/blocked/resident/cycled/ max_load each frame from driveStreamingLoads. The "cycled" / "max_load" pair makes thrash vs eviction-churn vs just-loading distinguishable. Off by default; opt-in via the env var. Bench-warm timeout dump When [bench warm] times out (600 frames without 0-loads streak), prints a structured summary: resident/missing/total chunks, cycled count, pool usage, largest free run, avg missing chunk size, and an auto-classifier diagnosis (POOL FRAGMENTED vs WORKING SET > POOL vs FEW-CHUNK CYCLE vs still-loading). Caught a real fragmentation pattern (18 MB largest free run vs ~100 MB typical chunk) on a 51-model run where the dumb classifier would have called it a load-budget problem. LOD1 firing counter "lod1 X/Y (saved Z tris, N no-lod1)" suffix on the [frame] log. X = LOD1-selected this frame, Y = LOD1-eligible, Z = tris not drawn vs always-LOD0, N = visible instances with no baked LOD1 (mesh below IFC_LOD_MIN_TRIS). Confirmed LOD1 path is genuinely firing post the per-chunk LOD1-storage commit, and exposed that ~90% of instances in real scenes are no-lod1 meshes — relevant to the future LOD-tier-residency design. Chunk.load_count + Chunk.lod0/1 layout bookkeeping Per-chunk reload counter for the thrash detector. lod0/1 layout_count fields prep the data model for distance-tiered residency (Phase B of #31) but aren't acted on yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds three knobs and three new heartbeat numbers to support the ongoing perf-parity work. None affect behaviour in default runs. cull[wall|compute|upload] split timer The existing cull_timer wrapped both the parallel std::async dispatch and the sequential cullModelCpuUpload loop (queueWriteBuffer × 3 per resident chunk × ~120 chunks ≈ 360 wgpu calls per frame). Splitting them ruled upload out as the bottleneck on a 51-model federation scene: compute ≈ 16-17 ms, upload ≈ 1 ms. WGPU_CULL_THREADS=0 — force sequential cull std::async-per-model was already in place; this env var disables it so we can compare wall time vs sequential and confirm parallelism is working. On the federation scene with 52 models: sequential 74 ms vs parallel 17 ms = 4.4× speedup. Confirmed; the 17 ms floor is not a parallelism failure, it's the cost of culling the largest single model (model 43, 114k instances) ÷ no parallelism within that model. WGPU_STREAM_DEBUG=1 — per-frame [stream-debug] log Surfaces cands/enq/drained/ev_lru/ev_pri/blocked/resident/cycled/ max_load each frame from driveStreamingLoads. The "cycled" / "max_load" pair makes thrash vs eviction-churn vs just-loading distinguishable. Off by default; opt-in via the env var. Bench-warm timeout dump When [bench warm] times out (600 frames without 0-loads streak), prints a structured summary: resident/missing/total chunks, cycled count, pool usage, largest free run, avg missing chunk size, and an auto-classifier diagnosis (POOL FRAGMENTED vs WORKING SET > POOL vs FEW-CHUNK CYCLE vs still-loading). Caught a real fragmentation pattern (18 MB largest free run vs ~100 MB typical chunk) on a 51-model run where the dumb classifier would have called it a load-budget problem. LOD1 firing counter "lod1 X/Y (saved Z tris, N no-lod1)" suffix on the [frame] log. X = LOD1-selected this frame, Y = LOD1-eligible, Z = tris not drawn vs always-LOD0, N = visible instances with no baked LOD1 (mesh below IFC_LOD_MIN_TRIS). Confirmed LOD1 path is genuinely firing post the per-chunk LOD1-storage commit, and exposed that ~90% of instances in real scenes are no-lod1 meshes — relevant to the future LOD-tier-residency design. Chunk.load_count + Chunk.lod0/1 layout bookkeeping Per-chunk reload counter for the thrash detector. lod0/1 layout_count fields prep the data model for distance-tiered residency (Phase B of #31) but aren't acted on yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The idea is to simplify generating IfcOpenShell code for custom developed schemas, e.g [1] [2]. The added CMake code checks for dependencies in the python executable (pyparsing) and invokes the two steps necessary to compile an express schema into cpp. Generated files are added to the IfcParse project.
[1] https://bitbucket.org/Vertexwahn/bug-reports/downloads/IFC4_P6_longform_draft5_withMinorSyntaxCorrection.exp
[2] https://raw.githubusercontent.com/DURAARK/IFCPointCloud/master/tool/schemas/IFC2X3_TC1_PC.exp