Ifcfile entity look-up + Kernel::get_layers optimizations by Stinkfist0 · Pull Request #482 · IfcOpenShell/IfcOpenShell

Stinkfist0 · 2018-10-02T09:15:10Z

Recently we reran conversion for a file ("MEP File") that to our best of recollection was converted in somewhat orderly fashion a long time ago. This time, the XML conversion seemed to take forever, so I started to looking into the biggest time spenders in XML serialization code, which lead mostly to IfcFile::getInverse(). This lead me to optimize entity look-ups and to remove one particular slow part of layer retrieval code that doesn't seem to yield any results ever.

Timing includes only the conversion ("-layer code" == unnecessary(?) layer code removed, no other optimizations)
Conference center (http://openifcmodel.cs.auckland.ac.nz/_models/20160124OTC-Conference%20Center.ifc):
before:         1 m 15 s
-layer code:         8 s
after:               7 s

"Massive File" (1.1 GB)
before:         21 m 2 s
-layer code:    19 m 46 s
after:           6 m 58 s

"MEP File" (180 MB)
before:         did not finish (hours?)
-layer code:    14 m 58 s
after:           3 m 25 s

aothms · 2018-10-02T09:30:02Z

Thanks this is brilliant, but there two problems I can think of.

First, I thought of using a vector, but it's not unlikely to have huge outliers in the instance name ids. What do you think of the idea to first scan through the file and try to obtain a measure for how sparse the instance name ids are and then decide on unordered_map vs vector. I can work on that.

Second, caching the instance refs need to somehow take into account that the files can be mutated. This is already a huge pain in the current code base. Or shall we set a flag that a file is going to be immutable and only in that case cache the instance refs?

Stinkfist0 · 2018-10-02T10:13:56Z

I did some testing regarding map vs. unordered_map vs. vector. Vector was the fastest by a wide margin and (naturally) used the least memory. Although the the memory usage reduction was negated by the introduction of the cached map. Map and unordered_map did not have much difference, sometimes one was faster than the other (IIRC unordered_map was a bit faster more often). But in the light of the get_layers "revelation", it might be OK to revert back to unordered_map there, and thinking about it, null checks while iterating the file's contents would be rather awkward. However, is this a common use case? Cannot seem to find any such usage in the IfcOpenShell's own C++ code at least. Edit: ah yes, IfcPython uses this.

Ah yes, I did not take into consideration the mutability of the file fully. Yeah, maybe it would be the best (and easiest) to allow the user of the IfcFile to (re)compute the cache with a public function when the user knows the file will not be mutated.

Stinkfist0 · 2018-10-02T10:19:39Z

Also ideally, maybe one "entities by ref" map would suffice, and I tried to achieve that, but working with IfcEntityList felt rather awkward, so I decided not to pursue that for now.

Stinkfist0 · 2018-10-15T09:43:13Z

I'll try to to continue with this PR this week.

aothms · 2018-10-15T10:21:41Z

Apparently this inverse relationship mutability is even checked, that;s why the travis complains: https://travis-ci.org/IfcOpenShell/IfcOpenShell/builds/436550585#L2105

... null checks while iterating the file's contents would be rather awkward. However, is this a common use case?

Also writing the file to disk does this and probably there's some 3rd party usage that we do not know of.

It's indeed a bit of a leaky abstraction if vector::iterator is passed to users and they need to check for null pointers there. We might need to wrap that with a custom iterator. Maintaining that vector::end() is never null would simplify filtering ranges a bit I think, surely boost has something for that. Instances can also be deleted from files, so I think than the vector needs to be shrunk when deleting the last instance.

If we really decide to switch between a some_map and vector storage at runtime. That custom iterator can perhaps also navigate between those.

In a long forgotten branch I had separated parsing from storing the instances [1] [2]. That would enable to parse the file twice. First time collect some measure on the sparsity of instance numbers. I don't know if you want to go that route.

I'll try to to continue with this PR this week.

Let me know where I can pitch in.

[1] https://github.com/ISBE-TUe/IfcOpenShell-HDF5/blob/hdf5/src/ifcparse/IfcParse.h#L287
[2] https://github.com/ISBE-TUe/IfcOpenShell-HDF5/blob/hdf5/src/ifcparse/IfcParse.cpp#L1072

Stinkfist0 · 2018-12-10T06:59:18Z

I'll finish this this week.

Stinkfist0 · 2018-12-13T14:47:43Z

So, I'm thinking about the simplest ways to handle the entity cache invalidation. Could the modified entity perhaps easily mark the owning file as dirty so the next time the cache is used it would be cleared (or could invalidate only the ID of modified entity)?

aothms · 2018-12-14T11:41:17Z

Perhaps the entire file doesn't need to be marked as dirty? It should be possible to navigate to exactly the correct entries. But agree, let's keep that as a follow up task, as manipulating data and inspecting inverses are usually not done simultaneously. File-wide flag should be fine :+1.

…EntityList in advance. Around 18.4 % speed-up (avg. of first 2000 calls) when converting a somewhat large (176 MB) file to XML.

…s around 3x speed-up (avg. of first 5000 calls to this function).

…ry code (yields 0 layers in my tests). The LayerAssignments() calls can take up to 95 % of the function's execution time yielding no results.

…cache is invalidated.

Stinkfist0 · 2018-12-21T10:25:44Z

OK, implement a really simple cache invalidation. I did some quick testing with a more fine-grained approach but could not really nail it down. Clearing the whole cache should not really affect the performance of typical use cases where the file is never modified though.

aothms · 2018-12-24T12:32:29Z

Clearing the whole cache should not really affect the performance of typical use cases where the file is never modified though.

I agree.

one particular slow part of layer retrieval code that doesn't seem to yield any results ever

I ran through all layer assignments in the small set of test files, and at least there this relationship between layer and individual item was never encountered.

One reason why this 2nd step would be slow is that a LOT OF things in IFC are representation items. Looking at a IfcShellBasedSurfaceModel, all referenced instances: IfcClosedShell - IfcFace - IfcFaceBound - IfcPolyLoop - IfcCartesianPoint are all subclasses of IfcRepresentationItem.

Perhaps an alternative approach would be to set a limit (max_level=2) to prod->entity->file->traverse(prod->Representation()) so that only IfcProductDefinitionShape - IfcShapeRepresentation - and the immediate items are considered.

I think that would be a minimal performance difference. I can imagine how for this e.g. MEP file, there are probably huge graphs of items hidden under these representations. But the top one should be a single shell most of the time.

I mean it just feels odd to eliminate code that the schema so explicitly requires:

TYPE IfcLayeredItem = SELECT
  ( IfcRepresentationItem
  , IfcRepresentation);
END_TYPE;

aothms · 2019-02-06T10:03:09Z

@Stinkfist0 is there still work pending on this? Sorry, lost track a bit.

Stinkfist0 · 2019-02-06T10:07:34Z

Ah, right, for some reason I thought this was already merged! Does the minimal cache invalidation implementation look okay to to you? If yes, I won't be continuing further with this for now.

aothms · 2019-02-06T11:29:51Z

Yes, cache invalidation seems fine to me. There is it a minimal functional difference now, that before a null shared pointer would be returned and now a pointer is returned to an empty list. Of course the later is much better behaviour (my previous self was sometimes too focussed on needless allocations). But I do think there might be parts of the code base that do directly check the shared pointer truth value rather than checking for an empty list.

aothms · 2019-02-06T12:30:26Z

There is it a minimal functional difference now

I will take care of that.

Stinkfist0 changed the title ~~Ifcfile entity look-up optimizations~~ Ifcfile entity look-up + Kernel::get_layers optimizations Oct 2, 2018

Stinkfist0 added 4 commits December 17, 2018 14:28

Optimize IfcFile::entitiesByReference() by reserving capacity for Ifc…

e9cc122

…EntityList in advance. Around 18.4 % speed-up (avg. of first 2000 calls) when converting a somewhat large (176 MB) file to XML.

IfcConvert: print durations of file parsing and XML conversion.

bec35af

IfcFile::entitiesByReference: cache loaded references. Profiling show…

403b1ed

…s around 3x speed-up (avg. of first 5000 calls to this function).

IfcGeom::Kernel::get_layers: remove what would appear to be unnecessa…

cfc962b

…ry code (yields 0 layers in my tests). The LayerAssignments() calls can take up to 95 % of the function's execution time yielding no results.

Stinkfist0 force-pushed the topic/xmlserializer-optimization branch from 9615f53 to cfc962b Compare December 17, 2018 12:28

IfcFile: possibility to mark an entity as modified so that potential …

0d61ef0

…cache is invalidated.

aothms merged commit 000d11d into IfcOpenShell:master Feb 6, 2019

aothms mentioned this pull request Jan 12, 2021

IfcOpenShell Python segfaults when accessing context HasSubContexts after removal #1224

Closed

aothms mentioned this pull request Aug 21, 2021

Optimising geometry iterator initialisation #1674

Closed

RickBrice mentioned this pull request Mar 18, 2024

IfcSectionedSolidHorizontal #4433

Merged

Uh oh!

Conversation

Stinkfist0 commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aothms commented Oct 2, 2018

Uh oh!

Stinkfist0 commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stinkfist0 commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stinkfist0 commented Oct 15, 2018

Uh oh!

aothms commented Oct 15, 2018

Uh oh!

Stinkfist0 commented Dec 10, 2018

Uh oh!

Stinkfist0 commented Dec 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aothms commented Dec 14, 2018

Uh oh!

Stinkfist0 commented Dec 21, 2018

Uh oh!

aothms commented Dec 24, 2018

Uh oh!

aothms commented Feb 6, 2019

Uh oh!

Stinkfist0 commented Feb 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aothms commented Feb 6, 2019

Uh oh!

aothms commented Feb 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stinkfist0 commented Oct 2, 2018 •

edited

Loading

Stinkfist0 commented Oct 2, 2018 •

edited

Loading

Stinkfist0 commented Oct 2, 2018 •

edited

Loading

Stinkfist0 commented Dec 13, 2018 •

edited

Loading

Stinkfist0 commented Feb 6, 2019 •

edited

Loading