Optimize and simplify MarchingSquares.hpp. #2180

Closed
opened 2026-04-06 01:11:49 +02:00 by MrUnknownDE · 0 comments
Owner

Originally created by @dbaarda on 9/15/2025

Optimize and Simplify MarchingSquares.hpp

This slightly improves the programming interface, optimizes MarchingSquares.hpp pretty significantly, and improves the output quality. It also fixes and improves the tests.

The programming interface improvement was removing the default 1 pixel "overlap" that caused some strange scaling artefacts and confused the meaning of the "window" argument. The old code required you ask for a minimum 2x2 window, which minus the overlap meant it actually used a 1x1 window. Now the window you ask for is the window it uses, and you can use a 1x1 window.

The optimizations include reducing memory requirements, improving memory locality for caching, and significantly improving the algorithm performance. The removal of the "overlap" enabled some of the performance improvements, because it means that cell corner tags can be shared between all 4 adjacent cells because they are not slightly offset by the overlap. The algorithm has been generally improved to take advantage of sharing the corner tags, packing them tightly into bitmapped blocks, and can skip over multiple "empty" cells at a time when scanning through tags to get cell exit directions, and when scanning through cell exit directions to find the start of a new ring.

Removing the "overlap" also removed some small scaling artifacts, and SLA/RasterToPolygons.cpp included some compensation scaling (with limited success) that can now be removed. Additionally the edge interpolation includes tweaks to improve the results for the bottom and right edges of extracted polygons. This has resulted in improving the output quality, and the extracted ExPolygons more closely track the raster inputs.

A more detailed breakdown by file;

  • tests/libslic3r/test_clipper_utils.cpp - fix ambiguous Slic3r::intersection_pl() calls so that the tests/libslic3r/all tests compile.
  • src/libslic3r/Format/SL1.cpp
    • change min window from 2x2 to 1x1.
  • src/libslic3r/SLA/RasterToPolygons.cpp
    • change min windowsize from 2x2 to 1x1.
    • change RasterGrayscaleAA default windowsize from 2x2 to 1x1.
    • remove compensation scaling of pixel width and height.
  • tests/libslic3r/CMakeLists.txt
    • uncomment the commented out test_marchingsquares.cpp
  • src/libslic3r/StreamUtils.hpp
    • add stream operators for Points, MultiPoint, Polygon, Polygons, ExPolygon, and ExPolygons. Used by test_marchingsquares.cpp to show info about reference and extracted ExPolygons.
  • tests/libslic3r/test_marchingsquares.cpp
    • fix the tests to compile by removing the now gone RasterBase namespace from sla::RasterBase::PixelDim and sla::RasterBase::Resolution.
    • fix the centering and size of the test output svg files.
    • improve the test output svg files to show the reference ExPolygons (thick red lines), the raster output they generated (green area), and the extracted polygons (thin blue lines).
    • Improve the tests to use CHECK instead of REQUIRE in many places so that additional assertion failures are reported, not just the first.
    • Improve the tests using Catch::Matchers for assertions so we can see the passing/failing numbers, not just pass or fail.
    • Make more tests use the test_expolys() function for more consistent tests and concise definitions.
    • Improve the test_expolys() function to be more comprehensive and optionally more tolerant when extracted rings that touch get merged together by union_ex() inside src/libslic3r/SLA/RasterToPolygons.cpp.
    • Reformat with clang-format.
  • src/libslic3r/MarchingSquares.hpp
    • simplified interface removing "overlap".
    • change m_tags vector containing a uint8_t for each cell with that cell's tags and direction flags into an m_tags vector of uint32_t containing tags for each cell corner packed in row/column order, and an m_dirs vector of uint32_t containing 8 packed cells 4 direction flags. The direction flags indicate the direction (left/down/right/up) a yet-to-be-processed line that passing through will exit the cell.
    • change the algorithm to first scan through and set all the tags for each corner, calling isoval() once per corner instead of 4x per cell. Next it scans through all the tags to set the directions in m_dirs. Then it scans through the m_dirs to build the rings, using the directions flags to get the edges the rings cross into the next cell and clearing them as it goes. Finally it interpolates the edges in the rings into points on the ring. Both the scan of tags to set directions flags, and the scan of directions flags to find rings, process the packed data in blocks and can skip over multiple "empty" blocks of tags/cells at a time.
    • Improve the edge interpolation so that polygon bottom and right edges should be after the "filled pixel", not before it. The interpolation for all crossed edges goes from the "outside in" and stops on the first set pixel, so we move the bottom and right edges out by one.
    • Improve the edge interpolator iterator to be a random access iterator so that the binary search interpolation is O(ln(N)) on incrementing the iterator too.
    • Reformat with clang-format.

Note when looking at diffs there were a lot of trailing whitespace that got stripped by my editor, so use git diff -w to see it more clearly. You can also turn on "ignore whitespace" in the github "changed files" view under the gear icon.

Screenshots/Recordings/Graphs

It was a bit hard to get comparative data because the old tests were so broken, so I've managed to do tests using the new tests and the old implementation for comparison. This required some minor tweaks to the tests to use the old interface window arguments equivalent to the new window arguments.

The images compare the reference polygons (thick red lines), the raster they generated (green areas), and the generated output (thin blue lines);

Old implementation, Old RasterToPolygons, New Test.

one_4x4
ac_10x10

square_with_hole_proportional_2x2_mm_px_full

Note that these show that the extracted polygons are a bit offset. This is very visible in the smaller tests, and less obvious for larger images. I believe this is an artefact caused by RasterToPolygons attempts to compensate for the MarchingSquares output being a bit under-sized.

Old Implementation, New RasterToPolygons, New Test.

one_4x4
ac_10x10
square_with_hole_proportional_2x2_mm_px_full

With the RasterToPolygons scaling compensation removed the extracted polygons are no longer shifted, but they are noticeably smaller. Note that the undersizing appears to be entirely because it doesn't compensate for the top and right side interpolation going one pixel too far.

New Implementation, New RasterToPolygons, New Test.

one_4x4
ac_10x10
square_with_hole_proportional_2x2_mm_px_full
square_with_hole_proportional_2x2_mm_px_half

Note the extracted polygons are significantly better at matching the raster image. There is a little bit of "corner cutting" on some corners, mainly bottom right, which are an almost inevitable artefact of the way marching squares works. One of the problems I encountered was for the "ambiguous case" tests the extracted polygons are so close they touch, and ex_union() when converting these to ExPolygons joined them together, but the old test was looking for multiple polygons.

Also note that using very small windows exactly matching raster images is not the only use-case. Many other use-cases would use large windows, in which case the corner cutting problem goes away because the lines transition through the middle of edges, and nearly never through cell corners.

Speed comparisons;

The best speed comparison I've got is the SL1Import test in test_marchingsquares.cpp , which seems to load a frog mesh model, slice it, output the slices to rasters, convert the rasters to polygons and then use the polygons to regenerate a mesh. This test is completely unchanged in the old and new versions. This is running on my intel laptop;

Old implementation;

$ /usr/bin/time -v tests/libslic3r/Debug/libslic3r_tests -s '[SL1Import]' > tmp/opt-marchsq-oldimpl/test-SL1Import2.txt
	Command being timed: "tests/libslic3r/Debug/libslic3r_tests -s [SL1Import]"
	User time (seconds): 25.96
	System time (seconds): 0.15
	Percent of CPU this job got: 120%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.61
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 196316
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 73133
	Voluntary context switches: 1348
	Involuntary context switches: 1125
	Swaps: 0
	File system inputs: 0
	File system outputs: 42384
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

New implementation;

$ /usr/bin/time -v tests/libslic3r/Debug/libslic3r_tests -s '[SL1Import]' > tmp/opt-marchsq-oldimpl/test-SL1Import2.txt
	Command being timed: "tests/libslic3r/Debug/libslic3r_tests -s [SL1Import]"
	User time (seconds): 11.60
	System time (seconds): 0.27
	Percent of CPU this job got: 150%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.87
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 211136
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 73719
	Voluntary context switches: 1667
	Involuntary context switches: 1895
	Swaps: 0
	File system inputs: 0
	File system outputs: 39152
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

So it went from 21.6 seconds down to 7.9 seconds... nearly 3x as fast. And a fair chunk of this test is not even MarchingSquares, but all the loading/slicing/rastering/etc stuff, so the marching squares part must have an even higher relative speed improvement.

Tests

I've been running the exiting unittests. It took me a bit of effort to figure out how, but FTR my general setup started with following the instructions at https://github.com/SoftFever/OrcaSlicer/wiki/How-to-build, but despite the -t argument claiming to build tests it didn't seem to so I had to find the relevant cmake target to build it myself. Also important is setting up ccache and using -p

$ cd src/OrcaSlicer
$ ./build_linux.sh -u
$ ./build_linux.sh -d
$ ./build_linux.sh -stp
$ cmake --build build --target tests/libslic3r/all
$ cd build
$ tests/libslic3r/Debug/libslic3r_tests -t
$ tests/libslic3r/Debug/libslic3r_tests -s '[MarchingSquares]'
$ctest
  etc.

Notes

The reason I tackled this is I'm planning to migrate FillTpmsFK.cpp to use it (I'm pretty sure it will be faster than the marching squares they have), and turn it into a generic surface equation fill pattern generator, and then use it to implement variable density fill patterns like gyroid,but where the "period" density is a function of how close to an external surface you are. This will make the isoval() surface equation function calls by marching-squares much more expensive, so optimising this from calling it 4x per cell to 1x per cell when building the tags should be a nearly 4x win. Note that the tests/benches etc just doing raster lookups barely show this benefit because their isoval() is so cheap, but there was a bunch of other low-hanging fruit to optimize so I got a bunch of other wins.

I'm also not really a C++ coder, so I've probably got some stuff that'll make C++ guys cringe. I'm more old-school C and quick-n-dirty Python these days.

*Originally created by @dbaarda on 9/15/2025* # Optimize and Simplify MarchingSquares.hpp This slightly improves the programming interface, optimizes MarchingSquares.hpp pretty significantly, and improves the output quality. It also fixes and improves the tests. The programming interface improvement was removing the default 1 pixel "overlap" that caused some strange scaling artefacts and confused the meaning of the "window" argument. The old code required you ask for a minimum 2x2 window, which minus the overlap meant it actually used a 1x1 window. Now the window you ask for is the window it uses, and you can use a 1x1 window. The optimizations include reducing memory requirements, improving memory locality for caching, and significantly improving the algorithm performance. The removal of the "overlap" enabled some of the performance improvements, because it means that cell corner tags can be shared between all 4 adjacent cells because they are not slightly offset by the overlap. The algorithm has been generally improved to take advantage of sharing the corner tags, packing them tightly into bitmapped blocks, and can skip over multiple "empty" cells at a time when scanning through tags to get cell exit directions, and when scanning through cell exit directions to find the start of a new ring. Removing the "overlap" also removed some small scaling artifacts, and SLA/RasterToPolygons.cpp included some compensation scaling (with limited success) that can now be removed. Additionally the edge interpolation includes tweaks to improve the results for the bottom and right edges of extracted polygons. This has resulted in improving the output quality, and the extracted ExPolygons more closely track the raster inputs. A more detailed breakdown by file; * tests/libslic3r/test_clipper_utils.cpp - fix ambiguous Slic3r::intersection_pl() calls so that the `tests/libslic3r/all` tests compile. * src/libslic3r/Format/SL1.cpp - change min window from 2x2 to 1x1. * src/libslic3r/SLA/RasterToPolygons.cpp - change min windowsize from 2x2 to 1x1. - change RasterGrayscaleAA default windowsize from 2x2 to 1x1. - remove compensation scaling of pixel width and height. * tests/libslic3r/CMakeLists.txt - uncomment the commented out test_marchingsquares.cpp * src/libslic3r/StreamUtils.hpp - add stream operators for Points, MultiPoint, Polygon, Polygons, ExPolygon, and ExPolygons. Used by test_marchingsquares.cpp to show info about reference and extracted ExPolygons. * tests/libslic3r/test_marchingsquares.cpp - fix the tests to compile by removing the now gone `RasterBase` namespace from `sla::RasterBase::PixelDim` and `sla::RasterBase::Resolution`. - fix the centering and size of the test output svg files. - improve the test output svg files to show the reference ExPolygons (thick red lines), the raster output they generated (green area), and the extracted polygons (thin blue lines). - Improve the tests to use CHECK instead of REQUIRE in many places so that additional assertion failures are reported, not just the first. - Improve the tests using Catch::Matchers for assertions so we can see the passing/failing numbers, not just pass or fail. - Make more tests use the `test_expolys()` function for more consistent tests and concise definitions. - Improve the `test_expolys()` function to be more comprehensive and optionally more tolerant when extracted rings that touch get merged together by `union_ex()` inside `src/libslic3r/SLA/RasterToPolygons.cpp`. - Reformat with `clang-format`. * src/libslic3r/MarchingSquares.hpp - simplified interface removing "overlap". - change `m_tags` vector containing a `uint8_t` for each cell with that cell's tags and direction flags into an `m_tags` vector of `uint32_t` containing tags for each cell corner packed in row/column order, and an `m_dirs` vector of `uint32_t` containing 8 packed cells 4 direction flags. The direction flags indicate the direction (left/down/right/up) a yet-to-be-processed line that passing through will exit the cell. - change the algorithm to first scan through and set all the tags for each corner, calling `isoval()` once per corner instead of 4x per cell. Next it scans through all the tags to set the directions in `m_dirs`. Then it scans through the `m_dirs` to build the rings, using the directions flags to get the edges the rings cross into the next cell and clearing them as it goes. Finally it interpolates the edges in the rings into points on the ring. Both the scan of tags to set directions flags, and the scan of directions flags to find rings, process the packed data in blocks and can skip over multiple "empty" blocks of tags/cells at a time. - Improve the edge interpolation so that polygon bottom and right edges should be after the "filled pixel", not before it. The interpolation for all crossed edges goes from the "outside in" and stops on the first set pixel, so we move the bottom and right edges out by one. - Improve the edge interpolator iterator to be a random access iterator so that the binary search interpolation is O(ln(N)) on incrementing the iterator too. - Reformat with `clang-format`. Note when looking at diffs there were a lot of trailing whitespace that got stripped by my editor, so use `git diff -w` to see it more clearly. You can also turn on "ignore whitespace" in the github "changed files" view under the gear icon. # Screenshots/Recordings/Graphs It was a bit hard to get comparative data because the old tests were so broken, so I've managed to do tests using the new tests and the old implementation for comparison. This required some minor tweaks to the tests to use the old interface window arguments equivalent to the new window arguments. The images compare the reference polygons (thick red lines), the raster they generated (green areas), and the generated output (thin blue lines); ## Old implementation, Old RasterToPolygons, New Test. ![one_4x4](https://github.com/user-attachments/assets/e05286ca-10b3-4922-99b4-4dfd9fe72b10) ![ac_10x10](https://github.com/user-attachments/assets/d89b74f7-79ba-4df0-b159-4acf055bcf81) ![square_with_hole_proportional_2x2_mm_px_full](https://github.com/user-attachments/assets/95347301-ebfc-4639-b6b0-ae9f3147b42e) Note that these show that the extracted polygons are a bit offset. This is very visible in the smaller tests, and less obvious for larger images. I believe this is an artefact caused by RasterToPolygons attempts to compensate for the MarchingSquares output being a bit under-sized. ## Old Implementation, New RasterToPolygons, New Test. ![one_4x4](https://github.com/user-attachments/assets/869e32ec-b58d-4108-9071-c7bd03ac6eb1) ![ac_10x10](https://github.com/user-attachments/assets/355792f2-d525-44e1-8510-d10377cf52ef) ![square_with_hole_proportional_2x2_mm_px_full](https://github.com/user-attachments/assets/4c1f8946-3506-4697-93ce-830ea8967461) With the RasterToPolygons scaling compensation removed the extracted polygons are no longer shifted, but they are noticeably smaller. Note that the undersizing appears to be entirely because it doesn't compensate for the top and right side interpolation going one pixel too far. ## New Implementation, New RasterToPolygons, New Test. ![one_4x4](https://github.com/user-attachments/assets/6cd4dfb4-83ea-452e-8de9-be07edf36646) ![ac_10x10](https://github.com/user-attachments/assets/ef789427-8215-4c47-a169-69819d873ec7) ![square_with_hole_proportional_2x2_mm_px_full](https://github.com/user-attachments/assets/c6836e4b-9cca-47c2-89e8-0b65c54e47b0) ![square_with_hole_proportional_2x2_mm_px_half](https://github.com/user-attachments/assets/d8d75542-69db-4a75-b2b9-a7c6686c17fa) Note the extracted polygons are significantly better at matching the raster image. There is a little bit of "corner cutting" on some corners, mainly bottom right, which are an almost inevitable artefact of the way marching squares works. One of the problems I encountered was for the "ambiguous case" tests the extracted polygons are so close they touch, and `ex_union()` when converting these to ExPolygons joined them together, but the old test was looking for multiple polygons. Also note that using very small windows exactly matching raster images is not the only use-case. Many other use-cases would use large windows, in which case the corner cutting problem goes away because the lines transition through the middle of edges, and nearly never through cell corners. ## Speed comparisons; The best speed comparison I've got is the SL1Import test in `test_marchingsquares.cpp` , which seems to load a frog mesh model, slice it, output the slices to rasters, convert the rasters to polygons and then use the polygons to regenerate a mesh. This test is completely unchanged in the old and new versions. This is running on my intel laptop; Old implementation; ```bash $ /usr/bin/time -v tests/libslic3r/Debug/libslic3r_tests -s '[SL1Import]' > tmp/opt-marchsq-oldimpl/test-SL1Import2.txt Command being timed: "tests/libslic3r/Debug/libslic3r_tests -s [SL1Import]" User time (seconds): 25.96 System time (seconds): 0.15 Percent of CPU this job got: 120% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.61 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 196316 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 73133 Voluntary context switches: 1348 Involuntary context switches: 1125 Swaps: 0 File system inputs: 0 File system outputs: 42384 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` New implementation; ```bash $ /usr/bin/time -v tests/libslic3r/Debug/libslic3r_tests -s '[SL1Import]' > tmp/opt-marchsq-oldimpl/test-SL1Import2.txt Command being timed: "tests/libslic3r/Debug/libslic3r_tests -s [SL1Import]" User time (seconds): 11.60 System time (seconds): 0.27 Percent of CPU this job got: 150% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.87 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 211136 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 73719 Voluntary context switches: 1667 Involuntary context switches: 1895 Swaps: 0 File system inputs: 0 File system outputs: 39152 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` So it went from 21.6 seconds down to 7.9 seconds... nearly 3x as fast. And a fair chunk of this test is not even MarchingSquares, but all the loading/slicing/rastering/etc stuff, so the marching squares part must have an even higher relative speed improvement. ## Tests I've been running the exiting unittests. It took me a bit of effort to figure out how, but FTR my general setup started with following the instructions at https://github.com/SoftFever/OrcaSlicer/wiki/How-to-build, but despite the `-t` argument claiming to build tests it didn't seem to so I had to find the relevant cmake target to build it myself. Also important is setting up ccache and using `-p` ```bash $ cd src/OrcaSlicer $ ./build_linux.sh -u $ ./build_linux.sh -d $ ./build_linux.sh -stp $ cmake --build build --target tests/libslic3r/all $ cd build $ tests/libslic3r/Debug/libslic3r_tests -t $ tests/libslic3r/Debug/libslic3r_tests -s '[MarchingSquares]' $ctest etc. ``` ## Notes The reason I tackled this is I'm planning to migrate FillTpmsFK.cpp to use it (I'm pretty sure it will be faster than the marching squares they have), and turn it into a generic surface equation fill pattern generator, and then use it to implement variable density fill patterns like gyroid,but where the "period" density is a function of how close to an external surface you are. This will make the `isoval()` surface equation function calls by marching-squares much more expensive, so optimising this from calling it 4x per cell to 1x per cell when building the tags should be a nearly 4x win. Note that the tests/benches etc just doing raster lookups barely show this benefit because their `isoval()` is so cheap, but there was a bunch of other low-hanging fruit to optimize so I got a bunch of other wins. I'm also not really a C++ coder, so I've probably got some stuff that'll make C++ guys cringe. I'm more old-school C and quick-n-dirty Python these days.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github/OrcaSlicer#2180