Jay Myers, Software Engineer
April 25, 2016

How do you check and see if items rendered in two different test drivers are close enough to one another? How close do you need to come to perfection?


test deliveryOne of several interesting issues I’ve been playing around with lately came out of the ongoing development of a new test driver we’re writing here at Strasz. The problem was that for an organization with a large bank that has already undergone some kind of layout or format review, changing the test driver could potentially mean changing the way an item is rendered during test delivery. This would obviously invalidate any kind of format review that had been done on these items, meaning that each item would have to be reviewed again which is very, very time consuming.

To walk through why this might happen, let’s say that an item or part of an item is stored in html and is then rendered for test delivery through some kind of a web browser. The underlying browser that the test driver uses will play a big part in how the item looks – different browsers and versions have different sets of rules for how content should look. This means that if you’re switching from one driver to another, or if you’re upgrading the browser which your driver uses, you are potentially changing the way each and every item looks. This makes some people uncomfortable.

Realistically, across a large enough gap of browsers and versions (say, a jump from a browser that was common in 2009 to a browser that’s common today) you won’t really be able to guarantee that every item in a large bank will render correctly with CSS tweaks alone – there are just too many variables to make this feasible. And besides, you probably don’t need everything to be pixel perfect – you just need to make sure that the item isn’t broken, tables are still formatted in the same way, text isn’t cut off or misplaced, etc…

This is the crux of the problem that needs solving. How do you check and see if items rendered in two different test drivers are close enough to one another? Well, if you boil the problem down to its fundamentals you really need three things. The first two are easy – you need a picture of the item rendered in both the old and the new driver. The other piece you need is some kind of a strategy for comparing two images to see if they’re similar enough, and this is where the fun begins.

After some discussions about what was feasible and what would make everyone comfortable, a decision was made to define “similar enough” as having roughly the same shape. That is to say that text, options, tables, and other renderable elements were in roughly the same spots and were similarly sized and shaped. To actually implement that, it turns out you can take an image of the item in the old driver, isolate the important pixels (the ones that belong to rendered content and not the background), and add a sort of buffer zone around each pixel. The size and shape of this buffer zone can be altered to adjust what it really means for two items to be close enough to one another. The resulting image is a kind of fuzzy version of the original item that defines all of the locations within which it is acceptable to render content for the item.

Once you have this image of acceptable locations for rendered content, you can overlay the image of the item in the new driver on top of the image containing acceptable locations and see if any of the rendered content falls outside of that acceptable range. If it does, then that item needs to be flagged for manual review. If it doesn’t, then we know that no content has been rendered outside of the areas already deemed to be acceptable.

Once you have the logic for this kind of comparison implemented, you just need to hook it up to some way to obtain the baseline (old driver) and new (new driver) images and you can let it run over as many items as you like. For instance, you could host the new test driver somewhere and give it each item as you want to review it, then obtain an image of the rendered content from the screen as it is displayed. After your review is complete, you will have a list of the items which were compared, their results (pass / fail), and for each failure that needs manual review the overlay image can be viewed with the points of failure highlighted for easy review.

Depending on how high you set your tolerance for content shifting around inside the item, you can avoid having to manually review a very large portion of your item bank. Or if you’re more cautious, you can set the tolerances a bit lower and have a few more items to manually review but leverage the resulting comparison image to quickly see where the differences actually are.

After everything is all set up, you can use the same mechanism to evaluate any potential item rendering impacts that a change of test driver (or an update to an existing driver) may have on each and every item in your bank. Freeing you up to experiment with new delivery strategies or browsers without the fear of having to manually review an entire item bank.

LINKS:

Common in 2009: http://gs.statcounter.com/#browser-ww-monthly-200901-200901-bar

Common today: http://gs.statcounter.com/#browser-ww-monthly-201412-201412-bar

CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets

Jay Myers, Software Engineer
April 25, 2016

How do you check and see if items rendered in two different test drivers are close enough to one another? How close do you need to come to perfection?


test deliveryOne of several interesting issues I’ve been playing around with lately came out of the ongoing development of a new test driver we’re writing here at Strasz. The problem was that for an organization with a large bank that has already undergone some kind of layout or format review, changing the test driver could potentially mean changing the way an item is rendered during test delivery. This would obviously invalidate any kind of format review that had been done on these items, meaning that each item would have to be reviewed again which is very, very time consuming.

To walk through why this might happen, let’s say that an item or part of an item is stored in html and is then rendered for test delivery through some kind of a web browser. The underlying browser that the test driver uses will play a big part in how the item looks – different browsers and versions have different sets of rules for how content should look. This means that if you’re switching from one driver to another, or if you’re upgrading the browser which your driver uses, you are potentially changing the way each and every item looks. This makes some people uncomfortable.

Realistically, across a large enough gap of browsers and versions (say, a jump from a browser that was common in 2009 to a browser that’s common today) you won’t really be able to guarantee that every item in a large bank will render correctly with CSS tweaks alone – there are just too many variables to make this feasible. And besides, you probably don’t need everything to be pixel perfect – you just need to make sure that the item isn’t broken, tables are still formatted in the same way, text isn’t cut off or misplaced, etc…

This is the crux of the problem that needs solving. How do you check and see if items rendered in two different test drivers are close enough to one another? Well, if you boil the problem down to its fundamentals you really need three things. The first two are easy – you need a picture of the item rendered in both the old and the new driver. The other piece you need is some kind of a strategy for comparing two images to see if they’re similar enough, and this is where the fun begins.

After some discussions about what was feasible and what would make everyone comfortable, a decision was made to define “similar enough” as having roughly the same shape. That is to say that text, options, tables, and other renderable elements were in roughly the same spots and were similarly sized and shaped. To actually implement that, it turns out you can take an image of the item in the old driver, isolate the important pixels (the ones that belong to rendered content and not the background), and add a sort of buffer zone around each pixel. The size and shape of this buffer zone can be altered to adjust what it really means for two items to be close enough to one another. The resulting image is a kind of fuzzy version of the original item that defines all of the locations within which it is acceptable to render content for the item.

Once you have this image of acceptable locations for rendered content, you can overlay the image of the item in the new driver on top of the image containing acceptable locations and see if any of the rendered content falls outside of that acceptable range. If it does, then that item needs to be flagged for manual review. If it doesn’t, then we know that no content has been rendered outside of the areas already deemed to be acceptable.

Once you have the logic for this kind of comparison implemented, you just need to hook it up to some way to obtain the baseline (old driver) and new (new driver) images and you can let it run over as many items as you like. For instance, you could host the new test driver somewhere and give it each item as you want to review it, then obtain an image of the rendered content from the screen as it is displayed. After your review is complete, you will have a list of the items which were compared, their results (pass / fail), and for each failure that needs manual review the overlay image can be viewed with the points of failure highlighted for easy review.

Depending on how high you set your tolerance for content shifting around inside the item, you can avoid having to manually review a very large portion of your item bank. Or if you’re more cautious, you can set the tolerances a bit lower and have a few more items to manually review but leverage the resulting comparison image to quickly see where the differences actually are.

After everything is all set up, you can use the same mechanism to evaluate any potential item rendering impacts that a change of test driver (or an update to an existing driver) may have on each and every item in your bank. Freeing you up to experiment with new delivery strategies or browsers without the fear of having to manually review an entire item bank.

LINKS:

Common in 2009: http://gs.statcounter.com/#browser-ww-monthly-200901-200901-bar

Common today: http://gs.statcounter.com/#browser-ww-monthly-201412-201412-bar

CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets