When sysadmins want to block or unblock something it's usually the domain or URL that's easiest to target—it's right there in the address bar after all. Our web filter, LiveStream 5, also provides a more advanced evaluation tool called Expressions. Expressions define a pattern with which to match against a string of text. URLs are the string of text in our case.
Expressions let you cast a slightly wider net when trying to catch a particular type of content with your filtering rules. Of course that also means they can easily cast too wide a net if you're not careful. The name of the game is identifying what the URLs you want to target have uniquely in common.
With that in mind, I wanted to highlight a few uses of expressions in LiveStream 5 that we were impressed with, along with our own advice.
LiveStream provides three pre-defined patterns for URL expressions: Contains, Query contains and Ends with. Here's what that looks like in our management interface:
Cool? Alright lets dig in to some examples.
Blocking a file extension
Granted, your category filtering has probably blocked any peer to peer websites already, but
.torrent is a good example of a file type you won't want many of your users to be downloading at work or school.
Here's where the Ends with expression type comes in. URLs follow a typical directory structure where the last thing in the path is always the filename, and therefore the file extension (if present).
http://www.example.com/movies/new/pitch-perfect-7/dl/pp7.torrent ^^^^^^^^ Matched!
Just don't forget to include the full stop or you'll end up blocking other things too!
Unblocking a YouTube video
Granular filtering of YouTube is a complex enough endeavor to warrant its own article—stay tuned for that one—but it also provides a good example of how paying attention to the structure of URLs can allow you create some very useful expressions.
All YouTube videos have a unique id which allows several different player URLs to load them. As you can see, all of these contain the same id string (
2JW1yguRpvg) in some form:
http://www.youtube.com/watch?v=2JW1yguRpvg&feature=player_embedded ^^^^^^^^^^^ Matched! http://www.youtube.com/watch?feature=player_embedded&v=2JW1yguRpvg ^^^^^^^^^^^ Matched! http://www.youtube.com/embed/2JW1yguRpvg ^^^^^^^^^^^ Matched!
So, to target all of these variations for unblocking, we can use a single contains expression with a value of
2JW1yguRpvg. So any URL that contains that unique string anywhere in the URL will be unblocked.
Blocking bad keywords (with care)
The most important thing to keep in mind when blocking a keyword is that it isn't also part of other innocuous words that you don't want to block. An example we often see is schools trying block URLs that contain the keyword "sex", for younger kids. In this case, not only will websites containing sexual references be blocked, but also any website with references to Essex or Sussex. Whoops.
There are a handful of keywords which will only match undesirable content 99% of the time—"porn" is one example of that. Many anonymous proxy / bypass tools bill themselves as an "unblocker" of this or that website, which makes for another very blockable keyword.
You should definitely think twice before adding a contains expression to your block lists, though. In most cases it's best to rely on a URL classification system that can analyze the actual content of a page—not only the URL—and supplement that with domain-based blocks if necessary.