Preventive maintenance; are you looking in the right places?

It used to be that good preventive maintenance meant walking around, looking, listening, and generally being a good observer. Attention to detail was important. It was the sounds, and sometimes the smell that told you something was wrong. Writing down meter readings was a normal event. And while some of that remains, most of today’s preventive maintenance can be done at your desk. With a little scripting it can be sent to your email; allowing it to be done easily from almost anywhere.

Logs, SNMP traps, and a variety of other items need to be considered when doing good PM today. Not all SNMP traps end up in the logs. Notice I said logs, plural. Most systems have several logs that are for different subsystems. One log may contain entries that are specific to the hardware, while another may be for the operating system (OS). Some logs relate directly to the application. With computer-based systems you really do need to dig as deep as possible; and on a regular basis. For instance, a quick check of Windows 7 shows log categories that include Application, Security, Setup, System and Services. All of these can show problems that relate to basic system health. It is likely that most of these categories are not looked at until too late. Most groups that check logs may only look at specific logs that relate directly to their area of expertise. Yes, problems with the system will eventually find their way in to the application logs, but many times these problems can be caught much earlier when all logs are regularly scanned for issues.

A good example of watching a problem develop can be seen on most digital receivers. RF paths are unreliable, forward error correction (FEC) is used to correct errors that occur through the transmission path. At any given time there will be some number of errors, and (hopefully) all, or most, will be corrected using the FEC. Tracking those numbers over time can show degradation in the antenna and receiver. The same is true in a computer NIC, hard drive system or memory. There will be some errors that are corrected, and over time that number may grow. However, when the number of errors increases, either sharply, or over a predetermined threshold, it is time to find the source of the errors and correct them. The only way to find those gradual increases is by tracking the numbers over time, just as we did with meter readings.

Digital systems can easily lull you into complacency. Things work great…until they don’t. Then what? The truth is the system probably left plenty of clues, many of which were likely missed because no one was looking. There are tricks that can be used; management systems can gather and report log errors, scripts can search log files and their contents. Even something as simple as the size of the log file can be an indicator of developing problems. If the files are normally 2K, a 4K file might be a warning sign that needs closer examination.

Another thing to be aware of is the various thresholds employed by the manufacturers. Manufacturers might categorize events as warnings, minor errors and major errors. A lack of major errors does not mean all is well. Look at minor errors and warnings. Ask the manufacturer if logged events other than errors or warnings should be tracked. If you don’t understand the events in the logs, ask about them.

Manufacturers are usually open to sharing that information with their customers. Throughout the software development cycle, all sorts of items are written to the logs as part of the debugging process. Many of the logged events detail how the system is really functioning, much like the margin numbers on digital receivers.

Today’s systems have a remarkable and somewhat uncanny ability to adapt. While that often makes our jobs easier, it can also mask an incredible number of problems. Some of those problems will come back and bite. Some won’t. Like all PM programs the question is when do you want to find out? On your terms? Or after the system is down and it’s too late? Personally, I would rather find out during the day instead of receiving a frantic phone call at 4AM.

Posted in Basics | Leave a comment

Automated, but not unattended

Lately, there have been several instances of confusion on the part of management that automated operations can be done without personnel, or, even worse, done with relatively unskilled personnel. There is this thought that the knowledge, skill and experience of top technical people can be saved in a knowledge base and then given to unskilled personnel and everything will be fine. Not true.

A knowledge base or playbook that details do ‘X’ when you see ‘Y’ is fine for common events and fixes that are known to have little effect on the rest of the system. However, in today’s highly integrated systems, fixing one problem can easily cause another if the fix is not handled properly or done in the right order. As an example, you could easily have a server in a network load balance that needs to be restarted due to an issue. Simply going to the start button and restarting it does not remove the server from the load balance, nor does it accommodate the current user sessions properly. Allowing the system to shed current sessions (and not starting new ones) before removing it from a load balance takes care of any users currently on that system. It also allows the other computers in that load balance to take on additional load properly. This is also true of devices connected to storage area networks (SANs). Proper task sequencing is crucial to maintaining the integrity of the files and systems that are connected.

Often, automation is used to accomplish complex tasks that are simply tedious or repetitive. Due to the complexity of the operation, things can go wrong on one pass that will work fine on the next. An acquaintance in the software industry is fond of saying ‘there are no gremlins’ and he is right, but in some scripts or automated sequences it is not worth the time to find the occasional random issues – it is easier to re-run the script and have it succeed on the second pass. But sometimes – and this is where skill and experience are important; things go wrong that cannot be solved by simply re-running the script. When this happens, having junior level personnel at the helm can be problematic. Do they know enough to correct the issue without breaking something else? Are they confident enough in their skills to know when to call for help?

Documentation is critical in today’s highly integrated systems. Having a knowledgebase that can be used and updated by all personnel is also critical. The key is making it meaningful. If all entries have ‘escalate to engineering’ as step 1 once the problem is identified, that is not much help. Nor, in the hands of junior personnel is a complex task sequence that should only be performed by knowledgeable maintenance personnel during a maintenance window.

Today’s automation allows a small group skilled individuals to accomplish things that could not have been done by an army 10 or 20 years ago. At the same time, it still requires someone to watch over it, and step in when things go wrong. Whenever large-scale, automated operations are running, the potential exists for any one of many sequences to fail. When that happens, things can snowball very quickly. Do you have the processes and personnel in place to stop that snowball effect and get things back without significantly impacting your business, operation or customers? If not, now is the time to address it – before that snowball starts rolling.

Posted in Basics | Leave a comment

Test early, test often and monitor constantly

Anyone that troubleshoots systems is familiar with the concept of breaking things down into subsystems and narrowing down the problem so the cause can be determined. Today’s systems are often more complex than ever, and the interactions between systems are just as complicated. A problem in one area may not show up for days, weeks or even months. Even more challenging, is that the problem may be apparent at one level, with the cause in an entirely different section of the system.

The key to finding these problems, and more importantly, preventing them in the first place, is a rigorous set of test cases combined with a detailed set of test and monitoring tools. Test cases should be developed to determine the suitability of an installation. Test and monitoring tools need to be applied during each portion of a system’s lifecycle. Operational monitoring needs to reflect those test cases and ensure the system is functioning as expected, and just as important, not functioning as not expected. Network routing is a perfect example of this. If you install a specific network route from point A to point B, for a system to work, that system should stop working if the route is removed. Both sides of that test case need to be tested and verified. If the system continues to function after the route is removed, a path between the units exists that likely should not. While it may not cause an issue today, that could change in the future.

Your test cases need to test all layers thoroughly. NTP can be a good example of this. The fact that you can ping an NTP source, does ensure it provides quality network time to all devices. I just went through a lengthy exercise where a new NTP system worked, but not completely. Depending on the device querying the server, the response was often rejected. Unfortunately for us, it took 3 or 4 weeks for the clocks to drift sufficiently to see there was a problem.

The system was originally tested from at least 8 different cities in the US, and in all cases, we were able to get a quality time sync. The NTP server was also tested with a variety of operating systems and devices – all looked good. Because a similar system was already in place and working, we missed one critical test – did it work with the domain controller (DC) on our Windows network? Four weeks later, when domain member machines started complaining that their Windows time did not match NTP, we realized we had a problem. This NTP server provides a version 4 response, but our Windows Server 2003 sends a version 3 query and would not interpret the version 4 response. The ‘similar’ system had been installed with 2008, and we were in the process of upgrading ours, but had not cutover to the new DCs. Windows Server 2008 sends a version 4 query and expects a version 4 response.

As it turned out the NTP server’s manufacturer had run into this and already had a firmware upgrade that allowed the NTP server to respond to each version appropriately. Those on the manufacturer’s side were convinced this would fix the issue, and it did – but not completely. Responses now came back correct, but due to an incompatible peering between the DC and NTP server, the server once again began rejecting nearly all NTP requests from the DC. Correcting that required a registry entry.

In retrospect, this should have been caught day on installation, but it was easy to assume that if we could ping the NTP server, all else was good, we had synced up plenty of devices and under the circumstances we had no reason to believe Windows server 2003 was any different.

When engineering , installing and operating a complex system, thorough testing is your best defense against problems. A complete set of test cases (positive and negative) will gain you needed familiarity with each new system in your facility. Until each test case is and understood and accepted, do not assume your new system is working. Additionally, your database of test cases should be amended as problems arise. Additionally, today’s monitoring systems can often provide a way to verify that systems are functioning properly on a day to day basis by running the test cases in the background and checking that the responses are correct.

Posted in Audio, Basics, Control, Video | Leave a comment

Gotchas

For the last several years I have been involved with specifying and purchasing a variety of systems. These normally start off with a salesperson offering up various solutions that might fit. As the system gets closer to reality, a sales engineer gets involved, and the discussion usually gets far more technical. Often, there is considerable give and take between what the sales person thought was needed and what the sales engineer specifies. None of this is really a surprise, the job of a sales person is to sell. Technical knowledge, while helpful, is second to salesmanship. Sales engineers, however, should have that technical knowledge and have a detailed understanding of what their company’s equipment will and will not do. Additionally, it is in the sales engineer’s best interest to understand the customer’s requirements and installation as well as possible.

Often the challenge appears when the field engineer arrives. The challenge comes out of the blue with a statement like “our management network has to be on its own subnet,” or “you can’t have both GigE ports on the same network,” or “even though the NIC is a GigE, it is limited to only 50Mbit/s ingress.” While each of these requirements may be reasonable within the equipment’s design, none should be hidden until installation. If a company is selling a network-capable device, it should act like any other network-capable device. If it has two NICs and both can run with a static IP, there should be no limit as to the addresses assigned. If there are requirements, they should be clearly spelled out on literature and through the sales process.

When working with multiplexed streams over networks, it is very easy to get caught by these items as technology changes. For instance, HD MPEG-2 was commonly distributed at 15Mbits/sec. At that time, 100Mbit/s NICs were also common. It was easy to put one stream on the NIC, and the designers may have limited the NIC drivers to 6 streams. This is because more than 6 streams at 15Mb/s would potentially swamp the NIC. Enter MPEG-4 at 5Mbits/sec, now you can easily playout 15 streams and not swamp the NIC (only 75Mbits/s). However, if that (long forgotten) driver code is still in place, the limit is 6. Gotcha.

With technology moving as rapidly as it is, items such as these become more common. Sometimes, it is as simple as a code upgrade. Sometimes it’s not. Code upgrades can take considerable time and that is not always part of your project’s timeline. Workarounds are not really how you want to implement new technology. Speaking from experience, code upgrades do not always correct the problem—and sometimes the new code comes with bigger issues than the old.

One solution is a detailed installation plan prior to purchase. In it, each and every IP address, connection and configuration setting is discussed, primarily with both a field engineer and the sales engineer. That way, any items that conflict with known implementation issues can be identified up front. Once identified, they can be resolved, or the purchase can be delayed, or potentially, other equipment can be considered for the project. Ultimately, all of that will information can be detailed as part of the purchase so that both sides fully understand what they are agreeing to. In the unfortunate event that it prevents a sale, that information gives the sales department the ammunition they need to go back to the technical side and get the situation corrected quickly.

Posted in Basics | Leave a comment

Network Protection

Who, or what, is protecting your network? For most, it will be a firewall, combined with some network security practices. Stations within larger groups likely have plenty of people at corporate dictating how the networks and IT systems are administered. Smaller stations may be in the unenviable position of doing it themselves or having their IT functions outsourced (possibly through a trade-out) to a local IT shop. I have been in all the above situations, and given sufficient knowledge, having to do it yourself has been the best, simply because you have the control and can ensure the technical systems are secure. Having a knowledgeable firewall and network person to bounce ideas off and collaborate with is even better.

Let’s look at a typical station’s in-house network layout. First, you will have a network more or less equivalent to any other business for running normal IT functions such as file operations, printing documents, etc. On top of that will be sales and traffic which will have databases for inventory, log generation and reconciliation. These operations are close enough to most sales systems that a typical IT shop should be able to handle them without issue. News, possibly weather, and production systems may begin to tax your local IT shop as the systems are built around operations that can be totally outside their comfort level. Finally, there is the engineering network. Among other things, automated playout of the station’s programming is on this network. It needs to be protected. To complicate matters, many of the systems on this network are likely running older operating systems that cannot be upgraded. Think of it this way, how many of the people with access to that network would you trust with a little green screwdriver? If you don’t trust them with a screwdriver, why are you allowing them access to the network?

A quick examination of the IP addresses in use will likely to show the entire station is on the same subnet. Very likely, network addresses all start with 192.168…Unless the station and/or group’s addresses are broken up into smaller subnets, you cannot control the routing between them. In other words, the new secretary in sales has complete access to all network devices in engineering/master control. So does the 12 year-old script kiddee that found his way into your network through an unsecured wireless card in the newsroom.

First things first, secure every device on the network through the use of logins and passwords. That way if someone does get to a device, the password and login will slow them down. This can prevent a lot of accidental intrusions. Second, make sure you are not using default passwords, logins or guest accounts. Confirm through testing that those accounts are not functional. Devices on the network that cannot be secured need to be re-evaluated. Not only is it easy to get into these devices once on the network, but there is also potential to launch an attack from one. A separate subnet or group of subnets for unsecured devices can add extra challenges, but can also serve to protect the rest of the network.

By subnetting your network, you create small workgroup- or task-based networks that can be secured through the use of a firewall. Within the firewall, each subnet could be considered a zone with specific traffic within the zone, along with specific traffic in and out of that zone. In both cases, with a good firewall/router, definitions can be very specific as to what sources and destinations are allowed. This is why keeping those unsecured ‘launch’ devices out of a normal zone is so important. By keeping them in an area to themselves, traffic can be restricted in and out, only allowing the addresses, ports and applications necessary to those specific devices.

For instance within a typical firewall, there would be two simple zones; trusted, and untrusted. On a firewall protecting a business network from the internet, obviously the trusted side is inside the business, and the untrusted side is the internet. All traffic that originates on the untrusted side can be denied. Traffic that originates on the trusted side can be allowed based on source, destination and application. Responses to the requests, for instance going out to websites, are allowed back through as the firewall understands that the request originated on the trusted side. If you wanted to allow traffic in from the outside, say for remote desktop sessions, that can also be done. Again this could be based on source, destination and application. To do this, you would open a pinhole in the firewall; the more specific, the better. The pinhole would allow traffic from a specific source to reach a specific destination inside for a specific application. For example, the Chief Engineer might be able to access his desk computer from home using Windows Remote desktop. The source would be the CE’s home gateway’s IP address. Destination would be the CE’s desk computer’s IP address, and the only application allowed would be Remote Desktop. Access at this level would allow the CE to do anything from a home computer that can be done from the computer in house. Additional security could be accomplished by only allowing login through a public/private key pair.

Network security, like good maintenance, is a matter of a lot of little things. Over time, these little things add up. Each layer of security you add to the network secures it against both accidental and intentional intrusion. Now is the time to start, before the next virus, next hire, or next intruder targets your facility.

Posted in Basics, Control, General | Leave a comment

Organizing the Chaos

Every piece of gear in your facility requires some level of organization. For traditional equipment, the requirements might be as simple as keeping the original box, a warranty card and the owner’s manual. For newer gear, there is a physical list, and then there is a logical or software list. Both need to be organized in a consistent manner. If not, sometime in the future, you will need one or the other and finding it will be like finding a needle in a haystack.

Let’s look first at the physical side, items on this list include:

  • Original box: Unless the device is very expensive or fragile, you probably won’t need the original box after the first 30 days of the warranty period. Transmitter tubes such as klystrons are an obvious exception to this.
  • Original paperwork such as PO, owner’s manual, warranty card, packing list, installation instructions: All of this can be kept in an expandable file folder. Depending on the organization, some of the paperwork (originals and/or copies) associated with cost and delivery may need to be kept in the Chief Engineer or Accounting office. The unpacking stage is a very good time to record items such as serial number, firmware/software version and license information. Some of the items in this list may be on CDs. More on this later.
  • Physical parts: These may include mating connectors, extender boards, specialty tools, custom cables, spare parts and mounting hardware. This is where things can get tricky. That expanding file folder may not hold the mounting hardware or extender cards. Sometimes it is better to put the mating connectors on the equipment, so they are there when needed. Other times, especially when considering custom cables, it is better not to add them to the gear as it tends to clutter up the racks.

Given the above, make some decisions and stick to them. Implement the decision (process) consistently when every piece of gear arrives. Methodically record serial numbers and relevant info including PO number on the original paperwork and decide what to do with duplicates. Three LCD monitors will come with three manuals, three sets of cables and three sets of mounting hardware. Three folders or one?

Where will you keep the spare parts (if any)? The challenging part is the diversity of sizes. Connectors will fit in a parts drawer, but rack rails, spare fans and the like will not. One idea is to take a tip from the warehouse industry. Put like-size pieces together and track where they are. A simple way to do this is to mark each large piece plainly and note where it is stored. Place that list along with smaller parts in the expanding file folder mentioned above. A master list can be kept in a database. This can be simplified with a barcode reader combined with the ability to print bar codes on self-stick labels.

Turning to the logical and software side, there are many items here that are easily missed, but critical in the event of a problem. Additionally, it seems the list keeps growing. The first decision is where to keep your records. Your laptop should not be the first choice. Unless you run a one-man shop, all these records belong on some sort of shared drive. A document share may work, but a lot of the pieces may not ‘fit.’ Regardless of where you choose, the location needs to be backed up regularly. And, you need to be able to access those backups quickly in the event of failure. Also consider how to handle remote locations such as the transmitter site. Can it reach your chosen storage? If not, this would provide a good argument to get the transmitter site on the network. Items in this category include:

  • Electronic documentation: This will include any files on the original CD, release and technical notes downloaded from the manufacturer’s website, copies of tech. support emails that detail log entries or maintenance procedures. Before and after screen shots when making changes.
  • Original software: Often, equipment includes the original version of software on a disk that is included in the box. If so, make an .iso backup of the disk and include a text file of any information printed on the disk or box in case the original is lost.
  • Configuration files: Any file in the device that you make changes on belongs here including the original and current configuration files. These should include lists of IP addresses and other relevant info in a clear text file if the configuration file is not clear text.
  • License files: Much of today’s equipment has licenses that allow additional features. Any files that you are sent via email need to be also kept here in the event of a problem.
  • Upgrade files: Often, manufacturers package upgrades as .zip or .tar files. Archive these files in case they are needed. Many times, these archives come in handy after a failed upgrade. For instance, the original version was 2.0, you upgrade to 2.1 and all goes well. Then, you upgrade to 2.2 and there is a problem. How do you get back to 2.1 at 2AM?
  • Tools: There may be executable programs provided by the manufacturer that allow diagnostics or log collection. Often along with the executable there may be documentation on how to install and use the tool.
  • Procedure documentation: Often called MoPs (Method of Procedure), these documents may detail how to upgrade a device within your environment. Written by internal staff or possibly the manufacturer, these documents detail the order, steps and checks that need to be done when interacting with a piece of equipment in a professional environment. They often detail health checks as well as testing, upgrade, reboot and back-out procedures. The level of detail varies, but passwords are rarely included to avoid that information getting into the wrong hands. Along those same lines, having too detailed of a MoP can allow anyone to follow the procedure. Fewer details will limit the ability of unskilled personnel to get very far.
  • Passwords: This is a good time to decide who will have passwords and where they will be kept. Also, will the default passwords be changed? If so, good records with solid backup procedures need to be maintained.
  • Electronic maintenance records: If not kept in a central location such as a database, they need to be kept with everything else that applies to the gear. Even if they are in a database, having an updated plain text copy here never hurts.

Today’s facilities are in constant flux. Each device in the facility has its own software and hardware needs. Maintaining today’s facility goes beyond cleaning the physical side, checking for loose screws and making sure the power supply is at the correct voltage. Maintenance today includes backing up configurations, making sure you are on the right level of firmware and having the latest patches in your equipment. Knowing what you have, and being able to find it is the first step in maintaining today’s facilities.

Posted in Basics, General | Leave a comment

Spreadsheets or a Database?

I was once told that you can’t improve something, unless you can measure it. While I did not fully understand it at the time, the concept has stuck with me. Every time I am asked about improvements to a system, I think about ways to measure whatever needs ‘improving.’ Often that touches off a lengthy discussion about exactly what would be considered an improvement.

Part of measuring something is documenting the measurement and storing it for later reference. Unfortunately, the tool of choice is often a spreadsheet. Why is that? The best answer I have come up with, is that spreadsheets are easy, and databases are hard. Nearly anyone with computer skills can open up and fill out a spreadsheet. Databases require thought, sometime lots of thought. But what are we storing? Lots of data points. Data points belong in a data base (database). Period.

The problem with spreadsheets is that they are like Post-It notes. They just multiply. Pretty soon they are everywhere. Everyone has their own. All are similar but none actually have any relational hooks, and they are all out of sync. Often there are multiple copies of the same sheet and there is no way to merge them. Every system I know is made up of many data points. These might be meter readings, IP addresses, cabling connections, SNMP alarms, tuning parameters, and the list goes on. For your own sanity–Stop the madness. Learn how to use and build a database, or hire someone to do it. Insist that spreadsheets are moving into a common database that is regularly backed up.

A good database to start with is Microsoft’s Access. It is the one I learned on and the tutorials are excellent. Microsoft SQL is huge, but a much better long term choice, especially if you have an all Microsoft shop. My favorite is MySQL. It interfaces well with both PHP and Apache, it is easy to use from the command line, and GUI tools are available open source. It is free for non-commercial applications, but you should always verify that any software is being used within its license terms. The combination of MySQL, PHP and Apache running on a Linux server is an inexpensive and powerful addition to any shop.

First some terminology: A single sheet in a spreadsheet (commonly called a flat file) is the equivalent of a table in a database. Tables are a basic building block within a database. As a start, the next time you want to open up a new spreadsheet, create a table in a database instead. To be more specific, we are talking about a relational database where individual tables can have defined relationships between each other. Often this is done with a key, a unique value that is in two related tables.

Keys can be text, but are often numeric values. The relationships might be one-to-one, or one-to-many. A one-to-one relationship might be defined between two tables; one table containing names of employees, and another containing cubicle or office information. That way if someone changes offices, all that needs to be changed is the key that relates the two tables. One-to-many relationships are more common. One table might contain names, with another table containing phone numbers. One person could have several phone numbers including office, home, mobile etc.

Even if there are only a few relationships, databases can come in handy as a central location for numerous tables (spreadsheets). Record locking can prevent two people from changing the same record at the same time, and the database file set can be easily backed up as it is in a single location. The more you need to track, the better databases are at tracking it. Even better is the ability to query your data. Queries are the other basic building block of a database. Queries allow you to select, insert and update your data. In the end, with just a little extra effort, you will find that a relational database offers all the ease of use of any spreadsheet, and a whole lot more. Give it a try, and you will be rewarded for your effort.

Posted in Basics | Leave a comment

Life Expectancy and Return on Investment

When I started in broadcast, the majority of the station’s equipment was old. Some was five years old, some ten, but there was plenty of 15-20 year old equipment. At that time, anything less than 5 years old was considered “new.” Nothing contained microprocessors, but ICs were common, transistors more so, and tubes (besides those in the transmitter) were not completely out of the picture. Station management had long-since paid for and depreciated the gear. And, they were still seeing a return on that investment.

At that time, the challenge was to keep the gear operating at a reasonable cost throughout its useful life. Today, things are different. Most devices seem to get through their useful life without a problem. They are still fully functional, but no longer useful. They require little or no maintenance and repair, but are useless (and worthless) long before they quit working. The challenge today is to get the gear in, up and running, and paid for before it becomes functionally worthless. Sometimes that life span is measured in months rather than years.

Several considerations weigh heavily on how long you can expect something to be useful; proprietary systems cost more and typically are functionally useful longer than commodity systems which usually cost less. Many ‘cutting edge’ systems are based on a combination of high end server hardware running proprietary software. When that is the case, there is a possibility that upon replacement, the servers can be put to some use. That gets you a little extra life from the hardware, but does nothing for the software which can be the greater part of the cost.

Even getting extra life out of the hardware is a gamble. Will the system internals need to upgraded for the system to be useful in its new role? Is there time and personnel available to adapt the old systems, or is it more cost-effective to purchase the correct hardware for the job? With the constant change, often it is easier and cheaper to buy new. A quick test is to look at whether or not you have been able to utilize hardware you have recently removed from service. If you don’t have the time to adapt your current old servers and systems, unless you anticipate major changes, it is unlikely you will have the time or desire to re-purpose the next round of out-of-service systems.

There is little doubt that purchasing new systems can open doors to new or additional revenue streams. However, in light of the pace of technological change, make sure you include reasonable estimates of how long you will be able to keep that new system useful and in use.

Posted in Basics | Leave a comment

Audits, part 2

Part 1 of our audit article certainly pushed some buttons. I received quite few emails regarding previous and current management that insist upon overtaxing existing infrastructure, and/or are completely oblivious to the fact that there is a limit to how much electricity you can use, even though there are plenty of outlets left…On the positive side, I received a reminder regarding vehicles. Many of the larger production truck/trailers have meters on them for power, but it is easy to forget about HVAC loads. On the subject of loads, when adding equipment to vehicles, make sure you consider weight and balance, as both can affect handling and licensing.

Certainly, power and HVAC are big items that need to be monitored for capacity and growth.The explosion of networked equipment brings with it many other areas that really should be monitored but rarely are. These include IP addresses, network bandwidth, usernames and passwords as well as software licensing. Each of these items could be an article in itself, but for now we will touch on the highlights as facility size can greatly influence how they could be handled. Within a small group or facility, these items might all be handled by one person on an as needed basis. Larger facilities might have one or two dedicated staff members assigned to the task, or even outsource it to a local IT shop. Most of the larger groups and networks will have a complete departments focused on tracking all these items and more. In short, the larger the network, the more people will be needed to keep it healthy.

IP addressing is a simple concept that can easily turn into a nightmare if not tracked. For instance, let’s say you take the easy route, set up for DHCP (dynamic host configuration protocol) and allocate a class C address range (256 addresses—254 usable) to the facility. Six to twelve months later, you plug in the 255th device and cannot get an address. In the heat of the moment, the last thing on your mind is going to be that DHCP allocation pool running out.

Setting static IP addresses and keeping track of them on a spreadsheet might be an entirely easier endeavor, if for no other reason than you have to look at it once in a while to get new addresses. DHCP works very well in an office setting where laptops come and go and do not need a set address. For stationary equipment, I have always had much better luck with a static addressing scheme and keeping track of the addresses in a spreadsheet, database or web application.

If you have no idea how your addresses are setup or used, there are numerous network scanners that can provide considerable details as to your network layout. Make sure before you use them that you have the necessary rights and permissions so that the scan is not viewed as a hostile threat. One facility I know of ran a scan on what they thought was an isolated network of about 200 machines and found that someone had infiltrated that network and there were more than 1500 addresses that could access the network—a major security risk!

Network bandwidth, much like all those extra outlets is something that can easily be higher than expected if not monitored. Many switches provide GUIs and SNMP alarms that include bandwidth utilization. Like power and cooling these systems need to be checked regularly and tracked to ensure they are not overcapacity and are in line with expected growth patterns. Trigger points for additional capacity can easily be established well in advance. This will provide additional data when submitting budget requests. Tracking data, combined with previously agreed to limits can make it much easier for management to approve the necessary funds.

Users and passwords are tricky. Obviously, there should be no unauthorized users on your network, but how many pieces of equipment have the default user and password as the primary way in? If the device is on the network and there is access from the outside, it can be a real security problem. How many people have the administrator or root password to the network servers? Do you trust them all? Can you track them? Administrator passwords need to be well guarded, but not too well…if only one person knows it, and they get hit by a bus; then what? A hierarchy of leadership and technical personnel needs to manage the ‘keys’ to the system. Passwords need to be kept secure, but major network passwords should not be trusted to only a single individual. Every business is different, but technical personnel with passwords need to ensure that necessary passwords are accessible by others in the event of an emergency.

As you can see, there are plenty of things within the facility that need to be monitored. If your facility is like most, you are too busy to track these items. Unfortunately, the less they are monitored, the more likely a crisis is. If nothing else, simply mentioning the fact that your current capacity is unknown might get you some help in finding the answers. Good luck!

Posted in Basics, Control, General | Leave a comment

Time for an audit?

Facilities change. ICs replaced transistors, digital replaced analog and HD is replacing SD. One would think that with all the miniaturization, heat loads and power requirements would shrink also. Maybe they did in your facility, but not mine. Any gains I might have realized from reduced loads from video and audio gear have been offset two-to-tenfold by the servers and network gear that is now needed. What’s an engineer to do?

One solution is to hire a local HVAC professional and get a facility audit. They will come in, do a survey of the various heat loads and cooling capacity and determine what is needed. More importantly, they can show you where you have too much cooling, or not enough. Balancing the system so that the cooling is where you need it, instead of where you don’t can make a big difference. They will also determine how much extra capacity you have. This would be a good time to discuss backup cooling. Typically, broadcast facilities have a large cooling unit for the studio. It is needed whenever the studio is in use. If you don’t do daily newscasts, can it be a backup for the equipment area?

As the northern hemisphere moves into summer, HVAC may be the primary concern, but don’t forget about power. If you are short on HVAC, do you have enough power capacity to add another cooling unit? Is it on the UPS/generator? How will you upgrade your power? This is another time for a local professional, it may also be the time to get the local municipality involved. The same applies to corporate management.

One way or another you have some responsibilities. Maybe you have a five year plan, maybe you need to support corporate’s five year plan. Maybe you have the responsibility to modernize an old facility. Maybe you are one of the lucky few that gets to design and build a new facility. Any way you look at it, you need an understanding of what your facility is capable of supporting and how much of that capacity you are using. The last thing you want is put in that new server farm over the winter, and have it meltdown the first time outside temperatures exceed 90 degrees and your AC can’t keep up.

Posted in Basics, General | Leave a comment