Over several postings, we have discussed logs and errors. I, like many, often find digging through logs to be tedious. However, it is a necessary evil. Despite what many tell you, IP/Network/computer-based systems do not operate independently. An error in one system can ‘infect’ another, supposedly independent system. Often errors will propagate downstream, but to my way of thinking, a bad SDI signal in a router layer never affected the other SDI streams in a router. With software, and software-based hardware that is not always true.
Code is code. Well-written code performs better than poorly written code, but in the end, handling errors chews up processor time. The more time spent handling errors, and writing messages to the log, the less time there is to do things correctly. Think of it this way: Take your average IP grooming device and run the maximum number of signals through it. One groomer I have worked with recently allows 936Mb/s on its GigE port. Allowing for some overhead, let’s put 180 4.5Mb/s signals in and allow each signal just over 10% additional bandwidth for processing (500kb/s). That gives us an input datarate of 810Mb/s and an output rate of 900Mb/s. With that many signals running through the device, the processor is keeping busy. Incoming packets need to be examined, groomed if needed and output in a very consistent manner such that errors and jitter are not introduced.
If one of the signals is non-existent or creating errors, the processor needs to work harder to handle and log the error. Now, instead of one bad signal, make it twenty, or thirty, or even one hundred. Each new error adds to the processor’s workload. Each increases the likelihood that a ‘good’ signal will get stepped on because the processor was working harder than necessary. Granted, in well-designed equipment, none of these things ‘should’ happen, but in reality, software is written by people, and things get into the code that result in issues.
The bottom line is the fewer errors in the system, the fewer problems you will have. If you are not looking, you will not find them until it is too late—when the system comes crashing down. Having support contracts for today’s systems is important. Proper installation and maintenance is also important. When you purchase and install a new piece of equipment, as part of the checkout, have the logs analyzed. This accomplishes several things. First, going in and getting the logs, or watching tech support do it, provides you with the location of the logs. Second, the first round of logs gives you a baseline to work from. Third, it may reveal some things about your system that you should be aware of. For instance, what if, on the new device you set an input stream at 4.5Mb/s, knowing your constant bit rate (CBR) encoder is running at 4.5Mb/s, and the new device is constantly giving you buffer overflow errors? The first thought may be that the thresholds are too tight, but if they are set at 500kb/s that should provide more than enough room. Further investigation may reveal that despite the label in the GUI, that rate is for video rather than transport rate. When the audio is added in, the overall transport rate gets very close to the upper limit (5Mb/s). It is likely an easy fix, simply adjust the rates as needed, but something that could have easily caused errors at each and every device downstream.
So, where and how do you start? My first recommendation is to start at both ends and work towards the middle. Get a baseline of the first device in the chain as well as the last. My guess is that errors in device #1 are propagating through the system and as you address them, you will see the number of errors at the end decrease. Use tools such as grep to parse logs looking for ‘ERR’ or other easily identifiable keywords. Capture and save the logs for reference later. If you are into scripting, much of this can be automated and sent to you via daily emails. Regardless of how, the point is you need to do this if you intend to be successful in this new world of IP based systems. Good Luck!