Yes, in what can be described as some sort of Backup Exec miracle, it has managed to run for 72 hours with 100% of jobs completing successfully across multiple servers via CASO. Only took 5 years of fiddling to get it to run smoothly for a bit, what a result.
I still really don't like Backup Exec much.
Just some Sysadmin's view of the world of Backups for Small/Medium Businesses using Backup Exec and Microsoft Data Protection Manager. Experiences, tips, problems, rants and ideas. We eventually gave up with Backup Exec, so while this was "Backup Exec Hell - The Daily Torture of making Backup Exec 10d, 12d and 12.5 work..." it's now "The Joy of Microsoft DPM. Although it isn't perfect, it's a damn sight better.
Showing posts with label caso. Show all posts
Showing posts with label caso. Show all posts
Saturday, 23 October 2010
Wednesday, 6 October 2010
Follow your own advice...
Recently I had an issue.
CASO couldn't talk to all of the managed media servers (sort of making them unmanaged then...) Much muttering and fiddling later and nothing.
So I try and think how to describe the exact issue, and google for it (naturally you google last after messing around because that would be akin otherwise to actually reading the manual that comes with stuff you buy)...
First match... er, this blog.
I refer myself to my own sodding blog to fix an issue. The post in question, June 8th 2009...
CASO couldn't talk to all of the managed media servers (sort of making them unmanaged then...) Much muttering and fiddling later and nothing.
So I try and think how to describe the exact issue, and google for it (naturally you google last after messing around because that would be akin otherwise to actually reading the manual that comes with stuff you buy)...
First match... er, this blog.
I refer myself to my own sodding blog to fix an issue. The post in question, June 8th 2009...
Tuesday, 10 November 2009
Reporting in Backup Exec ... no more pain and misery...
If, like me you manage a largeish Backup Exec installation with several media servers, hundreds of backups and lots of clients, you'll probably be pretty frustrated with the half assed nature of the Backup Exec logging and reporting capabilities.
For a long time, I've wanted a simple, but powerful way to do things like "show me backups that are consistently failing over 'x' period, or show me the most likely time of day for backup jobs to fail etc.
So, having looked everywhere and found no sane solution, I've just started writing one. Now I have a great little interface where I can review my backups, see what jobs are failing constantly, review the issue, fix it and then mark it as resolved so it can start being checked again.
I'm thinking of adding lots of features and eventually making it something I can sell for a reasonable (read: not outrageous) fee to others who feel the pain...
Any suggestions welcomed...
For a long time, I've wanted a simple, but powerful way to do things like "show me backups that are consistently failing over 'x' period, or show me the most likely time of day for backup jobs to fail etc.
So, having looked everywhere and found no sane solution, I've just started writing one. Now I have a great little interface where I can review my backups, see what jobs are failing constantly, review the issue, fix it and then mark it as resolved so it can start being checked again.
I'm thinking of adding lots of features and eventually making it something I can sell for a reasonable (read: not outrageous) fee to others who feel the pain...
Any suggestions welcomed...
Monday, 8 June 2009
So it just died....
About a week ago our Backup Exec CASO box decided it had had enough of talking to Managed Media Servers. Randomly declaring known working servers to be "unavailable".
The usual checks started, and nothing. Patches checked, removed etc. Nothing.
The solution. Search for any msgq*.dat files on your servers. Delete them.
Voila. Everything works... I'm not happy...
The usual checks started, and nothing. Patches checked, removed etc. Nothing.
The solution. Search for any msgq*.dat files on your servers. Delete them.
Voila. Everything works... I'm not happy...
Friday, 31 October 2008
Why are error messages not unique
That's what I want to know.
Why is it that you get an error message in Backup Exec, and it spits out an error for you, and so you click on it, which takes you to a web page with a description of that error, right?
Wrong. With Symantec Backup Exec, it just takes you to a list of issues which may or may not be remotely close to what you have issues with, and rarely has any useful answer.
Here I am today again trying to find out why certain jobs keep failing without any sort of sane error reason. Another day fighting Backup Exec.
Why is it that you get an error message in Backup Exec, and it spits out an error for you, and so you click on it, which takes you to a web page with a description of that error, right?
Wrong. With Symantec Backup Exec, it just takes you to a list of issues which may or may not be remotely close to what you have issues with, and rarely has any useful answer.
Here I am today again trying to find out why certain jobs keep failing without any sort of sane error reason. Another day fighting Backup Exec.
Labels:
caso,
communication failures,
error messages
Monday, 14 January 2008
Reliability? Can we say we're onto Stage 2?
I'm rather scared, and pleased, to say that we appear to have genuinely made it work - in my last few posts I was still sceptical that Backup Exec was just being nice to us, but it does appear it is genuinely now acting like a competent backup product.
One server has now been up for 27 days, and is still running backups quite happily, with 2,000 odd jobs run in that time (66 a day or so), and the CASO server just getting on with it's jobs and delegation. No more tears.
Our CPS system is still working - that's the most flawless part of Backup Exec I've seen. It's absolutely awesome - it says continuous protection, and it offers just that. We installed it, got over one hurdle of making it listen on a specific IP, and that was that. Our main file servers have just-below-real-time backups 24/7.
Our next challenge (Stage 2, which took 18 months to get to!) is to build a comprehensive reporting and restore testing infrastructure around the software. We want to know everything possible about what it does, so we can provide internal quality reports, ensure we meet SLAs and finally, but not least, ensure customers can be assured of regular, reliable backups.
Restore testing is often overlooked, but certainly not here! It is an essential, and core part of our plan to operate regular (hopefully scheduled) restore tests so we can be sure those backups actually work - as someone else we know found out to great cost - backing up isn't enough!
One server has now been up for 27 days, and is still running backups quite happily, with 2,000 odd jobs run in that time (66 a day or so), and the CASO server just getting on with it's jobs and delegation. No more tears.
Our CPS system is still working - that's the most flawless part of Backup Exec I've seen. It's absolutely awesome - it says continuous protection, and it offers just that. We installed it, got over one hurdle of making it listen on a specific IP, and that was that. Our main file servers have just-below-real-time backups 24/7.
Our next challenge (Stage 2, which took 18 months to get to!) is to build a comprehensive reporting and restore testing infrastructure around the software. We want to know everything possible about what it does, so we can provide internal quality reports, ensure we meet SLAs and finally, but not least, ensure customers can be assured of regular, reliable backups.
Restore testing is often overlooked, but certainly not here! It is an essential, and core part of our plan to operate regular (hopefully scheduled) restore tests so we can be sure those backups actually work - as someone else we know found out to great cost - backing up isn't enough!
Labels:
caso,
continuous protection,
restore testing
Friday, 23 November 2007
And now, for CPS...
Something truely amazing has started happening, as Backup Exec is *still* running, 6 days in, and it looks like we may have turned a corner with it in terms of making it think, act and work, like a competent backup solution.
As a result we've been able to look at other things - since my colleague tried the "CPS" or Continuous Protection Server part of Backup Exec some time ago, we found it was pretty good stuff - it worked straight out of the box, so we figured we'd give it a try - after all it would be more than a little useful if we could just replicate data from one site to another as part of a backup system, as it would boost our protection against failure of one of our key servers.
Installation of the CPS Server was easy, and worked first time. However, getting it to talk to our remote server was a little more difficult, as it's on a different subnet, and, as a bonus, the CPS Server also happens to have multiple NICs on different networks. To annoy us, CPS decided to pick the wrong IP to bind to (and there's no indication it will do this, nor any GUI to choose it).
It's OK though, easily fixed - with a famous registry edit to set a "PreferredAddress" on the CPS Server, and, on the CPS Machine being copied, changing it's "Gateway" address for the Veritas Software to be the IP of the CPS machine and not it's host name (even though it resolves to the same).
Right now we have a working CPS job doing the initial copy of our data, after which it should continuously protect... rock on.
As a result we've been able to look at other things - since my colleague tried the "CPS" or Continuous Protection Server part of Backup Exec some time ago, we found it was pretty good stuff - it worked straight out of the box, so we figured we'd give it a try - after all it would be more than a little useful if we could just replicate data from one site to another as part of a backup system, as it would boost our protection against failure of one of our key servers.
Installation of the CPS Server was easy, and worked first time. However, getting it to talk to our remote server was a little more difficult, as it's on a different subnet, and, as a bonus, the CPS Server also happens to have multiple NICs on different networks. To annoy us, CPS decided to pick the wrong IP to bind to (and there's no indication it will do this, nor any GUI to choose it).
It's OK though, easily fixed - with a famous registry edit to set a "PreferredAddress" on the CPS Server, and, on the CPS Machine being copied, changing it's "Gateway" address for the Veritas Software to be the IP of the CPS machine and not it's host name (even though it resolves to the same).
Right now we have a working CPS job doing the initial copy of our data, after which it should continuously protect... rock on.
Tuesday, 20 November 2007
I'm still in a dream
I must be. Because we've still got 3 working Backup Exec Media Servers, CASO is working, backups are working, and, with a couple exceptions they all run.
We've got 2 troublesome jobs which fail as they just have issues spitting the files out to the Media Servers and we'll work on those, and one that just fails on System State every day despite Windows itself being able to back it up, Backup Exec insists that "a failure occurred reading an object" and later claims that "shadow?copy?components" is a corrupt file.
So right now, we've had 3 days and 14 hours of running backups - you know we may even get to run test restores because it's not crashed.
We've got 2 troublesome jobs which fail as they just have issues spitting the files out to the Media Servers and we'll work on those, and one that just fails on System State every day despite Windows itself being able to back it up, Backup Exec insists that "a failure occurred reading an object" and later claims that "shadow?copy?components" is a corrupt file.
So right now, we've had 3 days and 14 hours of running backups - you know we may even get to run test restores because it's not crashed.
Sunday, 18 November 2007
Did we hit April 1st? What has been done with the Real Backup Exec?
In a brave move, given the near 3 day successful run with Backup Exec over last week, and given that I was more interested in seeing Bill Bailey live over the weekend than staring into the "Alerts" list on Backup Exec, I figured we'd just see if it could cope again.
Amazingly, it has. So far. It's now late Sunday evening but everything is running, and, to top it I've got 3 media servers online, running, and doing what they're designed for. We've even found time to expand the storage on the smallest of the servers from around 1Tb to 3Tb ready for when it starts to get a real workload.
Amazingly, it has. So far. It's now late Sunday evening but everything is running, and, to top it I've got 3 media servers online, running, and doing what they're designed for. We've even found time to expand the storage on the smallest of the servers from around 1Tb to 3Tb ready for when it starts to get a real workload.
Thursday, 15 November 2007
"d" is for "tape"
Something is wrong. Very wrong. Backup Exec is still working. That's over a day. It's not crashed or had a funny 10 minutes where it doesn't work. None of this "server paused" rubbish. Still meanwhile in the world of the admin's maintaining it, we've had a good old debate about how the whole thing is supposed to work, how symantec interpret "tape" backups when they're actually disk backups.
The only thing we did resolve is that the "d" in Backup Exec 10d means "tape". Yes. it means Tape. Which is probably how Symantec would have preferred we kept things.
The only thing we did resolve is that the "d" in Backup Exec 10d means "tape". Yes. it means Tape. Which is probably how Symantec would have preferred we kept things.
Wednesday, 14 November 2007
A new morning, a new failure...
It's just another average morning, a little nippy outside, terrible morning TV still exists (when will it die?) and I've got a raft of Backup Exec failures on my hands.
It seems that one of our sites has a condition I'll call "fussity", whereby the CASO server will submit it the jobs it's supposed to do, and, for about 90% of them, it'll just spit the dummy. Throwing "loading media" at you for a random period (could be 2 mins, could be 20 mins) eventually it gives up and fails the job. Only for the next one to work. And then next 9 to fail, and so on.... the last time we had a case of "fussity" the only way to stop it being so picky about jobs was to completely uninstall, rename the server, reinstall, reservice pack, and then re-create your devices, media, jobs and so on. I'm hoping to avoid it this time.
Meanwhile the main site figured it wasn't in the mood initially last night until around 8:30pm when I rebooted it. It just kept "recovering" it's own jobs. Good stuff.
All in a day's admin of Backup Exec.
It seems that one of our sites has a condition I'll call "fussity", whereby the CASO server will submit it the jobs it's supposed to do, and, for about 90% of them, it'll just spit the dummy. Throwing "loading media" at you for a random period (could be 2 mins, could be 20 mins) eventually it gives up and fails the job. Only for the next one to work. And then next 9 to fail, and so on.... the last time we had a case of "fussity" the only way to stop it being so picky about jobs was to completely uninstall, rename the server, reinstall, reservice pack, and then re-create your devices, media, jobs and so on. I'm hoping to avoid it this time.
Meanwhile the main site figured it wasn't in the mood initially last night until around 8:30pm when I rebooted it. It just kept "recovering" it's own jobs. Good stuff.
All in a day's admin of Backup Exec.
Tuesday, 13 November 2007
A New Strategy
I'm going to try a new strategy. We're going to create new jobs, one by one, for the servers. Each job will backup using old style Full/Incremental jobs, to the local server with basic 4 week rotation. Not even synthetics now, we can kiss goodbye to even more space. Thanks Backup Exec!
Let's see in a few days if that works either (see the lifecycle post below) - I'm at the "hope" stage again here...)
Let's see in a few days if that works either (see the lifecycle post below) - I'm at the "hope" stage again here...)
Queued. You mean "lost the plot"
Can we have a drum roll please....? The reason for our lovely "queued" status today is that at some point where the CASO box rolled onto it's belly and refused to play nicely with all the other devices, it managed to leave a "piece" of "media" in use in most of the Backup to Disk folders. End result - the "maximum concurrent jobs" limit has been reached.
The only resolve that seems to work is completely restarting the services (if you just restart Device and Media then the box tends to give up and never run a backup again). However, the trouble with this theory is that there are a couple of jobs running I want to finish first.
We could of course up the concurrent jobs on each device but it gets a bit tedious moving the limits up and down every day.
Really of course, it should JUST BLOODY WORK.
The only resolve that seems to work is completely restarting the services (if you just restart Device and Media then the box tends to give up and never run a backup again). However, the trouble with this theory is that there are a couple of jobs running I want to finish first.
We could of course up the concurrent jobs on each device but it gets a bit tedious moving the limits up and down every day.
Really of course, it should JUST BLOODY WORK.
I'm queuing in the rain...
I woke up this morning half expecting Backup Exec not to be working, and, as usual, it isn't. There are 2 possible causes for this:
a) My colleagues theory that Backup Exec works as long as you are physically staring at the Job Monitor, is in fact, completely true, and therefore because last night I figured I had other things to do, didn't watch it, and thus must now suffer it's failures.
or
b) (and this is my most likely theory) it's just a crock of craaaaaaap.
I'm now staring at the screen, and, in typical fashion, with no useful or meaningful reason it's just "Queued" for about 15 jobs, all of which have been sitting there for 9 hours. Just queued. No alerts, no reasons. But there is one job running, that's been running for 6 hours, and is slowly very slowly) notching up the byte count.
I am going to completely lose the will to live with this software soon.
a) My colleagues theory that Backup Exec works as long as you are physically staring at the Job Monitor, is in fact, completely true, and therefore because last night I figured I had other things to do, didn't watch it, and thus must now suffer it's failures.
or
b) (and this is my most likely theory) it's just a crock of craaaaaaap.
I'm now staring at the screen, and, in typical fashion, with no useful or meaningful reason it's just "Queued" for about 15 jobs, all of which have been sitting there for 9 hours. Just queued. No alerts, no reasons. But there is one job running, that's been running for 6 hours, and is slowly very slowly) notching up the byte count.
I am going to completely lose the will to live with this software soon.
Monday, 12 November 2007
More on "Loading Media"...
The earlier problem with "Loading Media" appears to be a false alarm - while the CASO server (the one which is supposed to mean we can manage things centrally without needing to constantly login to all the other boxes) swears blind the job status is "Loading Media", logging onto the actual media server reveals it's working away backing up quite happily.
Yet again the CASO option proves it's worthless.
Yet again the CASO option proves it's worthless.
Sunday, 11 November 2007
And so it begins...
So here we are with a completely failing Backup Exec setup. As usual, it's having random fits and failures. This weekend was going quite well, or rather, "well" in backup exec terms.
We had 1 out of 3 media servers running, but the one that was running was happily dealing with jobs it was assigned, until 16:30, and then, for no reason at all as far as we can see, it did the usual "Server Paused" thing. You know, the one where it dumps jobs you've had running for hours for no real reason, and just sits like a lame duck until you reboot it (because nothing else works).
Meanwhile box number 2 is still in "paused" status via the CASO tool, because hell, asking is to be enabled causes jobs to just be sent into an infinite loop of "Queued" and "Ready, On Hold" mixed with a load of failures. Good stuff Symantec.
Box number 3 however is really screwed. We've ended up having to reinstall and then when that failed, uninstall, and reinstall from scratch. Of course, that's really upset things, because despite following the instructions, the orphaned "managed media server" is now stuck on the CASO box, as is the new reinstalled one of the same, and all the non-existant "Backup to Disk" drives are too, you know, for good measure which it steadfastly refuses to remove.
No point calling Symantec for support, they never actually have an answer unless it's the most basic of problems.
So what shall we do?
I'm going to uninstall box 2 completely, rename the box on the network so it can't be confused anymore, reinstall, service pack 4 Backup Exec 10d and then configure it - once that's done I'll try a backup job, and if that runs (yeah, because it's so likely to work first time...) I'll try putting it onto managed media server mode so we can actually use it properly.
This is the first post here, but, inspired by a guy ( see http://y33dave.wordpress.com/%20) who eventually ditched Backup Exec for CommVault who maintained a blog of the miserable journey he went on we'll be keeping this Backup Exec Hell blog going...
We had 1 out of 3 media servers running, but the one that was running was happily dealing with jobs it was assigned, until 16:30, and then, for no reason at all as far as we can see, it did the usual "Server Paused" thing. You know, the one where it dumps jobs you've had running for hours for no real reason, and just sits like a lame duck until you reboot it (because nothing else works).
Meanwhile box number 2 is still in "paused" status via the CASO tool, because hell, asking is to be enabled causes jobs to just be sent into an infinite loop of "Queued" and "Ready, On Hold" mixed with a load of failures. Good stuff Symantec.
Box number 3 however is really screwed. We've ended up having to reinstall and then when that failed, uninstall, and reinstall from scratch. Of course, that's really upset things, because despite following the instructions, the orphaned "managed media server" is now stuck on the CASO box, as is the new reinstalled one of the same, and all the non-existant "Backup to Disk" drives are too, you know, for good measure which it steadfastly refuses to remove.
No point calling Symantec for support, they never actually have an answer unless it's the most basic of problems.
So what shall we do?
I'm going to uninstall box 2 completely, rename the box on the network so it can't be confused anymore, reinstall, service pack 4 Backup Exec 10d and then configure it - once that's done I'll try a backup job, and if that runs (yeah, because it's so likely to work first time...) I'll try putting it onto managed media server mode so we can actually use it properly.
This is the first post here, but, inspired by a guy ( see http://y33dave.wordpress.com/%20) who eventually ditched Backup Exec for CommVault who maintained a blog of the miserable journey he went on we'll be keeping this Backup Exec Hell blog going...
Subscribe to:
Posts (Atom)