Rolling out software is a tricky business. Any change, whether a major software release or a security patch, runs the risk of impacting user productivity. In order to manage this risk, IT administrators often choose to expose software changes to their estate in incremental stages.
The number of release stages IT administrators mandate for software rollouts however will often vary, even within one organisation. For example, a critical windows update might move straight to full rollout without even one simple test, whereas a Java update that's required for critical business systems access might be advanced through multiple stages.
To give you an idea of what this can look like, below I've detailed the 4 release stages we often apply to our software rollouts.
- Test
This stage releases the software solely to a few test machines. These machines are usually virtual, and are representative of the most common baseline software configuration. This stage represents the most basic quality control gateway for deploying the package.
- Initial Pilot
This stage exposes the software package to small pool of local IT staff. In our environment, this sample represents a fraction of 1% of our computer estate. The aim here is to catch those initial 'gotchas' using a pool of technically savvy users with 'standard' configurations. The confidence level in a successful rollout is medium. The combination of small sample size and user base however means that the impact of a failed pilot is relatively low and rollback is relatively painless.
- Extended Pilot
This stage extends the pilot to include machines which are more representative of the target estate. Here the confidence level in a successful rollout is high, and rollback becomes user impacting. In our environment, this sample represents perhaps 5% of our computer estate
- Full Rollout
All machines are now targeted with the software rollout. The confidence level in a successful rollout is very high, and the impact of rollback in the event of an issue emerging is high.
Over the years this system worked pretty smoothly for us with the exception of one niggle; the management overhead of the Extended Pilot group. Initially this pilot group consisted mostly of volunteers who were selected to be representative of our estate. When our customer base was small, the manual processes in keeping abreast of staff and PC changes was manageable. However as our customer base grew and became increasingly distributed, the process of reinvigorating the group to keep it from going stale became a major painpoint.
Eventually, I realised this pilot group in it's current form was no longer fit for purpose, and a rethink was required.
Extended Pilot Group Criteria
For the re-think, I went back to basics. What was it I ultimately wanted? Well, ideally I'd want a pilot filter which was maintenance-free and full of active computers. This would enable me to reliably pilot managed software deliveries for Java Run-time Environment rollouts, or IE upgrades etc.
This meant my group membership had to follow three basic criteria,
- Psuedo-random - so that it was representative of the computer population
- Dynamic - to automatically exclude retired or inactive machines
- Scope - to restrict the member total to a percentage of our computer estate
Construction
To figure this out I used the tools available to me; Google and critically the Symantec Connect community.
I decided a straightforward way of picking computers consistently in a pseudo-random manner was to use the computer's GUID. The GUID is randomly assigned to each computer when it first checks in to the SMP. So if I were to create a filter of my computers ordered by GUID, the top 10 would represent a consistent and pseudo-random subset.
Next, to eliminate inactive machines I should restrict membership to computers which were regularly checking in. To obtain recently-checked-in computers within ‘n’ days, I resorted to copying and pasting a chunk from SQL previously written by my colleague Ian Atkin;
SELECT guid FROM vComputer vc join resourceupdatesummary rus on vc.guid = rus.resourceguid AND rus.inventoryclassguid = 'C74002B6-C7B9-47BB-A5D6-3031AF73BB8D' WHERE and Datediff(dd,rus.[modifieddate],Getdate()) <= 7
This T-SQL provides the a nice list of computers which have checked-in within the last 7 days.
The question now is how to build in my last criteria -making the membership scale. That initially seemed simple -use the T-SQL 'SELECT TOP' to limit the returned rows to the top 5%.
So testing began, and initially it worked beautifully. But it seemed there was a bug in the SMP membership update processing which meant that the total returned just grew and grew each time the filter was automatically refreshed as a result of being in a live policy target. I looked to the Symantec Connect community[1] and being amazing as they are, I got a result: use NTILE
I had never heard of NTILE before, and although ‘ericg2’ had given me sample usage, I needed to read up on what it did to figure out how to crowbar it into my SQL. If you haven’t seen this before it’s very well explained all over Googleland, but suffice to say it splits the results into ‘n’ percentiles and you can select just one of them. I wanted about 5% so I selected the first of 20 ‘NTILE’s.
The SQL
Anyway, enough rambling, here is the finished crafted SQL.
select vc.guid from vcomputer vc join ( select ntile(20) over (order by guid) AS "ntile", guid from ----- ntile 20 is about 5% (SELECT guid FROM vComputer vc join resourceupdatesummary rus on vc.guid = rus.resourceguid AND rus.inventoryclassguid = 'C74002B6-C7B9-47BB-A5D6-3031AF73BB8D' WHERE Datediff(dd,rus.[modifieddate],Getdate()) <= 7 ) xxx ) "grp" on vc.Guid=grp.Guid where grp.ntile=1
Console View
Variants
It is simple to change the quantity of computers: just change the NTILE number. We also have another pilot filter that contains 50% of computers that checked in the last 7 days. The NTILE difference is:
select ntile(2) over (order by guid) AS "ntile", guid from ----- ntile 2 is about 50%
It's probably easiest to think in terms of fractions: ntile 2 = 1/2 = a half. Our 5% is ntile 20 = 1/20 = one twentieth.
Advantages and Disadvantages
Before I leave this, I should point out that whilst this approach resolves most of our previous issues, it isn't perfect.
Pros
- Maintenance free - as your estate grows so will the number of computers returned by a 5%-active query. You might want to play with this figure in your environment to find a percentage that you are comfortable with.
- Fire and forget: any policy with this filter in its target will get a good set of results back. I have confidence that around 100 computers will definitely check in and get my policy.
- Self-renewing: if someone is holiday their computer will drop out of the filter after 7 days. So the filter is always (nearly) full of current computers.
- Consistent membership - mostly: because the query sorts by GUID then the top 5% of computers will mostly be predictable providing they keep checking in.
Cons
- Inconsistent membership: clearly this contradicts the last 'pro' above, but it is important to note that this is dynamic and the exact computers will change pretty-much every day. A few stragglers will drop out and be replaced regularly. You may get a computer that is targeted by the filter, but it then drops out. This means that you can be left with a computer but not know it got the update or software etc. because it won't show up in the compliance view for the policy. As long as this is kept in mind though, there should not be any real surprises about this. We use central logging for all our deliveries so we can see from there which computers actually got targeted.
- Extra work for rollbacks: in the event of requiring a rollback of your pilot, you can't just apply the rollback script to the same target containing the pilot filter because the computers change. So the target for the rollback policy will have to specifically target the computers based on something that the pilot did, like an inventory item added - for example an add / remove programs entry.
Hope this provides some food for thought for you all out there. Happy Piloting!
[1] http://www.symantec.com/connect/forums/altiris-75-filter-sql-very-strange-behaviour
Darren Collins
Applications Packaging and Deployment for IT Services,
Oxford University, UK.