Avoid wind and welds for successful manufacturing
I've been intermittently tied up with two different challenges over recent months. Both were concerned with industrial quality issues but, despite that superficial semblance of a link, they couldn't have been more different. One was in a large plant, part of a regional economic development plan with government start-up funding; the other concerned a small village co-operative. One could offer me live access to its own data stores. The other had no process records or quality assurance expertise of any kind, until I started to design some; no telecommunications either, and the staff would have to do most of the work themselves under my intermittent direction. Both were concerned with manufacture of containers: milled steel in one case; fired clay in the other. Losses in the first, though so eye-wateringly large that my contingency fee was a pinprick, were being covered by deficit funding in the interest of public relations; those in the other were tiny, but would quickly cripple the co-operative, if not stemmed.
The two cases called for different approaches to a similar question: what production factors led to delayed fracture faults in a significant minority of the finished units?
I had encountered a detective-style case similar in some ways to the milled steel problem a couple of years ago, putting a review copy of Statistica Data miner release 6 to good use. I learnt a lot on that job, and applied the knowledge to the current concerns. It also happened, by a happy coincidence, that Statistica release 7 had just come to hand as this current pair of challenges hove into sight - so I earmarked it immediately for the larger-scale plant. The small village enterprise, though, called for a very different approach: something that could be applied by inexperienced and untrained users, so that every stage didn't have to wait on communications back to me. I had a copy of SigmaStat 3.0 lying around, which had impressed me by the ease with which it was accessed by novice users during a trial run in another industrial context; Systat International advised that a new 3.1 update was available, and sent a copy of for use in the co-op.
- In the background, at top and left, Statistica 7.0 signposts first steps on opening. SigmaStat 3.1 shows its simple, straightforward layout, including the project manager pane, foreground at bottom right.
Statistica 6.0 appeared in serial instalments - the base product first, with greyed-out menu options pointing to later additions, such as quality control charts, the data miner, and so on. With the exception (in my copy, anyway) of sequence, association and link analysis, 7.0 sprang fully formed from ... well, from whatever mythologically metaphorical source software springs fully formed. In particular, the Statistica Data Miner (SDM) is there, including the new Text Miner (STM). I have been (as anyone who reads my reviews will be wearily aware by now) a reluctant convert to data mining; but converted I am, since it allows a lot of ground to be opened up to intuition, very quickly generating a useful volume of possibilities for closer examination by more traditional means. Last time round, with version 6, I had only the data miner itself and so only formal quantitative or qualitative data tables were easily available. This time, with STM, I was able to trawl for clues through records such as memos, reports and so on - in fact, almost anything on the plant network - and that was, in the event, how one vital piece of the jigsaw was found.
I'm a great believer in horses for courses, and SigmaStat, a very different breed from Statistica, was the appropriate choice for the ceramics group. Even quality-assurance professionals are not always entirely happy with statistical analysis; people with no background in quality control beyond a quick visual assessment of the finished product, even less so. Friendly is an over-used description for software, but if any analytic product deserves it, then SigmaStat does. Try, for example, doing a t-test on unsuitable data that fails the normality test. SigmaStat neither pushes on regardless to a meaningless answer, nor primly displays a minimal error message. Instead, it calmly mentions the problem and asks if you would like to run a Mann-Whitney rank sum test instead? All you have to do is say 'yes'... you get a Mann-Whitney result, some helpful background information, and a plain English explanation of what it means. Everything is presented in a neat, clear, rich-text report, ready for printing out on half a sheet of paper. Better still is another equally helpful half sheet on why that t-test wasn't a good idea, so you can learn for next time. Nor would my potters have reached that attempt on a t-test unaided. The 'Statistics' menu, instead of the usual maze of statspeak, is organised along task-oriented lines. The first decision is not between tests named after long-dead researchers they last heard of heard of in college (if at all), but things they might want to do. Pick from headings like 'describe data', 'compare two groups', or 'before and after'. I don't want to suggest that the user never has to do anything. This isn't nursery school; they have to come up against the nitty-gritty eventually, but they will at least arrive at the task though sensible guidance, with helpful advice at their side, and an appropriately selected toolkit to hand. All of this helped to maximise their autonomy, increasing their ability to manage phases between my necessary interventions.
- Composite showing stages in a random forest generation by Statistica. From the data sheet (background, top left) process continues across the frame foreground from left lower centre to right upper centre, leading to the tree screen (background bottom right).
Usability, albeit for a different market, is also a feature of Statistica - and has gained new dimensions in release 7: the new 'by group' facility, for instance; project organisation; extension of spreadsheet behaviour, and metadata provision. Always very flexibly configurable, the working environment becomes even more so.
'By group' enables easy, automated generation of graphical or analytic results for categorised subsets. Selecting the variables 'supervisor' and 'lathe' for by-group use, for example, causes the desired analysis to be repeated for data every combination of lathe and supervisor: data for lathe 1 under supervision by A; for lathe 1 under supervision by B; lathe 2 under supervision by A; and so on. Not a startling new idea, but effective streamlining of an existing one: systematic compilation of an overview is much quicker and much less tedious, encouraging thoroughness. Extended brushing facilities and enhanced operational access to text operations (including formula and case handling) offer related gains; so do further enhanced sheet management options. A close and methodical control of process is also encouraged by the project organisation tools, most of which will be familiar in essence but are no less welcome for that. Automatic updating of linked visualisations to reflect changes in the data set, for instance, is part of a move to make the data sheet more like generic spreadsheets. Last but by no means least, comes metadata on analyses, cases, variables and visualisations - property information stored alongside the particular object and available to a range of selection and formatting operations. In my particular case, the primary use of metadata was to flag, track, and utilise through subsequent explorations those subsets identified as possible but unconfirmed error zones; another modality added to the data sleuth's psychoperceptual toolkit.
SigmaStat's data worksheet also invites comment. It is one of the most intuitive around, for the majority whose operational norms have been shaped by MS Office. Sticking with organisational issues, there is also a welcome new SigmaStat 'Notebook Manager' with the now familiar explorer tree giving windowed access to open notebooks, graph, worksheets, and reports. There are spreadsheet improvements, too, most of them contributing to a more Excel-like environment and greater, more intuitive flexibility. Transforms are fast, effective and easy to use - from milling and failure dates to a survival analysis report took me just eight mouse clicks, mostly on column headings. If you prefer to stick with the real thing, as my co-op users did, an Excel sheet (blank or full of existing data), can be opened inside SigmaStat, but the methodological advantages of using SigmaStat's own sheet are considerable. It is well designed, statistically aware and, at 32 million rows by 32 thousand columns, offers greater capacity.
User interface aspects aside, Statistica significantly extends the actual armoury on offer. First up for trial was the nonlinear iterative partial least-squares (NIPALS) algorithm which, to a useful extent, tames dimensionality in principal component and partial least-squares analyses. A number of graphical approaches (including, but by no means restricted to, a good repertoire of quality-control charts invaluable for the case at hand) are integrated with the analytics, giving a multi textured perceptual space within which to explore. The aim is to simplify the ferreting out of significant patterns within complexly multivariate data sets; this 'scalability', allied to onboard cross validation options and the simplified handling which results from combination of methods, saved a lot of time and paracetamol. After that came the 'miner' aspects (SDM, text miner, quality control miner) of an impressive Random Forests module - this, again, being very welcome in dealing with large, complex, multivariate data arrays. All aspects can be user-controlled including tree complexity, forest population, run limits, independent testing sample for predictive validity, and so on, although intelligent defaults step in if required. There is, as in other modules, heavy graphical support, and growing large numbers of trees from a large dataset is a surprisingly speedy process, delivering heavy duty information with great efficiency. Random forests and NIPALS, in collusion, unearthed several of the clues which led to eventual tracking down of the steel fractures problem.
- Multiple views of SigmaStat 3.1 in use. At top left, the task oriented statistics menu with t-test selected (the stepped setup of a graph is partially obscured.) Around the top right corner are the stages of setting up an ill-advised t-test, with the advisory response. Centre and lower right illustrates, through the format menu, the intuitively 'Excelesque' behaviour of the worksheet.
SigmaStat has put function expansion aside in this incremental release, but the previous full digit upgrade to 3.0 brought a survival analysis kit which is very capable given the intended audience. Since the co-op replaced every fractured pot, we were able to put this module to good use in quantifying the level of the problem. By introducing a serial numbering system, linking each pot to data about its entire history back to the digging of its clay, we were also able to make use of this returns and replacement policy to make a very accurate and complete population census of both fractured and non-fractured units.
A big difference between the two products, of course, is automation. SigmaStat is designed to be flown manually; it has a transform function language (extended in this release), but not a macro or control system. This was entirely in line with what the co-op needed: automation would have been neither feasible nor useful in an environment where all other activity was fluid. Statistica, on the other hand, has the impressive Visual Basic superset dialect, which replaced two previous proprietary languages a while back - and this too has been enhanced.
Import options vary between the two products, but each is appropriate to its market. Both have matching output options including good, easy-to-use reporting. It has for a long time been possible to produce paper reports straight from Statistica, without any other presentation software, particularly for internal use, and SigmaStat is also more than capable of delivering at that level. Both can now output PDF as well as HTML. While I have to say that I personally prefer a separate page-assembly stage in an appropriate specialised editor before producing final copy for publication, it's a useful facility for quick and easy sharing of in-line results. Both products make export to page layout a trivial task. A minor touch, which nevertheless gives me a disproportionate amount of quiet joy, is SigmaStat's willingness to set page sizes (US letter by factory default) in SI millimetres instead of the usual Hobson's choice of inches or centimetres. Graphs, like statistical methods, are much better handled by either product than by Excel. Point, click, wave your wand, and hey presto - selective subranges, comparative multiple plots, true histograms, the lot. If you have the latest release of its sibling product SigmaPlot (of which more in a future issue) on the same machine, another mouse-click in SigmaStat will call it up as a more advanced editor for the technical graphics that SigmaStat has generated for you. SigmaStat graphs, like the textual and statistical material, are page-oriented and can be moved straight into your final report; Statistica is similarly accommodated.
And the causes of the fractures? On the milled steel line, Statistica identified the indicators as a particularly obscure combination of milling machine sequence, output delivery rate, bar replacement time, time of day, and temperature. From there on, the detective work had to be extended to factors beyond the production line itself. The culprit turned out to be a poor spot-weld hidden beneath a conveyor surface, where no spot weld should be. Apparently, it was an ad hoc repair, never reported, to a smooth sheet that should have been replaced. Only in the particular set of circumstances identified, were components passing between production stages 'wobbled' by this in a way that interfered fractionally but significantly with the alignment of the immediately subsequent drilling phase.
Meanwhile, in the pottery, SigmaStat located a single critical variable: wind direction and strength outside the building, when damp clay is being prepared for use. We haven't yet figured out how this works, but experimental trials confirm the accuracy of the diagnosis. For the time being, production clay prep is abandoned for other tasks when the particular conditions prevail - and SigmaStat continues to analyse such things as airborne grit content, humidity, internal ventilation, and suchlike.