Observing Operations | Reviews | Survey Management

Operational Readiness Review for SDSS Data Processing Pipeline

Fermilab 13-October 1998


 

Review committee:

Todd Boroson – NOAO

Liz Buckley Geer – FNAL/CDF

Roc Cutri – IPAC, Chair

Mike Diesberg – FNAL/DO

Rodger Doxsey - STScI

Mark Fischler – FNAL/PAT
 

I.   General Comments
 

The Review Board is extremely impressed with the work that has gone into the development of the pipelines to date.  The advanced level of the Photometric Pipeline and its readiness for transition into operational mode is particularly noted from the preliminary scientific analysis that have been carried out on the photometric data (i.e. color-color plots).  The current state of the Imaging Pipeline is particularly advanced, and that test data from the telescope was able to be run through the pipeline with essentially no problems is very encouraging and a testament to the readiness of the system.  It is recognized that other aspects of the pipeline, such as the spectroscopy portion, lags behind in development consistent with the schedule of delivery of the instruments.
 

The Board also commends the Project for moving to address two other key issues:
 

1. The hiring of a Survey Operations Director.  A widely distributed project such as the SDSS very much needs direction from a single person who is directly responsible for monitoring the survey schedule and budget.  He or she must be able to make critical decisions, in the context of the main Scientific Requirements of the Survey.  Having a single director will also allow expedient arbitration in the event that critical decisions must be made and there are conflicting opinions among the Science Team.  The Board hopes that the partner institutions agree to cede the authority to the Operations Director to carry out these tasks.
 

2. The board strongly urges the completion and adoption of a well-defined set of Science Requirements for the Survey.  It is absolutely essential to have such Requirements to provide engineering and scientific “targets” for hardware and software development.  The Requirements should allow formation of metrics against which to judge progress of the survey, and to form the basis for critical Science Team decisions.  Achievement of the Science Requirements provides the context for most decisions in the Survey.
 

II.  The section below contains responses to the specific questions regarding the data processing effort put forth in the charge for the review committee.
 

1.  Has the Project properly scoped the work?
 

Basically yes.  The one processing effort that has probably been underestimated is in the area of quality assurance.  The SDSS data comes from several different instruments and will be fairly complex, requiring an appropriate set of automated and human support quality review tools.
 

2. Resources - is the staffing plan adequate for the scope of work?
 

The general feeling among the Board is that the staffing plan is marginally adequate.  As a baseline, we note that the data processing staffing estimates are comparable in number of FTEs to that of the 2MASS effort.  However, the SDSS data processing task is inherently more complex, involving three completely different pipelines dealing with data from three very different instruments.  One advantage the SDSS effort has is that considerable software development is being supplied by the partner institutions.

 
The board feels that the needs for Quality Assurance in particular have been seriously underestimated.  Experience with the 2MASS pipeline operations indicates that this is a time-intensive process that benefits from human intervention.  The need to identify unforeseen problems in the data as quickly as possible, having either a hardware or software origin, is very strong especially early in the survey.

 

3. Is the skill mix matched to the scope of the work?
 

- The Board suggests that the Project consider making more use of skilled non-Ph.D. personnel for tasks that do not require scientific judgements.  This can include pipeline operators, integration testing and repetitive data analysis.
 

However, there are key tasks that require judgements having ramifications in the validity of the resulting  science products that require trained a trained scientist.  It is also recognized that Ph.D. astronomers and physicists bring considerable "value added" benefits to the project.
 

- The Board is somewhat concerned with the operational plan that requires the intervention of a scientist at each intermediate step of the pipeline.  This makes the processing rate and efficiency strongly dependent on the availability and schedule of the scientists.  It is recognized that this configuration will be necessary at the beginning of operations, but the Project should consider moving toward a more “hands-off” pipeline system if it is deemed possible after some experience with operations has been had.(i.e. the Test Year)
 

- The Board recommends that the project consider moving quickly to a full end-to-end test of the processing system as soon as possible during the test year operations.  The test should include everything from shipping tapes from the observatory, processing, quality assurance, loading into the archive, and feedback to the observatory.  Such a test should be repeated as often as necessary during the test year to optimize the pipeline operations.  Tests of the Imaging, Calibration, and Spectroscopic pipelines can be performed separately.
 

4. What deficiencies are seen in the Operational Plan?
 

- The description of the staffing plan (XX FTEs at YY%) is somewhat confusing.  The project would benefit from a clear organization chart and clear reporting hierarchy.
 

- Need a more clear description of the divisions of labor and assignments of tasks.
 

- The board strongly recommends a single operations manager to manage the survey schedule and make top level task assignments.
 

- Recommend firming up the support that will be provided by the partner institutions. The Board recognizes that there are practical limitations to the number of staff that can be stationed at FNAL and at partner institutions.
 

5. Is the operational/implementation schedule reasonable?
 

- A formal schedule was not presented at the review.  It was not clear what is the timetable to take all sections of the pipeline to level 2b.  Specific schedule goals should be defined for getting the pieces to 2b.  Will Survey operations start when only the photometric pipeline reaches level 2b?
 

- The Board suggests that the project decide soon if the Imaging phase of the survey can begin soon, and if it is possible to live with a ~1 year delay in the start of the Spectroscopic phase of the survey.  This scenario will likely result in a much more efficient use of resources and the best chance of completing the survey in an acceptable period of time.
 

- The Project would do well to allow for schedule contingency for software that must be made in response to hardware changes/improvements.  This will probably be most critical early in operations while instruments are still essentially in shake-down.  However, changes to hardware can occur any time during the survey.  (e.g. An array may have to be replaced, so time will be needed to recharacterize and modify pipeline parameters if necessary)
 

6. Is the existing hardware, with planned upgrades, sufficient for the task?

7. Are the throughput and reprocessing margin adequate?
 

- Marginal.  Need better benchmarks.  Perhaps the best assessment  will come with the recommended end-to-end tests.  Time sinks are often found in the least expected places.
 

- The Board was somewhat concerned by the apparent lack of contingency in disk space that would be necessary as buffer space while waiting for QA review.  It will prove very inefficient to have to spool processed data to tape in order to avoid holding up the pipeline processing.

- Is there a budget/schedule for upgrades to the hardware system?  Should a data analysis system be added to the hardware plan?
 

- Consider plan for hardware back-up.  Also, make sure that the observatory can operate autonomously while waiting for the processing system to come back on line.
 

- Reprocessing margin seems adequate.  However, the Board recommends that the project consider more efficient piping of individual pieces of the pipelines to minimize need for human intervention.
 

8. Is the methodology for software maintenance sound?
 

- The configuration control plan should be in place before the start of operations.  How will changes be approved?  Will there be a change control board?
 

- Need to design a strategy for upgrading software.  When will new versions be created?  What circumstances will require deliveries of new software versions?
 

- Recommend developing Regression Test Baseline (RTB) data sets as soon as possible to validate software modifications..  Current test data is probably suitable for photometric pipeline.
 

- Which person on the staffing chart is responsible for integration testing for pipeline software.  If it is B.1.1. (0.1 FTE), then it is probably underscoped.  As the project moves into the operational phase, changes will likely come fast and furious, so there is a need to test modifications rapidly.
 

9. Are plans for communications between APO and Fermilab sufficient?
 

- The Project should consider development of some rapid response tools to monitor the health of the telescope and instruments.  These can deal with quick-look data.
 

10. Is the management approach sensible?
 

- The board recommends that the project consider appointing a single point of contact responsible for oversight  of the data processing effort.  This person would report to the Survey Operations Director, and would be the key contact point for all matters dealing with data processing.  They should manage the schedule for software milestones, and be responsible for prioritizing software development, following the Survey Director and Science Team recommendations.  Having a single person responsible for this will focus interactions regarding the pipeline development and will minimize distractions of the software engineers.
 

11. Survey Operations Director
 

See note above.
 

12. Database/Archive
 

- The Board was concerned that there did not appear to be a good way to scope the hardware and software requirements of the archive.  The project might consider building a simulated database that can be tested on the project hardware and software platforms.  Query performance can then be tested explicitly.
 

- The project scientists should outline a set of standard sample queries based on some of the key science drivers of SDSS.  These queries can in essence become the Archive RTB.
 

III.  Miscellaneous Comments
 

1. Quality Assurance  - Need well defined and agreed upon set of Science Requirements to provide metrics for Quality Assurance.
 

2. Tape Operations – Consider implementing tape compacting procedure to control tape costs.  Also, review archiving procedures.
 

3. Operations – There is a lot in the operations plan that requires JIT (Just In Time) decision and action.  What will happen if FNAL goes to a no-weekend support schedule?
 

4.  Don’t preclude the possibility of ever making the full image data available to the public.  That is, don’t do anything now that would make it impossible to retrieve those data.