Delete calculation results in OQ-engine

Bug #1117052 reported by Marco Pagani
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
High
Lars Butler

Bug Description

OQ currently stores in the DB all the results of the performed calculations. There are however some results that aren't relevant and the user might wish to remove from the DB. For this purpose it would be useful to have a --dhc function in OQ.

Changed in openquake:
status: New → In Progress
assignee: nobody → Lars Butler (lars-butler)
milestone: none → 0.9.2
importance: Undecided → High
Revision history for this message
Lars Butler (lars-butler) wrote :

Here are a couple of key points I propose for the implementation of this feature:

- This action should only be done by the `oq_admin` DB user
- A user should only be able to delete `hazard_calculation` records belonging to himself/herself (see `hazard_calculation.owner_id`).
- All DELETE privileges should be removed for all tables and only granted to the `oq_admin` user, with the exception of tables in the `htemp` schema space. DELETE privileges for `htemp` can remain with `oq_reslt_writer`.
- Add a command line option for `--delete-hazard-calculation|--dhc` and `--delete-risk-calculation|--drc`.
  - Invoking this should ask for confirmation from the user
  - For the purpose of running this command in a script or similar context, add a `--yes|-y` option to automatically confirm.

--dhc/--drc would delete everything associated with the calculation, including `oq_job` records and output artifact. It will NOT remove inputs.

Another point to think about/discuss: Perhaps it would be useful also to be able to delete individual outputs, instead of entire calculations. Given the time we have planned for this task, I think this functionality could be added as well.

Revision history for this message
matley (matley) wrote :

1) What about a risk calculation depending on an hazard output going to be deleted?

2) I would use --force-inputs (see the convention used in ./manage.py) or --batch instead of --yes that always assumes boolean question.

3) I do not see an use case for deleting individual outputs, but to delete artifacts produced by failed jobs. So, I am in favor of having a --purge-delete-jobs or --delete-job (where also the inputs will be deleted). What do you think?

Revision history for this message
Lars Butler (lars-butler) wrote :

@matley:

1) I'll have to think about it some more. The simplest thing to do I think is the following:
  - The user runs --dhc
  - Check if any of the outputs associated with that calculation are in use by a risk calculation
    - If yes, abort and don't delete anything
    - Else, remove the calculation and associated outputs

A more complex way to handle this would be to 'orphan' the output that is associated with the risk calculation, but delete everything else associated with the hazard calculation. This is more complex, but it be a useful change for slightly decoupling the `hazard_calculation`, `oq_job`, and `output` records, which is something we'll need for the capability to import pre-computed results. To clarify, that means we could have `output` records which are not associated to any `oq_job` or `hazard_calculation`.

2) At moment, the bin/openquake executable doesn't ask for any confirmation for an action. That's why I proposed to introduce --yes. The use case is much different from --force-inputs. We're using --yes in the oq_created_db script (https://github.com/gem/oq-engine/blob/master/bin/oq_create_db). Other utilities like apt-get use this convention, which is why I though to implement it this way.

3) I spoke with Damiano and, while he agrees that it would be useful to be able to delete individual outputs, we can implement that in the future. At the moment, we only need the capability to delete entire calculations.

Revision history for this message
Lars Butler (lars-butler) wrote :

Another implementation detail: specify ON DELETE CASCADE for most artifacts associated with the calculation. This is make deletions rather trivial--simply delete the calculation record and let the database clean everything up. By removing DELETE privileges (as mentioned above) from all users except `oq_admin`, this can help us ensure that accidental deletions don't become a problem.

Revision history for this message
Lars Butler (lars-butler) wrote :

Change to DB schema to allow for nice cascading DELETEs: https://github.com/gem/oq-engine/pull/1067

Revision history for this message
Lars Butler (lars-butler) wrote :

Second and final set of changes: https://github.com/gem/oq-engine/pull/1068

Changed in openquake:
status: In Progress → Fix Committed
Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.