mlflow: Container keeps stopping with no logs on WSL

Bug #2030737 reported by Ikram Ul Haq
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Docker Images
New
Undecided
Michal Hucko

Bug Description

I am trying to run this container after pulling the latest tag. The container keeps stopping with zero logs after I start it every time. I am trying to run it on WSL (Windows Subsystem for Linux).

Changed in ubuntu-docker-images:
assignee: nobody → Michal Hucko (michalhucko)
Revision history for this message
Michal Hucko (michalhucko) wrote :

Hi Ikram,
thanks for reaching out. This is expected behavior as by default it launches a python interpreter (as per upstream image). You can try to run following command to actually init the server in the container:

docker run -p 5000:5000 docker.io/ubuntu/mlflow:2.1.1_1.0-22.04 mlflow server --host 0.0.0.0

Hope it helps,
Michal

Revision history for this message
Ikram Ul Haq (ulhaqi12) wrote :

Hi,
Thank you for the help.
Yes, it worked but I am working with hugging face transformer models and storing them as pyfunc instead of using a transformer module. Sometimes, it fails to download the model from mlflow and gives the following error.

raise MlflowException("API request to %s failed with exception %s" % (url, e))
mlflow.exceptions.MlflowException: API request to http:///api/2.0/mlflow/runs/search failed with exception ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

Can you guide how can i increase the timeout of this mlflow server inside the container?
Thank you

BR,
Ikram

Revision history for this message
Michal Hucko (michalhucko) wrote :

Hi Ikram,
glad that it works for you.

Now regarding your problem. I don't have enough visibility of your deployment setup to give you clear answer. To give more informed answer I would need to know:

- Where is the mlflow server deployed? (is it with our charm?)
- What's the object store?
- Whats the relational db?
- How is mlflow-server deployed (what paras for server do you use)?
- Can you store and retrieve smaller models?
- What's the model size you are trying to retrieve?

To increase the timeout you need to set the `MLFLOW_HTTP_REQUEST_TIMEOUT` env variable in the container. You can also try to set the `GUNICORN_CMD_ARGS` with "--timeout 600" so running sth like

docker run -p 5000:5000 docker.io/ubuntu/mlflow:2.1.1_1.0-22.04 GUNICORN_CMD_ARGS="--timeout 600" mlflow server --host 0.0.0.0

Let me know if it helped.

Michal

Revision history for this message
Ikram Ul Haq (ulhaqi12) wrote :

Hi Michal,

Thank you for the help.
I am running a docker container locally for now and using local storage. I am pulling your provided image, running the container locally using docker and then accessing it on http://localhost:50000. Yes, smaller models worked fine. Models having sizes around 500 MBs causing issues.

But let me set the timeout and hope it will not have an issue.

Thank you for your help.

-Ikram

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.