Midori fills /tmp with html5 videos

Bug #1203240 reported by gue5t gue5t
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Midori Web Browser
Incomplete
Undecided
Unassigned
WebKit
Fix Released
Medium

Bug Description

When Midori plays an html5 video from the web, it seems to save a copy to /tmp with filenames like "midori4-733L0W". These files will eventually fill the filesystem because they aren't automatically deleted in some cases (e.g. if Midori crashes).

To reproduce, play a video like http://vt.tumblr.com/tumblr_mq791sqCBX1qliqrq.mp4 and see the file in /tmp; crash the browser, and the file will stick around even after restarting Midori. Maybe we could put these in a more temporary (process-specific?) location or clean them up in a segv handler or something.

Tags: html5 webkit
Revision history for this message
Cris Dywan (kalikiana) wrote :

For me these files are stored in $XDG_CACHE_HOME - though I remember seeing them in /tmp in the past.

To my knowledge we're neither handling the playback nor setting a path. This should be resolved in WebKitGTK+.

tags: added: html5 webkit
Revision history for this message
gue5t gue5t (gue5t) wrote :

Likely /tmp is the fallback for unset XDG_CACHE_HOME as I have in my current setup--strange, given that the basedir spec says $HOME/.cache should be used--but that's immaterial.

The real issue is a user-experience one: when we crash (fairly often, unfortunately), we leave behind a lot of potentially large video files, and disk or tmpfs space can disappear with midori implicated as the culprit. I don't know if webkit can do much about this, since it doesn't know about application lifetimes but maybe we should push it upstream just in case they can. Or maybe we can get webkit to namespace these temporary files to the midori config dir (e.g put them in a subdir that's a hash of the config path?) and delete the config dir's old ones on startup.

Revision history for this message
In , Pnormand (pnormand) wrote :

Right now preloaded videos go to ~/.cache but they should be stored in the webkit's cache folder instead.

The cache location can be configured using queue2::temp-location property but I'm not sure yet how to access the right queue2 element at runtime. Some investigation is needed :)

Changed in webkit-open-source:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Sebastian Dröge (slomo) wrote :

It currently is impossible to do that properly, best solution would probably be to expose that queue2 property on uridecodebin and playbin.

Revision history for this message
gue5t gue5t (gue5t) wrote :

I can no longer reproduce this with WebkitGTK+ 2.4.8.

Changed in midori:
status: New → Incomplete
Revision history for this message
In , Cgarcia-f (cgarcia-f) wrote :

Is that cache permanent? I mean, can it be reused among sessions? Otherwise I don't think it should be stored in the home dir at all, but in /tmp

Revision history for this message
In , Claudio Saavedra (csaavedra) wrote :

Carlos nailed it. If it cannot be reused between sessions then it doesn't belong in .cache.

Revision history for this message
In , Sebastian Dröge (slomo) wrote :

It can be used between sessions if webkit gives an appropriate name to the files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main problem with that is that some people put /tmp on a tmpfs... so if you want to watch a long enough movie, your memory runs full.

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

(In reply to comment #4)
> It can be used between sessions if webkit gives an appropriate name to the
> files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> problem with that is that some people put /tmp on a tmpfs... so if you want
> to watch a long enough movie, your memory runs full.

/tmp on tmpfs has been default in Fedora for several years now... I was amazed quite recently to learn that other distros are still not doing this. Anyway, the videos would have to go in /var/tmp since they are big. But /var/tmp is not cleaned automatically, so there needs to be some measure to clean up leaked videos in the event of a web process crash.

Revision history for this message
In , Pnormand (pnormand) wrote :

The UIProcess is notified when the WebProcess crashes, so perhaps we could do the clean-up at that moment.

Revision history for this message
In , Sebastian Dröge (slomo) wrote :

Or you could make the videos actually cacheable so they can be reused in future sessions until cleared from the cache ;)

Cleaning up from the webprocess crashes would mean that you need to have a way to identify which files belong to the pid of the crashed process

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

Note that we've gotten multiple reports of ~/.cache rising to excessive size; apparently there is no way to clean up cached video files if there is a web process crash.

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

CCing some media folks... how do we want to go about solving this?

Anything cached should go under WebKitWebsiteDataManager's base-cache-dir if it's persistent, or /var/tmp otherwise.

Note that we have to be robust to UI process crashes as well. That happens. We are already careful to never leak in the normal disk cache.

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

Probably bug #156369 needs to be fixed at the same time as this.

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

(Changing component just so that we can find this bug again.)

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to comment #5)
> (In reply to comment #4)
> > It can be used between sessions if webkit gives an appropriate name to the
> > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > problem with that is that some people put /tmp on a tmpfs... so if you want
> > to watch a long enough movie, your memory runs full.
>
> /tmp on tmpfs has been default in Fedora for several years now... I was
> amazed quite recently to learn that other distros are still not doing this.
> Anyway, the videos would have to go in /var/tmp since they are big. But
> /var/tmp is not cleaned automatically, so there needs to be some measure to
> clean up leaked videos in the event of a web process crash.

I suggest removing the file just after opening it for write.

That way you can continue to use the file for reading or writing to it, by passing the file descriptor id.

The kernel will automatically remove the file once the descriptor id is closed or the process holding the file descriptor id dies (crashes or exits)

So you not longer have to take care of cleaning the files. The file will be automatically cleaned when you not longer use it.

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to comment #5)
> (In reply to comment #4)
> > It can be used between sessions if webkit gives an appropriate name to the
> > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > problem with that is that some people put /tmp on a tmpfs... so if you want
> > to watch a long enough movie, your memory runs full.
>
> /tmp on tmpfs has been default in Fedora for several years now...

Personally, I don't think /tmp on tmpfs is a good idea.

In the worst case (you have configured your system without swap) you end wasting precious memory space for tmp files. Which is nuts.

In the best case, you end wasting swap space and risking incurring in more swappiness related freezes than otherwise you would have.

/tmp on tmpfs supporters main argument was they were worried about incrementing their SSD wearing level more than needed. This has been exaggerated a lot. I still have to find anyone that has nuked their SSD because of not having /tmp on tmpfs.

Related:
https://lists.debian.org/debian-devel/2012/05/msg01092.html
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=666096

Revision history for this message
In , eocanha (eocanha) wrote :

(In reply to comment #5)
> (In reply to comment #4)
> > It can be used between sessions if webkit gives an appropriate name to the
> > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > problem with that is that some people put /tmp on a tmpfs... so if you want
> > to watch a long enough movie, your memory runs full.
>
> /tmp on tmpfs has been default in Fedora for several years now... I was
> amazed quite recently to learn that other distros are still not doing this.
> Anyway, the videos would have to go in /var/tmp since they are big. But
> /var/tmp is not cleaned automatically, so there needs to be some measure to
> clean up leaked videos in the event of a web process crash.

A naive suggestion would be to treat /var/tmp as system storing /tmp on disk treat it. Those systems clean /tmp on boot. My suggestion is to use /var/tmp/${USER}/WebKit-media (or some better name) as "tmp", and not cleaning it at webprocess' death (which can be uncontrolled in the very worst case), but at WebKit start. Each time WebKit starts, it cleans the previous "tmp".

More efficient approaches can be implemented on top of that, but I think this one could be fine as a baseline.

Revision history for this message
In , Adrian Perez (aperezdc) wrote :

(In reply to comment #12)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > It can be used between sessions if webkit gives an appropriate name to the
> > > files IIRC. But yes, it should probably go into g_get_tmp_dir(). Main
> > > problem with that is that some people put /tmp on a tmpfs... so if you want
> > > to watch a long enough movie, your memory runs full.
> >
> > /tmp on tmpfs has been default in Fedora for several years now... I was
> > amazed quite recently to learn that other distros are still not doing this.
> > Anyway, the videos would have to go in /var/tmp since they are big. But
> > /var/tmp is not cleaned automatically, so there needs to be some measure to
> > clean up leaked videos in the event of a web process crash.

This is not necessarily true, many (most?) GNU/Linux distributions clean
“/var/tmp”:

  % grep '/var/tmp ' /usr/lib/tmpfiles.d/tmp.conf
  q /var/tmp 1777 root root 30d
  %

Still, using “/var/tmp” is not a good option because many systems are known
*not to* clean it by default ({Free,Open,Net}BSD), and applications are
expected to clean up after themselves when creating files under ”/var/tmp”.

> I suggest removing the file just after opening it for write.

+1

If we can make GStreamer happy without having named files, passing the FD
around is the absolute best option.

> That way you can continue to use the file for reading or writing to it, by
> passing the file descriptor id.
>
> The kernel will automatically remove the file once the descriptor id is
> closed or the process holding the file descriptor id dies (crashes or exits)
>
> So you not longer have to take care of cleaning the files. The file will be
> automatically cleaned when you not longer use it.

As an additional small side bonus unlinking the file after opening causes less name space pollution in the directory used for temporary files. Which reduces
contention for creating new temporary files — anyway this is most likely
irrelevant for web browsing, but still nice :)

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

(In reply to comment #15)
> This is not necessarily true, many (most?) GNU/Linux distributions clean
> “/var/tmp”:

You're right, systemd does this, even when the GNOME option to clean temporary files is disabled....

Revision history for this message
In , eocanha (eocanha) wrote :

Today I tried to solve this by setting the temp-remove[1] property of GstDownloadBuffer to true, only to find that it's already true by default and that the "delete on READY" statement in the documentation doesnt refer to the NULL -> READY -> PAUSED -> PLAYING transition (ie: delete after creation), but to the PLAYING -> PAUSED -> READY -> NULL one (ie: delete on destruction, the current behaviour).

I guess I'll have to try a different approach and delete the file by myself.

[1] https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer-plugins/html/gstreamer-plugins-downloadbuffer.html#GstDownloadBuffer--temp-remove

Revision history for this message
In , eocanha (eocanha) wrote :

Created attachment 300331
Patch

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

LGTM. Would be good for a GStreamer reviewer to review it. Please add it to https://trac.webkit.org/wiki/WebKitGTK/2.14.x after committing.

Revision history for this message
In , Cgarcia-f (cgarcia-f) wrote :
Download full text (4.1 KiB)

Comment on attachment 300331
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

> Source/WebCore/ChangeLog:13
> + Files cached on disk by MediaPlayerPrivateGStreamer are deleted only when the player is closed. If the
> + WebProcess crashed, they're just left there in the cache directory. This patch changes the location
> + of those temporary files to a proper temporary directory (/var/tmp, as those files aren't actually
> + reusable, so they don't belong to a cache directory, and /tmp is a bad place because it's RAM-based on
> + some distros), unlinks (deletes) them right after creation and also deletes any other stalled temporary
> + file on the old legacy cache directory.

Wouldn't it be a problem if /var is in a different partition? I'm not sure if the solution to leaked files is placing them in /tmp to be honest. If we can decide the folder, then we can use a specific one in the user cache dir, and clean it up at startup, for example.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1348
> + if (!g_strcmp0(className, "GstDownloadBuffer")) {

Early return here instead.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1350
> + g_signal_handlers_disconnect_by_func(bin, gpointer(uriDecodeBinElementAddedCallback), player);

Use C++ style casts

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
> + processName = "GStreamer";

shouldn't it be WebKitWebProcess by default?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1360
> + GUniquePtr<gchar> newDownloadTemplate(g_strdup_printf("/var/tmp/%s-XXXXXX", processName));

Use g_build_filename

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
> + player->purgeOldDownloadFiles(oldDownloadTemplate.get());

So, if we purge the old files, then we don't really need to use /var/tmp, no?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1372
> + g_signal_handlers_disconnect_by_func(player->m_downloadBuffer.get(), gpointer(downloadBufferFileCreatedCallback), player);

Use C++ style cast

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1379
> + GUniqueOutPtr<gchar> downloadFile;
> + g_object_get(player->m_downloadBuffer.get(), "temp-location", &downloadFile.outPtr(), nullptr);
> + if (g_unlink(downloadFile.get()) == -1)
> + GST_WARNING("Couldn't unlink media temporary file %s after creation", downloadFile.get());
> + else
> + GST_TRACE("Unlinked media temporary file %s after creation", downloadFile.get());

Do we need a signal for this? Doesn't gst changes it when g_object_set(element, "temp-template", synchronously?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1411
> + GDir* directory = g_dir_open(templatePath.get(), 0, nullptr);
> +
> + if (!directory)
> + return;
> +
> + for (const gchar* fileName = g_dir_read_name(directory); fileName; fileName = g_dir_read_name(directory)) {
> + if (g_str_has_prefix(file...

Read more...

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

(In reply to comment #20)
> Wouldn't it be a problem if /var is in a different partition?

Why would that be a problem?

Revision history for this message
In , eocanha (eocanha) wrote :
Download full text (3.8 KiB)

Comment on attachment 300331
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

Thanks for the rest of the comments I'm not replying, they're really useful.

>> Source/WebCore/ChangeLog:13
>> + file on the old legacy cache directory.
>
> Wouldn't it be a problem if /var is in a different partition? I'm not sure if the solution to leaked files is placing them in /tmp to be honest. If we can decide the folder, then we can use a specific one in the user cache dir, and clean it up at startup, for example.

I don't think GstDownloadBuffer is ready to reuse the media files between sessions with its current design, so as per https://bugs.webkit.org/show_bug.cgi?id=119477#c2 and https://bugs.webkit.org/show_bug.cgi?id=119477#c3 the files shouldn't be considered "cache" but temporary files. They can be huge (eg: 2GB or similar movie sizes), so they aren't appropriate for /tmp (because of some distros using tmpfs) but they're appropriate for /var/tmp as per https://bugs.webkit.org/show_bug.cgi?id=119477#c5

I considered /var/tmp as the consensus solution for storing this kind of files. If it's not the consensus solution, then discuss it first among yourselves before requesting an implementation.

Having /var in a different partition is as problematic as having /home or /tmp in a different partition too. If the former hasn't been a problem until now, using /var shouldn't be a problem too.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
>> + processName = "GStreamer";
>
> shouldn't it be WebKitWebProcess by default?

That's what g_get_prgname() should return, yes. In the very unlikely case that it returns NULL, I'm just defaulting to the same value that GstUriDecodeBin would:

https://github.com/GStreamer/gst-plugins-base/blob/1.8.0/gst/playback/gsturidecodebin.c#L1951

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
>> + player->purgeOldDownloadFiles(oldDownloadTemplate.get());
>
> So, if we purge the old files, then we don't really need to use /var/tmp, no?

The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were considered as "cache" or as "temporary". It's the second, so they belong to a temporary directory.

This purging is here just for legacy files generated by WebKit prior to this patch. Michael told me that leaking those legacy files wasn't unacceptable. That's why I added the purge method.

Ideally, when all the users have migrated to a WebKit version having this patch, the purge shouldn't be needed anymore and might be removed.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1379
>> + GST_TRACE("Unlinked media temporary file %s after creation", downloadFile.get());
>
> Do we need a signal for this? Doesn't gst changes it when g_object_set(element, "temp-template", synchronously?

I don't understand this comment. GstDownloadBuffer was already removing the file when the element was destroyed (when destroying the player). That's not enough for us. We want to delete the file immediately after it's created.

The only way I found to get notified of the file cre...

Read more...

Revision history for this message
In , Cgarcia-f (cgarcia-f) wrote :

(In reply to comment #21)
> (In reply to comment #20)
> > Wouldn't it be a problem if /var is in a different partition?
>
> Why would that be a problem?

I don't know that's why I'm asking :-)

Revision history for this message
In , Cgarcia-f (cgarcia-f) wrote :
Download full text (4.8 KiB)

(In reply to comment #22)
> Comment on attachment 300331 [details]
> Patch
>
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=300331&action=review
>
> Thanks for the rest of the comments I'm not replying, they're really useful.
>
> >> Source/WebCore/ChangeLog:13
> >> + file on the old legacy cache directory.
> >
> > Wouldn't it be a problem if /var is in a different partition? I'm not sure if the solution to leaked files is placing them in /tmp to be honest. If we can decide the folder, then we can use a specific one in the user cache dir, and clean it up at startup, for example.
>
> I don't think GstDownloadBuffer is ready to reuse the media files between
> sessions with its current design, so as per
> https://bugs.webkit.org/show_bug.cgi?id=119477#c2 and
> https://bugs.webkit.org/show_bug.cgi?id=119477#c3 the files shouldn't be
> considered "cache" but temporary files. They can be huge (eg: 2GB or similar
> movie sizes), so they aren't appropriate for /tmp (because of some distros
> using tmpfs) but they're appropriate for /var/tmp as per
> https://bugs.webkit.org/show_bug.cgi?id=119477#c5
>
> I considered /var/tmp as the consensus solution for storing this kind of
> files. If it's not the consensus solution, then discuss it first among
> yourselves before requesting an implementation.

Ok, my fault, I tried to help reviewing this, but I didn't read the discussion first, there's even a comment by me suggesting not to use home cache dir :-P

> Having /var in a different partition is as problematic as having /home or
> /tmp in a different partition too. If the former hasn't been a problem until
> now, using /var shouldn't be a problem too.

Yes, sure, but it's more unlikely. Anyway, I only asked. I know, for example, that we write the partial download files in the same download directory instead of /tmp to avoid copies between partitions.

> >> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
> >> + processName = "GStreamer";
> >
> > shouldn't it be WebKitWebProcess by default?
>
> That's what g_get_prgname() should return, yes. In the very unlikely case
> that it returns NULL, I'm just defaulting to the same value that
> GstUriDecodeBin would:
>
> https://github.com/GStreamer/gst-plugins-base/blob/1.8.0/gst/playback/
> gsturidecodebin.c#L1951

Because gst can't know in which process is running, but we know it.

> >> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
> >> + player->purgeOldDownloadFiles(oldDownloadTemplate.get());
> >
> > So, if we purge the old files, then we don't really need to use /var/tmp, no?
>
> The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were
> considered as "cache" or as "temporary". It's the second, so they belong to
> a temporary directory.

Right, ok.

> This purging is here just for legacy files generated by WebKit prior to this
> patch. Michael told me that leaking those legacy files wasn't unacceptable.
> That's why I added the purge method.

Ah, I see, I wonder if we should purge also leaked files from previous sessions too. That would solve the problem for systems where /var/tm...

Read more...

Revision history for this message
In , Calvaris (calvaris) wrote :
Download full text (4.9 KiB)

Comment on attachment 300331
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

Lots of comments and suggestions, let's see this patch again.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1348
>> + const gchar* className = G_OBJECT_CLASS_NAME(G_OBJECT_GET_CLASS(G_OBJECT(element)));
>> + if (!g_strcmp0(className, "GstDownloadBuffer")) {
>
> Early return here instead.

Remove the variable because it is only used at the if. Besides, type should have been const char*.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1353
> + GUniqueOutPtr<gchar> oldDownloadTemplate;

char

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1356
> + const gchar* processName = g_get_prgname();

char

>>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1358
>>>> + processName = "GStreamer";
>>>
>>> shouldn't it be WebKitWebProcess by default?
>>
>> That's what g_get_prgname() should return, yes. In the very unlikely case that it returns NULL, I'm just defaulting to the same value that GstUriDecodeBin would:
>>
>> https://github.com/GStreamer/gst-plugins-base/blob/1.8.0/gst/playback/gsturidecodebin.c#L1951
>
> Because gst can't know in which process is running, but we know it.

I am with Carlos here. We don't need the process name actually to name the temp file. I'd go for something like WebKit-Media-xxxxxx. This would be positive for not having to rename in case in the future we can do these downloads thru the network process.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1360
>> + GUniquePtr<gchar> newDownloadTemplate(g_strdup_printf("/var/tmp/%s-XXXXXX", processName));
>
> Use g_build_filename

char.

Another thing is that I think we should ensure the files are created in a different directory for each user and ensure that directory has the proper permissions so that it cannot be access by anyone else other than the user. I think media people watch online can vary a lot from cats to other things. I'd write a function to return the template and ensure it is placed in the proper directory.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1362
> + GST_TRACE("Reconfiguring file download template from '%s' to '%s'\n", oldDownloadTemplate.get(), newDownloadTemplate.get());

The \n is not advisable.

>>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
>>>> + player->purgeOldDownloadFiles(oldDownloadTemplate.get());
>>>
>>> So, if we purge the old files, then we don't really need to use /var/tmp, no?
>>
>> The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were considered as "cache" or as "temporary". It's the second, so they belong to a temporary directory.
>>
>> This purging is here just for legacy files generated by WebKit prior to this patch. Michael told me that leaking those legacy files wasn't unacceptable. That's why I added the purge method.
>>
>> Ideally, when all the users have migrated to a WebKit version having this patch, the purge shoul...

Read more...

Revision history for this message
In , eocanha (eocanha) wrote :

Comment on attachment 300331
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300331&action=review

>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1360
>>> + GUniquePtr<gchar> newDownloadTemplate(g_strdup_printf("/var/tmp/%s-XXXXXX", processName));
>>
>> Use g_build_filename
>
> char.
>
> Another thing is that I think we should ensure the files are created in a different directory for each user and ensure that directory has the proper permissions so that it cannot be access by anyone else other than the user. I think media people watch online can vary a lot from cats to other things. I'd write a function to return the template and ensure it is placed in the proper directory.

This is tricky and, IMHO, not needed.

Having and extra directory under /var/tmp means that we would have to maintain (delete) that directory by ourselves. This is problematic because we can't use the elegant trick of deleting the file right after its creation to delete the directory. If several videos are being played at once by different WebProcesses a race condition can happen:

- Process A creates /var/tmp/$USER
- Process A creates WebKit-Media-123456 there
- Process B tries to create /var/tmp/$USER (but it's already created, no problem)
- Process A deletes WebKit-Media-123456 there and /var/tmp/$USER
- Process B creates WebKit-Media-ABCDEF there but... the directory doesn't exist anymore. ERROR!

Anyway, the directory itself isn't needed because the new temporary files are now deleted right after their creation. This means that no other processes can open them anymore, so there shouldn't be any concern regarding privacy. I've checked it by myself with a "while true; do ls /var/tmp; done" script.

>>>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1364
>>>>> + player->purgeOldDownloadFiles(oldDownloadTemplate.get());
>>>>
>>>> So, if we purge the old files, then we don't really need to use /var/tmp, no?
>>>
>>> The "$HOME/.cache" vs. "/var/tmp" discussion was about if the files were considered as "cache" or as "temporary". It's the second, so they belong to a temporary directory.
>>>
>>> This purging is here just for legacy files generated by WebKit prior to this patch. Michael told me that leaking those legacy files wasn't unacceptable. That's why I added the purge method.
>>>
>>> Ideally, when all the users have migrated to a WebKit version having this patch, the purge shouldn't be needed anymore and might be removed.
>>
>> Right, ok.
>
> Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().

Michael would say that WebKit should try hard to clean its own files. I don't think distros are going to mess with the contents of the user home dir, even for the .cache dir. Thanks for the UNLIKELY() suggestion, btw.

Revision history for this message
In , eocanha (eocanha) wrote :

Created attachment 300398
Patch

Revision history for this message
In , Cgarcia-f (cgarcia-f) wrote :

Comment on attachment 300398
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:40
> +#include <WebCore/FileSystem.h>

Don't use <> since you are in WebCore.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:203
> + if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

G_TYPE_CHECK_INSTANCE_TYPE already null checks the pointer.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:204
> + g_signal_handlers_disconnect_by_func(GST_ELEMENT_PARENT(m_source.get()), reinterpret_cast<gpointer>(uriDecodeBinElementAddedCallback), this);

What if it has been disconnected before? I think it would be better to save the handler id and use g_signal_handler_disconnect() and invalidate the id.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1350
> + if (g_strcmp0(G_OBJECT_CLASS_NAME(G_OBJECT_GET_CLASS(G_OBJECT(element))), "GstDownloadBuffer"))

GST_IS_DOWNLOAD_BUFFER?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1395
> + // Transform "-XXXXXX" into "-??????" to get a file wildcard expression from the template.
> + for (char* p = &templateFile.get()[strlen(templateFile.get()) - 1]; p >= templateFile.get() && *p == 'X'; --p)
> + *p = '?';

I'm not sure I understand this, maybe you can simply do g_strdelimit (templateFile.get(), "X", '?');

Can't you pass something like "*-XXXXXX*" as pattern to listDirectory?

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1397
> + for (auto filePath : listDirectory(templatePath.get(), templateFile.get())) {

auto&

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1409
> + if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

G_TYPE_CHECK_INSTANCE_TYPE already null checks the pointer.

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

(In reply to comment #25)
> Another thing is that I think we should ensure the files are created in a
> different directory for each user and ensure that directory has the proper
> permissions so that it cannot be access by anyone else other than the user.
> I think media people watch online can vary a lot from cats to other things.
> I'd write a function to return the template and ensure it is placed in the
> proper directory.

It should be sufficient to just create the files with 600 permission, right?

Revision history for this message
In , Calvaris (calvaris) wrote :

Comment on attachment 300398
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:203
> + if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

You don't need the m_source && as WEBKIT_IS_WEB_SRC checks for nulls as expected.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1362
> + GUniqueOutPtr<char> oldDownloadTemplate;
> + g_object_get(element, "temp-template", &oldDownloadTemplate.outPtr(), nullptr);
> +
> + GUniquePtr<char> newDownloadTemplate(g_build_filename(G_DIR_SEPARATOR_S, "var", "tmp", "WebKit-Media-XXXXXX", nullptr));
> + g_object_set(element, "temp-template", newDownloadTemplate.get(), nullptr);
> + GST_TRACE("Reconfiguring file download template from '%s' to '%s'", oldDownloadTemplate.get(), newDownloadTemplate.get());

Nit: you can move getting oldDownloadTemplate after setting newDownloadTemplate (it is going to be actually used before the GST_TRACE and specially relevant for the purge).

Another nit: "Reconfigured" is better because the relevant action already happened.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1395
> + // Transform "-XXXXXX" into "-??????" to get a file wildcard expression from the template.
> + for (char* p = &templateFile.get()[strlen(templateFile.get()) - 1]; p >= templateFile.get() && *p == 'X'; --p)
> + *p = '?';

Maybe you can use String::replace instead of doing this (it can take position and length.

> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1409
> + if (m_source && WEBKIT_IS_WEB_SRC(m_source.get()))

You don't need the m_source && as WEBKIT_IS_WEB_SRC checks for nulls as expected.

Revision history for this message
In , Calvaris (calvaris) wrote :

(In reply to comment #26)
> > Another thing is that I think we should ensure the files are created in a different directory for each user and ensure that directory has the proper permissions so that it cannot be access by anyone else other than the user. I think media people watch online can vary a lot from cats to other things. I'd write a function to return the template and ensure it is placed in the proper directory.
>
> This is tricky and, IMHO, not needed.
>
> Having and extra directory under /var/tmp means that we would have to
> maintain (delete) that directory by ourselves. This is problematic because
> we can't use the elegant trick of deleting the file right after its creation
> to delete the directory. If several videos are being played at once by
> different WebProcesses a race condition can happen:
>
> - Process A creates /var/tmp/$USER
> - Process A creates WebKit-Media-123456 there
> - Process B tries to create /var/tmp/$USER (but it's already created, no
> problem)
> - Process A deletes WebKit-Media-123456 there and /var/tmp/$USER
> - Process B creates WebKit-Media-ABCDEF there but... the directory doesn't
> exist anymore. ERROR!
>
> Anyway, the directory itself isn't needed because the new temporary files
> are now deleted right after their creation. This means that no other
> processes can open them anymore, so there shouldn't be any concern regarding
> privacy. I've checked it by myself with a "while true; do ls /var/tmp; done"
> script.

Well, I don't think deleting the directory is needed, it is just 4k on disk.

> > Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().
>
> Michael would say that WebKit should try hard to clean its own files. I
> don't think distros are going to mess with the contents of the user home
> dir, even for the .cache dir. Thanks for the UNLIKELY() suggestion, btw.

I think I wasn't clear here. Line 1377, if the regular unlink fails (not the purge one, the regular) we are leaking them (under /var/tmp/).

Revision history for this message
In , Calvaris (calvaris) wrote :

Comment on attachment 300398
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1362
>> + GST_TRACE("Reconfiguring file download template from '%s' to '%s'", oldDownloadTemplate.get(), newDownloadTemplate.get());
>
> Nit: you can move getting oldDownloadTemplate after setting newDownloadTemplate (it is going to be actually used before the GST_TRACE and specially relevant for the purge).
>
> Another nit: "Reconfigured" is better because the relevant action already happened.

Forget this, it is wrong.

Revision history for this message
In , eocanha (eocanha) wrote :

Comment on attachment 300398
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=300398&action=review

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:204
>> + g_signal_handlers_disconnect_by_func(GST_ELEMENT_PARENT(m_source.get()), reinterpret_cast<gpointer>(uriDecodeBinElementAddedCallback), this);
>
> What if it has been disconnected before? I think it would be better to save the handler id and use g_signal_handler_disconnect() and invalidate the id.

If the callback has been disconnected before, then g_signal_handlers_disconnect_matched() (the actual function called by the macro) just returns "0" matches and nothing happens.

>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1350
>> + if (g_strcmp0(G_OBJECT_CLASS_NAME(G_OBJECT_GET_CLASS(G_OBJECT(element))), "GstDownloadBuffer"))
>
> GST_IS_DOWNLOAD_BUFFER?

Sorry, that's not possible. GstDownloadBuffer is an element, and as such, comes from a dynamically loaded plugin. It seems that the gstdownloadbuffer.h header isn't distributed for usage from third party applications. If I can't find it in WebKit/WebKitBuild/DependenciesGTK/Root/include/gstreamer-1.0/gst when using jhbuild, I doubt it's going to be available on regular distros.

>>> Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:1395
>>> + *p = '?';
>>
>> I'm not sure I understand this, maybe you can simply do g_strdelimit (templateFile.get(), "X", '?');
>>
>> Can't you pass something like "*-XXXXXX*" as pattern to listDirectory?
>
> Maybe you can use String::replace instead of doing this (it can take position and length.

Carlos: It doesn't work like that. "X" isn't an actual letter X, but the character g_mkstemp()[1] uses as wildcard to construct actual random filenames. That character is equivalent to "?" in shell glob syntax.

Now that I think about it, if we're comfident about what the legacy base name was ("WebKitWebProcess-XXXXXX") and are sure that it doesn't have "X" in any other place than in the pattern, we can do a simple character replacement in the base name without fear to screw legitimate "X" letters up.

[1] https://developer.gnome.org/glib/stable/glib-File-Utilities.html#g-mkstemp

Revision history for this message
In , Calvaris (calvaris) wrote :

After discussing in private I think the best solution because of permissions is sticking to ~/.cache/ but ensuring deleting and purge of old files. I guess a new patch would be needed.

Revision history for this message
In , carloslp (carloslp) wrote :

(In reply to comment #34)
> After discussing in private I think the best solution because of permissions
> is sticking to ~/.cache/ but ensuring deleting and purge of old files. I
> guess a new patch would be needed.

Why you can't you apply there the same approach of unlinking the file just after its created? Its simpler to implement and more robust.

Revision history for this message
In , eocanha (eocanha) wrote :

(In reply to comment #34)
> After discussing in private I think the best solution because of permissions
> is sticking to ~/.cache/ but ensuring deleting and purge of old files. I
> guess a new patch would be needed.

I've checked it (after adding a hack to avoid removing the file, normally external processes can't see it), and the file permissions are "600":

1605111 81156 -rw------- 1 enrique enrique 83103373 Feb 2 19:33 WebKit-Media-VPHGVY

Having this into account, there's no reason why plain /var/tmp is a bad idea, and there are several reasons why it's a good idea (as per the "cache" vs. "temporary" file discussion in previous comments).

Revision history for this message
In , Calvaris (calvaris) wrote :

(In reply to comment #35)
> Why you can't you apply there the same approach of unlinking the file just
> after its created? Its simpler to implement and more robust.

Probably I wasn't clear enough but my proposal implied this inside WebKit code the problem was a possible permissions race condition between creation and deletion.(In reply to comment #36)

> (In reply to comment #34)
> I've checked it (after adding a hack to avoid removing the file, normally
> external processes can't see it), and the file permissions are "600":
>
> 1605111 81156 -rw------- 1 enrique enrique 83103373 Feb 2 19:33
> WebKit-Media-VPHGVY
>
> Having this into account, there's no reason why plain /var/tmp is a bad
> idea, and there are several reasons why it's a good idea (as per the "cache"
> vs. "temporary" file discussion in previous comments).

Then I think we would be good to go once you fix the less serious comments such as extra null checks, chars replacing, etc.

Revision history for this message
In , eocanha (eocanha) wrote :

(In reply to comment #31)
> (In reply to comment #26)
> > > Yep, this will purge the old files. I have a concern though for the files that fall under the GST_WARNING of line 1377 of not being able to unlink. Should we assume that is is very unlikely and forget (and trust distros to purge this from time to time)? Btw, if it is so unlikely, we can tag the decission as UNLIKELY().
> >
> > Michael would say that WebKit should try hard to clean its own files. I
> > don't think distros are going to mess with the contents of the user home
> > dir, even for the .cache dir. Thanks for the UNLIKELY() suggestion, btw.
>
> I think I wasn't clear here. Line 1377, if the regular unlink fails (not the
> purge one, the regular) we are leaking them (under /var/tmp/).

Trying to purge leaked files in the new /var/tmp location doesn't make sense. If the files couldn't be deleted right after creation in the first place (because of a race condition with permissions, where some malicious (root?) process changes the permissions in the milisecond that the file is visible), then ther's no point on trying to delete the file later: the deletion will also fail for the same reasons it failed initially.

Purging files in the old location still makes sense, because with the old implementation more time passed from file creation to file deletion (eg: 2h until the movie finishes being played) and the probability of a crash or kill (the only reason why the file would have been leaked) was higher.

Revision history for this message
In , eocanha (eocanha) wrote :

Created attachment 300517
Patch

Revision history for this message
In , Commit-queue (commit-queue) wrote :

Comment on attachment 300517
Patch

Clearing flags on attachment: 300517

Committed r211627: <http://trac.webkit.org/changeset/211627>

Revision history for this message
In , Commit-queue (commit-queue) wrote :

All reviewed patches have been landed. Closing bug.

Revision history for this message
In , Mcatanzaro (mcatanzaro) wrote :

\o/

Kinda a shame that this went in with a misleading commit name, since the media is most definitely not being stored in WebKit's cache, but oh well.

Revision history for this message
In , carloslp (carloslp) wrote :

Storing this big media files on the homedir is a potentially performance problem on systems where the home-dir is mounted over a shared network file-system (like happens on many shared computers at places like universities)

I think /var/tmp is the right place for this. This directory will be served from local storage, and never from a network file-system.

Also, the possibility of an attacker opening the file in the millisecond that happens between the file is created and the file is unlinked are pretty low.

Adding that to the fact that videos you watch on your browser is not something that I would consider critical information (like a password).

If you are that kind of person, is better you ensure that you are not sharing your computer with anyone when you watch the videos, rating than expecting WebKitGTK+ to do magic tricks to protect your privacy from the other users of your own computer.

So... I'm happy that in the end the simplest approach of storing the file on /var/tmp and unlink it was committed, and we didn't end creating an over-engineered solution to something I don't think is a real problem

Just my 2 cents.

Changed in webkit-open-source:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.