constantly shows wrong temperature (99°C )

Bug #1581594 reported by brian on 2016-05-13
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
libatasmart
Unknown
Unknown
libatasmart (Ubuntu)
Medium
Phillip Susi
Trusty
Medium
Unassigned
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned

Bug Description

Hello, here's my system info

~$ lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04

~$ apt-cache policy udisks2
udisks2:
  Installed: 2.1.7-1ubuntu1
  Candidate: 2.1.7-1ubuntu1
  Version table:
 *** 2.1.7-1ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        100 /var/lib/dpkg/status

Here's the problem: udisks2 constantly shows that my ssd temperature is 99°C (210°F), but in reality it's 30°C

~$ sudo smartctl -A /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-22-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
  9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 24
 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 51
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
170 Unknown_Attribute 0x0003 100 100 010 Pre-fail Always - 231
173 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 65539
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 23
194 Temperature_Celsius 0x0023 070 070 030 Pre-fail Always - 30 (Min/Max 30/30)
218 Unknown_Attribute 0x000b 100 100 050 Pre-fail Always - 0
231 Temperature_Celsius 0x0013 100 100 000 Pre-fail Always - 99
241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 337

in the GUI app ("Disks") it shows the wrong temperature too. hddtemp works well though:

~$ sudo hddtemp /dev/sda
/dev/sda: PNY EU SSD CS1311 240GB: 30°C

Phillip Susi (psusi) wrote :

It seems that this extra attribute needs to not be treated as temperature. I have patched libatasmart to do this. Please try the version in my ppa and see if it resolves it for you. You can add my ppa by running sudo add-apt-repository ppa:psusi/ppa, then an apt-get update and upgrade should install the new version of libatasmart.

no longer affects: gnome-disk-utility (Ubuntu)
affects: udisks2 (Ubuntu) → libatasmart (Ubuntu)
Changed in libatasmart (Ubuntu):
assignee: nobody → Phillip Susi (psusi)
status: New → In Progress
brian (blec78) wrote :

It works great now. Thank you :)

tags: added: xenial
Phillip Susi (psusi) wrote :

Martin, could you please review and apply the attached patch?

The attachment "0001-Fix-incorrect-temperature-reporting.patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Changed in libatasmart (Ubuntu):
importance: Undecided → Medium
Martin Pitt (pitti) wrote :

@Phillip, thanks for working on this! https://en.wikipedia.org/wiki/S.M.A.R.T. mentions both possibilities for attribute 231. Do you happen to have a more direct reference? Also, can you please create an upstream bug for this as well? Thanks!

Martin Pitt (pitti) on 2016-07-01
Changed in libatasmart (Ubuntu):
status: In Progress → Incomplete
Phillip Susi (psusi) wrote :

Sorry for the late reply Martin... my bugs mailbox has gotten quite backed up. I emailed smartmontols-devel about the issue and their response was:

There likely was some historic HDD which used attribute 231 for temperature (same for 9 and 220).

And yes, there are devices which report temperature in more than one attribute.

They went on to say they would probably change smartmontools to prefer the more common attribute and I thought that libatasmart should follow suit.

Will create the upstream bug report, though the last one or two I have created with patches attached have gone ignored there for years so...

Phillip Susi (psusi) on 2016-08-22
Changed in libatasmart (Ubuntu):
status: Incomplete → In Progress
Thomas Mayer (thomas303) wrote :
Download full text (3.8 KiB)

Confirming this issue:

Using Ubuntu 16.04, Kernel 4.4.0-57-generic wrong temperatures also make it into syslog:

Jan 2 20:22:27 server smartd[876]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 110 to 112
Jan 2 20:22:27 server smartd[876]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 111 to 113

Not only is that wrong, it also tells admins that there was need to take action. Eventually, it makes it into some monitoring tool, which thereby abstracts the RAW_VALUE even further.

Note that in reality my drives have <40°C, just by reading SMART's RAW_VALUE (and touching the drive with my finger).

Also note that the device model is "In smartctl database", according to smartctl's output. That said, I'd expect smartctl to somehow "know" all variables of my drive, including their meaning.

smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-57-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital AV-GP (AF)
Device Model: WDC WD30EURS-63SPKY0
[...]
Firmware Version: 80.00A80
User Capacity: 3.000.592.982.016 bytes [3,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Jan 2 23:27:25 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
  3 Spin_Up_Time 0x0027 177 176 021 Pre-fail Always - 6133
  4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 98
  5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
  7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
  9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 20909
 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 98
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 55
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 42
194 Temperature_Celsius 0x0022 112 101 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_C...

Read more...

Thomas Mayer (thomas303) wrote :

I think my case is different: It's not about variable 231/duplicate entries for temperature.

Instead, smartctl does something with the temperature's RAW_VALUE and thereby falsifies it.

For my case, I filed a new bug report #1653560, respectively.

Yury (drvlas) wrote :

Hi,

I'm in too. But my case is a little bit different and I don't see how can I apply any patch.
I try to install Linux Mint 18.1 with my brand new Kingston 256GB SSD. When I check the SSD with Disks it's OK, but the utility shows 100 Celsius temperature.
Then I begin the installation, choose partitions and try to move on. The Installer crashes with Error 5.
Is it possible to apply the patch in such a case? How am I to do it? I use USB live cd.

Yury (drvlas) wrote :

Sorry guys,

My case is different indeed. But in a sense it may be caused by other error.
I've made integrity check of the USB stick with a system - and it showed an error in one file.
After rewriting the system on a USD everything became fine.

Phillip Susi (psusi) wrote :

Yes, that is unrelated to this bug, which is simply a display issue.

Changed in libatasmart (Ubuntu Xenial):
importance: Undecided → Medium
Changed in libatasmart (Ubuntu Trusty):
importance: Undecided → Medium
Changed in libatasmart (Ubuntu Yakkety):
importance: Undecided → Medium
Łukasz Zemczak (sil2100) wrote :

I could sponsor this change for you as it looks sane in this case, but I'm a bit concerned that it's not the right way to go. I think the right way is per what smartmontols-devel reply was, i.e. that we should actually, in case of multiple temperature readings, should prefer the one that's more common - and in that case treat the other attribute as another value. I'm simply worried that this change could affect some of those older HDDs that actually used 231 only to denote temperature. Do we know if those won't get affected?

Phillip Susi (psusi) wrote :

I'm not sure that there are any drives that used only 231. This whole thing seems to be very black magic / ad hoc.

It might be worth checking how the latest upstream release of smartmontools handles this.

Robie Basak (racb) wrote :

I have similar feelings to sil2100. I don't think it makes sense for Ubuntu to do something special here. Until someone figures out what upstream is doing I don't think it's appropriate to add a delta to Ubuntu for this. So unsubscribing ~ubuntu-sponsors for now. Please resubscribe ~ubuntu-sponsors when the upstream situation is determined.

I've added support for SSD Life Left via the quirks infrastructure. The two devices I have access to, that abuse attribute 231, store the value differently, so I had to use separate quirks.

The attached patch adds support for Kingston A400 and V300 series SSDs. Side-effect, correct temperature reporting!

Let me know if the diff needs any tweaks to make it more, Ubuntu-y.

Sorry, I missed Robie's comment yesterday. Upstream seems inactive/dead. Nothing from Lennart on https://bugs.freedesktop.org/buglist.cgi?product=libatasmart since 2011. No commits to http://git.0pointer.net/libatasmart.git since 2012. I didn't see any indication the development had been picked up anywhere else.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libatasmart (Ubuntu Trusty):
status: New → Confirmed
Changed in libatasmart (Ubuntu Xenial):
status: New → Confirmed
Changed in libatasmart (Ubuntu Yakkety):
status: New → Confirmed
Natetronn (natetronn) wrote :

Not sure I should chime in, since it's already been confirmed but, I'm seeing this as well.

Says it's 100c in Disks and in Freon too.

$ sudo hddtemp /dev/sdb
/dev/sdb: KINGSTON SA400S37240G: 26°C

I'm a bit too tired at this point to try and figure out how to address the issue. I'll read the thread again in the morning.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.