installation fails creating user on large multipath system - multipathd show command times out in curtin log with 163 paths
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Release Notes for Ubuntu |
Fix Released
|
Undecided
|
Unassigned | ||
The Ubuntu-power-systems project |
Fix Released
|
Medium
|
Canonical Foundations Team | ||
curtin |
Fix Released
|
Medium
|
Unassigned | ||
subiquity |
Fix Released
|
Undecided
|
Unassigned | ||
multipath-tools (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
While on the creating user screen the installation fails. There are no messages to the screen. The system is a power 9 ppc64le system with a large number of multipath disks. The installer is the latest pending focal, updated to 20.05.1.
Logs taken from the system are attached. This bug follows LP 1873728, but seems to be a different failure.
Related branches
- Server Team CI bot: Approve (continuous-integration)
- curtin developers: Pending requested
-
Diff: 214 lines (+69/-59)2 files modifiedapicurtin/commands/block_meta.py (+30/-34)
tests/unittests/test_commands_block_meta.py (+39/-25)
- Server Team CI bot: Approve (continuous-integration)
- Ryan Harper: Approve
-
Diff: 275 lines (+77/-86)4 files modifiedapicurtin/block/multipath.py (+3/-12)
curtin/commands/block_meta.py (+30/-34)
tests/unittests/test_block_multipath.py (+5/-15)
tests/unittests/test_commands_block_meta.py (+39/-25)
- Chad Smith: Approve
- Server Team CI bot: Approve (continuous-integration)
-
Diff: 45 lines (+16/-1)3 files modifiedapicurtin/block/multipath.py (+1/-1)
tests/data/multipath-nvme.txt (+1/-0)
tests/unittests/test_block_multipath.py (+14/-0)

bugproxy (bugproxy) wrote : tarball of log files from installer | #1 |
tags: | added: architecture-ppc64le bugnameltc-185835 severity-medium targetmilestone-inin20041 |
Changed in ubuntu: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
affects: | ubuntu → debian-installer (Ubuntu) |
no longer affects: | debian-installer (Ubuntu) |
Changed in ubuntu-power-systems: | |
importance: | Undecided → Medium |
assignee: | nobody → Canonical Foundations Team (canonical-foundations) |
tags: | added: installer |

Dimitri John Ledkov (xnox) wrote : Re: installation fails creating user on large multipath system | #2 |
Command: ['multipathd', 'show', 'maps', 'raw', 'format', "name=%n multipath='%w' sysfs='%d' paths='%N'"]
Exit code: 1
Reason: -
Stdout: timeout receiving packet
Stderr: ''
Unexpected error while running command.
Command: ['multipathd', 'show', 'maps', 'raw', 'format', "name=%n multipath='%w' sysfs='%d' paths='%N'"]
Exit code: 1
Reason: -
Stdout: timeout receiving packet
Stderr: ''
curtin: Installation failed with exception: Unexpected error while running command.

Dimitri John Ledkov (xnox) wrote : | #3 |
Where large is:
In [11]: probedata=
In [12]: len(probedata[
Out[12]: 163
So just 163 paths.
@IBM how many total paths do you actually expect on that system?
summary: |
- installation fails creating user on large multipath system + installation fails creating user on large multipath system - multipathd + show command times out in curtin log with 163 paths |

Frank Heimes (fheimes) wrote : | #4 |
I guess it would also be interesting to run probert manually and see how long it takes:
time probert
respectively:
time probert --storage

Dimitri John Ledkov (xnox) wrote : | #5 |
2020-05-11 05:32:39,767 DEBUG root:39 start: subiquity/
2020-05-11 05:32:52,099 DEBUG root:39 finish: subiquity/
probably 14 seconds or so

Dimitri John Ledkov (xnox) wrote : | #6 |
From multipath.conf.5 manpage:
uxsock_timeout
CLI receive timeout in milliseconds. For larger systems CLI
commands might timeout before the multipathd lock is released and
the CLI command can be pro‐ cessed. This will result in errors
like "timeout receiving packet" to be re‐ turned from CLI
commands. In these cases it is recommended to increase the CLI
timeout to avoid those issues.
The default is: 1000

Dan Watkins (oddbloke) wrote : | #7 |
It sounds like we need to bump this timeout in curtin. My assumption is that it will have no effect on multipath commands unless they were previously timing out, so I assume that we can set this globally.
Changed in curtin: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |

bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla | #8 |
------- Comment From <email address hidden> 2020-05-18 02:06 EDT-------
Sorry for the delay, from time probert --storage
real 0m16.505s
user 0m2.796s
sys 0m2.169s

Michael Hudson-Doyle (mwhudson) wrote : | #9 |
I guess we need to up the limit for probing too -- currently we time it out after 15 seconds.

Dimitri John Ledkov (xnox) wrote : | #10 |
> So just 163 paths.
> @IBM how many total paths do you actually expect on that system?
Note that 163 is a prime number, which seems like an unusual setup. I expected for most devices to have the same number of paths, or like to have a predictable divisible number by 2/3/4 or some such.
Marking this ticket incomplete, and it will self close, unless you provide us with more information.
Changed in subiquity: | |
status: | New → Incomplete |
Changed in curtin: | |
status: | Triaged → Incomplete |
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |

Thierry FAUCK (thierry-j) wrote : | #11 |
I could access a system with a lot of I/O and multipath disks, and I could choose a multipath disk and say continue but when installer asks for user name system crashes and when to send report to Canonical (what I did once).
To be complete I also had errors on nve disk but I could definitely check that the disk is broken and it is not related to our issue.
For info, I see in full report there are device P: /devices/
In that report searching for multipath I also found :
[ 1285.250755] device-mapper: table: 253:72: multipath: error getting device
[ 1285.250791] device-mapper: ioctl: error adding target to table
[ 1285.263657] device-mapper: table: 253:72: multipath: error getting device
[ 1285.263674] device-mapper: ioctl: error adding target to table
[ 1286.359539] device-mapper: table: 253:80: multipath: error getting device
[ 1286.359569] device-mapper: ioctl: error adding target to table
[ 1287.327763] device-mapper: table: 253:87: multipath: error getting device
[ 1287.327793] device-mapper: ioctl: error adding target to table
[ 1287.341100] device-mapper: table: 253:87: multipath: error getting device
[ 1287.341120] device-mapper: ioctl: error adding target to table
[ 1287.350460] device-mapper: table: 253:87: multipath: error getting device
[ 1287.350481] device-mapper: ioctl: error adding target to table
File "/snap/
mp_dict = util.load_
File "/snap/
key, value = line.split("=", 1)
ValueError: not enough values to unpack (expected 2, got 1)
UdevDb:

Ryan Harper (raharper) wrote : | #12 |
Hi, can you provide the output of the following commands and attach to the bug?
multipathd show maps raw format "name=%n multipath='%w' sysfs='%d' paths='%N'"
multipathd show patsh raw format "device='%d' serial='%z' multipath='%m' host_wwpn='%N' target_wwnn='%n' host_wwpn='%R' target_wwpn='%r' host_adapter='%a'"

Thierry FAUCK (thierry-j) wrote : | #13 |
boston113 has 5 disks with 4 paths each. That's pretty much the smallest FC SAN we'd ever set up.

Thierry FAUCK (thierry-j) wrote : | #14 |
There are a lot of SCSI disks as far as I understand ( 68), but they are not mulitpath capable
Here are the commands I am issuing to get the installer:
wget http://
mkdir iso
mount -o loop -t iso9660 groovy-
kexec -l ./iso/casper/
kexec -e
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
[ 98.496213] kexec_core: Starting new kernel
[257090.
[257093.
[ 5.503384] blk_update_request: protection error, dev nvme0c0n1, sector 3124995968 op 0x0:(READ) flags 0x4090700 phys_seg 1 prio class 0
[ 5.504824] blk_update_request: protection error, dev nvme0c0n1, sector 3124995968 op 0x0:(READ) flags 0x4010000 phys_seg 1 prio class 0
[ 5.504876] Buffer I/O error on dev nvme0n1p2, logical block 24413887, async page read
[ 5.506553] blk_update_request: protection error, dev nvme0c0n1, sector 3124995968 op 0x0:(READ) flags 0x4090700 phys_seg 1 prio class 0
[ 5.508097] blk_update_request: protection error, dev nvme0c0n1, sector 3124995968 op 0x0:(READ) flags 0x4010000 phys_seg 1 prio class 0
[ 5.508141] Buffer I/O error on dev nvme0n1p2, logical block 24413887, async page read
------------------- these messages are related to a disk used as secure boot -------
multiple messages
mount: mounting /dev/sdaz on /cdrom failed: Device or resource busy
....
Two methods available for IP configuration:
* static: for static IP configuration
* dhcp: for automatic IP configuration
static dhcp (default 'dhcp'):
vlan id (optional):
http://
url:
.....
groovy-
groovy-
groovy-
[ 702.806817] blk_update_request: protection error, dev nvme0c0n1, sector 3124995840 op 0x0:(READ) flags 0x4010000 phys_seg 1 prio class 0
[ 702.808426] blk_update_request: protection error, dev nvme0c0n1, sector 3124995840 op 0x0:(READ) flags 0x4010000 phys_seg 1 prio class 0
Connecting to plymouth: Connection refused
[FAILED] Failed to start LVM event activation on device 129:99.
[FAILED] Failed to start LVM event activation on device 129:115.
[FAILED] Failed to start LVM event activation on device 130:19.
[ 727.083442] blk_update_request: protection error, dev nvme0c0n1, sector 3124995840 op 0x0:(READ) flags 0x4010000 phys_seg 1 prio class 0
[ 727.085050] blk_update_request: protection error, dev nvme0c0n1, sector 3124995840 op 0x0:(READ) flags 0x4010000 phys_seg 1 prio class 0
[ 734.032080] device-mapper: table: 253:74: multipath: error getting device
[ 734.042066] device-mapper: table: 253:74: multipath: erro...

Thierry FAUCK (thierry-j) wrote : | #15 |
WHen I select a multipath device I get
└──────
no way to go to shell and get the log nad crash info - it has been sent to canonical !!!
│ [Close report ] │ │ │ └──────
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets
python error .....
and comes back to
┌──────

Thierry FAUCK (thierry-j) wrote : | #16 |
I also checked that with a SCSI disk (not multipath) even if there are 88 SCSI disks on that system, installation is proceding.

Ryan Harper (raharper) wrote : | #17 |
The installer is multipath aware; so in the case that there are not any multipath devices in the system, then the output from those commands will not include any maps or paths to multipath devices.
However, the error you posted suggested that *something* was returned from those commands and we failed to parse that; thus we're keenly interested in running those commands exactly to capture the output.
I believe you can use F2 to enter debug shell on the installer, from there you can run those commands and hopefully capture the output in some way.

Thierry FAUCK (thierry-j) wrote : | #18 |
multipathd show maps raw format "name=%n multipath='%w' s
name=mpatha multipath=
name=mpathb multipath=
name=mpathc multipath=
name=mpathd multipath=
name=mpathe multipath=
name=mpathf multipath=
name=mpathg multipath=
name=mpathh multipath=
name=mpathi multipath=
name=mpathj multipath=
name=mpathk multipath=
name=mpathl multipath=
name=mpathm multipath=
name=mpathn multipath=
name=mpatho multipath=
name=mpathp multipath=
name=mpathq multipath=
name=mpathr multipath=
name=mpaths multipath=
name=mpatht multipath=
name=mpathu multipath=
name=mpathv multipath=
name=mpathw multipath=
name=mpathx multipath=
name=mpathy multipath=
name=mpathz multipath=
name=mpathaa multipath=
name=mpathab multipath=
name=mpathac multipath=
name=mpathad multipath=
name=mpathae multipath=
name=mpathaf multipath=
name=mpathag multipath=
name=mpathah multipath=
name=mpathai multipath=
name=mpathaj multipath=
name=mpathak multipath=
name=mpathal multipath=
name=mpatham multipath=
name=mpathan multipath=
name=mpathao multipath=
name=mpathap multipath=
name=mpathaq multipath=
name=mpathar multipath=
name=mpathas multipath=
name=mpathat multipath=
name=m...

Thierry FAUCK (thierry-j) wrote : | #19 |

Thierry FAUCK (thierry-j) wrote : | #20 |

Thierry FAUCK (thierry-j) wrote : | #21 |

Ryan Harper (raharper) wrote : | #22 |
> name=nqn.
That's the culprit.
NVME multipath ...
@Dimitri
Was there another bug related to blacklisting NVME multipath by default?
Why does the nvme multipath 'name' field include colon separated fields?
Changed in curtin: | |
status: | Incomplete → In Progress |
Changed in ubuntu-power-systems: | |
status: | Incomplete → In Progress |

Server Team CI bot (server-team-bot) wrote : | #23 |
This bug is fixed with commit 48703979 to curtin on branch master.
To view that commit see the following URL:
https:/
Changed in curtin: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |

Thierry FAUCK (thierry-j) wrote : | #24 |
I spent some time trying to test that small patch and the result is not good as far as I can tell
- I copy /snap/subiquity
======
SSH Setup [tar: Removing leading `/' from member names==
lqqqqqqqqqqqqq
Am I doing something bad (for example should I need another ISO, ....) ? Any other way to test the installer ?

Patricia Domingues (patriciasd) wrote : | #25 |
Thierry, the curtin fix is not yet available on the server daily-live images.
I'd say copying over curtin manually is not preferable.
I think it would be worth retesting it, just if you have a smaller multipath system.
If you don't, we could wait until the fix is published to Ubuntu Server images.

Andrew Cloke (andrew-cloke) wrote : | #26 |
@thierry-j, a fix for this should now be available from the subiquity edge channel.
There are instructions on how to do an edge channel subiquity installation here: https:/
Would you be able to try that out on your "massively multipathed" system and let us know the result?
Many thanks.
Changed in subiquity: | |
status: | Incomplete → In Progress |

Thierry FAUCK (thierry-j) wrote : Re: [Bug 1878041] Re: installation fails creating user on large multipath system - multipathd show command times out in curtin log with 163 paths | #27 |
On 8/25/2020 7:48 PM, Andrew Cloke wrote:
> @thierry-j, a fix for this should now be available from the subiquity
> edge channel.
>
> There are instructions on how to do an edge channel subiquity
> installation here: https:/
> latest-
> difficulties.
>
> Would you be able to try that out on your "massively multipathed" system
> and let us know the result?
>
> Many thanks.
>
>
> ** Changed in: subiquity
> Status: Incomplete => In Progress
>
Installer uprade fails looping with messages:
-- File
"/snap/
line 274, in run super().run()
File
"/snap/
line 488, in run self.urwid_
File
"/snap/
line 286, in run self._run()
File
"/snap/
line 384, in _run self.event_
File
"/snap/
line 1484, in run reraise(*exc_info)
File
"/snap/
line 58, in reraise raise value
File "/snap/
145, in _run
self.
File
"/snap/
line 366, in select_
self.
File
"/snap/
line 349, in start_loading_
os.
AttributeError: 'ErrorReporter' object has no attribute 'crash_directory'
So I couldn't go further and test
Regards
_______
thf - Thierry Fauck - <email address hidden>>
/pubkey: 4096R/FCC181CE/
/fingerprint: 5CCF 6B82 DE4E E72A A40B B63E A153 BF4F FCC1 81CE/

Patricia Domingues (patriciasd) wrote : | #28 |
Thierry, thanks for replying it.
I've added a more detailed instructions on how to install the latest Subiquity version:
https:/
I was able to install it:
```
latest/edge: 20.07.1+
installed: 20.07.1+
```
Could you please try following the link above?
Let me know if you have any questions.

Michael Hudson-Doyle (mwhudson) wrote : | #29 |
The AttributeError: 'ErrorReporter' object has no attribute 'crash_directory' crash should be fixed now in edge btw.

Thierry FAUCK (thierry-j) wrote : | #30 |
- 1599584416.180427551.install_fail.crash Edit (5.3 MiB, application/octet-stream)
Update of installer is now successful, but the system still crash when selecting
when I press enter to confirm 'file system destruction/
' after defining file systems I get an error and when I say
'send to canonical' I get messages

Thierry FAUCK (thierry-j) wrote : | #31 |

Thierry FAUCK (thierry-j) wrote : | #32 |

Thierry FAUCK (thierry-j) wrote : | #33 |

Thierry FAUCK (thierry-j) wrote : | #34 |

Thierry FAUCK (thierry-j) wrote : | #35 |

Michael Hudson-Doyle (mwhudson) wrote : | #36 |
FWIW the failure is now this:
curtin.
Command: ['multipathd', 'show', 'maps', 'raw', 'format', "name=%n multipath='%w' sysfs='%d' paths='%N'"]
Exit code: 1
Reason: -
Stdout: timeout receiving packet
Stderr: ''
So it sounds like the timeout mentioned in comment #7 still needs to be raised?
Changed in ubuntu-power-systems: | |
status: | Fix Committed → In Progress |
Changed in curtin: | |
status: | Fix Committed → In Progress |

Ryan Harper (raharper) wrote : | #37 |
For the timeout, I'm wondering why curtin should be setting this value? It's not accessible via the multipathd command line, rather it's a multipath.conf setting with a default.
I don't think it makes sense for curtin to muck with multipath.conf temporarily when running one or more of the multipathd commands.
Should multipath packaging provide a different default value? If not in the package, where and why?
Changed in curtin: | |
status: | In Progress → Incomplete |

Dimitri John Ledkov (xnox) wrote : | #38 |
I do think multipath should be bumped.

Frank Heimes (fheimes) wrote : | #39 |
For 20.10 a 'known issue' entry was added to the release notes:
"LP :#1878041 - In case of multipath systems with huge amounts of paths, the installer may hit a timeout."
Changed in ubuntu-release-notes: | |
status: | New → Fix Released |
tags: | added: fr-885 |

Michael Hudson-Doyle (mwhudson) wrote : | #40 |
Looking at this a bit more carefully, I think we can remove the call that is failing and replace it with some udev checking instead. I'm actually a bit surprised it fails -- a very similar call was made by probert earlier in the run and that seems to have worked fine (although I can't . I wonder if it is conflicting over the multipathd lock with multipath calls made by udev in response to device changes or something like that. Do you still have access to this system for testing? I can probably make a snap for testing with tomorrow.

Thierry FAUCK (thierry-j) wrote : | #41 |
Tried to boot hirsute dated of 01/14 - it failed to initramfs
Booted with 20.10 suiquity-

Michael Hudson-Doyle (mwhudson) wrote : | #42 |
Hi, can you try again with suiquity-
The initramfs failure sounds worrying, can you file a separate bug about that?

bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla | #43 |
------- Comment From <email address hidden> 2021-01-21 04:14 EDT-------
still crash during install after choice of disk - as asked here is the content of /var/crash

bugproxy (bugproxy) wrote : crash from install of 01/21 | #44 |
- crash from install of 01/21 Edit (1020.2 KiB, application/x-compressed)
------- Comment (attachment only) From <email address hidden> 2021-01-21 04:16 EDT-------

bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla | #45 |
------- Comment From <email address hidden> 2021-01-21 04:49 EDT-------
By the way, I checked with hirsute and system now boots and stops at same level !!!

Michael Hudson-Doyle (mwhudson) wrote : | #46 |
Thanks for that. The failure is different now (which suggests my changes have helped):
Running command ['sgdisk', '--new', '2:18432:
An error occured handling 'partition-1': ProcessExecutio
Command: ['sgdisk', '--new', '2:18432:
Exit code: 4
Reason: -
Stdout: ''
Stderr: Could not create partition 2 from 18432 to 2424585727
Could not change partition 2's type code to 8300!
Error encountered; not saving changes.
I think curtin has some bugs with multipath disks where the block size is 4096.
The end of the attempted partition in bytes is 2424585727*4096 = 9931103137792, larger than the reported size of the disk, 9931038130176.
The first call to sgdisk was
['sgdisk', '--new', '1:256:2303', '--typecode=
So simply based on this, I would expect the start sector of the second partition to be 2304, not 18432. Chasing code a bit, I think the function calc_dm_

Michael Hudson-Doyle (mwhudson) wrote : | #47 |
Actually lets try something simple, can you try suiquity-
Also I'm glad to hear today's hirsute daily works!

Server Team CI bot (server-team-bot) wrote : | #48 |
This bug is fixed with commit ea15dfa9 to curtin on branch master.
To view that commit see the following URL:
https:/
Changed in curtin: | |
status: | Incomplete → Fix Committed |

bugproxy (bugproxy) wrote : crash of 01/22 | #49 |
- crash of 01/22 Edit (5.8 MiB, text/plain)
------- Comment (attachment only) From <email address hidden> 2021-01-22 05:07 EDT-------

bugproxy (bugproxy) wrote : crash itself | #50 |
- crash itself Edit (5.8 MiB, text/plain)
------- Comment (attachment only) From <email address hidden> 2021-01-22 05:09 EDT-------

Thierry FAUCK (thierry-j) wrote : | #51 |
- 1611649013.477067232.install_fail.meta Edit (48 bytes, text/plain)
When I reach the FILE SYSTEM SUMMARY panel and say 'continue' I get the 'an error occured during installation' message within a minute.
I was running the parameters : snap refresh subiquity --channel=
I didn't apply to update to the new version of the installer as I believe it would override the previous parameter. I wait for 5mn at FILE SYSTEM SUMMARY and nothing happen, so it sound slike it is when I validate the problem occures. Attached you'll find content of crash.

Thierry FAUCK (thierry-j) wrote : | #52 |

Michael Hudson-Doyle (mwhudson) wrote : | #53 |
Hm that's failing in the same sort of way. I've just pushed a new version (0+git.5ef5b352) to edge/mwhudson-hack, can you try that? Hopefully that will work better, or at least have more informative logs.

Thierry FAUCK (thierry-j) wrote : | #54 |
- 1611663228.337142229.install_fail.crash Edit (6.1 MiB, application/octet-stream)
unfortunately same behaviour

Thierry FAUCK (thierry-j) wrote : | #55 |

bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla | #56 |
------- Comment From <email address hidden> 2021-01-26 14:33 EDT-------
Log contains:
get_blockdev_
Running command ['fdasd', '--table', '/dev/dm-36'] with allowed return codes [0] (capture=False)
Checking if /dev/dm-36 is a swap device
Found swap magic: b'\x00\
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/dm-36'] with allowed return codes [0] (capture=True)
/dev/dm-36 is multipath device? False
Running command ['multipath', '-c', '/dev/dm-36'] with allowed return codes [0] (capture=True)
/dev/dm-36 is multipath device member? False
wiping superblock on /dev/dm-36
wiping /dev/dm-36 attempt 1/4
wiping 1M on /dev/dm-36 at offsets [0, -1048576]
successfully wiped device /dev/dm-36 on attempt 1/4
shutdown running on holder type: 'disk' syspath: '/sys/class/
Running command ['lsblk', '--noheadings', '--bytes', '--pairs', '--output=
get_blockdev_
{
"dm-34": {
"ALIGNMENT": "0",
"DISC-ALN": "0",
"DISC-GRAN": "0",
"DISC-MAX": "0",
"DISC-ZERO": "0",
"FSTYPE": "",
"GROUP": "disk",
"KNAME": "dm-34",
"LABEL": "",
"LOG-SEC": "4096",
"MAJ:MIN": "253:34",
"MIN-IO": "4096",
"MODE": "brw-rw----",
"MODEL": "",
"MOUNTPOINT": "",
"NAME": "mpathr",
"OPT-IO": "0",
"OWNER": "root",
"PHY-SEC": "4096",
"RM": "0",
"RO": "0",
"ROTA": "1",
"RQ-SIZE": "256",
"SIZE": "9931038130176",
"STATE": "running",
"TYPE": "mpath",
"UUID": "",
"device_path": "/dev/dm-34"
},
"dm-35": {
"ALIGNMENT": "0",
"DISC-ALN": "0",
"DISC-GRAN": "0",
"DISC-MAX": "0",
"DISC-ZERO": "0",
"FSTYPE": "",
"GROUP": "disk",
"KNAME": "dm-35",
"LABEL": "",
"LOG-SEC": "4096",
"MAJ:MIN": "253:35",
"MIN-IO": "4096",
"MODE": "brw-rw----",
"MODEL": "",
"MOUNTPOINT": "",
"NAME": "mpathr-part1",
"OPT-IO": "0",
"OWNER": "root",
"PHY-SEC": "4096",
"RM": "0",
"RO": "0",
"ROTA": "1",
"RQ-SIZE": "128",
"SIZE": "8388608",
"STATE": "running",
"TYPE": "part",
"UUID": "",
"device_path": "/dev/dm-35"
},
"dm-36": {
"ALIGNMENT": "0",
"DISC-ALN": "0",
"DISC-GRAN": "0",
"DISC-MAX": "0",
"DISC-ZERO": "0",
"FSTYPE": "",
"GROUP": "disk",
"KNAME": "dm-36",
"LABEL": "",
"LOG-SEC": "4096",
"MAJ:MIN": "253:36",
"MIN-IO": "4096",
"MODE": "brw-rw----",
"MODEL": "",
"MOUNTPOINT": "",
"NAME": "mpathr-part2",
"OPT-IO": "0",
"OWNER": "root",
"PHY-SEC": "4096",
"RM": "0",
"RO": "0",
"ROTA": "1",
"RQ-SIZE": "128",
"SIZE": "1073741824",
"STATE": "running",
"TYPE": "part",
"UUID": "",
"device_path": "/dev/dm-36"
}
}
get_blockdev_
Checking if /dev/dm-34 is a swap device
Found swap magic: b'\x00\
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/dm-34'] with allowed return codes [0] (capture=True)
/dev/dm-34 is multipath device? True
Running command ['multipathd', 'show', 'maps', 'raw', 'format', "name='%n' multipath='%w' sysfs='%d' paths='%N'"] with allowed return codes [0] (capture=True...

Michael Hudson-Doyle (mwhudson) wrote : | #57 |
It looks like the snap didn't get updated to the version from edge/mwhudson-hack for that run?

Patricia Domingues (patriciasd) wrote : | #58 |
Hi Thierry thanks again for testing this,
From your crash files, I noticed you're using an image built on October -2020-10-22 14:33 (url=http://
` 2021-01-26 08:10:48,735 INFO subiquity:135 Starting Subiquity revision 2101 `
```
SnapRevision: 2101
SnapUpdated: False
SnapVersion: 20.09.1+
SourcePackage: subiquity
```
so, I'm going to ask you for trying again following these steps, please:
https:/
with this latest 20.10 URL: http://
For my own reference I've tested it on P9 witherspoon (bobone).
Let me know if you have any questions.

bugproxy (bugproxy) wrote : | #59 |
------- Comment From <email address hidden> 2021-01-26 16:10 EDT-------
Bu in the crash file we get the info: Kernel command line: ip=9.40.
or in the meta file:
[ 1.572051] This architecture does not have kernel memory protection.
[ 1.572057] Run /init as init process
[ 1.572059] with arguments:
[ 1.572060] /init
[ 1.572061] snap
[ 1.572062] refresh
[ 1.572063] subiquity
[ 1.572064] ---
[ 1.572065] with environment:
[ 1.572066] HOME=/
[ 1.572067] TERM=linux
[ 1.572068] ip=9.40.
[ 1.572069] url=http://
[ 1.572070] --channel=
Did I mistyped the parameters ?

Michael Hudson-Doyle (mwhudson) wrote : | #60 |
Yes, it should be subiquity-

Michael Hudson-Doyle (mwhudson) wrote : Fixed in curtin version 21.2. | #61 |
This bug is believed to be fixed in curtin in version 21.2. If this is still a problem for you, please make a comment and set the state back to New
Thank you.
Changed in curtin: | |
status: | Fix Committed → Fix Released |

bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla | #62 |
------- Comment From <email address hidden> 2021-01-27 02:58 EDT-------
Retried with parameter Kernel command line: ip=9.40.
I saw the subiquity build related to the mwhudson-hack , so I expect this time it is good
same behaviour for failure

bugproxy (bugproxy) wrote : crash | #63 |

bugproxy (bugproxy) wrote : meta | #64 |

Patricia Domingues (patriciasd) wrote : | #65 |
Thanks again Thierry, there's a new subiquity version available.
Please, could you try with Hirsute/21.04 -
This URL:
`http://

bugproxy (bugproxy) wrote : crash for hirsute | #66 |
- crash for hirsute Edit (7.1 MiB, text/plain)
------- Comment on attachment From <email address hidden> 2021-01-28 04:40 EDT-------
today's crash

bugproxy (bugproxy) wrote : meta | #67 |

Dimitri John Ledkov (xnox) wrote : | #68 |
The crash file is interesting, it seems that we have many many udev timeouts getting stuck for more than 120s.
I'm now not sure if these are timeouts in lvm calls of udev. or somewhere else.
It would be interesting to see if one can complete the install without lvm, with just ext4.
Also not sure how to make this more resilient. Cause clearly things are not broken, it's just there is a lot of things to do. Hence timeouts are getting in the way, rather than being helpful.

bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla | #69 |
------- Comment From <email address hidden> 2021-01-28 12:03 EDT-------
Comments and time out are related to NVME devices
[ 130.332163] blk_update_request: protection error, dev nvme0c0n1, sector 3124995968 op 0x0:(READ) flags 0x2090700 phys_seg 1 prio class 0
[ 130.333596] blk_update_request: protection error, dev nvme0c0n1, sector 3124995968 op 0x0:(READ) flags 0x2010000 phys_seg 1 prio class 0
[ 130.333639] Buffer I/O error on dev nvme0n1p2, logical block 24413887, async page read
Apparently not using LVM seems to work .
May be the problem was to use a device with already LVM in there !

Dimitri John Ledkov (xnox) wrote : | #70 |
thank you for that.
i am tempted to close out the multipath package task; and open an lvm package task.
i'm not sure if we allow reusing existing lvms, if we do, that would be also an intersting datapoint test case.
normally lvm commands do create udev cookies, and trigger udev to settle with a timeout. I do wonder, if curtin should be doing something more clever here. Like lookaside at all the prober data and try to disable that behaviour and instead do udev triggers / waiting by itself possibly with larger timeouts.
or if we can somehow "speed things up" by deactivating all the multipaths and all the "unused" drives not selected for the installation.

Michael Hudson-Doyle (mwhudson) wrote : | #71 |
Oh so the last crash with LVM was this silly thing: https:/
I think given that this bug report has fixed two real problems and has resulted in a successful install without LVM, I'm going to close it and any new problems can be handed in a new report. Feel free to reopen if you disagree :)
Also, thanks for all your testing with this system, it's definitely made the installer better!
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Released |
Changed in subiquity: | |
status: | In Progress → Fix Released |
Changed in multipath-tools (Ubuntu): | |
status: | New → Invalid |

Frank Heimes (fheimes) wrote : | #72 |
Agreeing to Michaels suggestion.
Marking LP 1905412 as affecting the Power project.
Default Comment by Bridge