Host-based backup of Microsoft Hyper-V VMs.
ChristineAlexa
Enthusiast
Posts: 44
Liked: 6 times
Joined: Aug 26, 2019 7:04 am
Full Name: Christine Boersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by ChristineAlexa »

JRRW
I hear ya, hence I took the "safe" option (disable hyper-threading), and added a 4th node to the cluster to help make up for some of the lost cycles. All the equipment was/is whatever their top tier enterprise HCL compatibility matrix showed as approved.

And 3 years later, we are still running in this mode (HT disabled) and it runs reasonably well (Yes I beat the holy heck out of it before putting it to production after the initial issues, hence much of the testing posted in this thread early on)
tomrs
Lurker
Posts: 1
Liked: never
Joined: Aug 03, 2023 5:45 am
Full Name: TomZG
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by tomrs »

I have just come across these same issues. I assume there is still no fix as of yet?
steendp
Influencer
Posts: 11
Liked: 3 times
Joined: Jan 11, 2023 2:47 pm
Full Name: Steen Dalsgaard Pedersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by steendp »

I dont think so. We are still struggling with it, despite having an open case with MS.
We tried running without HT but that didn't solve it.

Next step for us is to evaluate another platform (hypervisor or storage) as I dont trust MS to fix it.
ncoker
Lurker
Posts: 1
Liked: never
Joined: Sep 04, 2023 6:07 am
Full Name: Norman Coker
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by ncoker »

Hi All,
here are our experience with that issue ! Really frustrating !
Important is the requirement
- BIOS Setting (C-States must be disabled -> Dell Performance profile (NOT with any extension like OS or watt controlled))
- BIOS settings are wrong if your CPU is running lower than the base speed !
Check here for Dell: https://www.dell.com/support/kbdoc/en-s ... -stack-hci
OS
- OS 2022 latest patch level
Backup Software
- Veeam Backup and Restore v11-v12 (latest patch level)
- Veritas Backup Exec v22.2
Hyper-V
- ReFS 64k cluster size
- Storage Spaces (OS 2022 Version) on a local Hyper-V Node
- Virtual Machines Configuration Version v10.0
- Virtual Machines all with Gen2 and VHDX
- VHDX files > 4TB & >30TB -> For me that looks to be the issue ! All Lab environments we was not able to reproduce as we do not have this kind of VHDX files !

So for us it is a OS2019/2022 issue as it happens with more than 1 Backup Brand ! I will not believe that 2 vendors do a wrong implementation !
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft » 2 people like this post

I will be posting an update about the RCT side of this ongoing issue in a few weeks time ... [Hopefully in 4 weeks time (subtle hint)]
Note that this is for the issue with .rct files. It is unclear if this will help with the other aspect, and the one that Live Migrating the VM to another host overcomes (where a VM unexpected gets into a degraded IO state for one or more of its data disks [vhdx's] and live migrating to a new host restores its IO performance)
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft » 5 people like this post

The long awaited 'RCT fix', for the RCT side of this ongoing issue, should be released in the October 2023 Windows update, ie tomorrow!
For WS2019 and WS2022
Please apply and test and let us know how it goes.
Note that this is for the issue with general VM io slow down with .rct files in use.

It is unclear if this will help with the other issue than many people are seeing -- the one that Live Migrating the VM to another host overcomes (where a VM unexpected gets into a degraded IO state for one or more of its data disks [vhdx's] and live migrating to a new host restores its IO performance).
Nick-SAC
Enthusiast
Posts: 75
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC »

Please confirm if this fix is for a general performance slowdown or what I described in the very first post in this thread; i.e.,
the Event ID: 9 Warnings where I/O Requests essentially Stall (Stop, Not Slow Down) for some 10 to 20 Seconds.

Thanks
rold
Service Provider
Posts: 11
Liked: 7 times
Joined: Sep 14, 2016 12:04 pm
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by rold » 3 people like this post

The update solved the write io performance issue!

Tested with an iometer, on an updated host 40-50k iops, on a non-updated host 4-5k
benthomas
Veeam Vanguard
Posts: 39
Liked: 11 times
Joined: Apr 22, 2013 2:29 am
Full Name: Ben Thomas
Location: New Zealand
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by benthomas »

This is excellent news @rold!

@stephc_msft, can we get a note in the October 2023 update release notes to say this is in there? I can't see any mention of RCT or VM perf
Ben Thomas | Solutions Advisor | Veeam Vanguard 2023 | VMCE2022 | Microsoft MVP 2018-2023 | BCThomas.com
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft » 1 person likes this post

The fix is for a general performance (write performance) slowdown when RCT in use
With regard to StorageVSP event 9's, that is a different peculiarity that still isnt fully understood ie how it sometimes claims an io write took >10 seconds when tests inside VM's only show normal but poor latency [eg 30mS in bad state, 2mS in good state]
for info: That event 9 was added in WS2019, and has a default threshold of 10 seconds. Exactly what and how it measures it (and if maybe it is getting 'confused'), is still being investigated.

Re any release note. Unfortunately it is not explicitly mentioned in the October release notes. I'll try to see if we can get anything publicly documented but I have little control or influence over that.
mkaec
Veteran
Posts: 462
Liked: 134 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkaec » 3 people like this post

stephc_msft wrote: Oct 09, 2023 10:15 pm The long awaited 'RCT fix', for the RCT side of this ongoing issue, should be released in the October 2023 Windows update, ie tomorrow!
Wow! I have been waiting 2 years for this post. I had given up hope. Never give up hope!
Nick-SAC
Enthusiast
Posts: 75
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC »

stephc_msft wrote: Oct 11, 2023 9:54 am With regard to StorageVSP event 9's, that is a different peculiarity that still isnt fully understood ie how it sometimes claims an io write took >10 seconds when tests inside VM's only show normal but poor latency [eg 30mS in bad state, 2mS in good state]
for info: That event 9 was added in WS2019, and has a default threshold of 10 seconds. Exactly what and how it measures it (and if maybe it is getting 'confused'), is still being investigated.
What we were seeing appeared to be an actual Stall of the I/O because concurrent with the Event 9 Warnings on the HV-Host, we would get Read/Write Delay Warnings/Errors with matching times & durations from the Exchange Server running on the VM.
stephc_msft wrote:Oct 11, 2023 10:11 am One scenario where the event 9 might occur and be genuine, is at the start of a backup
The backup creates an avhdx checkpoint
That avhdx disk starts small
If the VM is doing significant IO during the time of the backup, and while this tempoary checkpoint is in use, then its io writes are going into the avhdx file and the avhdx is extending as required.
This extending of the avhdx (during the early stages of the backup, when it is most likely to need extending) can cause a delay, and is the only scenario I personally have found that may cause an StorageVSP event 9 to occur.
The Event 9 Warnings & Delays we were seeing were more often than not also occurring at times when the Backups were not in progress.

Thanks,
Nick
ruddj
Novice
Posts: 5
Liked: 1 time
Joined: Mar 12, 2019 12:04 am
Full Name: James Rudd
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by ruddj » 1 person likes this post

stephc_msft wrote: Oct 09, 2023 10:15 pm The long awaited 'RCT fix', for the RCT side of this ongoing issue, should be released in the October 2023 Windows update, ie tomorrow!
For WS2019 and WS2022
Please apply and test and let us know how it goes.
Note that this is for the issue with general VM io slow down with .rct files in use.
After applying the October fix to our Server 2022 testing cluster, multiple VMs refused to start with 'Incorrect function.' errors.
Lots of other reports online with similar issues. Link 1 Link 2

The fix is to delete the .mrt and .rct files in same directory as the VHDs.

I am guessing this is related to the RCT fix possibly making the old files incompatible. Deleting the files will cause Veeam to have to do a more complete scan rather than just use deltas, but at least it lets the VM start up.
andrew.symons
Lurker
Posts: 2
Liked: never
Joined: Apr 22, 2022 12:53 pm
Full Name: Andrew Symons
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by andrew.symons »

Nick-SAC wrote: Oct 11, 2023 3:03 pm What we were seeing appeared to be an actual Stall of the I/O because concurrent with the Event 9 Warnings on the HV-Host, we would get Read/Write Delay Warnings/Errors with matching times & durations from the Exchange Server running on the VM.



The Event 9 Warnings & Delays we were seeing were more often than not also occurring at times when the Backups were not in progress.

Thanks,
Nick
What underlying storage do you have Nick?
Nick-SAC
Enthusiast
Posts: 75
Liked: 15 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by Nick-SAC »

Check out the very first post in this thread where I laid out all the specs on the first Server we were seeing this problem on. Since then we’ve encountered it on other boxes also and with varying Hardware... and then there have been others with the same Hardware where we haven’t seen the problem at all.

All in all it’s been maddeningly intermittent & inconsistent and hasn’t seemed to have any correlation with the Hardware or anything else for that matter.

Thanks,
Nick
andrew.symons
Lurker
Posts: 2
Liked: never
Joined: Apr 22, 2022 12:53 pm
Full Name: Andrew Symons
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by andrew.symons »

OK so local non-clustered SAS storage in terms of the VHD storage location.

Reason for asking is we saw a similar issue with CSV storage on Dell iSCSI SAN's - the issue was caused by an odd lost TCP packet scenario whereby we would see iSCSIPRT errors and the Event ID 9's.
We had to go to packet capture level to find the issue - in essence the array would tell the host that it had not received a TCP packet through a SYN resend - this was repeated in accordance with the standard TCP timeouts and eventually an unsolicited reset was sent to the hosts and the iSCSI session needed tearing down and restarting. This whole process took seconds and therefore had impact. On the systems in question we had MultiPath IO -> dual separated subnets/switches/NIC's/SAN Controllers -> at times we received the issue on multiple paths and therefore the system got very upset because it effectively lost IO in totality. Ultimately Dell asked us to change the Windows Host TCP Optimisations for the iSCSI NIC's to different settings from their Best Practice Guidance:
IPv4 Checksum Offload from "Disabled" to "Rx & Tx Enabled".
Large Send Offload V2 (IPv4) from "Disabled" to "Enabled".
TCP Checksum Offload (IPv4) from "Disabled" to "Rx & Tx Enabled"
UDP Checksum Offload (IPv4)from "Disabled" to "Rx & Tx Enabled"

This resolved some of the Event ID 9 issues but we did still see some during backups due to Storage Hardware Provider Snapshots initiated from Veeam/Windows in combination with RCT.

Realise this is not your use case but may help others if they find their way here and I have seen Dell iSCSI scenarios in this forum.

Also just as a note on the Dell Hardware (sure you have done this already) they regularly release HDD level firmware updates (both as part of RAID update sets and also individually) - these can often resolve underlying disk reset/stall issues which could also be a cause of what you are seeing and would not necessarily be reported anywhere.
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

ruddj wrote: Oct 17, 2023 1:54 am After applying the October fix to our Server 2022 testing cluster, multiple VMs refused to start with 'Incorrect function.' errors.
Yes myself and others have been made aware of those posts.
Your analysis sounds plausible (also the fact RCT is in use is recorded in the .vmcx config file)
There is no confirmation or analysis from the msft side yet, and I am awaiting access to some VM's and ALL their related files to try to understand it more.
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

I have a set of files (vhdx and .rct/.mrt from before the update) that show the issue, and am looking into it.
Also see the "Incorrect funtion" if try to locally mount the vhdx on a post 10B system.
See vhdmp event log event 24 indicating its having some issue with the .rct file.

Workaround for now is to delete the .rct and .mrt file, to allow the VM to start
And of course that will mean the next host level backup of the VM will be a full backup, but that is not too unreasonable.
joelg
Influencer
Posts: 11
Liked: 2 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg »

andrew.symons wrote: Oct 17, 2023 3:16 pm OK so local non-clustered SAS storage in terms of the VHD storage location.

Reason for asking is we saw a similar issue with CSV storage on Dell iSCSI SAN's - the issue was caused by an odd lost TCP packet scenario whereby we would see iSCSIPRT errors and the Event ID 9's.
We had to go to packet capture level to find the issue - in essence the array would tell the host that it had not received a TCP packet through a SYN resend - this was repeated in accordance with the standard TCP timeouts and eventually an unsolicited reset was sent to the hosts and the iSCSI session needed tearing down and restarting. This whole process took seconds and therefore had impact. On the systems in question we had MultiPath IO -> dual separated subnets/switches/NIC's/SAN Controllers -> at times we received the issue on multiple paths and therefore the system got very upset because it effectively lost IO in totality. Ultimately Dell asked us to change the Windows Host TCP Optimisations for the iSCSI NIC's to different settings from their Best Practice Guidance:
IPv4 Checksum Offload from "Disabled" to "Rx & Tx Enabled".
Large Send Offload V2 (IPv4) from "Disabled" to "Enabled".
TCP Checksum Offload (IPv4) from "Disabled" to "Rx & Tx Enabled"
UDP Checksum Offload (IPv4)from "Disabled" to "Rx & Tx Enabled"

This resolved some of the Event ID 9 issues but we did still see some during backups due to Storage Hardware Provider Snapshots initiated from Veeam/Windows in combination with RCT.

Realise this is not your use case but may help others if they find their way here and I have seen Dell iSCSI scenarios in this forum.

Also just as a note on the Dell Hardware (sure you have done this already) they regularly release HDD level firmware updates (both as part of RAID update sets and also individually) - these can often resolve underlying disk reset/stall issues which could also be a cause of what you are seeing and would not necessarily be reported anywhere.
Does Dell have a KB article referencing this issue? Would you be able to provide some detail on how to check that issue? We have a Dell ME4084 connected similarly to your setup.

ME4084
Two Dell switches
3 Dell PowerEdge servers

Any info you could provide would be wonderful,
Joel

EDIT: Just checked our settings and we've already enabled the options you mentioned.. Probably during a call to Dell a year ago :(

Thanks,
Joel
mkaec
Veteran
Posts: 462
Liked: 134 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by mkaec » 1 person likes this post

stephc_msft wrote: Oct 18, 2023 6:15 pm Workaround for now is to delete the .rct and .mrt file, to allow the VM to start
And of course that will mean the next host level backup of the VM will be a full backup, but that is not too unreasonable.
I don't think the next backup is a full backup. It's just a slower scan - an incremental backup that takes longer.
bhead
Influencer
Posts: 12
Liked: 6 times
Joined: Sep 30, 2020 9:18 am
Full Name: Bjoern Goerlich
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by bhead » 5 people like this post

Hello,

after installing KB5031361 on two of our Hyper-V nodes I can't see any IO issues after creating a VM backup.
I've had no issues starting or restarting VMs or moving VMs from or to these hosts. I didn't have to delete mrt or rct files in order to start a VM.
Overall the performance of the VMs residing on these hosts seems to be outstanding compared to what we had to deal with until now. VMs will now boot within seconds.

Yet it is still sad that it took years for Microsoft to fix such a bad issue.
I am sure they lost a bunch of customers because of this.

We will continue to roll out the October CU so that we're finally back on track.

Regards
hallsos
Novice
Posts: 4
Liked: 1 time
Joined: Oct 10, 2019 6:20 pm
Full Name: Chris
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by hallsos » 1 person likes this post

We have installed the Windows Server Oct CU, and we are unfortunately still experiencing one of the two issues reported on this thread. CSV's with VM's vhdx disks, and VEEAM backups, where the CSV experiences high IO and goes into paused state. This issue doesnt occur on CSV's with VM's that do not have VEEAM backups enabled. The error we get is below, and the VM's with disks on this CSV of course go into paused/offline state.

Cluster Shared Volume 'CSVName-CSVD' ('Cluster Virtual Disk (CSVName-CSVD)') has entered a paused state because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished.
joelg
Influencer
Posts: 11
Liked: 2 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg »

We're still experiencing the issue after the update. Small sample size, but we were averaging 2300 events per day (30 day sample) before the update and about 1500 (8.5 day sample) after the update.

I don't think the patch has resolved the issue for us.
Joel
joelg
Influencer
Posts: 11
Liked: 2 times
Joined: Jan 16, 2023 3:13 pm
Full Name: Joel G
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by joelg » 2 people like this post

Please disregard my last post - I was looking at the wrong update that was installed. We don't currently have the noted update installed on our servers..

Joel
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 »

Hi Everyone,

Recently setup a new Hyper-V 2022 cluster (3 nodes) Dell PE R650 and new storage Unity 380XT Hybrid. Moved 17 guests from Hyper-V 2012 R2 cluster to new cluster, everything has been fine. Most of these servers are fairly small and not impactful. Friday Oct 20 I finally migrated our production SQL database to the new Hyper-V (total 6.7 TB VM)
I ran the initial backup job after re-pointing it to the new cluster. This ran the initial full incremental scan and understandably took some time but completed successfully. This was over Sunday-sometime Monday it finished. No issues so far. Tuesday no issues. Now comes Wednesday, about 4 days after completing migration our application starts hanging/freezing and disconnecting queries after backups run. Seems like after about an hour or so after the Merge completes things stabilize and go back to somewhat normal. I'd understand if slowness/disconnects happen during the merge but an hour or so after seemed odd.
Been searching for a few hours and came across this thread, read all 13 pages, and I feel for all of you who have been dealing with this for years. It has been 1 day for me, and I wanted to smash my head on the desk. There is no worse feeling than moving to brand new hardware, that should be better/faster, and running into issues that really make it seem like the move was a waste of money.
Anyway, we had not run our October update, we tend to wait 1-2 months, but I am now installing the update on all 3 nodes. Hopefully this help resolve our issue but at least this thread provided some workarounds I can try like performing a Live Migration to one of the other nodes to see if it helps clear up the disconnects.
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 »

Updates applied to my 3 Hyper-V hosts. Unfortunately, a Full Backup ran last Friday of the month after applied so had to a few days. Doing a health check too. Don't think it resolved the issue fully. Now when I first attempt to login to the application it times out. I try again and it takes about two minutes to get my logged in, everything is slow for about another minute or two, then it starts to act somewhat normal. Will see how it acts after its next backup later in the evening but believe maybe I have the issue where Live Migration needs to be ran after backups complete as well.
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 »

Second Update: Had to perform Live Migration which got things back to normal for users. Will be looking for a way to script this in the meantime I guess.
SodaPop87
Novice
Posts: 9
Liked: 3 times
Joined: Oct 26, 2023 3:09 pm
Full Name: Daniel Roth
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by SodaPop87 » 1 person likes this post

Third update: Since Live Migration to a new Node backups and connectivity have been stable for now. Believe I have read that the issue can come back in a few days after a couple backups? Will report another update if that seems to be true end of this week and next week.
stephc_msft
Technology Partner
Posts: 26
Liked: 15 times
Joined: May 04, 2016 12:35 pm
Full Name: Stephen Cole
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by stephc_msft »

stephc_msft wrote: Oct 18, 2023 6:15 pm ... [VM's not starting after the October 2023 Windows Server update]
Workaround for now is to delete the .rct and .mrt file, to allow the VM to start
And of course that will mean the next host level backup of the VM will be a full backup, but that is not too unreasonable.
This issue with the October 2023 Windows Server update (where the fix that helps overcome RCT related VM IO performance issues has the unexpected side effect of stopping some VM's from starting) is due to be addressed in the November update.
Exact confirmation cannot be confirmed until closer to the date.
steendp
Influencer
Posts: 11
Liked: 3 times
Joined: Jan 11, 2023 2:47 pm
Full Name: Steen Dalsgaard Pedersen
Contact:

Re: Windows Server 2019 Hyper-V VM I/O Performance Problem

Post by steendp » 1 person likes this post

After the october update, we have seen a decrease of instances where iops are slow. We are still experiencing them, though. I have just live migrated a host that went from 100 iops to 35.000 iops after migration.
We might have been hit with both issues mentioned in the thread and have managed to resolve one..?
Post Reply

Who is online

Users browsing this forum: Dewey Decimal and 7 guests