Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow deserialization of XElement with text content using DataContractSerializer #37893

Open
troplin opened this issue Jun 15, 2020 · 6 comments
Assignees
Labels
area-Serialization backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity tenet-performance Performance related issue
Milestone

Comments

@troplin
Copy link

troplin commented Jun 15, 2020

Description

The following code takes about 15 minutes to run on my machine:

var element = new XElement("element", string.Join("\n", Enumerable.Repeat(new string('a', 100), 100000)));
var serializer = new DataContractSerializer(element.GetType());
using var memoryStream = new MemoryStream();
serializer.WriteObject(memoryStream, element);
memoryStream.Seek(0, SeekOrigin.Begin);
var deserialized = (XElement)serializer.ReadObject(memoryStream);

The equivalent code using Save and Load only takes about 200-300 ms:

var element = new XElement("element", string.Join("\n", Enumerable.Repeat(new string('a', 100), 100000)));
using var memoryStream = new MemoryStream();
element.Save(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
var deserialized = XElement.Load(memoryStream);

Going through a string using XmlWriter+StringWriter+XmlReader+StringReader is also only 200-300 ms:

var element = new XElement("element", string.Join("\n", Enumerable.Repeat(new string('a', 100), 100000)));
var serializer = new DataContractSerializer(element.GetType());
using var writer = new StringWriter();
using (var xmlWriter = XmlWriter.Create(writer))
    serializer.WriteObject(xmlWriter, element);
using var reader = new StringReader(writer.ToString());
using var xmlReader = XmlReader.Create(reader);
var deserialized = (XElement)serializer.ReadObject(xmlReader);

Curiously, If I change the content to use "\n" instead of "\r\n" as line separator, the runtime is reduced to about 15 s, which is still way too slow.

Configuration

  • OS: Windows 10 Pro (Build 18363.900)
  • Processor: x64, Core i7-9700 @ 3.00 GHz
  • NET Core: 3.1.2 or 3.1.4 (both are installed, not sure which one is actually used by VS)

Analysis

My analysis shows, that the problem is excessive string concatenation in XElement.AddStringSkipNotify. The content seems to arrive in very small chunks but I don't know the reason why.

@troplin troplin added the tenet-performance Performance related issue label Jun 15, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Xml untriaged New issue has not been triaged by the area owner labels Jun 15, 2020
@ghost
Copy link

ghost commented Jun 15, 2020

Tagging subscribers to this area: @buyaa-n, @krwq
Notify danmosemsft if you want to be subscribed.

@buyaa-n
Copy link
Contributor

buyaa-n commented Jun 15, 2020

Interesting, its indeed very slow, but i don't think its caused from XElement, i did log the timing and it shows XmlObjectSerializer.ReadObject(stream) is very slow:

     Stopwatch watch = Stopwatch.StartNew();
     watch.Start();
     var element = new XElement("element", string.Join("\r\n", Enumerable.Repeat(new string('a', 100), 100000)));
     Console.WriteLine(watch.ElapsedMilliseconds); // 78
     watch.Restart();
     var serializer = new DataContractSerializer(element.GetType());
     using var memoryStream = new MemoryStream();
     serializer.WriteObject(memoryStream, element);
     memoryStream.Seek(0, SeekOrigin.Begin);
     Console.WriteLine(watch.ElapsedMilliseconds); // 146
     watch.Restart();
     var deserialized = (XElement)serializer.ReadObject(memoryStream);
     Console.WriteLine(watch.ElapsedMilliseconds); // 30520

Event without casting to XElement is the same, very slow. So changing the area to area-Serialization. As @troplin mentioned changing the content to use "\r\n" instead of "\n" as line separator, the runtime is increased to almost 30 mins on my laptop, which is very very slow. Tagging @adamsitnik as he might want to see this, also tagging Runtime.Serialization area owners @StephenMolloy @HongGit

@troplin
Copy link
Author

troplin commented Jun 15, 2020

@buyaa-n I've profiled the code and the time is lost in XElement.AddStringSkipNotify (which is called from DataContractSerializer.ReadObject, so you're not wrong).
The problem is (probably), that this method is called many times, each time with a small chunk of content which is then appended to the XElement.content member by string concatenation. This will allocate a new string each time, copying the previous content which grows larger each time.
This again leads to massive GC activity.

@HongGit HongGit removed the untriaged New issue has not been triaged by the area owner label Jun 25, 2020
@HongGit HongGit added this to the Future milestone Jun 25, 2020
@StephenMolloy
Copy link
Member

I can understand the frustration here, and we are going to keep this bug open to look at in a future milestone. 15 minutes is a long time to wait for a serializer. However, it is true that this is not a regression from the 4.8 framework.

What is probably happening here is that XElement likely already has custom serialization code to handle this case and thus can do things smartly and quickly, whereas DataContractSerializer is a general-purpose serializer that doesn't take those same shortcuts. It is almost always true that a special-purpose serializer will be faster than a general-purpose serializer for the case that it was specially designed for.

But again, 15 minutes is extreme. We will try to come back to this in the future. But we are quite low on resources and cannot prioritize this at this time given that it is not a regression from previous frameworks.

@troplin
Copy link
Author

troplin commented Jun 26, 2020

@StephenMolloy The strange thing is that it is as fast as expected when using DataContractSerializer together with XmlReader and StringReader to deserialize from a string.
Since that workaround works well, the issue is not really urgent right now.

Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Serialization backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants