-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues - cpu usage #324
Comments
How are you running this code in a loop? The first example must be accidentally dropping the entire PLC connection from time to time to be that much slower. The Micro800 cannot packet multiple requests into a PDU so there should not be a lot of difference between the two examples. @jkoplo @timyhac ? Note that a Micro800 is not a fast PLC at all. Reading 1500 tags as fast as you can will probably drive the CPU usage of the PLC very high. Make sure you measure the CPU load of the PLC! The libplctag DLL itself is capable of overwhelming a network connection to a ControlLogix L80 when running on a RaspberryPI. |
One way the connection could be dropped is if the GC runs and closes the tag handle between iterations of the loop. |
The example code shows that there are multiple things being tried. These two examples are not really equivalent. I suggest starting with the lowest level of code possible, the C DLL.
The command line should look something like this:
When I run this against my own PLC on my WiFi network, it fails because it takes more than 5000ms (the default timeout) to create all the tags and do the first read. My WiFi network has much higher latency than a wired network. In the log file, you will find a line that looks like this: This is what I get when testing with 100 tags. I am using a ControlLogix L80 system but preventing it from using packed requests to simulate a Micro800. Again, my network is quite bad so you should see much smaller numbers. Try this and attach the results here. From the numbers I see, I am not sure you can get the total time much lower. The |
Thank you for your answer. Later I'll test in c, to check if the cpu problem occurs in c or it is only for c#. |
It doesn't make sense in the second example to use both ReadAsync and AutoSyncReadInterval. Also, ReadAsync isn't awaited. What kyle said about tags being Garbage Collected is correct. Tags are designed to be long-lived. Store them in a local variable to prevent them being garbage collected. For what its worth - when my PC is idling, it sits at about 12% CPU usage. Everything is relative when talking about performance, so it would be helpful if you can provide some more context about what the problem is. Here is an example C# program that should help to analyze the .NET side of things more precisely:
internal class Program
{
static void Main(string[] args)
{
var test = new LoadTest();
// Don't measure here
test.Setup();
// Measure here
while(true)
{
test.Test();
}
}
}
class LoadTest
{
List<Tag<DintPlcMapper, int>> tags = new();
public void Setup()
{
for (int i = 1; i <= 1500; i++)
{
var myTag = new Tag<DintPlcMapper, int>
{
Name = $"TAG_INT_{i:0000}",
Gateway = "192.168.1.150",
Path = null,
PlcType = PlcType.Micro800,
Protocol = Protocol.ab_eip,
Timeout = TimeSpan.FromMilliseconds(500),
AllowPacking = true
};
myTag.Initialize();
tags.Add(myTag);
}
}
public void Test()
{
foreach (var tag in tags)
{
tag.Read();
}
}
} |
@timyhac in your example code it won't do any tag packing since it's reading each tag synchronously. So you'll get one tag read per round trip - the throughput will be pretty low. But if we're just looking at cpu usage overhead maybe that's intended. I really need to carve out some time to finish performance benchmarking and improvements. That said, using the techniques in that performance issue, one of my coworkers benchmarked libplctag.net against Ingear and we were faster in pretty much all cases. This was real-world against a clogix PLC. So we got that going for us, which is nice. |
The Micro800 does not support request packing anyway. When I test this on my network faking a Micro800, it takes more than 5 seconds to just create the 1500 tags. But my network is not good. |
Any update on timing tests? |
I tested using the timyhac example, initializing and adding to a list, then reading. My pc is Intel(R) Core(TM) i7-3770 CPU @3.40GHz 3.40 GHz |
Hi Vitor, Let me restate the results so that I make certain that I understand them. When you test with Tim's program, it takes 8 seconds to create the tags and then takes 9 seconds to read all 1500 tags. Is that correct? What is the network latency from your PC to the Micro800 PLC? You can use the ping.exe tool on the command line to get this information. I modified the async_stress.c program to do essentially what Tim's .Net version does above. Here are some results simulating a Micro800 (running with the PLC type as Micro800). First I tested my network:
As you can see, this is a bad network. The WiFi latency varies from 6ms to over 16ms. With one packet round trip taking 16ms, the fastest the C library can read tags is 60 tags per second if the PLC introduces no additional latency. Here is a test of 100 tags:
So in this case, it looks like takes an average of about 5.47ms per tag read (547ms/100 tags). Now I'll try 1000 tags:
Here we can see that it takes 5.7ms (5751ms/1000 tags) per tag read. About the same. But it still takes 5.7 seconds to read all 1000 tags. There were some timeouts that appear to be due to transient latency (my network sometimes hangs). Since there is little change between the 100 tag case and the 1000 tag case in time per tag read, we can simply multiple to find that 1500 tags would take about 8.6 seconds to read. That is not that much different that what Vitor measured. The system load on my Mac Air M1 was between 40 and 60% of a CPU. The reason I say a CPU is that this laptop has 4 performance cores and 4 efficiency cores. As far as I can tell the program is running on one of the efficiency cores which is much lower performance. The system as a whole was about 85% idle (this fluctuated). Here's the same test, but run against a ControlLogix:
Notice the much higher variance. 74ms to 7286ms. That is how bad my WiFi network is. But the average is interesting. On average it takes 142ms to read 1000 tags. That is about 40x faster due to the ability to pack multiple requests per network packet to the PLC when targeting a ControlLogix. Total load on the system was lower, around 90% idle with a single efficiency core at 30-50%. My results show that while there is CPU load, it isn't that much and the overall throughput to a ControlLogix is high even with a bad network. There are a lot of variables here. I am using macOS and an ARM CPU instead of Windows and x86-64. This is not with the version of async_stress.c that is included in the normal release, but one that I changed to be about the same logic as Tim's version. |
This whole issue (#224) has a bunch of performance info, but I started benchmarking here. The code I recommend for reading a bunch of tags is below (really just the one line). I didn't do anything special when instantiating tags - it's about as close to the basic example for DINTs as you can get on CLogix. internal static async Task ReadWhenAllAsync(List<TagDint> allTags)
{
var stopwatch = new Stopwatch();
stopwatch.Restart();
await Task.WhenAll(allTags.Select(x => x.ReadAsync()));
stopwatch.Stop();
Console.WriteLine($"Tag ReadWhenAllAsync = {stopwatch.ElapsedMilliseconds} msec");
} I've also done some longterm testing with this code - running a read loop that was pulling like 100 tags every second over a 24 hour period or something like that. I didn't see any memory leaks or runaway CPU usage. I have some work in a branch that's looking to really dig in and optimize how we manage the async reads to the base library. Unfortunately I haven't had time to finish it up and prove out some hunches. My initial thought is that there should be room to reduce our overhead in terms of memory (and probably CPU). |
Oh, it's all worth noting that @kyle-github has a very basic mock program that pretends to be a PLC for testing. I don't think it supports all tag types, but it would probably be preferable to test against that instead of a Micro800 to help isolate your code performance. |
Just want to add some information to Vitor's tests today. As he mentioned, the ReadAsync function is very fast, but when we use the ReadCompleted to actually get the value (with await), for the 1500 tags we will wait for 26 seconds until we get the last tag read response. I have also read an interesting documentation from Software Toolbox giving some tips to improve the control logix communications. Of course, related to their software, but the hints are interesting. According to the documentation above:
I am not sure how relevant this is when we talk about libplctag, but since our tests have different tag names (example below): This "big tag names" might be impacting our capability to use the packing feature. Does it make sense? I am trying to understand why our tests have such bad performance agains Kyle's tests. Kyle, are you testing read or read async? From my understanding you are readind sync. |
Hi, I am traveling so I don't have access to my personal laptop.
Smaller tag names will help. The tag name is part of the request. The
longer the name, the fewer requests can be packed together.
If the ControlLogix has a newer type of network module or has a L80 or
newer CPU module with the built-in network port, the libplctag library will
negotiate a 4000-byte packet size.
With a ControlLogix or CompactLogix the library will use the multi-request
CIP command to pack as many requests as possible into one packet to the PLC.
This packing is transparent to the application but you need to help the
library by queuing the requests at the same time. The async_stress code
does that as do Tim and Jody's examples.
The library will pack requests while it is waiting for a response from the
PLC for the previous request.
In my testing, the data type of a tag does not matter much. If your tag
has a lot of data it will use up more space in the response. You can read
and entire UDT tag at once but you need to decide the data using your own
method. If most of your tags are individual field in a UDT, it may make
sense to do that.
Micro800 PLCs cannot use packed requests.
Best,
Kyle
…On Tue, Jan 24, 2023, 4:12 PM fabio160781 ***@***.***> wrote:
Just want to add some information to Vitor's tests today.
As he mentioned, the ReadAsync function is very fast, but when we use the
ReadCompleted to actually get the value (with await), for the 1500 tags we
will wait for 26 seconds until we get the last tag read response.
I have also read an interesting documentation from Software Toolbox giving
some tips to improve the control logix communications. Of course, related
to their software, but the hints are interesting.
https://support.softwaretoolbox.com/ci/fattach/get/20774/1245932540/redirect/1/filename/Optimizing_ControlLogix_Communications.pdf
<http://url>
According to the documentation above:
Steps to Maximum Throughput
1) Use Global Tags only
2) Use Array Tags
3) Keep PLC Tag Names Short
4) Set PLC CPU time slice to 40%-50%.
5) Define Alias Tags with RSLogix
The use of short tag names in the PLC for tags other than arrays is also of great benefit. This is because the TOP Server packs the PLC tag addresses into the Multi-Item Request packet sent to the ControlLogix. The 500-byte limit is what makes the shortness of the tag addresses so critical. Creating all required PC communications tags under the Global file is one way of shortening the names because Global tags require the least amount of space in the MultiItem Request Packet. Local (i.e. Program) tags may seem nice, but in the ControlLogix to get at a program tag (vs. a Global) we also have to put in the packet the text “Program:ProgramName.” plus the tag name! – You can see how 500 bytes can go fast.
I am not sure how relevant this is when we talk about libplctag, but since
our tests have different tag names (example below):
PROGRAM:MAINPROGRAM.HLL_FT1012.LOWLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1033.HIGHLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1033.LOWLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1034.HIGHLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1034.LOWLIMIT
PROGRAM:MAINPROGRAM.HLL_FT1037.HIGHLIMIT
PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.PGAIN
PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.IGAIN
PROGRAM:MAINPROGRAM.TIC_1010_BLOCO.DGAIN
This "big tag names" might be impacting our capability to use the packing
feature. Does it make sense? I am trying to understand why our tests have
such bad performance agains Kyle's tests. Kyle, are you testing read or
read async? From my understanding you are readind sync.
—
Reply to this email directly, view it on GitHub
<#324 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN4LCY4LYSE3JVFLD7T2YTWUBASLANCNFSM6AAAAAAT54AR4Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Even if CPU usage or read latency could be reduced, what is the reason that the current CPU usage/Read latency is too high? i.e. what problem are you trying to solve? Vitor's message indicates that the problem is that the "PC freezes", and that high CPU usage is a contributing factor - can you elaborate on this?
Some other notes:
|
To add further context, it would also be helpful to know the following:
If possible, it would be helpful to benchmark this with another driver. Can you perform the same test with FactoryTalk Linx Enterprise and the FactoryTalk Live Data Test Client? |
I am using TaskManager to monitor it.
The difference between the 70% and 26% is that 70% is our real product and the 26% is a basic test. About PWSys question, I'll need to check with my client all this information |
I have been out for a while. Is there any status update on this? |
Any updates, my application still executing with a high cpu usage, and not too fast. I get this info about the plc, I do not know if it helps |
What is the network module? |
Thanks, Vitor. Based on the docs, it looks like this is a standard CompactLogix port. I think that these are only 100Mbps, but I cannot find a speed listed in the docs. CompactLogix use the internal CPU for both network and for running the PLC control code. This means that you need to be very, very careful how much load you place on the PLC. If you got to the web page the network stack shows on port 80, you can navigate to the page that shows the load. On my ControlLogix L80, this is http://10.206.1.40:80/ -> Diagnostics -> Module Diagnostics. You will need to change the IP address to the one for the PLC. This page changes depending on the firmware, so you may have to explore a bit. It should show the current and maximum packet rates. Note that the docs show that you can only have 120 TCP connections to this type of PLC. That is not very many :-( I have never seen a CompactLogix that supported the large packet size (4002 bytes). That is not to say that they do not exist, but I have not seen one. If this PLC only supports the smaller packet size, then you will not be able to pack as many requests into one packet to the PLC. That will slow down the reads and increase the CPU load on the PC. @jkoplo @timyhac , do you have any measurements of the overhead of using callbacks? In this case, I wonder if setting up the tags to use auto sync reads would be a better idea? That would eliminate all the cross DLL marshalling/unmarshalling to trigger reads and only have that hit for the call back. Vitor, the idea to test here (if Jody and Tim think it might help) is to set up your tags with automatic reads using the auto sync configuration. Create your tags with this attribute set and with a callback that looks for read complete events. Vitor, is there any way you can get the same PLC type? It is important to make sure our tests are really testing the same configuration as in your customer's factory. |
I don't have much in terms of benchmarks on callbacks but I figure that dotnet must use them for system calls on Windows/Linux for any low level resource interaction. I'd be shocked if they were part of any sort of bottle neck for us. But that's just my intuition. |
These CompactLogix PLCs do support Large Connections when running firmware 20.11 and up. They are closely related to L7 ControlLogix PLCs. Going to the webpage for the controller would be beneficial to know the embedded Ethernet card's utilization. What else is communicating with this PLC? A common scenario is using numerous PanelView terminals which consume multiple TCP connections. If this PLC is communicating with Ethernet/IP I/O devices that will consume additional resources. The diagnostics web page will indicate the percentage of utilization as well as the number of TCP and CIP connections in use and available. What is the System Overhead Time Slice? The default is 20%. Can you try increasing this value? I've experienced unpredictable results when communicating with controllers that are overly taxed. Examples include running continuous tasks with a low System Overhead Time Slice. This effectively prioritizes executing logic over communications. |
Ah, thanks for the update about the CompactLogix! I don't have one and have not used one for years. That is good news for the possible performance. There is a fairly large gap in performance between the L7x series CPUs and the L8x series. I have L55, L61 and L81 but not a L7x in my test lab but I used to have a L71. The L81 can handle a heavy network load. That is in a ControlLogix chassis. Every connected device uses at least one TCP connection (to some extent, even produced/consumed tags and remote IO). So 120 connections is not that many. Every laptop with RSLinx, other OPC server etc. uses at least one TCP connection. Back to the performance... I am not a C#/.Net programmer, so I cannot come up with sample code. I could try to write up something in Java as that would be close to C# but things like callbacks might be quite different between Java and .Net. The rough outline would be this:
Note that there is no awaiting at all in this model. The underlying C DLL will call you when the tag read is complete. |
I think I can test it on my client's PC, I need to talk to him first but he will probably allow me to test there. Maybe next week I can test there with auto sync configuration
I'll try to check this property and try to increase it Thanks for your help |
I finally tested the async read at my client pc.
It is at 45%. I tested with a simple project, with 1538 tags. But, it reads the tags every 1 second correctly, it has a great difference between reading it all at once. A question, does the auto sync work for all plcs except micro800? |
In my tests, I saw that initializing all the tags takes several seconds. The time you saw is expected since the tag creation process is waiting for completion as each tag is created. I am not sure I understand what you mean by "a great difference between reading it all at once." If you are using auto-sync, then the library will try to spread out the reads so that they are not all at the same time. Auto-sync works for all tag types and for all PLC types. The Micro800 does not allow requests to be packed into one packet so every tag read is done one at a time. You will not be able to read 1500 tags from a Micro800 in one second. If it takes 3ms to read a tag, then 300 tags will take one second to read. The CPU load of 25-27% is much better than the 70+% that the client was reporting. At least I think that is what you were saying earlier. Does this solve the problem? 1500 tags is a lot. Note that the library will take some CPU time. My own tests showed that my laptop was using more than a bit of CPU using just C code. Thanks for running these tests! |
I updated my application with the latest libplctag and using auto read. A question, why is it too much faster then read and readasync? My client will still test there, but I hope that he is not complaining about 25-30% of cpu usage. Thank you for the help |
Good to hear that this is starting to work! What is the read period you are using for the tags? I am not sure exactly why the speed is so much better, but I strongly suspect that it is due to some combination of factors:
I do not know which of these factors might be more important, but your results are very interesting! A different language wrapper might show different results. I suspect that a C application would be different and not show as much difference. |
My 2c is that different things are being measured. Can you post code/information on how you measured this? Note, internally the AsyncRead piggy-backs off of the Event/callback system - it doesn't poll.
I still believe that investigating the root problem your customer is will be helfpul in understanding the real issue. CPU Usage may be a contributing factor, or even a side effect, but isn't the root problem in itself. |
Best of luck with your project! |
Hello,
I am having some performance issues. The cpu usage is too high. I create a simple test that read 1500 tags
Sync read
The cpu usage goes to 10% (my pc is a good one). And it tooks about 17s to read it all.
2 - Read Async
I tried to read async with AutoSyncReadInterval and ReadComplete.
It reads each tags each 3s, and the cpu usage goes to 15%¨
Is there any way to ensure that CPU usage does not grow too much?
The text was updated successfully, but these errors were encountered: