Re: [PATCH 1/2] regedit: Overhaul Unicode line processing

List overview All Threads

newer

older

Re: [v2 2/2] d3d11/tests: Add test...

Code style guidelines on the wiki

Alexandre Julliard

21 Nov 2016 21 Nov '16

7:38 p.m.

Hugh McMaster hugh.mcmaster@outlook.com writes:

...

```
   p = strpbrkW(line, line_endings);
```
```
   if (!p)
```
```
   {
```

       HeapFree(GetProcessHeap(), 0, buf);

```
       return NULL;
```

If there are no line endings you need to read more data instead of failing. Also it looks to me that you are reading the entire file into the buffer instead of just one line.

-- Alexandre Julliard julliard@winehq.org

Show replies by date

Hugh McMaster

22 Nov 22 Nov

12:01 a.m.

New subject: [PATCH 1/2] regedit: Overhaul Unicode line processing

On Tuesday, 22 November 2016 6:38 AM, Alexandre Julliard wrote:

...

Hugh McMaster writes:

...
+ p = strpbrkW(line, line_endings); + if (!p) + { + HeapFree(GetProcessHeap(), 0, buf); + return NULL;

If there are no line endings you need to read more data instead of failing.

By the time p == NULL, we are already at the end of the buffer. And even if we weren't, the buffer would be increased and more data read in, on the second (and subsequent) loops.

This does assume, of course, that the .reg file format is valid. I'm not sure if .reg files are valid if they end without a newline sequence. I'll look into this and write some tests to check.

...

Also it looks to me that you are reading the entire file into the buffer instead of just one line.

The patch splits the buffer contents into lines. But we do end up reading the entire file into the buffer because REG_VAL_BUF_SIZE is set to 4096 bytes. We could decrease this to 1024 bytes if you want. But, in many cases, I expect the same behaviour to occur.

Still, I'm not entirely sure how to implement what you're suggesting with line-by-line reads.

We can't use fgets because \r line endings are valid on Windows.

I suppose we could use fread like it was fgetc, and read character by character until we reach a line ending. But that defeats the purpose of using fread to read in large chunks.

Hugh

Alexandre Julliard

12:37 p.m.

New subject: [PATCH 1/2] regedit: Overhaul Unicode line processing

Hugh McMaster hugh.mcmaster@outlook.com writes:

...

On Tuesday, 22 November 2016 6:38 AM, Alexandre Julliard wrote:

...
Hugh McMaster writes:

...
+ p = strpbrkW(line, line_endings); + if (!p) + { + HeapFree(GetProcessHeap(), 0, buf); + return NULL;

If there are no line endings you need to read more data instead of failing.

By the time p == NULL, we are already at the end of the buffer. And even if we weren't, the buffer would be increased and more data read in, on the second (and subsequent) loops.

There won't be a subsequent loop since you returned failure.

...

This does assume, of course, that the .reg file format is valid. I'm not sure if .reg files are valid if they end without a newline sequence. I'll look into this and write some tests to check.

The function should of course be able to handle a file that doesn't end in a newline, but the problem above would happen on any line that does not fit into the existing buffer.

...

The patch splits the buffer contents into lines. But we do end up reading the entire file into the buffer because REG_VAL_BUF_SIZE is set to 4096 bytes. We could decrease this to 1024 bytes if you want. But, in many cases, I expect the same behaviour to occur.

The initial size won't make a difference, the problem is that you always append data without removing it, so the buffer grows to the entire file size. It shouldn't need to grow bigger than the longest line.

-- Alexandre Julliard julliard@winehq.org

Hugh McMaster

1 p.m.

New subject: [PATCH 1/2] regedit: Overhaul Unicode line processing

On Tuesday, 22 November 2016 11:37 PM, Alexandre Julliard wrote:

...

Hugh McMaster writes:

...
On Tuesday, 22 November 2016 6:38 AM, Alexandre Julliard wrote:

...
Hugh McMaster writes:

...
+ p = strpbrkW(line, line_endings); + if (!p) + { + HeapFree(GetProcessHeap(), 0, buf); + return NULL;

If there are no line endings you need to read more data instead of failing.

By the time p == NULL, we are already at the end of the buffer. And even if we weren't, the buffer would be increased and more data read in, on the second (and subsequent) loops.

There won't be a subsequent loop since you returned failure.

This is a marginal case at best. The first line would need to be use 4095 bytes and not have a newline in it at all for this to fail. Still, I take your point in handling this aspect correctly.

...

...
This does assume, of course, that the .reg file format is valid. I'm not sure if .reg files are valid if they end without a newline sequence. I'll look into this and write some tests to check.

...

The function should of course be able to handle a file that doesn't end in a newline, but the problem above would happen on any line that does not fit into the existing buffer.

Windows allows the final line in a .reg file to end without a newline on all versions except XP.

Our current implementation is broken in this regard.

...

...
The patch splits the buffer contents into lines. But we do end up reading the entire file into the buffer because REG_VAL_BUF_SIZE is set to 4096 bytes. We could decrease this to 1024 bytes if you want. But, in many cases, I expect the same behaviour to occur.

...

The initial size won't make a difference, the problem is that you always append data without removing it, so the buffer grows to the entire file size. It shouldn't need to grow bigger than the longest line.

Continually moving the remaining characters to the start of the buffer hits perfomance. That's why I took the 'read into memory' approach as line parsing is faster in RAM.

But fair enough, I'll rework the patch. Still, I'm interested to know why you dislike reading the file into memory first.

Thanks for the review.

Alexandre Julliard

1:20 p.m.

New subject: [PATCH 1/2] regedit: Overhaul Unicode line processing

Hugh McMaster hugh.mcmaster@outlook.com writes:

...

...
The initial size won't make a difference, the problem is that you always append data without removing it, so the buffer grows to the entire file size. It shouldn't need to grow bigger than the longest line.

Continually moving the remaining characters to the start of the buffer hits perfomance. That's why I took the 'read into memory' approach as line parsing is faster in RAM.

But fair enough, I'll rework the patch. Still, I'm interested to know why you dislike reading the file into memory first.

It's wasteful, it doesn't scale, and there's no reason to do it since you only read one line at a time. And I very much doubt that with the reallocations and the extra memory pressure performance would be any better.

-- Alexandre Julliard julliard@winehq.org

3268

Age (days ago)

3269

Last active (days ago)

wine-devel@winehq.org

4 comments

2 participants

tags (0)

participants (2)

Alexandre Julliard
Hugh McMaster