Continue Reading File After Empty Line C

I realized when there are many blank line inside a txt file
and I use getline inside a file with many blank line it will read the blank line into the string.

so does c++ getline treats blank line as \n?

What you see as blank lines actually have, depending on the text file encoding, carriage return and/or line feed encoding. Something text file editors don't normally show.

Unix is LF only, Windows is CR + LF. Usually.

Open your file(s) in a hex editor and you will see what is hidden to us hoo-mans.

ASCII Codes
https://cplusplus.com/doc/ascii/

On Windows CR = x0D, LF = x0A. I don't do *nix, but I'd guess file line endings are x0A since it uses LFs only.

Last edited on

notepad++ amongst its other coding features will display the end of line as the bytes that make it up, eg it displays [cr][lf] or something like that for 13/10 (from my memory which is faulty).

And there are 5 or so variations (see the ugly chart at https://en.wikipedia.org/wiki/Newline) , beyond just basic windows and unix, unfortunately. It lists 8 versions, but 4 of them are just cr/lf combos.

back to the question: it also depends on both how you read the file (stream operators ignore whitespace!) and whether your compiler recognizes the file's format of end of line as such. If you WANT the actual bytes, you may need to read the file in binary mode. If your compiler is compatible with the file's end of line, you can use getline -- then even if its empty you know you got an end of line sequence (or end of the file). Getline has a default, but you can override it to whatever you need.

Last edited on

If it matters, I usually just use Notepad++'s ability to convert a file's line endings en masse. (Or, if I'm on *nix, I'll just use dos2unix or, worst case scenario, a sed script.)

At least in the modern Windows/Linux/OS X world, line endings shouldn't matter too much, but some programs still care, and you must be aware of how you are targeting your users.

To answer the title question, yes , a blank line is terminated by \n , which C++'s getline() function will consider to be the signal to return the string read so far from file (by default — you can tell it to stop at a different character if you wish).

So if your file contains nothing but newline sequences, all you will get from getline() is empty strings.

Often you can just ignore them.
I also ignore lines that have nothing but spaces (or other whitespace characters) on them.

                      1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
                                              #include <iostream>                        #include <string>                        std::string trim(                        const                        std::string & s ) {                        auto                        first = s.find_first_not_of(                        " \f\n\r\t\v"                        );                        auto                        last  = s.find_last_not_of (                        " \f\n\r\t\v"                        );                        return                        (first == s.npos) ?                        ""                        : s.substr( first, last+1 ); }                        int                        main() {   std::ifstream f(                        "myfile.txt"                        );   std::string s;                        while                        (getline( f, s ))   {                        if                        (trim( s ).empty())                        continue;     std::cout << s <<                        "\n";   } }                    

Prints all non-blank lines in

myfile.txt

.

This solution works for ASCII text only. If it matters for Unicode handling it gets a little less simple. For newbie stuff you typically don't have to care about exotic Unicode space characters as yet.

In text mode, getline() will read all chars from the specified stream up-to but NOT including the specified terminator char (\n by default) into the specified variable and will then read and DISCARD the terminator char if not eof. Hence if there no non-term chars before the next term char, then getline() will return an empty string.

For Windows in text mode, the line terminator written as '\n' in code is actually written as \r\n by the OS. Consider:

                      1
2
3
4
5
6
7
8
9
                                              #include <fstream>                        #include <iostream>                        #include <string>                        int                        main() { 	std::ofstream of("tt.txt");  	of <<                        "l1"                        <<                        '\n'                        <<                        "l2"                        <<                        '\n'; }                    

If you look at the file tt.txt with a hex editor you'll see

                              108 49 13 10 108 50 13 10                          

If this is now read back with getline():

                      1
2
3
4
5
6
7
8
9
                                              #include <fstream>                        #include <iostream>                        #include <string>                        int                        main() { 	std::ifstream ifs("tt.txt");                        for                        (std::string s; std::getline(ifs, s); std::cout << s <<                        '\n'); }                    

you get the expected:

However, if you now read the same file character by character then you get:

                      1
2
3
4
5
6
7
8
9
10
11
                                              #include <fstream>                        #include <iostream>                        #include <string>                        int                        main() { 	std::ifstream ifs("tt.txt");                        for                        (char                        c ; ifs.get(c); std::cout << (int)c <<                        ' ');  	std::cout <<                        '\n'; }                    

you get:

No 13 - even though the hex editor shows 13's are present. The OS does the conversion from \r\n to \n.

If you open the same file in binary mode:

                      1
2
3
4
5
6
7
8
9
10
11
                                              #include <fstream>                        #include <iostream>                        #include <string>                        int                        main() { 	std::ifstream ifs("tt.txt", std::ios::binary);                        for                        (char                        c ; ifs.get(c); std::cout << (int)c <<                        ' ');  	std::cout <<                        '\n'; }                    

then you do see the \r\n termination:

                              108 49 13 10 108 50 13 10                          

The interesting one is:

                      1
2
3
4
5
6
7
8
9
                                              #include <fstream>                        #include <iostream>                        #include <string>                        int                        main() { 	std::ifstream ifs("tt.txt", std::ios::binary);                        for                        (std::string s; std::getline(ifs, s); std::cout <<                        "!"                        << s <<                        "!\n"); }                    

which displays:

which at first glance seems OK. But it's not. If you look at the code, the displayed string should be delimited on output by !. But the output only shows ! at the start of the line, not at the end. This is because in binary mode the \r before the \n is not 'translated' by Windows but becomes part of the read string and \r is return which moves the cursor to the start of the line and the outout ! is then written over the existing ! Yikes!!

What in text mode if the line termination is just \n instead of \r\n?

No problem. This still works as expected. if there's no \r before the \n then the OS does no conversion.

But what about \n\r in text mode? This is a problem. The \n is treated as line terminator but the following \r is treated as the first char of the next line!

So reading a text file in Windows in text mode is OK if the line terminator is either \n or \r\n.

PS. Why \r\n and not \n\r? This goes back to the age of mechanical tele-typewriters. \r caused the mechanical head to move back to the left hand side - which took time. If a char was output during this movement then it wasn't necessarily printed on the left as expected - but somewhere. If a \r was followed by \n (line feed - physically advance the paper by 1 line) then this paper advancement could be done during the time it took for the head to move to the left. So the char to be printed after \r\n printed as expected. Using \n\r didn't work the same!

Last edited on

As these could easily get confused, there is a difference between C function getline() and C++ function std::getline() :

getline() reads an entire line from stream, storing the address of the buffer containing the text into *lineptr . The buffer is null-terminated and includes the newline character , if one was found.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdelim.html
https://man7.org/linux/man-pages/man3/getline.3.html
https://www.gnu.org/software/libc/manual/html_node/Line-Input.html

std::getline (string)
Extracts characters from is and stores them into str until the delimitation character delim is found (or the newline character, '\n', for (2)).
If the delimiter is found, it is extracted and discarded (i.e. it is not stored and the next input operation will begin after it).

https://cplusplus.com/reference/string/string/getline/

________

To answer the original question:

For a "blank" line in the input file, the getline() function would give you a string containing a single new-line ( '\n' ) character, as always followed by a terminating NULL character. Conversely, the std::getline() function would give you an empty std::string object.

As others have pointed out, unless you open the file in "binary" mode, the differences between \r\n and \n line-endings are ignored; all line-endings (e.g. \r\n or \n ) are translated into a single '\n' character. That happens before getline() processes the input data!

Last edited on

deleonannert.blogspot.com

Source: https://cplusplus.com/forum/general/284742/

0 Response to "Continue Reading File After Empty Line C"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel