Previous |
Next |
Almost all non trivial programs need to do Input and Output (I/O). Input/Output is such an important subject that entire books have been written dedicated purely to Java I/O. Examples of I/O are Word Processors that read document files, reading configuration files and network programs that read information from the Internet. The programs you have seen so far have performed output to the console using the System.out.println method. This is fine for very small demonstration programs but for real world programs you need to know about the built in Java.io classes. Java probably does a much better job of I/O than most other programming languages, but there are times when it can still seem quite clumsy.
Because I/O is such a big subject I will cover just the basics here before returning to the subject later on. Java uses a concept of Streams to perform I/O. The idea of a stream is an analogy from real life such as with a streams in the real world. You use the Java I/O classes to create streams that can be used to read and write the contents of files.
If you are going to learn about I/O you need to know about exception handling. Exceptions are conditions that should not happen, i.e. exceptions to the expected situation. An example of an exception is where you attempt to read a file from a disk but find that the file does not exist. This can happen when the disk is removable and it has unexpectedly been removed. With many programming languages it is entirely up to the job of the programmer to write exception handling code, and as a result programmers sometimes forget about it.
Java tries to gently push you in the direction of always handling exception by ensuring that the methods in the key classes require you to handle the exceptions they are likely to throw. Of course you can handle the exceptions by simply ignoring them, but by the time you have put in code to ignore them, you may as well actually handle them properly.
They keywords for handling exceptions are try and catch. You might like to think of these as "try to do this task" and if there is an exception "catch the exception and do something about it". The try and catch keywords surround the code with curly braces, so you will often read about "try/catch blocks". Here is an example of the use of a try/catch block
try{
DataInputStream dis = new DataInputStream(System.in);
is.readChar();
}catch(Exception e){
System.out.println(e.getMessage());
}I have always found the format of the catch part easy to get wrong. Note that the type of the exception is encased in parenthesis and you have to make up a variable name for the exception you are catching. Within the following brackets you need to place the code that should be run if the exception occurs. In this case I have used the getMessage method of the Exception. This is a common way of sending out a meaningful message indicating what has happened. Typically this might be "File not found".
In this example I have only caught the great grandparent of all Exceptions, good programming practice suggests you should try to catch the specific exceptions that might happen, such as FileNotFound or whatever. As you learn more about I/O you will quickly become familiar with the most likely Exceptions that can occur.
Here is a complete program that will read in the text of its own code and send it to the console.
1.import java.io.*;
2.public class FileOut{
3. public static void main(String argv[]){
4. FileOut f = new FileOut();
5. f.go();
6. }
7. public void go(){
8. try{
9. FileReader fr = new FileReader("FileOut.java");
10. int ch;
11. while((ch = fr.read())> -1){
12. System.out.print((char) ch);
13. }
14. }catch (Exception e){
15. System.out.println(e.getMessage());
16. }
17. }
18.}
Note how on line 4 the program makes a call to new to create a reference to an instance of this class. Without creating an instance of itself a program cannot call any of its own methods. On the following line, line 5 it calls the go method to get things started.
The first line of the go() method is line 8 of the program which starts a try block. This is because we are about to run some code that might throw an exception
The second instance of the use of the new keyword occurs with the creation of a reference to an instance of the FileReader class. The FileReader class is available to the program because of the use of the import statement at the very start of the program.
import java.io.*;
The use of the import statement is a little like using a path statement in DOS. It doesn't actually include or import any code, it just says that if Java is looking for any classes it should look to see if it can find them in the packages stated in the import statements. At first the number of packages available in Java can seem to be a bit daunting, but they are divided up into quite sensible and predictable areas such as io, sql, and classes covering the various ways of manipulating graphics. Remembering the names of the methods that classes use is a bit more tricky.
Note how the instance of the FileReader class is fed the name of a file to read. This is known as a constructor parameter. Constructors are special methods that have the same name as the class itself. In this case we have used the version of the class constructor that takes a String value of the name of the file the class is going to manipulate.
On line 10 the program declares a variable with the name ch and a type of int. The following three lines loop around the instance of the FileReader class reading in characters and sending them to the console. The read method of the FileReader classes returns the value of each character unless it is at the end of the file, in which case it returns -1. A bit of a fudge is required to get at the values of each character because the read method of the FileReader class is an int, which is a signed 32 bit number. What we want to output is a char which is an unsigned 16 bit number. This conversion is performed by using what is known as a cast, i.e. the type of an item of data is forced into another type. This is done simply by putting parenthesis round the name of the datatype you want to convert to.
19. FileReader fr = new FileReader("FileOut.java");
20. int ch;
21. while((ch = fr.read())> -1){
22. System.out.print((char) ch);
23. }
In the previous examples the exceptions were handled approximately where they occur. This can lead to some fairly complex code. For instance if you are doing a large amount of I/O you will end up with huge amounts of try/catch blocks that can soon make your code hard to read.
The use of the throws keyword allows you to pass exceptions up the stack to calling methods. The throws keyword is appended to a method name and is followed by the Exceptions that might be thrown. Thus typically you might have a method with a file operation thus
public void amethod() throws IOException{
FileOutputStream fout = new FileOutputStream("test.txt");
fout.close();
}
}Notice how the throws clause comes after the parameter parenthesis and before the curly brackets. This sample code simply creates a file in the underlying operating system called test.txt. It is the close method of the FileOutputStream class that actually writes the file to the operating system. Without the throws clause this code would not even compile. Note that the calling method must catch the declared exception.
To allow the creation of complex I/O processing Java uses the concept of "chaining" streams. This means that an instance of one Stream is passed as a parameter to the constructor of another. You can see this in action in the following example.
import java.io.*;
public class Stio {
public static void main(String argv[]){
try{
File f = new File("Output.txt");
FileOutputStream fos= new FileOutputStream(f);
OutputStreamWriter out = new OutputStreamWriter(fos);
out.write("Hello World");
out.close();
}catch(Exception e){}
}
}The FileOutputStream class deals with the actual business of opening a file for writing and the OutputStreamWriter deals writing to the file. The OutputStreamWriter class was added with the JDK 1.1 and can take a constructor that takes a String to allow for handling character sets apart from the current default. This way you can process files that contain Unicode character sets such as Chinese or Cyrilic. The writer classes understand the idea of data as text rather than simply as a sequence of bytes that might be numbers or anything at all.
Note that the File class is a bit misleading as you might expect it exclusively refer to an actual file and to be concerned with writing to and from a physical file. By using Stream chaining you can assemble file processing functionality from multiple classes, rather than needing a huge range of discreet file processing classes.
Processing text is an important programming language context. Processing text in java is done by the Reader and Writer classes which are in the java.io package. They are capable of parsing any character encoding that Java supports including the Unicode set that encompasses just about every character set on the planet. For the purpose of this course I will concentrate on the ASCII character set. The Reader and Writer classes work in a similar way to the stream classes, except that they are character based rather than byte based.
The following program uses the FileReader and FileWriter classes to open a file, read in the contents and then write them out to a new file. Note that it is not the creation of the File with the “Output.txt” constructor that creates the output file, it is the creation of the fwout instance of the FileWriter class.
/**
*Demonstrating FileReader
*and FileWriter
*@author Marcus Green
**/
import java.io.*;
public class Self{
public static void main(String[] args) throws IOException{
File fin = new File("Self.java");
File fout = new File("Output.txt");
/*FileReader takes an instance of File */
FileReader frin = new FileReader(fin);
FileWriter fwout = new FileWriter(fout);
int c;
while((c=frin.read()) !=-1){
fwout.write(c);
}
fwout.close();
}
}
The while loop continues execution until the read method of FileReader returns -1 when it exits the loop. Note how the FileWriter fwout is closed once the loop ceases execution. In a trivial program like this closing the file doesn't make much difference, but if you were processing large numbers of files, you would get an accumulation of resources that could impact performance if you fail to close a stream at the end of using it.
Early versions of java were was limited in its support for the type of text processing typified by the Perl language. Later versions have introduced such concepts as. It was only with JDK1.4 and 1.5 that Java acquired the type of text manipulation features that made Perl such a popular language when early web sites were developed. The java.utilScanner class can parse strings using regular expressions. Regular expressions are a way of manipulating strings with single statements rather than looping through each character. If you have ever used the SQL language you can consider regular expressions to be similar to the way SQL processes data declaratively rather than in a record by record approach.
The following code performs a search on a string to see if it contains the @ symbol used in email addresses. It is not hard to imagine this sort of code being used to validate the content of a web page field.
import java.util.regex.*;
public class Reg{
private static Pattern pattern;
private static Matcher matcher;
public static void main(String argv[]){
new Reg();
}
Reg(){
pattern = Pattern.compile("@");
matcher = pattern.matcher("someuser@someisp.com");
if(matcher.find()){
System.out.println("match");
System.out.println(matcher.start());
}
}
}For a brainless search like this you could use the standard String API method charAt, but the regular expression API is capable of some very sophisticated searching and matching.
Meta characters are characters that stand in place for characters, in a similar way that the character x is used to stand for an unknown value in algebra. To take the example used previously it is possible to imagine a situation where the @ symbol is being used is some context other than an email address (OK it is unlikely but possible). So what you want to search for is
string@string
/*Searching within strings using
* the dot . meta character
*author Marcus Green
*/
import java.util.regex.*;
public class FindAt{
private static Pattern pattern;
private static Matcher matcher;
public static void main(String argv[]){
/* simple search will find @ in both strings */
new FindAt("@","someuser@someisp.com");
new FindAt("@","@someisp.com");
/*Use the dot metacharacter only finds embedded @ */
new FindAt(".@.","someuser@someisp.com");
/*result in no match */
new FindAt(".@.","@someisp.com");
}
FindAt(String regex, String text){
pattern = Pattern.compile(regex);
matcher = pattern.matcher(text);
if(matcher.find()){
System.out.print("match ");
System.out.println(matcher.start());
}else{
System.out.println("no match");
}
}
}
The output from this code will be
Java FindAt match 8 match 0 match 7 no match
The following is taken directly from the JavaDoc api of the pattern class
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
. Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w]
Imagine you are searching for an National Insurance number as used in the UK. Mine has the format of two letters followed by six numbers then one letter, so I am going to assume that is the general format of all NI numbers. This can be converted into a regular expression as
\w\w\d\d\d\d\d\d\w
But what if some people have taken to putting dashes between the different sections, so it may be represented as
WK345676A
or WK-34576-A
Also it will make life easier if instead of hard coding my text and pattern I can pass them from the command line. This
/*Searching within strings using
* parameters passed from
* command line
*author Marcus Green
*/
import java.util.regex.*;
public class RegEx{
private static Pattern pattern;
private static Matcher matcher;
public static void main(String argv[]){
if(argv.length < 2){
System.out.println("Usage: java RegEx regex string");
}else{
new RegEx(argv[0],argv[1]);
}
}
RegEx(String regex, String text){
pattern = Pattern.compile(regex);
matcher = pattern.matcher(text);
if(matcher.find()){
System.out.print("match ");
System.out.println(matcher.start());
}else{
System.out.println("no match");
}
}
}
The command line
java RegEx "\w\w\d\d\d\d\d\d\w" "Wk345676A"
Will result in an output of match, but the command line of
java RegEx "\w\w\d\d\d\d\d\d\w" "Wk-345676-A"
Will result in no match. Of course you can chage the regular expression to include the dashes but then there will be no matching the version without dashes. What you need is the ability to match with dashes or without dashes, which is where the quantifiers come in.
Like the name implies quantifiers are used to indicate how many times a match can be made. So you might want to match once or more times, and the case that struck me as odd, zero or many times. Take the National Insurance code with dashes, and see that what you want is to match a dash when it occurs, but also when it does not occur (a zero match). The following example is a regular expression using the ? Operator that will match with or without dashes.
java RegEx "\w\w-?\d\d\d\d\d\d-?\w" "Wk-345676-A"
Note how the quantifier ?, comes after the character to be matched. Another example where zero or more matches could be required is for UK/US spelling. In the UK the words labour and colour have an additional u. You could use the match
labou?r and colou?r to match both the US and UK spellings.
The + operator matches against one or more occurrences of a character. Thus the expression
m?
Will match against marcus but the expressions
z+
will not
Sometimes you might want to match several different sub strings, for example if you are looking for a web url it can begin with http or https or ftp. The following pattern will match any of those and it uses the parenthesis to group the match.
"(http|https|ftp).*
Curly braces can be used to specify numeric quantifiers. For example you could search text for duplications of the string “the” with a regular expression
(the){3}
This will search for occurrences of thethe. Of course you could simply do a direct search for the string thethe, but for a longer string or if you were generating the regular expressions programmatically this technique could be very useful
The word class is not used in the Object Oriented sense in this context, but simply a category of characters. They are used to select by criteria such as “search for strings that only contain numbers” or the following upper case letters within a string.
The following pattern will match the names smith and smyth but not smoth
sm[iy]th
The pattern effectively says find any string that starts with sm and the next two characters can by i or y . The expression
sm[ihty]
would also match smith and smyth.
A character class can include metacharacters, for example you could specify the inclusion of at least one letter and one digit.
You can limit the matches to a smaller subset by including a group within the spare brackets. Thus you could search for four letter words with a vowel at the second position thus
"\w[aeiou]\w\w"
This will match words with any starting character, followed by a vowel, followed by two more word characters. Thus it will match come but not glee. You can have some childish fun with seeing how it matches various swear words :-).
There are times when you want to treat metacharacters as “ordinary” text. Thus if you were looking through this text and wanted to find the string colou?r you can escape the question mark character with the backslash character thus
colo\?r
You can escape the backslash literal with two backslashes.
According to the Java API docs for the Formatter class it is
“An interpreter for printf-style format strings”.
This is a reference to the printf method in the standard C library. If you do not have a background in the C programming language knowing this may not be helpful. The format method uses an idiom slightly different to that found in most methods. It takes formatting characters embedded in a string to control the way other parameters are output. To give an example.
import java.util.*;
public class FormatOut {
public static void main(String args[]) {
Formatter f = new Formatter();
f.format("String %s decimal %d float %f", "mystring", 10, 45.1);
System.out.println(f);
}
}Note how the formatting characters %s %d and %f are embedded in the string of the first parameter. The output of this program will be
String mystring decimal 10 float 45.100000
It is possible to use formatting strings to peform a an awesome amount of manipulation and conversion. For example it is possible to pad out numbers, change the number of decimal places and to manipulate the way dates and times are shown. However the exam objective explicitly says that your understanding can be limited only to the following specifiers
%b, %c, %d, %f, %s
The %b specifiers is a little odd. According to the API documents
“If arg is a boolean or Boolean, then the result is the string returned by String.valueOf()”
This means a boolean or Boolean (the wrapper version) of true will output true, and false will output false.
The feature that surprised me was that if the argument is null you will get an output of false. Any other argument will result in an output of true. This seems a little like the functionality of other languages where any positive value can be taken to be true.
|
|
The %b specifier will output true for any non null, non false argument. |
The %c specifier formats the argument as a Unicode character representation. Because the standard character set used on computers in the west (the ASCII set) is only one byte in size it is unable to represent all possible characters in other languages. The Unicode character set is a multi byte set that allows almost any characters to be represented. The following format string will output the @ character, as 0040 is the Unicode representation of that character.
format("%c",'\u0040');
The Scanner class makes it easier to break up input into manageable data. The API documentation give a rather odd example of using the word fish as a delimiter. I suspect that more commonly encountered delimiters would be either a simple blank space or one of the characters used when exporting data such as a comma, or pipe symbol (|). To give a trivial example with the default white space delimiter.
import java.util.*;
public class GetNumbers{
static int num[] = new int[3];
public static void main(String argv[]){
Scanner scan = new Scanner(System.in);
for(int i = 0; i < 3; i++){
num[i] = scan.nextInt();
}
}
}If you run this code it will simply take three int values from the command line.
The following code is a little more interesting and is adapted from the example given in the API docs that uses a delimiter of “fish”
import java.util.*;
public class ScanIn{
public static void main(String argv[]){
new ScanIn(argv);
}
ScanIn(String argv[]){
System.out.println("Input: "+argv[0]);
Scanner s = new Scanner(argv[0]);
System.out.println("Delimiter: "+argv[1]);
s.useDelimiter(argv[1]);
System.out.println(s.nextInt());
System.out.println(s.nextInt());
}
}If you run this code with the command line
java ScanIn "1 , 2 , " " , "
The output will be
Input: 1 , 2 , Delimiter: , 1 2
However, the delimiter will only match a single space character, what if you have a slightly irregular file that sometimes has more than one space between the numbers. The Scanner can use just about any standard regular expression, so you can use
java ScanIn “1 , 2, “ “\s*,\s*”
Note that if you need to use any expression with regular expression meaning, e.g. The bar character sometimes used as a separator for data you will need to use the backslash character to “escape” it. Thus
java ScanIn "1 | 2 | " "\s*\|\s*"
Will parse the input and output only the numbers
Java FAQ on I/O
http://www.javafaq.com/cjioa.html
Previous |
Next |