The aim of this assignment is for you to undertake a neural network exercise in simplified CAPTCHA
recognition. You are recommended to use Weka for this part of the assignment but you can also use
Matlab if you wish. Matlab has some advanced features but is also difficult to use for new users.
For COMP701 students, this is your second assignment worth 50% of the overall course mark (the
first 50% is from your GA assignment). Do not attempt this assignment until you complete
workshop 6 on ANNs.
1. The first part of this assignment is as follows. You will design, develop, train and test a
neural network that can recognise 10 correctly formatted visual patterns of your choice.
You will therefore need to generate 10 training samples (all as binary input matrices)
containing 1s and 0s that represent the pixels of the visual patterns you are attempting to
Once you have trained your ANN to recognise the 10 patterns, you will test the ability of
your ANN to still correctly identify the test patterns after you have mutated the patterns at
least 10% of bits at a time. YOU MUST NOT RE-TRAIN YOUR ANN ON THE TEST PATTERNS.
You must use the same weights you obtained on the training set to test the mutated
patterns without any further changes to the weights.
This is the recommended procedure:
a. Generate 10 training samples. Don’t forget to add the desired output information.
Do not mutate any of these samples in the training set.
b. Design an ANN for training on these 10 patterns. You must use the ‘Use training set’
option here after reading the patterns into the neural network using the ‘Explorer’
tab. Run the ANN on your character training set. All training is finished at this stage.
c. Once you have trained the network, generate three test files from your training
patterns, as follows:
i. Test1 that contains the same 10 patterns as the training set, except that at
least 10% of ‘1’ bits somewhere in each input pattern are changed to ‘0’, or
‘0’ is changed to ‘1’.
ii. Test2 that contains the same 10 patterns as the training set, except that 20%
‘1’ bits somewhere in each input patters are changed to ‘0’, or 20% ‘0’s are
changed to ‘1’.
iii. Test3 that contains the same 10 patterns as the training set, except that 30%
‘1’ bits somewhere in each input pattern are changed to ‘0’, or 30% ‘0’s are
changed to ‘1’.
d. Test your previously trained network first with Test1, then Test2 and finally Test3.
You must enter these files in the ‘Supplied test set’ option. Make a note of the
results (i.e. the accuracy and other measures of the test sets). DO NOT RE-TRAIN
YOUR ANN ON THE TEST SET EXAMPLES!
e. Write a report (maximum 6 pages) which includes details of how you generated your
training set, how you represented the patterns in pixel matrix, how effective your
ANN was at learning the training patterns including details of all parameters used,
and how accurate your trained ANN was on the three test sets. In your conclusion
evaluate what you have done, including references to the ability of the network to
degrade gracefully in the face of noise.
2. Consider one or more of the following variations:
a. Amend the ANN architecture so that the network returns good results despite the
increasing severity of changes to the test patterns;
b. Identify at what point the ANN fails to recognise characters, no matter what you do
to try to improve the architecture.
c. Instead of removing or adding bits at random for your test set, remove and add bits
d. You may wish to compare your ANN results with another of the methods under
Functions in Weka.
3. If you are interested in using a different character set, read
You will find example .arff files containing all 26 English capital letters in Canvas under the
relevant Workshop. These files are meant to guide your dataset construction. If, however, you
wish to use these arff files for your own assignment, please be aware that you may not get the
best marks for that part of the marksheet dealing with ‘Generating training and test sets’. See
the final page of this handout for the marksheet.
Further information now follows on data file structure.