Tags: computer science university, correct mistakes, current technology, department of computer science, document format, duction, generalisation, imprecision, infor, information extraction, portobello street, regent court, rule induction, s1 4dp, sgml tags, shef ac uk, sheffield uk, symbolic rules, two steps, university of sheffield,
Adaptive Information Extraction from Text
by Rule Induction and Generalisation
Fabio Ciravegna
Department of Computer Science, University of Sheffield
Regent Court, 211 Portobello Street,
S1 4DP Sheffield, UK
F.Ciravegna@dcs.shef.ac.uk
Abstract scenarios [Cardie 1997; Miller et al. 1998; Yangarber et
al. 2000]. Given the current technology, IE experts are
(LP)2 is a covering algorithm for adaptive Infor-
still necessary.
mation Extraction from text (IE). It induces
In the last years, the increasing importance of the Inter-
symbolic rules that insert SGML tags into texts
net has stressed the central role of texts such as emails,
by learning from examples found in a user-
Usenet posts and Web pages. In this context, extralin-
defined tagged corpus. Training is performed in
guistic structures (e.g. HTML tags, document format-
two steps: initially a set of tagging rules is
ting, and ungrammatical stereotypical language) are
learned; then additional rules are induced to
elements used to convey information. Linguistically in-
correct mistakes and imprecision in tagging. In-
tensive approaches are difficult or unnecessary in such
duction is performed by bottom-up generaliza-
cases. For this reason a new research stream on adaptive
tion of examples in the training corpus. Shallow
IE has arisen at the convergence of NLP, Information
knowledge about Natural Language Processing
Integration and Machine Learning. The goal is to pro-
(NLP) is used in the generalization process. The
duce IE algorithms and systems adaptable to new Inter-
algorithm has a considerable success story.
net-related applications/scenarios by using only an ana-
From a scientific point of view, experiments re-
lyst's knowledge (i.e. knowledge on the domain/scenario
port excellent results with respect to the current
itself) [Kushmerick 1997; Califf 1998; Muslea et al.
state of the art on two publicly available cor-
1998; Freitag and McCallum 1999; Soderland 1999;
pora. From an application point of view, a suc-
Freitag and Kushmerick 2000]. Such algorithms are very
cessful industrial IE tool has been based on
effective when applied on highly structured HTML
(LP)2 . Real world applications have been devel-
pages, but less effective on unstructured texts (e.g. free
oped and licenses have been released to external
texts). In our opinion this is because most successful
companies for building other applications. This
algorithms make scarce (or no) use of NLP, tending to
paper presents (LP)2, experimental results and
avoid any generalization over the flat word sequence.
applications, and discusses the role of shallow
When they are applied to unstructured texts, data
NLP in rule induction.
sparseness becomes a problem.
This paper presents (LP)2, an adaptive IE algorithm de-
1. Introduction signed in this new stream of research that makes use of
By general agreement the main barriers to wide use and shallow NLP in order to overcome data sparseness when
commercialization of Information Extraction from text confronted with NL texts, while keeping effectiveness
(IE) are the difficulties in adapting systems to new ap- on highly structured texts. This paper first introduces the
plications. The classical IE has been focusing on appli- algorithm, discusses experimental results and shows how
cations to free texts; therefore systems often rely on ap- the algorithm compares successfully with the current
proaches based on Natural Language Processing (NLP) state of the art. The role and importance of shallow NLP
(e.g. using parsing) [Humphreys et al. 1997; Grishman for overcoming data sparseness is then discussed. Fi-
1997]. Most systems require the manual development of nally a successful industrial system for adaptive IE built
resources (e.g. grammars) by a user skilled in NLP [Ci- around (LP)2 is presented and some conclusions and fu-
ravegna 2000]. There is an increasing interest in apply- ture work are drawn.
ing machine learning (ML) to IE in order to build adap-
tive systems. Up to now, the use of ML has been ap- 2.The Rule Induction Algorithm
proached mainly in an NLP-oriented perspective, i.e. in
(LP)2 learns from a training corpus where a user has
order to reduce the amount of work to be done by the
highlighted the information to be extracted with differ-
NLP experts in porting systems across free text based
ent SGML tags. It induces symbolic rules that insert (3) cover different parts of input1; (4) have an error rate
SGML tags into texts in two steps: that is less than a specified threshold.
1. Sets of tagging rules are induced by bottom-up
generalization of tag instances found in the training word Condition Associated information Action
corpus. Shallow knowledge about NLP is used in the index word lemma LexCat case SemCat Tag
generalization process. 1 the the Art low
2. Correction rules are induced that refine the tag- 2 seminar seminar Noun low
ging by correcting mistakes and imprecision 3 at at Prep low
This section presents and discusses these two steps. 4 4 4 Digit low
5 pm pm Other low timeid
2.1 Inducing Tagging Rules 6 will will Verb low
A tagging rule is composed of a left hand side, contain- Table 1: Starting rule (with associated NLP knowledge)
ing a pattern of conditions on a connected sequence of inserting in the sentence ``the seminar at
words, and a right hand side that is an action inserting an 4 pm will...''.
SGML tag in the texts. Each rule inserts a single SGML
tag, e.g. . This makes (LP)2 different from The other generalizations are discarded. Retained rules
many adaptive IE algorithms, whose rules recognize become part of the best rules pool. When a rule enters
whole slot fillers (i.e. insert both and the best rules pool, all the instances covered by the rule
[Califf 1998; Freitag 1998]) or even multi are removed from the positive examples pool, i.e. cov-
slots [Soderland 1999]. The tagging rule induction algo- ered instances will no longer be used for rule induction
rithm uses positive examples from the training corpus ((LP)2 is a covering algorithm). Rule induction continues
for learning rules. Positive examples are the SGML tags by selecting new instances and learning rules until the
inserted by the user. All the rest of the corpus is consid- pool of positive examples is void.
ered a pool of negative examples. For each positive ex- Word Condition Action
ample the algorithm: (1) builds an initial rule, (2) gener- index Word Lemma LexCat Case SemCat Tag
alizes the rule and (3) keeps the k best generalizations of 3 at
the initial rule. In particular (LP)2's main loop starts by 4 digit
selecting a tag in the training corpus and extracting from 5 timeid
the text a window of w words to the left and w words to Table 2: A generalization for rule in table 1. The pattern is
the right. Each information stored in the 2*w word win- relaxed in length (conditions on words 1, 2 and 6 were re-
dow is transformed into a condition in the initial rule moved) and conditions on the other words were substituted
pattern, e.g. if the third word in the window is "semi- by other constraints.
nar", the condition on the third word in the pattern will
be word="seminar". Each initial rule is then general- 2.1.1 Learning Contextual Rules
ized. In the generalization process (LP)2 uses generic When applied on the test corpus, the best rules pool pro-
shallow knowledge about Natural Language as provided vides good results in terms of precision, but limited ef-
by a morphological analyzer, a POS tagger and a user- fectiveness in terms of recall. This means that such rules
defined dictionary (or a gazetteer). A lexical item (LexIt insert few tags (low recall), and that such tags are gener-
in the following) summarizes such knowledge for each ally correct (high precision). Intuitively this is because
word (e.g., companies) via: a lemma (company), a lexi- the absolute reliability required for rule selection is
cal category (noun), case information (lowercase) and a strict, thus only some of the induced rules will match it.
list of user defined classes as defined by a user-defined In order to reach acceptable effectiveness, it is necessary
dictionary or a gazetteer (if available). An initial rule to identify additional rules able to raise recall without
and associated information in the LexIts is in Table 1. affecting precision. (LP)2 recovers some of the rules not
Generalization consists in the production of a set of selected as best rules and tries to constraint their appli-
rules derived by relaxing constraints in the initial rule cation to make them reliable. Constraints on rule appli-
pattern. Conditions are relaxed both by reducing the cation are derived by exploiting interdependencies
pattern in length and by substituting constraints on among tags. As mentioned, (LP)2 learns rules for insert-
words with constraints on some parts of the additional ing tags (e.g., ) independently from other tags
knowledge. Table 2 shows one of the many generaliza- (e.g., ). But tags are not independent. There
tions for rule in table 1. The last step of the algorithm is are two ways in which they can influence each other: (1)
the selection of the best generalizations. Each generali- tags represent slots, therefore always requires
zation is tested on the training corpus and an accuracy ; (2) slots can be concatenated into linguistic
score L=wrong/matched is calculated. For each initial patterns and therefore the presence of a slot can be a
instance the k best generalizations are kept that: (1) re- good indicator of the presence of another, e.g.
port better accuracy; (2) cover more positive examples;
1
Rules derived from the same seed cover the same portions
of input when ineffective constraints are present.
can be used as "anchor tag" for inserting It learns from the mistakes made in applying tagging
2. In general it is possible to use to in- rules on the training corpus. Shift rules consider tags
troduce . (LP)2 is not able to use such contextual misplaced within a distance d from the correct position.
information, as it induces single tag rules. The context is Correction rules are identical to tagging rules, but (1)
reintroduced in (LP)2 as an external constraint used to their patterns match also the tags inserted by the tagging
improve the reliability of unreliable rules. In particular, rules and (2) their actions shift misplaced tags rather
(LP)2 reconsiders low precision non-best rules for appli- than adding new ones. An example of an initial correc-
cation in the context of tags inserted by the best rules tion rule for shifting in "at 4
only. For example some rules will be used only to close pm'' is shown in table 4.
slots when the best rules were able to open it, but not The induction algorithm used for the best tagging rules
close it (i.e., when the best rules are able to insert is also used for shift rules: initial instance identification,
but not ). Selected rules are called generalization, test and selection. "Wrong Tag" and
contextual rules. As example consider a rule inserting a "Correct Tag" conditions are never relaxed. Positive
tag between a capitalized word and a low- (correct shifts) and negative (wrong shifts of correctly
ercase word. This is not a best rule as it reports high assigned tags) are counted. Shift rules are accepted only
recall/low precision on the corpus, but it is reliable if if they report an acceptable error rate.
used only to close an open . Thus it will only
be applied when the best rules have already recognized 3. Extracting Inf ormation
an open , but not the corresponding In the testing phase information is extracted from the
. Area of application is the part of the text test corpus in four steps: initial tagging, contextual tag-
following a and within a distance minor or ging, correction and validation. The best rule pool is
equal to the maximum length allowed for the present initially used to tag the texts. Then contextual rules are
slot3. "Anchor tags" used as contexts can be found either applied in the context of the introduced tags. They are
to the right of the rule space application (as in the case applied until new tags are inserted, i.e. some contextual
above when the anchor tag is ), or to the left rules can match also tags inserted by other contextual
as in the opposite case (anchor tag is ). rules. Then correction rules correct some imprecision.
Detailed description of this process can be found in [Ci- Finally each tag inserted by the algorithm is validated.
ravegna 2000a]. Reliability for contextual rules is com- There is no meaning in producing a start tag (e.g.
puted by using the same error rate used for best rules, ) without its corresponding closing tag
but only matches in controlled contexts are counted. () and vice versa, therefore uncoupled tags
In conclusion the sets of tagging rules (LP)2 induces are are removed in the validation phase.
both the best rule pool and the contextual rules. Figure 3
shows the whole algorithm for tagging rule induction. Condition Additional Information
Loop for instance in initial-instances word Wrong tag correct tag lemma LexCat case SemCat
unless already-covered(instance)
loop for rule in generalise(instance) at at prep low
test(rule) 4 4 digit low
if best-rule?(rule) pm pm other low timeid
then insert(rule, bestrules) Table 4: A a correction rule. The action (not shown) shifts
cover(rule, initial-instances) the tag from the wrong to the correct position.
else loop for tag in tag-list
if test-in-context(rule,tag,:right) 4. Experimental Results
then select-contxtl(rule,tag,:right)
if test-in-context(rule,tag,:left) (LP)2 was tested in a number of tasks in two languages:
then select-contxtl(rule,tag,:left) English and Italian. In each experiment (LP)2 was trained
Figure 3: The final algorithm for rule tagging induction. on a subset of the corpus (some hundreds of texts, de-
pending on the corpus) and the induced rules were tested
2.2 Inducing Correction Rules on unseen texts. Here we report about results on two
Tagging rules when applied on the test corpus report standard tasks for adaptive IE: the CMU seminar an-
some imprecision in slot filler boundary detection. A nouncements and the Austin job announcements4. The
typical mistake is for example "at 4 first task consists of uniquely identifying speaker name,
pm", where "pm" should have been part of the starting time, ending time and location in 485 seminar
time expression. For this reason (LP)2 induces rules for announcements [Freitag 1998]. Table 5 shows the over-
shifting wrongly positioned tags to the correct position. all accuracy obtained by (LP)2, and compares it with that
obtained by other state of the art algorithms. (LP)2 scores
2
In the following we just make examples related to the first the best results in the task. It definitely outperforms
case as it is more intuitive. other symbolic approaches (+8.7% wrt Rapier[Califf
3
The training corpus is used for computing the maximum filler
4
length for each slot. Corpora available at www.isi.edu/muslea/RISE/index.html
1998], +21% wrt to Whisk[Soderland 1999]), but it also alization via shallow NLP, (3) the use of single tag rules
outperforms statistical approaches (+2.1% wrt BWI and (4) the use of correction.
[Freitag and Kushmerick 2000] and +4% wrt HMM (LP)2 induces rules by instance generalization. Generali-
[Freitag and McCallum 1999]). Moreover (LP)2 is the zation is also used in SRV, Rapier and Whisk. It allows
only algorithm whose results never go down 75% on any reducing data sparseness by capturing some general as-
slot (second best is BWI: 67.7%). pects beyond the simple flat word structure. Shallow
NLP is the basis for generalization in (LP)2. Morphology
(LP)2 BWI HMM SRV Rapier Whisk allows overcoming data sparseness due to num-
speaker 77.6 67.7 76.6 56.3 53.0 18.3 ber/gender word realizations, while POS tagging infor-
location 75.0 76.7 78.6 72.3 72.7 66.4 mation allows generalization over lexical categories. In
stime 99.0 99.6 98.5 98.5 93.4 92.6 principle such type of generalization produces rules of
etime 95.5 93.9 62.1 77.9 96.2 86.0 better quality than those matching the flat word se-
All Slots 86.0 83.9 82.0 77.1 77.3 64.9 quence, rules that tend to report better effectiveness on
Table 5: F-measure (=1) obtained on CMU seminars. Results unseen cases. This is because both morphology and POS
for algorithms other than (LP)2 are taken from [Freitag and tagging are generic NLP processes performing equally
Kushmerick 2000]. We added the comprehensive ALL SLOTS well on unseen cases; therefore rules relying on their
figure, as it allows better comparison among algorithms. It was results apply successful on unseen cases. This intuition
computed by: was confirmed experimentally: (LP)2 with generalization
slot (F-measure * number of possible slot fillers) ((LP)2G) definitely outperforms a version without gener-
100 alization ((LP)2NG) on the test corpus, while having com-
slot number of possible slot fillers
Concerning (LP)2 results from a 10 cross-folder experiment using parable results on the training corpus (+57% on the
half of the corpus for training. F-measure calculated via the speaker field, +28% on the location field, +11% overall
MUC scorer [Douthat 1998]. Average training time per run: 56 on the CMU task). Moreover in (LP)2G the covering algo-
min on a 450MHz computer. Window size w=4. rithm converges more rapidly than in (LP)2NG, because its
rules tend to cover more cases. This means that (LP)2G
A second task concerned IE from 300 Job Announce- need less examples in order to be trained, i.e., rule gen-
ments taken from misc.jobs.offered [Califf 1998]. The eralization also allows reducing the training corpus size.
task consists of identifying for each announcement: mes- Not surprisingly the role of shallow NLP in the reduc-
sage id, job title, salary offered, company offering the tion of data sparseness is more relevant on semi-
job, recruiter, state, city and country where the job is structured or free texts (such as the CMU seminars) than
offered, programming language, platform, application on documents with highly standardized language (e.g.
area, required and desired years of experience, required HTML pages, or the job announcement task). During the
and desired degree, and posting date. The results ob- rule selection phase (LP)2 is able to adopt the right level
tained on such a task are reported in table 6. (LP)2 out- of NLP information for the task at hand: in an experi-
performs both Rapier and Whisk (Whisk obtained lower ment on texts written in mixed Italian/English we used
accuracy than Rapier [Califf 1998]). We cannot compare an English POS tagger that was completely unreliable on
(LP)2 with BWI as the latter was tested on a very limited the Italian part of the input. (LP)2G reached the same ef-
subset of slots. In summary, (LP)2 reaches the best results fectiveness of (LP)2NG, because the rules using the unreli-
on both the tasks. able NLP information were automatically discarded.
This shows that the use of NLP is always a plus, never a
Slot (LP)2 Rapier BWI Slot (LP)2 Rapier minus.
id 100 97.5 100 platform 80.5 72.5 The separate recognition of tags is an aspect shared by
title 43.9 40.5 50.1 application 78.4 69.3 BWI, while HHM, Rapier and SRV recognized whole
company 71.9 69.5 78.2 area 66.9 42.4 slots and Whisk recognizes multislots. Separate tag
salary 62.8 67.4 req-years-e 68.8 67.1 identification allows further reduction of data sparse-
recruiter 80.6 68.4 des-years-e 60.4 87.5 ness, as it better generalizes over the coupling of slot
state 84.7 90.2 req-degree 84.7 81.5 start/end conditions. For example in order to learn pat-
city 93.0 90.4 des-degree 65.1 72.2 terns equivalent to the regular expression
country 81.0 93.2 post date 99.5 99.5 (`at'|`starting from')DIGIT(`pm'|`am'), (LP)2 just
language 91.0 80.6 All Slots 84.1 75.1 needs two examples, e.g., `at'+`pm' and `starting
Table 6: F-measure (=1) obtained on the Jobs do- from'+`am', because the algorithm induces two inde-
main using half of the corpus for training. pendent rules for (`at' + `starting from')
and two for (`am' + `pm'). In a slot-oriented
5. Discussion rule learning strategy four examples (and four rules) will
be needed, i.e. `at'+`pm', `at'+`am', `starting
(LP)2 's
main features that are most likely to contribute to from' +`pm', `starting from'+`am'. In a multislot
the excellence in the experiments are: (1) the induction approach the problem is worst and the number of train-
of symbolic rules (see the conclusions), (2) rule gener- ing examples needed increases drastically [Ciravegna
2000a]. Two other applications were developed for Kataweb, a
Another reason for the good experimental results relies major Italian Internet portal. The goal was to extract
in the use of a correction step. Correction is useful in information from both financial news and classified ads
recognizing slots with fillers with high degree of vari- written in Italian and published on the portal pages.
ability (such as the speaker in the CMU experiment), LearningPinocchio is used both to generate hyperlinks
while it does not pay on slots with highly standardized for cross-referencing texts and to retrieve texts querying
fillers (such as many slots in the Jobs task). (LP)2 using the content. The application is currently under final test
correction rules reports 7% more in terms of accuracy on at the customer's site. Table 8 shows experimental re-
than (LP)2 without correction. Imprecision sults on financial news.
in tagging was also reported by [Califf 1998] who noted
up to 5% imprecision on some slots (but she did not in- 7. Conclusions and Future Work
troduce any correction steps in Rapier).
Slot PRE REC F-measure Slot PRE REC F-measure (LP)2 is a successful algorithm. On the one hand it out-
Name 97 82 88.9 Email 92 71 80.1 performs the other state of the art algorithms on two
Street 96 71 81.6 Tel. 93 75 83.0 very popular IE tasks. It is important to stress the fact
City 90 90 90 Fax 100 50 66.6 that (LP)2 outperforms also statistical approaches, be-
Prov. 97 92 94.4 Zip 100 90 94.7 cause in the last years the latter largely outperformed
Zip 100 90 94.7 symbolic approaches. There is a clear advantage in using
Table 7: Results of a blind test on 50 resumees. This is not a symbolic rules in real world applications. It is possible
simple named entity recognition task. A resumee may contain to inspect the final system results and manually
many names and addresses (e.g. previous work addresses, add/modify/remove rules for squeezing additional accu-
name of referees or thesis supervisors and their addresses). racy (it was not done in the scientific experiments, but it
The system had to recognize the correct ones. was in the applications).
On the other hand (LP)2 was the basis for building
LearningPinocchio, a tool for building adaptive IE ap-
6. Developing real world applications plications that is having a considerable commercial suc-
(LP)2 was developed as a research prototype, but it cess. This shows that adaptive IE is able produce tools
quickly turned out to be suitable for real world applica- suitable for building real world applications by a final
tions. An industrial system based on (LP)2, LearningPi- user by using only analyst's knowledge.
nocchio, was developed. Recently LearningPinocchio Future work on (LP)2 will involve both the improvement
has been used in a number of industrial applications. of rule formalism expressiveness and the further use of
Moreover licenses have been released to external com- shallow NLP for generalization. Concerning the im-
panies for further application development. This section provement in rule formalism expressiveness we plan to
reports about some industrial applications we have di- include some forms of Kleene-star and optionality op-
rectly developed. The system is used for extracting in- erators. Such improvement has shown to be very effec-
formation from professional resumees written in English. tive in both BWI and Rapier. Concerning the use of
It is used on the results of a spider that surfs the Web to shallow NLP for generalization (i.e., one of the keys of
retrieve professional resumees. The spider classifies the success in (LP)2) there are two possible improve-
resumees by topics (e.g. computer science). LearningPi- ments. On the one hand (LP)2 will be used in cascade
nocchio extracts the relevant information and its output with a Named Entity Recognizer (also implemented by
is used to populate a database. Table 7 shows some re- using (LP)2). This will allow further generalization over
sults obtained in such task. Application development named entity classes (e.g., the speaker is a person, so it
time for the IE task required about 24 person hours for is possible to generalize over such class in the rules). On
scenario definition and revision (the scenario was re- the other hand (LP)2 is compatible with forms of shallow
fined by tagging some texts in different ways and dis- parsing such as chunking. It is then possible to preproc-
cussing among annotators). Further 10 person hours ess the texts with a chunker and to insert tags only at the
were needed for tagging about 250 texts. The rule in- chunk borders. This is likely to improve precision in
duction process took 72 hours on a 450MHz machine, border identification.
with window size w=4. Finally system results validation An interesting question concerns the limits of the tag-
required four person hours. ging-based IE approach used by many adaptive systems,
TAG F(1) TAG F(1) (LP)2 included. Classic MUC-like IE is based on tem-
Geograph Area 0.70 Organiz. Name 0.86 plate filling. Template filling is more complex than tag-
Currency 0.85 Company Share ging, as it implies to decide about both coreference of
Stock Exchange Name 0.85
expressions (are "John A. Smith" and "J. Smith" the
same person? Two seminars have been identified in a
Name 0.91 Type 0.92
text: are they separate events or are they coreferring?),
Index 0.97 Category 0.86
and slot pairing (two seminars and two speakers have
ALL SLOTS 0.87
been identified: which is the speaker of the first semi-
Table 8: Results of blind test on financial news (300 texts).
nar?). (LP)2 is able to apply default strategies for tem- Learning for Information Extraction, Berlin, August 2000.
plate merging that solve simple cases of coreferences (www.dcs.shef.ac.uk/~fabio/ecai-workshop.html)
and slot pairing. Such strategies are powerful enough to
cope with many real world tasks, but in some other cases [Douthat 1998] Aaron Douthat, `The message understanding
they are not effective enough. For example in the re- conference scoring software user's manual', in the 7th Mes-
sumees application the customer was interested in re- sage Understanding Conf., www.muc.saic.com
trieving also the triples degree/university/year. Learn- [Freitag 1998] Dayne Freitag, `Information Extraction from
ingPinocchio was able to correctly highlight such infor- HTML: Application of a general learning approach', Proc.
mation, but often it was not able to pair them correctly, of the 15th National Conference on Artificial Intelligence
therefore they were not used to populate the database,
(AAAI-98), 1998.
but only to index texts. Even if some ad hoc strategies
would have probably solved the problem in the specific [Freitag and McCallum 1999] Dayne Freitag and Andrew
case, it is quite clear that this is a major limitation in the McCallum: `Information Extraction with HMMs and
approach. Classical MUC-like IE systems use sophisti- Shrinkage', AAAI-99 Workshop on Machine Learning for
cated strategies for coreference resolution and template Information Extraction, Orlando, FL, 1999,
merging, very often based on a mix of NLP knowledge www.isi.edu/~muslea/RISE/ML4IE/
and domain knowledge [Humphreys et al. 1998]. Two
problems prevent the use of such techniques. On the one [Freitag and Kushmerick 2000] Dayne Freitag and Nicholas
hand deep NLP is not effective in many applications in Kushmerick, `Boosted wrapper induction', in F. Ciravegna,
the Internet realm (e.g. how can you parse an e-mail?). R. Basili, R. Gaizauskas (eds.) ECAI2000 Workshop on
On the other hand it is not clear how to elicit the domain Machine Learning for Information Extraction, Berlin, 2000,
knowledge for coreference from an analyst (adaptability (www.dcs.shef.ac.uk/~fabio/ecai-workshop.html)
via analyst's knowledge is a strong constraint as men-
tioned above). Adaptive template filling is an issue [Grishman 1997] Ralph Grishman, ` Information Extraction:
worth exploration that we are currently investigating. Techniques and Challenges, . In Information Extraction: A
We work in the direction of further using shallow NLP Multidisciplinary Approach to an Emerging Information
for improving template filling and merging. To some Technology, in M.T. Pazienza, (ed.), Springer, 97.
extent this is also a step in the direction of bridging the [Humphreys et al. 1998] K. Humphreys, R. Gaizauskas, S.
gap between classical NLP based IE systems and fully Azzam, C. Huyck, B. Mitchell, H. Cunningham, Y. Wilks:
adaptive systems. `Description of the University of Sheffield LaSIE-II System
as used for MUC-7'. In Proc. of the 7th Message Under-
Acknowledgments standing Conference, 1998 (www.muc.saic.com).
I developed (LP)2 and LearningPinocchio at ITC-Irst, [Kushmerick et al. 1997] N. Kushmerick, D. Weld, and R.
Centro per la Ricerca Scientifica e Tecnologica, Trento, Doorenbos, `Wrapper induction for information extrac-
Italy. LearningPinocchio is property of ITC-Irst, see tion', Proc. of 15th International Conference on Artificial
http://ecate.itc.it:1025/cirave/LEARNING/home.html. Intelligence, IJCAI-97, 1997.
The financial application mentioned above was jointly
developed with Alberto Lavelli. Thanks to Daniela Pet- [Miller et al. 1998] S. Miller, M. Crystal, H. Fox, L. Ram-
relli for revising this paper. Errors, if any, are mine. shaw, R. Schwartz, R. Stone and R. Weischedel, `BBN:
Description of the SIFT system as used for MUC-7', In
References Proc. of the 7th Message Understanding Conference, 1998
(www.muc.saic.com).
[Califf 1998] Mary E. Califf, Relational Learning Tech-
niques for Natural Language IE, Ph.D. thesis, Univ. Texas, [Muslea et al. 1998] I. Muslea, S. Minton, and C. Knoblock,
Austin, www.cs.utexas.edu/users/mecaliff `Wrapper induction for semi-structured, web-based infor-
mation sources', in Proc. of the Conference on Autonomous
[Cardie 1997] Claire Cardie, `Empirical methods in infor- Learning and Discovery CONALD-98, 1998.
mation extraction', AI Journal, 18(4), 65-79, 1997.
[Soderland 1999] Steven Soderland, `Learning information
[Ciravegna et al. 2000] Fabio Ciravegna, Alberto Lavelli, extraction rules for semi-structured and free text', Machine
and Giorgio Satta, `Bringing information extraction out of Learning, (1), 1-44, 1999.
the labs: the Pinocchio Environment', in ECAI2000, Proc.
of the 14th European Conference on Artificial Intelli- [Yangarber et al. 2000] Roman Yangarber, Ralph Grish-
gence, ed., W. Horn, Amsterdam, 2000. IOS Press. man, Pasi Tapanainen and Silja Huttunen: ``Automatic
Acquisition of Domain Knowledge for Information Ex-
[Ciravegna 2000a] Fabio Ciravegna, `Learning to Tag for traction'' In Proc. of COLING 2000, 18th Intern. Confer-
Information Extraction from Text' in F. Ciravegna, R. ence on Computational Linguistics, Saarbrücken, 2000.
Basili, R. Gaizauskas (eds.) ECAI Workshop on Machine