Acknowledgements

I want to thank the reviewers who reviewed our work, especially the one who could appreciate my thought.

SynTime's preliminary version was submitted to WWW2017 and got rejected. For some reason that preliminary version didn't express my thought but expressed the misunderstanding that was misleading. However, one of the reviewers who recommended to accept it could identify its novelty.

There is knowledge gap between me and the frontier of computer science. I have a background of computer science, but my scientific training is in statistics and financial engineering. I was trained to think scientifically. When I design rules, by default, I design them with generality. But this default seemed not to be default in computer science. So when preparing slides for the SynTime presentation,¹ I emphasized the generality and heuristics. Although the most exciting moment in that journey of exploration is when I saw the simplicity and its beauty; at that moment my curiosity was satisfied. And the most valuable part in SynTime is the analysis of time expressions and the connection of linking the time expression as part of language to the language (Section 3 and the last paragraph in Section 6). It added a piece of knowledge to the field of language.

Back to the reviewer. The reviewer showed his/her expertise in identifying the main aspects of a problem. The reviewer really knows what novel is and what interesting is. I think the reviewer must be capable of recognizing beauty from messy stuff and must have done many very good research. The reviewer's comment is shown below (Review 3). The detailed and informative comment helped me know what questions the computer science mainly cares about and helped me re-conceptualize the whole work, clean almost all the misunderstanding, and turn a technique report (that preliminary version) back to an analytical paper, the one I originally wanted. I want to express my gratitude to the reviewer: thank you; you really made me feel company: at least someone somewhere could appreciate my thought. The reviewer should be the first one who nearly understood my thought, even though that preliminary version was misleading. Who could do that must be a great researcher.

Another review (Review 2; that reviewer knows what insight is.) raised questions "Lack of discussion on alternative (obvious approaches) - CRFs?" and "I was curious to understand why sequence taggers wouldn't do a good job on this task and why the authors dont think about using one." I wish the SynTime and TOMN papers could answer your questions.

I love language and computational linguistics. I hope to reveal the beauty of language which I think of as one of the three non-biological elements that advance our evolution. But at this stage I need more feedbacks to bridge that knowledge gap.

Last, I thank again the reviewers for their time reviewing our work and the one who could appreciate my thought and made me feel company.

30 December 2017

1. Thanks to Mr. Dheeraj Rajagopal for his help of presenting SynTime at ACL conference. I think Dheeraj must be a great speaker. Dheeraj finished that presentation 4 minutes before the allocated time but still made the presentation go pretty well.

Review 3
Strong Points:	1) well written 2) interesting initial analysis of the data 3) Nice overview and discussion of the observations the authors made in Sections 3.2 4) Novel approach, inspired by POS tags, with a good performance
Weak Points:	1) Pure extraction, no normalization / parsing of the time expressions 2) Lack of runtime/speed evaluation
Detailed Review:	In this papers the authors present a dictionary approach for extracting time expressions from text. While the name suggests that it performs only simple lookups, such as days and months, there are actually rules involved as well. For instance, years are not defined as fixed tokens but regular expressions. Both can be added or removed from the TimeDict. By identifying the tokens of a time expressions through these rules and lookups they assign types to them. The novel idea here is to assign types similarly to POS taggers, which consider only syntax but no semantics. An example is 'ago', which is a typical 'suffix' in time expressions, whereas existing tools analyze this to identify it as a 'modifier' of the time expression. On the one hand, this makes the approach much more light-weight by requiring fewer rules, which makes it more flexibly extensible as the authors argue, however, on the other hand, the deeper knowledge might actually be needed to parse and normalize a time expression correctly, which the authors do not tackle in this paper. The idea is based on an interesting analysis that the authors performed prior to their work with findings like, time expressions are generally very short and the required vocabulary is rather small. As a starting point they always identify a 'time token' in text, which is one category of the types in the dictionary. From there they extend and merge with other types based on certain rules until they find the complete time expression. This is very well explained with good examples in the paper. However, based on their initial analysis 'only' 93% of the time expressions include such a time tokens. Hence, the approach is not applicable to other sorts of expressions and hence the recall cannot be improved further, although it is already pretty high. With a focus only on the extraction of time expressions the presented approach performs very well and outperforms existing tools in most of the cases. The evaluation was done on three datasets of which two are existing annotated datasets and one was created by authors, extracted from Twitter. Especially on the tweets dataset the approach performs exceptionally well with much higher scores than the competitors. However, there are two aspects that were not quite clear to me. The best performance was achieved by TimeDict-E, which is the extended versions with manually added terms specific for the dataset. This is understandable, however, in order to be fair in the evaluation, the machine learning approach UWTime that the authors compared against, should have been trained on the same dataset, which I think they did not? Also it is surprising that TimeDict-E only performs slightly better than TimeDict-S, which uses the same rules as SUTime, but with much better results. A short discussion on these results would be nice to add to the camera-ready version. A drawback of this dictionary approach was shown by one example in Section 4.2. The authors claim they cannot detect 'ten brutal months', since 'brutal' is too specific to be included in the dictionary. So I was wondering, as POS tags are anyways required and it is obvious that 'brutal' is an adjective here, why is this not ignored and considered as part of the expression? The authors argue that rule-based approaches may not detect informal expressions, however, this is a case where rules may be useful over pure lookups. Also, I would have been interested in the speed of the tool. As dictionary approaches tend to be fast, such a comparison with existing tools would have been a nice selling point. However, it is not clear how much the required POS tagging has an impact on the performance in comparison to the competitors since POS tagging is generally a slow task. Furthermore, I can imagine that a strength of this approach and its flexibility, may be that it can perform well on different languages by just using another dictionary. If the authors had checked that, it would have been a plus.
Originality:	3: (fair (some novel ideas))
Impact:	3: (good (will influence many researchers; likely to be highly cited))
Reproducibility:	4: (excellent (methods and data descriptions are crystal clear; code and/or data made available))
Overall evaluation:	3: (accept)

Review 2
Strong Points:	Detailed paper Well written Good insights
Weak Points:	No methods to improve coverage of temporal expressions Technical contribution Lack of discussion on alternative (obvious approaches) - CRFs?
Detailed Review:	This paper describes a dictionary based tagger for temporal expressions in text. The authors present a comprehensive and interesting insight into how temporal phrases are expressed. TimeDict describes 15 types of time tokens along with modifiers and numerals. In order to improve coverage of temporal expressions, the authors simply propose looking up training data and labeling unseen phrases in terms of the TimeDict types (Algorithm 1). This doesnt seem scalable and I am not sure I understand why the authors chose to highlight this method? If the purpose is to identify obscure temporal expressions wouldn't something like an IDF score be useful? The authors should clarify this aspect. The TimeDict tagger relies on POS tagging to disambiguate words (Eg "May" ) and define simple rules in terms of TimeDict types to extract Temporal expressions. The authors compare their work against three recent methods and demonstrate its effectiveness. Overall, a detailed well written paper. In some parts the authors spend too much time explaining fairly obvious details. I was curious to understand why sequence taggers wouldn't do a good job on this task and why the authors dont think about using one. Perhaps some discussion can be included on this aspect and maybe also discuss the kinds of temporal phrases that would be hard for the tagger to identify. Eg: " I reached when the sunset", or "I reached at sunset/moon-rise" or "I reached the minute the train left" have a sense of temporality, perhaps not in the explicit sense the authors are looking to address but these pose a challenging problem and have applications in NLU. An ML tagger is probably going to be more effective for such phrases?
Originality:	3: (fair (some novel ideas))
Impact:	2: (fair (typical paper, will have some follow-up work and citations))
Reproducibility:	3: (good (methods and data are described in sufficient detail to re-implement by others) [most acceptable papers should do this])
Overall evaluation:	3: (accept)