学术讲座: Learning representations for semantic relational data


讲座题目: Learning representations for semantic relational data

主讲人:Patrick Gallinari教授



主持人:高升 模式识别实验室

内容摘要:Learning representations for complex relational data has emerged at the crossroad between different research topics in machine learning. The motivation of this work is often driven by the applications themselves and by the nature of the data which are often complex (multimodal, heterogeneous, dynamic), and multi-relational (e.g. biology, social networks). One possible approach is to map these data onto one or more continuous latent spaces in order to obtain representations on which it is possible to use classical machine learning methods. In recent years, several lines of research have developed these ideas, sometimes independently, and they are now represented in the “Learning Representations” community. The tools deployed rely on statistical modeling, on linear algebra with matrix or tensor factorization, or more recently on neural networks. The presentation will give a brief presentation of some of these methods and show applications in the field of semantic data analysis and social networks.

主讲人简介:Patrick. Gallinari is professor in Computer Science at Universite Pierre et Marie Curie (UPMC), France. His research domain is primarily statistical machine learning with applications to domains involving semantic data like information retrieval. His recent work has focused on statistical modeling of complex relational data described by sequences, trees or graphs.  Before that, he has been a pioneer of neural networks in France, participating to the development of this domain in Europe. He has also been director of the computer science lab. at UPMC for about 10 years.


学术报告: Subspace Clustering – Recent Advances

报告题目:Subspace Clustering – Recent Advances
报告人:Zhouchen Lin(林宙辰)博士 北京大学教授
地点:教三 810会议室

报告摘要: Nowadays we are in the big data era, where the data is usually high dimensional. How to process high dimensional data effectively is a critical issue. Fortunately, we observe that data usually distribute near low dimensional manifolds. Mixture of subspaces is a simple yet effective model to represent high dimensional data, where the membership of the data points to the subspaces might be unknown. Therefore, there is a need to simultaneously cluster the data into multiple subspaces and find a low-dimensional subspace fitting each group of data points. This problem, known as subspace clustering, has found numerous applications. In this talk, I will present my recent work on this research problem.

ZHOUCHEN LIN received the Ph.D. degree in applied mathematics from Peking University in 2000. He is currently a Professor at Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University. He is also a Chair Professor at Northeast Normal University and a guest professor at Beijing Jiaotong University. Before March 2012, he was a Lead Researcher at Visual Computing Group, Microsoft Research Asia. He was a guest professor at Shanghai Jiaotong University and Southeast University, and a guest researcher at Institute of Computing Technology, Chinese Academy of Sciences. His research interests include computer vision, image processing, computer graphics, machine learning, pattern recognition, and numerical computation and optimization. He is an associate editor of International J. Computer Vision and a Senior member of the IEEE. He served CVPR2014 as an area chair.

Call for Papers: Representation Learning Workshop (RL 2014) Joint With ECML/PKDD 2014


2014-03-27: The web page for the workshop is now online.

Important dates

Submission deadlines

  • Submission deadline:
    June 20, 2014
  • Acceptance notification:
    July 11, 2014
  • Final paper submission:
    July 25, 2014
  • Workshop date:
    September 15, 2014


Representation learning has developed at the crossroad of different disciplines and application domains. It has recently enjoyed enormous success in learning useful representations of data from various application areas such as vision, speech, audio, or natural language processing. It has developed as a research field by itself with several successful workshops at major machine learning conferences, sessions at the main machine learning conferences (e.g., 3 sessions on deep learning at ICML 2013 + related sessions on e.g. tensors or compressed sensing) and with the recent ICLR (International Conference on Learning Representations) whose first edition was in 2013.

We take here a broad view of this field and want to attract researchers concerned with statistical learning of representations, including matrix- and tensor-based latent factor models, probabilistic latent models, metric learning, graphical models and also recent techniques such as deep learning, feature learning, compositional models, and issues concerned with non-linear structured prediction models. The focus of this workshop will be on representation learning approaches, including deep learning, feature learning, metric learning, algebraic and probabilistic latent models, dictionary learning and other compositional models, to problems in real-world data mining. Papers on new models and learning algorithms that combine aspects of the two fields of representation learning and data mining are especially welcome. This one-day workshop will include a mixture of invited talks, and contributed presentations, which will cover a broad range of subjects pertinent to the workshop theme. Besides classical paper presentations, the call also includes demonstration for applications on these topics. We believe this workshop will accelerate the process of identifying the power of representation learning operating on semantic data.

Topics of Interest

A non-exhaustive list of relevant topics:
– unsupervised representation learning and its applications
– supervised representation learning and its applications
– metric learning and kernel learning and its applications
– hierarchical models on data mining
– optimization for representation learning
– other related applications based on representation learning.

We also encourage submissions which relate research results from other areas to the workshop topics.

 Workshop Organizers

Program Committee

  • Thierry Artieres, Université Pierre et Marie Curie, France
  • Samy Bengio, Google, USA
  • Yoshua Bengio, University of Montreal, Canada
  • Antoine Bordes, Facebook NY, USA
  • Leon Bottou, MSR NY, USA
  • Joachim Buhman, ETH Zurich, Switzerland
  • Zheng Chen, Microsoft, China
  • Ronan Collobert, IDIAP, Switzerland
  • Patrick Fan, Virginia Tech, USA
  • Patrick Gallinari, Université Pierre et Marie Curie, France
  • Huiji Gao, Arizona State University, USA
  • Marco Gori, University of Siena, Italy
  • Sheng Gao, Beijing University of Posts and Telecommunications, China
  • Jun He, Renmin University, China
  • Sefanos Kollias, NTUA, Greece
  • Hugo Larochelle, University of Sherbrooke, Canada
  • Zhanyu Ma, Beijing University of Posts and Telecommunications, China
  • Yann Lecun, NYU Courant Institute and Facebook, USA
  • Nicolas Leroux, Criteo, France
  • Dou Shen, Baidu, China
  • Alessandro Sperduti, University of Padova, Italy
  • Shengrui Wang, University of Sherbrooke, Canada
  • Jason Weston, Google NY, USA
  • Jun Yan, Microsoft, China
  • Guirong Xue, Ali, China
  • Shuicheng Yan, National University of Singapore, Singapore
  • Kai Yu, Baidu, China
  • Benyu Zhang, Google, USA


  • To Be Announced.

Submission of Papers

We invite two types of submissions for this workshop:

  • Paper submission

We welcome submission of unpublished research results. Paper length should be between 8-12 pages, though additional material can be put in a supplemental section. Papers should be typeset using the standard ECML/PKDD format, though the submissions do not need to be anonymous. All submissions will be anonymously peer reviewed and will be evaluated on the basis of their technical content. Template files can be downloaded at LNCS site.

  • Demo submission

A one page description of the demonstration in a free format is required.

We recommend to follow the format guidelines of ECML/PKDD (Springer LNCS), as this will be the required format for accepted papers.



TREC 2013中的Temporal Summarization评测目的在于从按时间排序的文档流中提取与给定事件相关的更新信息,具体来说是提取最为相关的句子主干信息及核心属性值,如枪击事件的死伤人数,飓风的当前位置等信息。本评测是TREC2013今年新增加的评测,有如下两个任务: Task1:顺序更新摘要(Sequential Update Tracking),要求找出与给定事件相关的更新信息(句子)。 Task2:值追踪(Value Tracking)要求估计事件的给定属性的值。cocktail dresses cheap,christian louboutin uk,ralph lauren sale,louis vuitton

sale,michael kors outlet







本次TREC Temporal Summarization评测为首次举办,共有7支队伍分别在两个任务提交了27组和7组结果。具体参评单位为:

–    The Johns Hopkins University

–    University of Waterloo

–    Human Language Technology Center of Excellence

–    University of Guelph


–    北京邮电大学

–    北京工业大学





针对本次TREC 时间摘要评测任务的两个任务,我们分别采用了如下方法:




1、Task1共有7支队伍提交了27组结果,下表从Expected Gain,Expected Latency Gain,Comprehensiveness,Latency Comp这四个指标对结果进行了评价(以E(Latency Gain)排序),可以看出,我们提交的5组结果均进入前十,并有一组结果排名第一。(前两项指标与后两项指标反相关,不能同时达到最优)


表1 Task1主要指标,以E[Latency Gain]排序



                               表2 F值                                                表3 F值(考虑时间延迟)

2、Task2共有4支队伍提交了7组结果,下表列出了location,deaths,injuries,financial impact四个属性的值追踪的误差期望。其中,我们的一组结果在financial impact这项属性值的追踪上取得了最好的成绩。


表4 Task2各属性值追踪误差


Task1:孟繁宇  吴桐











Task2:李洪岩  徐立鑫



cheap louboutin,ralph lauren polo uomo
louis vuitton bags,cheap michael kors ,party dresses online australia

2013 TREC contextual 评测总结


TREC 是每年一度由美国国家标准技术研究院举办的文本检索会议国际公开评测,旨在通过以大规模数据为基础的信息检索关键技术评测来促进信息检索研究的发展。目前已成为信息检索领域最有影响的技术评测,反映了该领域最前沿的研究和最新的技术突破。 Contextual Suggestion Track关注与那些高度依赖于上下文context和用户兴趣(user interest)的复杂信息需要的搜索技术。Context2013与去年的区别主要在于context信息不再包含时间、季节等因素,只包含地点信息。输入为contextprofile两项信息,每个context对应一个位置——城市,profile对应一个单独的user,通过user对每个推荐样例的偏好来表征。对每个profilecontext对,需生成有序的若干个推荐,最多50个推荐内容,推荐内容是该城市中的景点、饭店、旅馆等,每个推荐项应包括该景点(或饭店、旅馆等)的urltitledescription信息。






本次TREC contextual Track吸引了国内外高校机构参赛,共有20支队伍,提交了34次结果。具体高校为:

–    University of North Carolina

–    York University

–    University of Delaware

–    University of Pittsburgh

–    University of Waterloo

–    Georgetown University

–    University of Glasgow

–    Central Institute for Research on Goats

–    Information Sciences Institute

–    University of Amsterdam


–    北京邮电大学



评测论坛: http://trec.nist.gov/pubs/trec21/



针对本次TREC contextual 评测任务的具体要求,我们的工作主要从以下两方面展开:1,对官方给定的50个城市进行景点抓取、以及整理、描述生成。2,针对用户对已有景点的评分情况、景点的描述信息,简历用户模型。3、预测用户对已生成的景点的评分。




在参加评测的20支队伍提交的27个成绩中,我们的三个指标成绩均在第6名左右,当然,如果按队伍排名,我们的最好成绩分别是p@5 Rank 5TBG Rank 6MRR Rank 4

【参加人员:张岱,奉珊,方舟,侯成文】 h_large_zAY9_4a9e000034be1375  无标题        original_6wxs_5e9b00000e101191     DSCF2739

cheap christian louboutin,ralph lauren polo ,louis vuitton uk,cheap michael kors bags,evening dresses online australia

2013 TREC Microblog 评测总结



Microblog Track微博检索评测今年只有一个任务——实时ad hoc检索任务(real-time ad hoc search)。它是用户输入特定时间的查询,从twitter语料集中检索出按相关度排序的微博列表并对每条微博进行打分。对于相对稳定的数据库,根据特定时间的topic进行查询扩展,结果返回按时间倒排的相关度高、信息量大的前1000条 tweets。






本次TREC Microblog Track吸引了20所国内外高校机构参赛,共提交71次结果。具体高校为:

–    Albalqa’ Applied University

–    The University of Michigan

–    Indian Statistical Institute, Kolkata, India

–    Institt de Recherche en Informatique de Toulouse

–    Kobe University

–    Universidade Nova de Lisboa

–    Qatar Computing Research Institute

–    Qatar University

–    InfoLab at University of Delaware

–    The University of Glasgow

–    University of Amsterdam

–    Web Information Systems group, TU Delft

–    北京大学

–    中科院计算所

–    武汉大学

–    北京交通大学

–    北京邮电大学



评测论坛: http://groups.google.com/group/trec-microblog



针对本次TREC Microblog 评测任务的具体要求,我们的工作主要从以下两方面展开:1,对微博的查询主题进行查询扩展。2,对查询返回的微博进行相关度重排序。其中,查询扩展部分我们主要使用了WAF和基于TF-IDF相关反馈两种方法。相关度重排序部分我们采用了结合多feature的线性加权算法,综合考虑主题词,扩展词及微博内URL对微博相关性进行排序





朱思明朱思明gz高 哲

QQ截图20140104161218王 辉 袁亚静袁亚静

2013年TAC KBP评测总结


KBP评测的目标是促进自动化系统的研究,包括在大语料库中发现命名实体信息,以及将这些信息整合到知识库中。TAC 2012的任务有三块,均旨在提高从文本中自动填充知识库的能力

2013年KBP评测包含三个子任务:EL(Entity Linking),SF(Slot Filling)和SSF(Sentiment Slot Filling),我们PRIS实验室今年参加了KBP所有的评测。






Enity Linking任务:左乃彻,王建龙

Slot Filling任务:张一昌,王颖

Sentiment Slot Filling任务:童欣,李东豫






4.1 Enity Linking


Entity-Linking: 给定一个query包含一个名字字符串(人名,地名,组织机构名),一个背景文档ID,一组标识字符串起始位置的UTF-8码,系统需要输出相应名字指示的KB entry(维基百科页面)的ID号,如果没有,输出一个”NILxxxx” ID。




彻彻 王建龙

4.2 Slot Filling


给定一个命名实体、一个预定义的属性集,通过抓取相关值的信息,扩充一个KB节点的属性值。KB的参考文献来自英文维基。Slot Filler Validation这个诊断任务,将去判断参赛者的系统是否正确完成了填充。




wy zyc

4.3 Sentiment Slot Filling


Sentiment Slot Filling  是一个情感要素抽取的任务。做为信息抽取的一个子问题,情感要素抽取着重在于在大规模的语料集合中,找出query所给出的主体(holder)所对应的受体(target),或反之。比如,在语料集中找出不喜欢Obama的所有命名实体。或反之,找出文档集中所有Obama有积极评价的实体。其中,实体仅仅包括 人名,地名和组织机构名三类。以上的任务也可以描述如下:







童欣 李东豫


ARPANI: Bhilai Institute of Technology, Durg: English Slot Filling

basistech: Basis Technology: Chinese Entity Linking, English Entity Linking, Spanish Entity Linking

BIT: Beijing Institute of Technology: English Slot Filling, Slot Filler Validation

Brandeis: Brandeis University: English Entity Linking

BUPTTeam: Beijing University of Posts and Communications: English Entity Linking

CASIA: Institute of Automation, Chinese Academy of Sciences: English Entity Linking

CMUML: Carnegie Mellon University: English Slot Filling, Temporal Slot Filling

CohenCMU: Carnegie Mellon University: English Slot Filling

columbia_nlp: Columbia University: Sentiment Slot Filling

Compreno: ABBYY: English Slot Filling, Temporal Slot Filling

cornpittmich: Cornell University / University of Pittsburgh: Sentiment Slot Filling

FRDC: Fujitsu R&D Center Co.,LTD.: Chinese Entity Linking

HITS: Heidelberg Institute for Theoretical Studies gGmbH: Chinese Entity Linking, English Entity Linking, Spanish Entity Linking

hltcoe: Johns Hopkins University Human Language Technology Center of Excellence: Cold Start KBP, English Entity Linking

IIRG: University College Dublin: English Slot Filling

INESCID: Instituto Superior Tecnico, INESC-ID: English Entity Linking, Spanish Entity Linking

jhuapl: Johns Hopkins University Applied Physics Laboratory: Slot Filler Validation

lkd: University of Economics, Prague: English Entity Linking

lsv: Saarland University: English Slot Filling

MS_MLI: Microsoft Research: English Entity Linking, Temporal Slot Filling

MSRA_Link: Microsoft Research Asia: English Entity Linking

NYU: New York University: Cold Start KBP, English Slot Filling

OSU: Oregon State University: English Entity Linking

polymtl: Ecole Polytechnique de Montreal: English Entity Linking

poseidon: Harbin Institute of Technology: English Entity Linking

PRIS2013: Beijing University of Posts and Telecommunications: English Entity Linking, English Slot Filling, Sentiment Slot Filling

RPI_BLENDER: Rensselaer Polytechic Institute: English Entity Linking, English Slot Filling, Slot Filler Validation, Temporal Slot Filling

SAFT_KRes: University of Southern California Information Sciences Institute: English Slot Filling

SINDI: Korea Institute of Science and Technology Information: English Slot Filling

Stanford: Stanford University: English Slot Filling, Slot Filler Validation

SYDNEY_CMCRC: University of Sydney / Capital Markets CMCRC: English Entity Linking

TALP_UPC: TALP Research Center of Technical University of Catalonia (UPC): English Entity Linking, English Slot Filling

THUNLP: Tsinghua University: Chinese Entity Linking, English Entity Linking

TRRD: Thomson Reuters R&D: English Entity Linking

UBC: University of Basque Country: English Entity Linking

UGENT_IBCN: Ghent University – IBCN / iMinds: English Entity Linking

UI_CCG: University of Illinois at Urbana Champaign: English Entity Linking, Slot Filler Validation

UMass_CIIR: CIIR, School of Computer Science, Univ. of Massachusetts Amherst: English Entity Linking

UMass_IESL: University of Massachusetts Amherst, Information Extraction and Synthesis Lab: Cold Start KBP, English Slot Filling

UNED: Universidad Nacional de Educacion a Distancia: English Slot Filling, Temporal Slot Filling

USFD: University of Sheffield: English Entity Linking

utaustin: University of Texas at Austin – AI Lab: English Slot Filling

UWashington: University of Washington: English Entity Linking, English Slot Filling

WebSail: Northwestern University: English Entity Linking

ZZ_INFO_TECH: Zhengzhou Information Science and Technology Institute: English Entity Linking

wedding dresses online,christian louboutin sale,ralph lauren italia,louis vuitton uk,michael kors bags

2013 TREC KBA评测总结


2013年KBA评测包含三个子任务:CCR(Cumulative Citation Recommendation),SSF(Stream Slot Filling)和TS(Temporal Summarization)。主办方提供了170个entity(其中150个来自Wikipedia,20个来自twitter),4.5T原始文档集,和7074篇标注文档集。









本次TREC KBA评测吸引了13所国内外高校机构参赛,共提交117次结果。具体高校结构为:

–    University of Illinois

–    University of Amsterdam

–    University of Avignon

–    Aix-Marseille University

–    University of Massachusetts

–    University of Delaware

–    CWI the Netherlands

–    University of Wisconsin

–    the university of Florida

–    Southern Cross University

–    RetrieWin

–    北京理工大学

–    北京邮电大学



评测论坛: https://groups.google.com/forum/#!forum/trec-kba


4.1 CCR



cheap party dresses online,christian louboutin sale uk,ralph lauren sale uk,cheap louis

vuitton,michael kors sale






张为泰_头像张为泰              杨静_头像 杨静


4.2  SSF










纪剑书_头像纪剑书     张岱_头像 张岱

“111”基地Prof. Eduard Hovy参观PRIS实验室通知

        卡内基梅隆大学的Prof. Eduard Hovy将于后天下午(2013年12月11日(周三)下午14:30—17:00)到我们实验室参观讨论。Prof. Eduard Hovy是郭老师负责的111项目聘请的兼职教授,专门从事文本数据处理方面的研究,和我们的工作联系密切。建议大家都参加讨论学习。 


14:30-15:30 请Prof. Eduard Hovy介绍在CMU的研究工作。

15:30-15:50 休息

15:50-17:00 介绍PRIS实验室工作情况。



Professor Eduard Hovy works at the Language Technologies Institute of Carnegie Mellon University. He is Co-Director for Research of the Command, Control, and Interoperability Center for Advanced Data Analysis (CCICADA). He was working at the University of Southern California, as a Fellow of its Information Sciences Institute (ISI), as Director of the Human Language Group, as Research Associate Professor in USC’s Computer Science Department, and as Director of Research for ISI’s Digital Government Research Center DGRC).

His research focuses on several topics around aspects of the computational semantics of human language, including text analysis, text summarization and generation, question answering, discourse and dialogue processing, ontologies, annotation, machine translation evaluation, and digital government.

christian louboutin uk,ralph lauren outlet,louis vuitton outlet,michael kors sale uk,wedding dresses online