Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Generated Xtext FAQ(Please help evaluating automatically extracted FAQs)
icon7.gif  Generated Xtext FAQ [message #654093] Mon, 14 February 2011 08:05 Go to next message
Stefan Henss is currently offline Stefan HenssFriend
Messages: 6
Registered: February 2011
Junior Member
Hi everybody,

I'd like to draw your attention on a research project I do for my bachelor thesis.

It is about automatically creating FAQs from forum and mailing list topics/threads.
The challenge is that the software assumes to only know the posts to each topic,
all further information like the category is not used in any way!

The procedure of the program is:
- Crawl tons of topics and their entries from the whole forum.
- Group (cluster) related topics based on the texts only.
- For each cluster (= one FAQ) select the best questions and a title.
- For each question try to find the best reply as the answer.

As all this is done without human interaction it's hard to always get good results.
So I'd like to ask for your help in evaluating the FAQ's quality and relevance
(I'm not an expert on Xtext :).


Direct link to the Xtext FAQ: http://faqcluster.com/xtext-grammar

(There are some other interesting FAQs as well at http://faqcluster.com/)


Thanks for your help

Stefan
Re: Generated Xtext FAQ [message #654121 is a reply to message #654093] Mon, 14 February 2011 09:32 Go to previous messageGo to next message
Sebastian Zarnekow is currently offline Sebastian ZarnekowFriend
Messages: 3118
Registered: July 2009
Senior Member
Hi Stefan,

thanks for the info. I'm really excited about your research project.
Unfortunately, the compiled FAQ does not mirror my personal impression
on the hot topics in this newsgroup. Furthermore I think the page is
hard to read due to inappropriate fonts.
Did you consider to use indicators such as synonyms for "Thank you for
your helpful answer", "This question came up a couple of times in this
newsgroup", or "Did you search this newsgroup before you asked" as well
as "Have a look at the FAQ / this blog entry / whatever web page"?
People that felt the urge to write a blog post about a topic identified
this one as a missing piece so the chances are good that it adresses a
FAQ. I'd expect terms like "value converter", "scoping", "exported
objects", "serializer", or "generator" to come up in the FAQ.
Let me know if you need more input for your research.

Regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Am 14.02.11 09:05, schrieb Stefan en:
> Hi everybody,
>
> I'd like to draw your attention on a research project I do for my
> bachelor thesis.
>
> It is about automatically creating FAQs from forum and mailing list
> topics/threads. The challenge is that the software assumes to only know
> the posts to each topic, all further information like the category is
> not used in any way!
> The procedure of the program is:
> - Crawl tons of topics and their entries from the whole forum.
> - Group (cluster) related topics based on the texts only.
> - For each cluster (= one FAQ) select the best questions and a title.
> - For each question try to find the best reply as the answer.
>
> As all this is done without human interaction it's hard to always get
> good results. So I'd like to ask for your help in evaluating the FAQ's
> quality and relevance (I'm not an expert on Xtext :).
>
>
> Direct link to the Xtext FAQ: http://faqcluster.com/xtext-grammar
>
> (There are some other interesting FAQs as well at http://faqcluster.com/)
>
>
> Thanks for your help
>
> Stefan
Re: Generated Xtext FAQ [message #655679 is a reply to message #654121] Tue, 22 February 2011 09:42 Go to previous messageGo to next message
Stefan Henss is currently offline Stefan HenssFriend
Messages: 6
Registered: February 2011
Junior Member
Hi Sebastian,

Thanks for your reply.

I unfortunately missed to point out the main focus of the thesis. I agree that you could add lots of indicators of good questions/answers etc. But the idea is to evaluate how one specific model/algorithm (latent Dirichlet allocation) performs on extracting FAQs. So also selecting questions/answers etc. should barely require more data/rules/algorithms etc. than the ones provided by the categorization/clustering approach.

But of course, the evaluation/conclusion should point out further approaches in improving quality and looking for certain phrases seems to be promising.

Also the keywords you gave are really helpful. The FAQs are basically defined by a set of keywords (which is automatically obtained). Here are the most important words I received for xtext: http://faqcluster.com/xtext-keywords.txt ... Some of them seem ok, others too general.

I will filter out some of the less relevant keywords and see what the results will be afterwards.


Kind regards,

Stefan
Re: Generated Xtext FAQ [message #655716 is a reply to message #655679] Tue, 22 February 2011 12:18 Go to previous message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
Just an idea, but wouldn't it be a good idea to use available documentation
to establish/augment ontology?
- henrik

Stefan <stefan.henss@gmail.com> wrote:
> Hi Sebastian,
>
> Thanks for your reply.
>
> I unfortunately missed to point out the main focus of the thesis. I agree
> that you could add lots of indicators of good questions/answers etc. But
> the idea is to evaluate how one specific model/algorithm (latent
> Dirichlet allocation) performs on extracting FAQs. So also selecting
> questions/answers etc. should barely require more data/rules/algorithms
> etc. than the ones provided by the categorization/clustering approach.
>
> But of course, the evaluation/conclusion should point out further
> approaches in improving quality and looking for certain phrases seems to be promising.
>
> Also the keywords you gave are really helpful. The FAQs are basically
> defined by a set of keywords (which is automatically obtained). Here are
> the most important words I received for xtext:
> http://faqcluster.com/xtext-keywords.txt ... Some of them seem ok, others too general.
>
> I will filter out some of the less relevant keywords and see what the
> results will be afterwards.
>
>
> Kind regards,
>
> Stefan


--
- henrik
Previous Topic:NPE while using content assist in xtext generated editor
Next Topic:Xtext: excluding grammar check for specific regions
Goto Forum:
  


Current Time: Sat Apr 27 02:57:42 GMT 2024

Powered by FUDForum. Page generated in 0.02960 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top