Is there any easy way to search file with auto-detected charset [message #1729419] |
Thu, 14 April 2016 05:07 |
Zen Huang Messages: 1 Registered: April 2016 |
Junior Member |
|
|
Hi,
I am working on a RCP project in which I have the need to search files that may have different charset (e.g. UTF-8, GBK, etc). These files are in a Eclipse project so they are Eclipse resources. They are downloaded from Internet so their charsets are not obvious.
I use TextSearchQueryProvider, FileTextSearchScope, FileSearchQuery and some other Eclipse internal class to build the query, which finally runs in NewSearchUI.runQueryInBackground method.
It works fine except the charset problem. For instance, there are some Chinese words in file A, whose charset should be GBK. If the charset of file A hasen't been set correctly in Eclipse (It can be set in Resource Properties), then file A won't be shown in the search result.
Due to the charsets of files varies from each others, I cannot simply set their charset to one charset like UTF-8.
I found some solution to solve these problem:
1. use IFile.setCharset method to set charset for each file programmatically with auto-detected charset, since its charset is correctly set, the query will work perfectly.
2. implement my own FileSearchQuery and some other class, pass the correct charset to build the InputStreamReader, then the query job will get the correct file contents and return the correct result.
However, these solutions are tricky, especially the second solution, i have to implement my own FileCharSequenceProvider and many other class to get the correct file contens.
So my problem is, is there any easy way to search file with auto-detected charset?
Thank you for reading this post.
|
|
|
Powered by
FUDForum. Page generated in 0.02506 seconds