TIKA
Tika下载
- server.jar
http://tika.apache.org/download.html java -jar tika-server-1.17.jar
下载server版,需要java运行环境。
注:
JAVA9默认缺少server运行所需要的xml.bind包,需要另行解决,JAVA8无问题。 - docker
docker pull logicalspark/docker-tikaserver # only on initial download/update docker run --rm -p 9998:9998 logicalspark/docker-tikaserver
- app.jar
app也有server模式,但他并非HTPP协议,所以无法使用curl调试。
- maven
Docker Server
- 测试服务器
curl -X GET http://localhost:9998/tika
- 获取meta
curl -T test.pdf http://localhost:9998/meta --header "Accept: application/json"
- 获取文档内容
curl -X PUT --data-binary @test.pdf http://localhost:9998/tika --header "Content-Type: text/pdf" curl -T test.pdf http://localhost:9998/tika --header "Accept: text/html" # 返回html,带标签,可不带header
Go Implement
var tikaServerUrl = "http://localhost:9998/"
func putRequest(url, filename string) (string, error) {
file, err := os.Open(filename)
if err != nil {
return "", err
}
req, err := http.NewRequest(http.MethodPut, url, file)
client := &http.Client{}
response, err := client.Do(req)
if err != nil {
return "", err
}
defer response.Body.Close()
b, _ := ioutil.ReadAll(response.Body)
return string(b), nil
}
文档
https://wiki.apache.org/tika/TikaJAXRS