☆ Yσɠƚԋσʂ ☆@lemmy.ml to Open Source@lemmy.mlEnglish · 12 days agoMicrosoft open-sourced a Python tool for converting files and office documents to Markdowngithub.comexternal-linkmessage-square23fedilinkarrow-up1131arrow-down15
arrow-up1126arrow-down1external-linkMicrosoft open-sourced a Python tool for converting files and office documents to Markdowngithub.com☆ Yσɠƚԋσʂ ☆@lemmy.ml to Open Source@lemmy.mlEnglish · 12 days agomessage-square23fedilink
minus-squaredjango@discuss.tchncs.delinkfedilinkarrow-up3·11 days agoThere is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691 Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.
minus-squareutopiah@lemmy.mllinkfedilinkarrow-up1·11 days agoThanks for the clarification. I checked the code you linked and noticed recognize_google and seems it’s relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?
minus-squaredjango@discuss.tchncs.delinkfedilinkEnglisharrow-up1·11 days agoYes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.
minus-squareutopiah@lemmy.mllinkfedilinkarrow-up3·11 days agoMight open up a GDPR related issue there. I don’t think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.
There is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691
Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.
Thanks for the clarification. I checked the code you linked and noticed
recognize_google
and seems it’s relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?Yes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.
Might open up a GDPR related issue there. I don’t think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.