September 11, 2012

WebDL: ABC iView and SBS Downloader

Filed under: Technical — Tags: , , , , , — James Bunton @ 7:16 pm

WebDL is a collection of Web TV downloader scripts with a consistent user interface. I’ve previously released these separately, but a while ago I refactored them to share common code and packaged them into a single utility. You can use this interactively or to download any shows matching a glob from a cronjob. Currently supported are ABC iView and SBS OnDemand. I’ll probably add more in the future.

Update 2015-05-24: Please see the Bitbucket project for up to date docs!
Update 2015-05-24: Fixed SBS and Channel 9. Livestreamer is now a required dependency.
Update 2014-07-22: Added notes on version to dependencies.
Update 2013-03-26: The latest version of autograbber.py now accepts a file with a list of patterns instead of taking them from the command line.
Update 2014-02-15: Please see https://bitbucket.org/delx/webdl for bug reports or to post patches.

Dependencies

  • Livestreamer
  • python (2.7, not python 3)
  • python-lxml
  • rtmpdump a1900c3e15
  • ffmpeg / libav

The versions listed above are what I have success using. In particular note that rtmpdump always reports v2.4 even though there have been many binaries built with different bugs and features using that version number. If something doesn’t work, try compiling a new ffmpeg/avconv or rtmpdump to see if it fixes the problem.

Interactive Usage

You can run WebDL interactively to browse categories and episode lists and download TV episodes.

$ ./grabber.py
 1) ABC iView
 2) SBS
 0) Back
Choose> 1
 1) ABC 4 Kids
 2) Arts & Culture
 3) Comedy
 4) Documentary
<snipped>
Choose> 4
 1) ABC Open Series 2012
 2) Art Of Germany
 3) Baby Beauty Queens
 4) Catalyst Series 13
<snipped>
Choose> 4
 1) Catalyst Series 13 Episode 15
 2) Catalyst Series 13 Episode 16
 0) Back
Choose> 1
RTMPDump v2.3
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting ...
INFO: Connected...
Starting download at: 0.000 kB

The bolded parts are what you type. Note that you can go back on any screen by typing “0”. At the list of episodes you can download a single episode by typing one number, or multiple episodes by typing several numbers separated by spaces.

Cron Scripted Usage

I have a shell script which looks something like this, I run it daily from crontab.

# m    h  dom mon dow   command
  0    1   *   *   *     ./autograbber.py /path/to/video-dir/ /path/to/patterns.txt

The patterns.txt file should contain shell-style globs, something like:

ABC iView/*/QI*/*
SBS/Programs/Documentary/*/*

The above will download all episodes of QI from ABC as well as every SBS documentary. Whenever an episode is downloaded it is recorded into downloaded_auto.txt. Even if you move the files somewhere else they will not be redownloaded.

148 comments

Paul says:

Hi I’ve been using this excellent set of scripts for SBS and Plus7 shows for a few weeks. ABC iView has never worked though, always get:

RTMPDump 2.4
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting …
INFO: Connected…
ERROR: Closing connection: NetStream.Play.StreamNotFound
rtmpdump exited with error code: 1

In the last day or so I think Plus7 also changed something, all downloads now fail with error like this:
Traceback (most recent call last):
File “./grabber.py”, line 56, in
main()
File “./grabber.py”, line 49, in main
if not n.download():
File “/Users/pvdzel/bin/webdl/plus7.py”, line 63, in download
vid = self.get_vid()
File “/Users/pvdzel/bin/webdl/plus7.py”, line 60, in get_vid
raise Exception(“Could not find vid on page ” + self.url)
Exception: Could not find vid on page http://au.tv.yahoo.com/plus7/grimm/-/watch/15601316/wed-12-dec-series-1-episode-6/

If you could get either (or both!) of these working again it would be amazing. Thanks

delx says:

You were right, Plus7 had been changed recently, it should work again now.
iView works for me, what Python and OS version are you using? Can you add a line “print cmd” just underneath “def exec_subprocess(cmd)” in common.py? That’ll print the rtmpdump command and help debugging.

Paul says:

Wow thanks this is great.
Mac 10.8
Python 2.7.2

Downloading: The Drum 131212.mp4
[‘rtmpdump’, ‘-o’, ‘The Drum 131212.mp4’, ‘-r’, ‘rtmp://cp53909.edgefcs.net/ondemand?auth=daEbDabaIdJdidgaSbid9bEa1aycBaQada3-bqZMQd-8-skm_vHwqN&aifp=v001’, ‘-y’, u’mp4:news/DRMs_Tx_1312′, ‘–swfVfy’, ‘http://www.abc.net.au/iview/images/iview.jpg’]
RTMPDump 2.4
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting …
INFO: Connected…
ERROR: rtmp server sent error
ERROR: rtmp server requested close
rtmpdump exited with error code: 1

delx says:

That was helpful thanks :)
I’ve pushed a fix to correctly escape the authorisation token that iView sends. I guess I never received one in the same format as you.

Paul says:

Afraid it still fails for me (I did re-clone and re-added “print cmd” to common.py to debug:
Downloading: The Drum 131212.mp4
[‘rtmpdump’, ‘-o’, ‘The Drum 131212.mp4’, ‘-r’, ‘rtmp://cp53909.edgefcs.net/ondemand?auth=daEbLaZbAdzdSdnb8cxdnd0cVdqadaPdDaS-bqZOOc-8-qkm_xHwqJ%26aifp%3Dv001’, ‘-y’, u’mp4:news/DRMs_Tx_1312′, ‘–swfVfy’, ‘http://www.abc.net.au/iview/images/iview.jpg’]
RTMPDump 2.4
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting …
INFO: Connected…
ERROR: rtmp server sent error
ERROR: rtmp server requested close
rtmpdump exited with error code: 1

Paul says:

Last error should have been:
Downloading: The Drum 131212.mp4
[‘rtmpdump’, ‘-o’, ‘The Drum 131212.mp4’, ‘-r’, ‘rtmp://cp53909.edgefcs.net/ondemand?auth=daEdWbYamdkaxbAaMaZancbcGcqd0buaDdX-bqZO64-8-rll_tHrnI%26aifp%3Dv001’, ‘-y’, u’mp4:news/DRMs_Tx_1312′, ‘–swfVfy’, ‘http://www.abc.net.au/iview/images/iview.jpg’]
RTMPDump 2.4
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting …
INFO: Connected…
ERROR: Closing connection: NetStream.Play.StreamNotFound
rtmpdump exited with error code: 1

I was not correctly connected with the last attempt

delx says:

Ok, fixed for real this time. iView requires different handling for ISPs which provide access to it unmetered like Internode/iiNet compared with metered ISPs like Optus.

Paul says:

Definitely fixed for real, thank you very much for your effort on this.

James says:

How do I download the code?

delx says:

Use Mercurial.
$ http://delx.net.au/hg/webdl

Otherwise you can just browse to the repository URL above and download the files individually with your browser, use the “raw” link after selecting a file.

Peter says:

Thank you very much for these great tools.
For me it works on SBS but sadly not on ABC.
I’m using Bigpond btw.

Downloading: Making Couples Happy ) Series 2 Episode 3.mp4
RTMPDump v2.4
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting …
INFO: Connected…
ERROR: Closing connection: NetStream.Play.StreamNotFound
rtmpdump exited with error code: 1

delx says:

Unfortunately I don’t have a Telstra connection to test with. iView does behave differently depending on the ISP. Currently it definitely works with Internode, iiNet and Optus.

If you have any programming skills you could try adding print statements to the download() function in iview.py.

Paul says:

Hi James, latest change to autograbber.py caused me a bit of headscratching till I figured out it was looking for a flat text file as “patternfile” and not the target location as previously passed on commandline. New method is better as I can now direct each target to its own download directory. Another option which has opened up is the possibility of multiple simultaneous sessions, if I split the “downloaded_auto.txt” files into the different download directories according to the targets in each patternfile… hmm off to give that a try now.

Thanks again!

Paul says:

@Peter, fwiw ABC works ok for me on Bigpond. Do you have the latest versions of this code? If you have installed Mercurial it’s really simple with “hg clone http://delx.net.au/hg/webdl“. I was getting the same error previously but recent updates fixed this for me.

delx says:

@Paul, I’m glad you figured it out :)

I’ve updated the blog so anybody else who comes along won’t be misled. Thanks!

John says:

Fantastic set of scripts, would be even better if you added channel 9…is this in your plans?

delx says:

Unfortunately Channel 9 uses a form of RTMP streaming which flvstreamer and rtmpdump cannot currently decrypt.

Aidan says:

Fantastic work, thanks.

Just a heads up for others who have the same problem — I couldn’t seem to access any of the kids content on ABC3. It didn’t come up under the ABC4Kids menu. Frustratingly I could see things on the iView interface I couldn’t see with WebDL.

I delved into the code (nice and clean and easy to read BTW) and changed line 67 of iview.py to this:

for category in categories_doc.xpath(“//category”)

(I took out the “[@genre=’true’]” filter) and now I get all the different categories listed on the iView site. It is messier, but I can find everything now.

I’m not suggesting this is for everyone, but you might want to make it a user preference, or maybe a “EVERYTHING” sub-menu?

Thanks again

Tir says:

Thanks so much for this, you’re to be greatly, greatly commended! I finally got it working with Ubuntu. Just one query,does it need specifically Python 2.6 or later is ok?. I installed the former as well.

delx says:

@Aidan, good suggestion. I’ve pushed a new version which should expose all these shows in a nicer way.

@Tir, Glad you got it to work. Python 2.7 will work just fine, I’ve updated the dependencies list to clarify that. Thanks :)

Aidan says:

Nice!

Love the new menu structure for iView, very clean and easy to use. Thanks.

Jez says:

Thanks for your work. This is really awesome, works for ABC, Ch7. I have it working on my NAS (with Optware) and autograbber to download shows to the drive ready for viewing. Nice!

But can’t get it to work for SBS.

Choose> 10
1) The Fabric Of The Cosmos Ep4 – Universe Or Multiverse?
0) Back
Choose> 1
Traceback (most recent call last):
File “./grabber.py”, line 56, in
main()
File “./grabber.py”, line 49, in main
if not n.download():
File “/home/jpl/webdl/sbs.py”, line 48, in download
desc_url = append_to_qs(desc_url, {“manifest”: None})
File “/home/jpl/webdl/common.py”, line 265, in append_to_qs
qs = urlparse.parse_qs(r[3])
AttributeError: ‘module’ object has no attribute ‘parse_qs’

delx says:

@Jez, I’m glad it’s mostly working for you! :)

I guess you’re not using Python 2.6? Try to upgrade if possible. Otherwise you may be able to use this workaround.

  1. Edit common.py
  2. Add ‘import cgi’ to the top of the file
  3. Search for urlparse.parse_qs and replace it with cgi.parse_qs

I’ve only tested this on Python 2.6 or 2.7, so it’s possible that other things may not work. You get to keep the pieces, good luck! :)

Jez says:

@delx

Yup, the work-around was effective. Thanks very much. And yes, python 2.5. I could upgrade (packages for 2.6 are available), but I don’t want to break other stuff, so I’ll avoid this if I can.

Now I just need to compile ffmpeg to support the video codecs, and the conversion to mp4 might start working. But that’s my problem to solve.

One more question: Can you explain a bit more about the fields in the autograbber patter file? It’s not clear where the data come from, and what they mean. For some channels, I can pick a genre, but on others I can’t. On some, there are more layers of choices. It’s a bit confusing to me.

If I were to add:

*/*/*/*/The Fabric*

Would that find “The Fabric of the Cosmos” on any channel in any category/genre?

delx says:

@Jez, for each line in the file you give to autograbber.py in splits on the ‘/’ character and walks down the tree of menu items that you see in grabber.py. This tree varies depending on the layout of the source material. For example SBS and ABC have category systems with many shows listed multiple times. Channel 7 only has a flat list. This is dictated by the structure of the website that is being scraped for data.

Jez says:

Thanks @delx. That explains why a different number of slashes for different channels.

The parse_qs work-around also got the youtube.cgi script to work. Youtube downloads galore! So now it has me thinking of an autograbber for Youtube. I can probably write something up in a shell script to do that.

Lance says:

I really love your scripts! It makes getting shows for my son so easy and portable now. I just checked the iview webpage and it was working but the flv streamer part is returning an error today

Connecting …
ERROR: RTMP_Connect0, failed to connect socket. 110 (Connection timed out)
rtmpdump exited with error code: 3

Any ideas about what I can do to fix this?

delx says:

@Lance, thanks. I’ve hacked iview.py to work around the ABC changes. Looks like I’ll need to do some more work on this soon. On the plus side, it seems iView is now available in HD :D

Lance says:

Thank you for fixing that so quickly. Excellent news with the HD streams!!

Jez says:

Oops, another error with SBS. Any clues?

18) Survivorman 19) The Tall Man
20) Treasures Of Ancient Rome
21) Urban Secrets
22) Who Do You Think You Are?
0) Back
Choose> 13
1) -Latest
2) Full Episode
3) Meet The Families
4) Remembering History
0) Back
Choose> 2
1) Jabbed – Love, Fear And Vaccines
0) Back
Choose> 1
Traceback (most recent call last):
File “./grabber.py”, line 56, in
main()
File “./grabber.py”, line 49, in main
if not n.download():
File “/home/jpl/webdl/sbs.py”, line 46, in download
raise Exception(“Failed to get JSON URL for ” + self.title)
Exception: Failed to get JSON URL for Jabbed – Love, Fear And Vaccines

Jez says:

Sorry, that was SBS->Programs->Documentary->Jabbed..->Full episode->Jabbed..

delx says:

@Jez, huh, SBS changed recently too. Also fixed :)

Jez says:

Awesome!! Downloading the show now!

Crikey you’re quick! Thanks mate.

stigli says:

Yep working fine for me. Thanks! On another note, for Ubuntu users, cssselect needs to be installed as well, and the old deprecated ffmpeg (not an expert, but both can be fetched from Ubunbtu Software Centre).

dycey says:

Hi, new to this, Hopeing there is something simple I have missed.
Ubuntu 12.04. , Python 2.7.3, ffmpeg version 0.8.6-4:0.8.6-0ubuntu0.12.04.1,v2.4

errors on execution of grabber.py

$./grabber.py
Traceback (most recent call last):
File “./grabber.py”, line 4, in
from common import load_root_node, natural_sort
File “/home/dycey/mecurial/webdl/common.py”, line 25, in
autosocks.try_autosocks()
File “/home/dycey/mecurial/webdl/autosocks.py”, line 87, in try_autosocks
return configure_socks(host, port)
File “/home/dycey/mecurial/webdl/autosocks.py”, line 64, in configure_socks
print >>sys.stderr, “Failed to use configured SOCKS proxy:”, host, port
NameError: global name ‘sys’ is not defined

any help appreciated.

dycey

delx says:

@dycey
Looks like you’re using a SOCKS proxy. I’ve fixed a missing import bug. You’ll need to have socksipy installed to use a SOCKS proxy.

BlackDalek says:

Using 64bit Ubuntu 13.04 – the iView part is not working for me. Always gives a ERROR: RTMP_ReadPacket, failed to read RTMP packet body. len: 5665121 (or similar). File is created but either remains empty (0k) or downloads approx 130K then stalls.

actually… it does not always fail with same RTMP_ReadPacket error — sometimes I get different result such as below –

$ ./grabber.py
1) ABC iView

Choose> 1
1) By Channel
2) By Genre

Choose> 2
1) ABC4Kids
2) Arts & Culture
3) Comedy
4) Documentary

Choose> 4
1) The A-Z Of Contemporary Art
2) ABC Open Series 2013
3) Atlantis: The Evidence
Choose> 3
1) Atlantis: The Evidence
0) Back
Choose> 1
Downloading: Atlantis The Evidence.mp4
RTMPDump v2.4
(c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
Connecting …
INFO: Connected…
Starting download at: 0.000 kB
INFO: Metadata:
INFO: duration 2958.01
INFO: moovPosition 36.00
INFO: width 640.00
INFO: height 360.00

201.960 kB / 19546.11 sec (660.7%)
ERROR: DECODING ERROR, IGNORING BYTES UNTIL NEXT KNOWN PATTERN!
ERROR: HandleMetadata, error decoding meta data packet
WARNING: HandleInvoke, Sanity failed. no string method in invoke packet
WARNING: HandleInvoke, Sanity failed. no string method in invoke packet
ERROR: RTMP_ReadPacket, failed to read RTMP packet body. len: 10606973
204.856 kB / 29058.10 sec (982.3%)
INFO: Connection timed out, trying to resume.

Resuming download at: 204.856 kB
ERROR: DECODING ERROR, IGNORING BYTES UNTIL NEXT KNOWN PATTERN!
ERROR: HandleMetadata, error decoding meta data packet

delx says:

@BlackDalek
I was able to successfully download that file. Maybe there’s a problem with your internet connection? What ISP are you using?

stigli says:

I got a RTMPDump fault as well, but I deleted the webdl folder and then ran the usual command “hg clone http://delx.net.au/hg/webdl” to recreate the webdl folder and its contents, seemed to fix it? Running fine now.

BlackDalek says:

@delx using Internode for my ISP. I will try deleting and recreating the webdl folder as suggested by stigli.

BlackDalek says:

@stigli @delx deleting the webdl directory and re-cloning it seems to have fixed the problem

delx says:

@stigli, @BlackDalek
I’m glad you resolved your problem. For future reference you can update to the latest version by running ‘hg pull ; hg update’. This is easier and faster than deleting and recloning.

Joe Green says:

You are my saviour! The recent (June?) ABC changes broke the unmaintained DOS-based downloader that I had been using for ages, so I had to look for something better-maintained. Great stuff.

Just one question, though. As you noted, iView is now available in HD. However the downloads that I’m getting don’t appear to be. (Well the files are the same kind of size as the older ones, and the resolution is the same, etc.) Have you simply not yet worked out how to get the HD ones, or are they in fact not yet available, or what?

Keep up the good work :-)

Paul says:

Many thanks for the continued updates. Looks like channel 7 have changed things around again, no attempts to grab episodes succeed anymore:

Example:
Traceback (most recent call last):
File “./autograbber.py”, line 65, in
main(destdir, patternfile)
File “./autograbber.py”, line 55, in main
match(download_list, node, search)
File “./autograbber.py”, line 45, in match
match(download_list, child, pattern, count+1)
File “./autograbber.py”, line 45, in match
match(download_list, child, pattern, count+1)
File “./autograbber.py”, line 45, in match
match(download_list, child, pattern, count+1)
File “./autograbber.py”, line 33, in match
if node.download():
File “/Users//bin/webdl/plus7.py”, line 93, in download
vid_id = self.get_video_id()
File “/Users//bin/webdl/plus7.py”, line 87, in get_video_id
raise Exception(“Could not find video id on page ” + self.url)
Exception: Could not find video id on page http://au.tv.yahoo.com/plus7/criminal-minds/-/watch/17903511/mon-8-jul-series-8-episode-15/

The source file:
$ cat patternfile_plus7_criminal.txt
Yahoo Plus7/Criminal Minds/*

Called via:
./autograbber.py ~/Movies patternfile_plus7_criminal.txt

Thanks

John Bartlett says:

And anything on SBS suddenly gives this error

Traceback (most recent call last):
File “./grabber.py”, line 56, in
main()
File “./grabber.py”, line 49, in main
if not n.download():
File “/home/johnb/scripts/flvgrabber/v4/sbs.py”, line 37, in download
desc_url = swf_url_qs[“releaseUrl”][0]
KeyError: ‘releaseUrl’

Please keep up the fantastic work on these scripts

Jez says:

Me too. Yahoo Plus7 failing for me too.

Looks like we’re playing whack-a-mole.

SBS failure:

Attempting to download Mythbusters S7 Ep212 – Deadliest Catch / Crab Napping
http://www.sbs.com.au/ondemand/video/single/35988035616
Traceback (most recent call last):
File “/home/jpl/webdl/autograbber.py”, line 66, in
main(destdir, patternfile)
File “/home/jpl/webdl/autograbber.py”, line 56, in main
match(download_list, node, search)
File “/home/jpl/webdl/autograbber.py”, line 46, in match
match(download_list, child, pattern, count+1)
File “/home/jpl/webdl/autograbber.py”, line 46, in match
match(download_list, child, pattern, count+1)
File “/home/jpl/webdl/autograbber.py”, line 46, in match
match(download_list, child, pattern, count+1)
File “/home/jpl/webdl/autograbber.py”, line 46, in match
match(download_list, child, pattern, count+1)
File “/home/jpl/webdl/autograbber.py”, line 34, in match
if node.download():
File “/home/jpl/webdl/sbs.py”, line 39, in download
desc_url = swf_url_qs[“releaseUrl”][0]
KeyError: ‘releaseUrl’

delx says:

@Joe, I’m glad it works for you :)
I did some work on the new ABC streaming protocol, but I’ve been busy lately and have not had time to finish it.

@Joe, @Paul, @John
SBS made a small change, it has been fixed now.
Seven has switched from their old streaming provider to Brightcove. I think the code for Ch9 and Ch10 should be easily adaptable for this. I’ll still need to extract the API token and map the custom field metadata to filenames correctly. It’ll probably be a little while before I get around to this since I don’t have proper internet at the moment.

Al says:

It doesn’t look good for P7. As far as I can see most shows are encrypted using Flash Access DRM … which remains “unbroken” AFAIK.

Haven’t been able to derive the Brightcove API media token easily but that is of limited value if the videos are all encrypted.

Easy enough to see what happens by dumping a Firefox Flash container process.

Aidan says:

ABC appears broken now too.

“132.703 kB / 4.68 sec (0.1%)rtmpdump exited with error code: -11”

Aidan says:

Rescind that. Seems to be working now. Odd.

Comments are closed.