Jun 27

HOW TO: Download entire channels from youtube on Linux

HOW TO: Download entire channels from youtube on Linux

 

 

Hello everyone,

so today I am gonna share a quick how to, this is not my usual kind of posts but I found this awesome program on Linux that I use to download video from YouTube and works amazing.

The reason why I needed that is because I always find amazing channels full of tutorials for programming and other stuff and I wish to download and save them.

here it is a small tutorial about that :

HOW TO : Download entire channels from youtube on Linux from Marco Giordano on Vimeo.

Here the python script I used
http://www.marcogiordanotd.com/download/youtube.py

Like the video and subscribe to the channel! and of course share it around!

Jun 11

OPENGL : 3D viewport , let the fun begin!

OPENGL : 3D viewport , let the fun begin!

splashScreen

Hello everyone! It has been quite a bit of time but trust me when I do say there is so much going on in my life right now, I can’t tell you much yet but you will soon know.

Lately I have been focusing on getting better at c++ and Opengl, I do want to move toward pure R&D guy (research and development) rather than staying in rigging.

What I did was creating a 3D viewport maya like , in order to have a play room to implement some stuff I have in mind since a while.

Again, you have to be a little patient you will know later on what my final goal is, for now one step at the time.

Starting learing Opengl was not so easy , there is so much stuff to learn in order to be able to just draw a single triangle (in the proper way) that you can easily find yourself overwhelmed by so much information that you need to know.

Let s start from the beginning.

After I was able to draw a triangle on a 2D screen my first goal was to be able to load in an object, because of course I don’t want to draw manually everything, like vertex , faces , uvs etc.

I decided I was going to write myself a simple obj loader, again just to get a bit more comfortable in parsing text files in cpp, once I did that this was the result:

objLoaded

A nice cube isn’t it?

At this stage I was able to load all the information and to store all the data into buffers to send them to the graphics card.

I wanted my obj loader to be a bit more flexible and I did not want to lose too much time on it , it was out of the scope of my goal, so I just googled to find a nice obj library and I found this:

https://github.com/syoyo/tinyobjloader

Which works really well and if I recall correctly automatically converts quad to triangles at “read” time , which is pretty cool , even tho I did not push too much the tests on this aspect.

Once that was done I actually needed a 3D camera to move around my object, my first implementation of a camera was by rotating around the camera pivot which si not that user-friendly , then after a bit

of effort it was converted into the same kind of behavior of a maya camera.

By the way the camera deformation is performed inside a vertex shader , on a vertex by vertex basis.

At this point of the job it was kinda hard to see the model, the reason is simple , there is not any kind of lighting, the color was completely flat.
I needed to do something about it, here comes in play the fragment shader, this shader gave me quite a bit of trouble to make it up and running but after a lot of swearing I was able to plug it properly in the pipeline

fragment

As you can see initially it was just a simple flat shader but actually the color was being interpolated for each fragment.

Next step was to implement a simple diffuse light model , which is really simple, it is just based on the cosine rule which basically we can resolve with a dot product between camera vector and normal and this is the result

dragonShader

Now it do looks like is starting to look something nice isn’t it ?

Well last thing I do wanted to implement was the ability to load a texture, this task was not too complicated but again just a bit of troubleshooting , is not easy at all to debug shader since they run on the graphics card.

texture

That is it for now , as said in the demo (you can find it here below) I don’t think for now I am going to push rendering much further It s out of the scope , probably I will just implement the wire frame shader.

Here is the video demo , I hope you enjoy , if you like please share and subscribe to my Vimeo channel.

Cheers

M.

Jun 04

A new adventure : UNIVERSITY

A new adventure : UNIVERSITY

Yeah you got it right , I am officially enrolled in a university here in London , the Open University to be exact.

The degree I wish to get is Computing and IT , second subject (applied math), basically hardcore programming and math.

The reasons why I decided to embrace this new adventure are many mad took me a long time to sort them out, let s start from the beginning.

When I was still in highschool I was more attracted from the artistic side of things, I actually wanted to do computer graphic but as an animator ( how fool of me), then the

rigs  I started using where bugged or broken and no one willing to help so I had to learn rigging.

It was love at first sight , I started studying more and more and begun to discover this my tech side that was completely hidden before, I started learning programming aswell!

I started working in the  CG industry as a rigger and met some crazy talented people who pushed me even more down that tech path.

Rigging started feeling a bit limited to me , what maya was offering was not enough , so I started playing with maya api and plugins , I wrote many of those as you might now.

Working with maya api soon or later you have to deal with linear algebra , and there my “faith” was not that strong anymore , but as soon as I figured out that math was going

to be a way to reach my goals, everything changed prospective. I started spending the last two years spending my nights programming and studying math, then last year I started taking university courses online from Coursera.

So much cool stuff to learn and I was putting so much effort in it (some times 3 courses at the same time and still working regularly at DNEG).

Then I realized since I am putting so much effort in this , why don’t actually get a real degree rather than some certificate that have no real value?

That was the time that my quest for finding a university started, one thing I was sure of was that did not have to affect my work , I don’t want to have a part-time job, ideally I would need

a part-time university, luckily the Open University was invented and served exactly  as a university for people who works and that cannot follow regular live lessons and so on.

Funny enough the open university is actually harder than some other university because you have to study by yourself , pace yourself etc , then you do your exams like a regular university.

So here it is a brand new adventure starting, this means though I decided to kill the artistic side and join the dark tech side , to be honest I don’t regret that one bit. If you ever worked with me you know that my tech side is much much MUCH more stronger than my artistic side , I can do okay deformations and stuff but is not my strength.

I tried so many times to work on that and trying to get better and better at art stuff as a rigger , for example I tried several times to do a nice muscle setup taking care of perfect deformations and so on , after 5 minutes I was already dreaming of a way to implement a nice and fast muscle primitive , or different ways to implement a pose reader etc.

To be honest I don’t think I will be a rigger much longer , I am getting attracted more and more to pure research and development,  even though still on creature devlopment.

Also having a degree in computer science will of course help me change my career to pure programmer if I ever will want to.

Anyway I will keep you posted folks :D

That s it for now !

Mar 15

HOW TO : fix guake terminal position on multi-monitor

HOW TO : fix guake terminal position on multi-monitor

Hello everyone ! Today I want to talk about guake terminal , it is a great on linux , because it s so handy.

The only problem with that is if you get multiple monitor it starts to freak out so there is a bit of fixing to do!

Since I run on a 3 monitor setup I got this problem right away and I was quite bothered from it.

I started googling (of course!) and run into this guy post :

http://brightbyte.de/page/Guake_on_the_right_screen

This guy modified slightly the .py of guake which you can actually find in  /usr/lib/guake/guake.py

BEWARE : This fix works for sure on Ubuntu 12.04 I did not try it on later version , I know that guake is not located in the same directory.

Now following the direction in that post kinda gets the job done , on my case I got guake on the right screen but I want it to have it in the middle screen, I got a laptop in the middle and

one monitor left one monitor right.
What I did was playing with this line :

monitor = screen.get_n_monitors() – 2

By default was -1 , I first tried  to remove the -1 and was on the left , so I tried -2 and got the job done , so based on the amount of

monitors you got your fix might be different.
I also changed  the width = 80 to width = 100  because I like it full screen.

Here how it looks :

temp

That’s it  folks!

Mar 02

TUTORIAL FOR ANIMATORS: add extra deformation to a rig

TUTORIAL FOR ANIMATORS: add extra deformation to a rig

extraMod

Hello everyone since lately I have been asked from  many animators to add some extra deformations on top of their

rigs I decided to make a video tutorial out of it in order to help other fellows animators out there to do so!

Here is the video tutorial :

How to modify a maya rig to add extra deformation : FOR ANIMATORS from Marco Giordano on Vimeo.

If you like it please subscribe to the channel and share the video around

Cheers

M.

Jan 31

IMAGE PROCESSING : PYCUDA

IMAGE PROCESSING : PYCUDA

Hello everyone today we are talking about pycuda , I will tell you my experience and will show you some tests! Let s jump into it!

Everything started a couple of weeks ago with the start of an image processing course on Coursera.com , I actually decided to just had a quick look around on the course rather than really follow it, but it is really interesting

so I am taking the course in order to get the certificate. I hope I can manage it with the other 3 courses I am taking , but so far looks good.

The first week we just did some theory of image processing so I decided to have a quick look and mess around , you can actually read what I did in my previous post here.

It was quite a straight forward task , but the computation was talking like 5-6 seconds for a 1440×900 picture, so I decided to try PyCuda and doing the processing on my GPU.

It was not too easy to make Pycuda up and running but with some googling I got it done , I will actually make a post about that in the near future.

Pycuda is deeply boundwith Numpy , in order to send stuff on the GPU you can create GPU data like gpu_array and so on or create Numpy data that have straight conversion to C data.

That was the source of all my problems and frustration . It took me a while to figure out how the data was moved to the GPU and back and also understand the way Numpy handles

the arrays /matrices.

In the way I am used to work with matrices , I would have loved to parse the image by going left to right and top to bottom (row major style) , instead Numpy was converting straight from a PIL

image object to a column major matrix, which is ok , I am just not so used to, when you copy the array on the gpu memory you don’t get the same matrix shape.

I was using a 3 dimensional matrix on the host , something like matrixImage[column][row][rgb], instead  on the gpu I got a matrixImage[column*row*rgb] , which is a 1D matrix, took me a while to figure

out that actually , luckily I found a guy explaining it in a random stackoverflow post!

Once all those small technicalities were settled I could finally work on the kernel which was uber simple for a black and withe effect.

Here is the code I used , in order to make it run you need to have :

- pillow , or PIL

- numpy

- Pycuda (and its dependencies

CODE

import PIL
from PIL import Image
import time

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy

def blackWhite(inPath , outPath , mode = "luminosity",log = 0):

    if log == 1 :
        print ("----------> SERIAL CONVERSION")
    totalT0 = time.time()

    im = Image.open(inPath)
    px = numpy.array(im)

    getDataT1 = time.time()

    print ("-----> Opening path :" , inPath)

    processT0 =  time.time()
    for x in range(im.size[1]):
        for y in range(im.size[0]):

            r = px[x][y][0]
            g = px[x][y][1]
            b = px[x][y][2]
            if mode == "luminosity" :
                val =  int(0.21 *float(r)  + 0.71*float(g)  + 0.07 * float(b))

            else :
                val = int((r +g + b) /3)

            px[x][y][0] = val
            px[x][y][1] = val
            px[x][y][2] = val

    processT1= time.time()
    #px = numpy.array(im.getdata())
    im = Image.fromarray(px)
    im.save(outPath)

    print ("-----> Saving path :" , outPath)
    totalT1 = time.time()

    if log == 1 :
        print ("Image size : ",im.size)
        print ("get and convert Image data  : " ,getDataT1-totalT0 )
        print ("Processing data : " , processT1 - processT0 )
        print ("Save image time : " , totalT1-processT1)
        print ("total  Execution time : " ,totalT1-totalT0 )
        print ("\n")

def CudablackWhite(inPath , outPath , mode = "luminosity" , log = 0):

    if log == 1 :
        print ("----------> CUDA CONVERSION")

    totalT0 = time.time()

    im = Image.open(inPath)
    px = numpy.array(im)
    px = px.astype(numpy.float32)

    getAndConvertT1 = time.time()

    allocT0 = time.time()
    d_px = cuda.mem_alloc(px.nbytes)
    cuda.memcpy_htod(d_px, px)

    allocT1 = time.time()

    #Kernel declaration
    kernelT0 = time.time()

    #Kernel grid and block size
    BLOCK_SIZE = 1024
    block = (1024,1,1)
    checkSize = numpy.int32(im.size[0]*im.size[1])
    grid = (int(im.size[0]*im.size[1]/BLOCK_SIZE)+1,1,1)

    #Kernel text
    kernel = """

    __global__ void bw( float *inIm, int check ){

        int idx = (threadIdx.x ) + blockDim.x * blockIdx.x ;

        if(idx *3 < check*3)
        {
        int val = 0.21 *inIm[idx*3] + 0.71*inIm[idx*3+1] + 0.07 * inIm[idx*3+2];

        inIm[idx*3]= val;
        inIm[idx*3+1]= val;
        inIm[idx*3+2]= val;
        }
    }
    """

    #Compile and get kernel function
    mod = SourceModule(kernel)
    func = mod.get_function("bw")
    func(d_px,checkSize, block=block,grid = grid)

    kernelT1 = time.time()

    #Get back data from gpu
    backDataT0 = time.time()

    bwPx = numpy.empty_like(px)
    cuda.memcpy_dtoh(bwPx, d_px)
    bwPx = (numpy.uint8(bwPx))

    backDataT1 = time.time()

    #Save image
    storeImageT0 = time.time()
    pil_im = Image.fromarray(bwPx,mode ="RGB")

    pil_im.save(outPath)
    print ("-----> Saving path :" , outPath)

    totalT1 = time.time()

    getAndConvertTime = getAndConvertT1 - totalT0
    allocTime = allocT1 - allocT0
    kernelTime = kernelT1 - kernelT0
    backDataTime = backDataT1 - backDataT0
    storeImageTime =totalT1 - storeImageT0
    totalTime = totalT1-totalT0

    if log == 1 :
        print ("Image size : ",im.size)
        print ("get and convert Image data to gpu ready : " ,getAndConvertTime )
        print ("allocate mem to gpu: " , allocTime )
        print ("Kernel execution time : " , kernelTime)
        print ("Get data from gpu and convert : " , backDataTime)
        print ("Save image time : " , storeImageTime)
        print ("total  Execution time : " ,totalTime )
        print ("\n")

As you can see there is not too much stuff going on, but I know you guys want to see some benchmarsk :D and I am not going to disappoint you!

Before starting here is a couple of stats on my hardware

CPU :

i7 4900MQ  full specks here : http://ark.intel.com/products/75131

GPU :

NVidia GTX 780 M  : full specs here

1K IMAGE

I started by working on a 1K images , here down you can see the source , the serial result and the cuda result , for full image click on it.

image1K

image1KBW

image1KBWCuda

Here is some nice debug prints outline the performance :

LOG :

----------> SERIAL CONVERSION
-----> Opening path : C:/Users/Marco/Desktop/jessTest.jpg
-----> Saving path : C:/Users/Marco/Desktop/jessTestLuminosity.jpg
Image size : (1440, 900)
get and convert Image data : 0.04000210762023926
Processing data : 4.67326807975769
Save image time : 0.03500199317932129
total Execution time : 4.748272180557251

----------> CUDA CONVERSION
-----> Saving path : C:/Users/Marco/Desktop/jessTestLuminosityCuda.jpg
Image size : (1440, 900)
get and convert Image data to gpu ready : 0.042001962661743164
allocate mem to gpu: 0.006000041961669922
Kernel execution time : 0.04200291633605957
Get data from gpu and convert : 0.010999917984008789
Save image time : 0.03200197219848633
total Execution time : 0.13300681114196777

OOOOOH YEAAAH that s the good stuff :D a nice ~40X performance speed up , that s quite impressive considering that the computing part of the black and white filter is so small that is not really
taking full advantage of the GPU.

3K IMAGE image3K
image3KBWimage3KBWCuda

LOG :

----------> SERIAL CONVERSION
-----> Opening path : C:/Users/Marco/Desktop/image3K.jpg
-----> Saving path : C:/Users/Marco/Desktop/image3KBW.jpg
Image size : (2494, 1663)
get and convert Image data : 0.07900404930114746
Processing data : 14.747843980789185
Save image time : 0.10500597953796387
total Execution time : 14.931854009628296

----------> CUDA CONVERSION
-----> Saving path : C:/Users/Marco/Desktop/image3KBWCuda.jpg
Image size : (2494, 1663)
get and convert Image data to gpu ready : 0.09000515937805176
allocate mem to gpu: 0.014000892639160156
Kernel execution time : 0.03500199317932129
Get data from gpu and convert : 0.03200197219848633
Save image time : 0.10600614547729492
total Execution time : 0.27701616287231445

I am gonna shoot out the rest of the test and then we can discuss about it :D

4K IMAGE
Remember the order of the pictures is : source , serial , cuda.

Harry Potter and the Deathly Hallows Part 1
image4KBW
image4KBWCuda

LOG


----------> SERIAL CONVERSION
-----> Opening path : C:/Users/Marco/Desktop/image4K.jpg
-----> Saving path : C:/Users/Marco/Desktop/image4KBW.jpg
Image size : (4096, 2440)
get and convert Image data : 0.17701005935668945
Processing data : 35.46102809906006
Save image time : 0.2350139617919922
total Execution time : 35.87305212020874

----------> CUDA CONVERSION
-----> Saving path : C:/Users/Marco/Desktop/image4KBWCuda.jpg
Image size : (4096, 2440)
get and convert Image data to gpu ready : 0.21601200103759766
allocate mem to gpu: 0.029001951217651367
Kernel execution time : 0.03500223159790039
Get data from gpu and convert : 0.07800483703613281
Save image time : 0.23401308059692383
total Execution time : 0.592034101486206

Lets move forward!
5K IMAGE

image5Kimage5KBWimage5KBWCuda

LOG :

</pre>
----------> SERIAL CONVERSION
-----> Opening path : C:/Users/Marco/Desktop/image5K.jpg
-----> Saving path : C:/Users/Marco/Desktop/image5KBW.jpg
Image size : (5184, 3456)
get and convert Image data : 0.27501511573791504
Processing data : 67.43985795974731
Save image time : 0.39202213287353516
total Execution time : 68.10689520835876
----------> CUDA CONVERSION
-----> Saving path : C:/Users/Marco/Desktop/image5KBWCuda.jpg
Image size : (5184, 3456)
get and convert Image data to gpu ready : 0.33801889419555664
allocate mem to gpu: 0.0760049819946289
Kernel execution time : 0.22401213645935059
Get data from gpu and convert : 0.18601083755493164
Save image time : 0.40102314949035645
total Execution time : 1.2250699996948242
<pre>

9K IMAGE
Hell yeah :D I found a 9K image and I am not afraid to use it ! The original png I downloaded was  ~160MB!
The actual JPG i used in the code was 50 mb and the site doesn’t let me upload that size , so I had to compress it to a 70% but the processed Image is coming out
from the HD version aaand I actually lost the one converted serially so I will just add the cuda one .

image9Kcompimage9KBWCuda

And here the LOG for the fat 9k resolution.

</pre>
----------> SERIAL CONVERSION
-----> Opening path : C:/Users/Marco/Desktop/image9K.png
-----> Saving path : C:/Users/Marco/Desktop/image9KBW.png
Image size : (9000, 4637)
get and convert Image data : 0.6430368423461914
Processing data : 144.94929099082947
Save image time : 11.3296480178833
total Execution time : 156.92197585105896
----------> CUDA CONVERSION
-----> Saving path : C:/Users/Marco/Desktop/image9KBWCuda.jpg
Image size : (9000, 4637)
get and convert Image data to gpu ready : 2.886164903640747
allocate mem to gpu: 0.12500715255737305
Kernel execution time : 0.037001848220825195
Get data from gpu and convert : 0.3350191116333008
Save image time : 1.0140578746795654
total Execution time : 4.3972508907318115
<pre>

Alright , as you guys can see the GPU totally nailed the CPU and of course that was expcected , even though not on that extend , I expected  the conversion and memory “travels”
back and forth from host and device memory was going to be a huge factor, but if we see the log actually the major part of the time is spent in coverting the data in a way
that first the gpu can understand it and then in a way PIL can save the image.

Here is a plot of the performances of CPU vs GPU for this task

chart_totalTime

Looks like from those simple data the gain in performance is not completely linear , but again those are simple data , I should have taken like 20 -30 runs of the same code and take an average
of it so what you see here might not be 100% true , but I can confidently say a good 90% is .

Here a plot representing the time spent in data conversion :

chart_dataConvertion
The time spent in converting the data for the 5K compared to the 9k is quite bigger , that might also be due to the implementation of the numpy arrays.

Here another plot of the memory allocation time :

chart_memoryAllocation
Here everything looks pretty normal no surprises .
The surprises actually starts here in the kernel execution time plot :

chart_kernelAgain this data is not completely accurate but if we look the deltas between each sample we can say that the kernel pretty much took the same time on all the pictures,
Now that might be due to the python clock not being able to have enough fine time sampling? or simply because I did not reach full usage of the GPU yet so It can still easily handle that amount
of pixels without breaking a sweat.

Last thing I wanted to check was if I actually got the same exact result out of the CPU and GPU , so I subtracted the 1k images and that s the result :

image1KDiff

No , your monitor is not dirty , the thing you see in the image is a small delta between the two images , now beware that I actually bumped up the delta for a factor of 10 in order to make it more visible ,

so It s pretty much nothing, now if I have to say where this delta is coming from I would say most likely from the many conversions back and fort that happened to the memory during the whole process?

If you have an idea don’t be shy and shoot it out!

That s it folks ! I hope you enjoyed that as much as I did (the good and bad times of the development XD).

I am still a noob in this huge and wild world of massive parallel computing so any feedback is more than welcome !

cheers

M.

Older posts «